Computer Association of SIUE - Forums

CAOS Forums => News and Commentary => Topic started by: Jerry on 2009-07-06T16:57:28-05:00 (Monday)

Title: Deleting Spam
Post by: Jerry on 2009-07-06T16:57:28-05:00 (Monday)

Ok, so the CAPTCHA modules for CAOS have been updated, however there is still some spam making it on to the forum. Greg & Jeff think that this is manually entered spam, which means there is no automatic way to keep it from showing up.

Jeff, Greg, and I get notices for new accounts, and we delete suspicious ones. We also delete any associated posts.

However an occasional spam post gets left on for a little while. We will delete it eventually.

Here's my question: If I delete a spam post, should I delete a post by a regular CAOS member who posts a message saying the last post is a spam post?

Title: Re: Deleting Spam
Post by: Tangent Orchard on 2009-07-06T19:44:22-05:00 (Monday)
Quote from: JerryHere's my question: If I delete a spam post, should I delete a post by a regular CAOS member who posts a message saying the last post is a spam post?
Please? ^_^ The last few times we've had a spam post that was deleted, I came in after the fact and only saw the CAOS member's post, which seems to be about the same off-topic-wise, in all honesty.
Title: Re: Deleting Spam
Post by: raptor on 2009-07-07T04:13:49-05:00 (Tuesday)
Yes, When I delete spam I delete the post calling it spam as well.  I try to check to forum at least once a day and keep an eye on things.  Especially when I receive an email telling me of a new member joining (almost daily).  The other officers have been informed to check, but may not be keeping as close an eye on things.
Title: Re: Deleting Spam
Post by: arcdrag on 2009-07-07T08:51:11-05:00 (Tuesday)
Quote from: SomethingFunny on 2009-07-07T04:41:27-05:00 (Tuesday)
Yep,it's very hard to control it.Anyway,spam is 1 part of the internet.We gotta accept it.

Coming from someone making their first post with 4 websites in their sig...I must say your name suits you. 

I'm just really curious how bad one's life must be in order to reach the point where you think that the best way to make money is by manually spamming random web forums. 
Title: Re: Deleting Spam
Post by: Shaun Martin on 2009-07-07T13:07:32-05:00 (Tuesday)
Wow, just wow.  We've got spam bots with attitude on this forum!
Title: Re: Deleting Spam
Post by: raptor on 2009-07-09T04:22:37-05:00 (Thursday)
I just went through and removed these 'spam-esk' users who are from Vietnam and Pakistan.

Scott
Title: Re: Deleting Spam
Post by: raptor on 2009-07-15T12:54:32-05:00 (Wednesday)
So a conversation arose on a thread that was hijacked by a spammer.  Essential the problem is such:

People, be it bots or actual people are creating accounts and making what are related yet irrelevant posts on old threads.  They also use some horrible yet amusing broken English.  These users happen to have spam ads galore in their sigs.  This seems to be happening on a now daily basis.

I have opened the floor (and bribed with a CS @ SIUE shirt) to anyone who has a solution to the problem.  Currently it seems as though most of these spammers are from distant foreign countries.  I considered doing an ip check upon creating accounts, but that could prohibit people with valid reason from being members, and could be easily avoided with a proxy server.

Greg has suggested the following:

"How about a short English grammar exam on the registration page.  Laughing

There is a small possibility that some of them are actually bots.  There was a lot of news recently about bots getting better at breaking captchas.  They were targeting Google at one point and generating massive numbers of spam email accounts.  They go for whatever they can that will give them the most bang for their buck so to speak.  Because SMF is so popular, there may be a group out there that specializes in writing bots that can beat SMF's captcha.  If this is the case, all we need to do is make a trivial change to our captcha so that it is different enough from the standard that the specialized bots will not be able to figure it out.  What I really want to do is change the fonts that are used.  If you want to help, see if you can find some free, heavily styled font and convert it to the "gdf" type such that I can put it in CAOS's fonts directory (I'll need the individual letter gifs as well).  If you do convert some obscure font, email it to me (I don't want some would-be hacker to download the zip from our site and add it to their arsenal). "


If anyone would like to do this please feel free.  I am also open to other thoughts and ideas.  Mark Sands suggested some nature of a test that someone familiar with the department would pass.  Again I am concerned this could keep our legitimate users.

The only option I've thought of so far that would for sure work, yet isn't very nice to users would be requiring administrator/moderator approval before a new account is created.  I considered maybe an audio only captcha but that won't work for people using machines without speakers.

Scott
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-15T13:12:06-05:00 (Wednesday)
I've just disabled the audio option on the registration page as it occurred to me that the audio is probably easier for a computer to crack these days than the captcha.  Let me know if this is unacceptable.  As for the grammar test, I was thinking of something along the lines of:

Which of the following is proper English:

A. I was to tired to post yesterday.
B. They were happy, but there mouths didn't show it.
C. One plus one equals three.

We would need to build some sort of database of such questions to pull randomly from and having a higher ratio of incorrect to correct answers would be good to.

gb
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-15T13:59:43-05:00 (Wednesday)
I just had a thought.  To cut down on the number of sentences that we would have to contrive, could someone instead write a program that would generate random, simple sentences but substitute the more common grammatical errors?  This way, we would only need a few lists of nouns, verbs, adjectives and the like.  It wouldn't matter that the sentences didn't make any sense, only that the grammar was incorrect.  The program would need to substitute common grammatical errors such as "your" in place of "you're" - things like what are listed here: http://news.zdnet.co.uk/itmanagement/0,1000000308,39273376,00.htm.  I think we should still go with a database of correct statements though, just to be sure.  I would also make it a rule that the "correct" sentences should not be logically correct or sensible to be sure that they blend in with what the computer is generating.
Title: Re: Deleting Spam
Post by: raptor on 2009-07-15T18:38:06-05:00 (Wednesday)
To be honest, I like this idea.  We would make them answer like two or three of them.  LOL example B is subtle, switching there and their.  I'm not sure everyone would grab that right away. 

Don't worry about audio disabled.  I was simply throwing out thoughts.
Title: Re: Deleting Spam
Post by: Robert Kennedy on 2009-07-15T19:29:57-05:00 (Wednesday)
I'm just going to throw out a couple of low-tech ideas that I've seen other forums use.

1.  Disable signatures/links in a user's posts until a user has x amount of posts
2.  Only automatically approve users that have a semi-local IP address.  All other users will require an admin approval.
3.  Automatically lock threads older than x years. 
Title: Re: Deleting Spam
Post by: Tangent Orchard on 2009-07-15T20:05:33-05:00 (Wednesday)
Just as a tangent from the original thread this came up in; I'm also not entirely certain these are actual bots.  I've been going to a couple of different forums for several years now (long enough to get acquainted with the process) and we've had maybe two people that actually posted something that wasn't a deluge of e-commerce links.  (But even then, they were posting a news article word-for-word.)  These new spam-users look to be more like humans than any bot I've ever seen, which would make this task difficult. =|

As for the post above, I partially disagree with #2, although the admin-approval thing is nice if someone looks at it daily.  I know of several students that go back and forth (sort of) from here to India, for example.  We wouldn't want to block those by accident.
Title: Re: Deleting Spam
Post by: William Grim on 2009-07-15T22:43:44-05:00 (Wednesday)
Before jumping to a solution to a problem that may not exist and wasting a lot of someone's time, how do we know that CAPTCHA has been defeated?  From what I understand, it was actual people being paid low wages to beat Google's CAPTCHA, not an algorithm.  Though, look up the historical meaning of "computer" to see the irony about saying computers themselves weren't breaking the CAPTCHA; wikipedia can help with this.

How about we start simply and record the estimated number of spam accounts created per day.  Next, make a very slight and simple change to CAPTCHA and see if the new number of spam accounts goes down.  If it does, then the problem is CAPTCHA, and you need to improve it; if not, then the problem is most likely an exploit in SMF itself, and you should start searching there.

Secondly, if it is CAPTCHA that's the problem, the most awesome CAPTCHA system ever would be a playing a simple game from a random set of simple games.  You'd have to win in order to pass.

UPDATE: I just saw another spam attempt.  Greg, I think a scan of the Apache logs may reveal something useful to you, but I'm only guessing.  Also, I'm too lazy right now, but perhaps you could use Fiddler for IE (or the Mozilla/Safari equivalent) to see if any odd websites are being loaded when you look at CAOS, which would indicate the site has been compromised.
Title: Re: Deleting Spam
Post by: William Grim on 2009-07-15T22:48:16-05:00 (Wednesday)
Also, just for the record, I have to disagree with #2 completely.  I would have to stop coming here if the forum became hostile to friendlies.
Title: Re: Deleting Spam
Post by: raptor on 2009-07-16T07:41:15-05:00 (Thursday)
Ryan Balfanz proposed some English math...
Ex:
Q- What is you + me   A- Us

But I think one can see how this could become not so straight forward in all cases

What about this:
When creating an account we show a simple picutre.  A ball, a cat, a dog, a house.

The user has to type into a text box what the object in the picture is.  We would use very straight forward simple images, and one word answers.

This will tell us right away if these are bots or real people.

Scott
Title: Re: Deleting Spam
Post by: Robert Kennedy on 2009-07-16T08:16:20-05:00 (Thursday)
Well, it does seem interesting that this spamming is happening almost immediately after China banned virtual currencies.  It makes me wonder if all those out of work World of Warcraft farmers are now making their $5/month by spamming forums instead. 
Title: Re: Deleting Spam
Post by: Jerry on 2009-07-16T10:46:10-05:00 (Thursday)

I'm currently of the opinion that the spammers we are experiencing are not bots, so any solution suggested to combat bots will not be very useful.

Coincedently I went to a talk at the AI conference I'm attending in LA right now that was on how to break captcha. The state of the art for automatically breaking captcha according to this talk is not very sophisticated.

Title: Re: Deleting Spam
Post by: raptor on 2009-07-16T11:25:53-05:00 (Thursday)
I'm working with Chris Lintott here at Oxford.  He recently spoke with the man who invented captcha regarding OCR on ancient texts.  The conversation apparently included a portion about how the best captchas haven't really been beaten, that its been people brute forcing through them and saving character images.  Anyone have any other info on this?

Again, I also agree these aren't bots, and don't readily have any ideas minus LOTS of monitoring.
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-16T11:31:32-05:00 (Thursday)
For the record, the idea that I was proposing about adding a simple, randomized grammar quiz was supposed to be an attempt to cut down on the humans who appear to be spamming this site. I was attempting to exploit the apparent correlation between those who's English skills are extremely limited and the spammers. I figure that the grammatical subtleties that frequently trip up those of us who have English as our first language would be almost impossible for the uneducated foreigners who seem to be spamming this site to pass. It was just an idea. I thought (was hoping) maybe some ambitious CS student might find the program that I proposed a worthwhile challenge.

As for the apache logs, what I have only goes back so far, but I'm attaching the access logs for the registration page of CAOS to this post (you will probably need to be logged in to see the link). I did find the number of accesses to the captcha audio playback interesting, but nothing else in the logs looked particularly interesting or out of the ordinary. What do you all think - does the activity shown in the logs suggest a human with really poor captcha skills or a computer with really good captcha skills?

I have already made a subtle change to the captcha that should be ample to reduce a specialized captcha cracking program from being able to break our captcha. I'm not sure that a reduction in the number of spammers that we see at this point would be all that informative though as all the hostile comments that we have been making about spammers on this site lately might also cause a reduction in the number of human generated spam posts (at least for a while).

gb
Title: Re: Deleting Spam
Post by: William Grim on 2009-07-16T12:00:36-05:00 (Thursday)
I don't really see anything in the logs, but I don't know what code index.php has in it.  I kind of want to know what the "rand" variable contains.  I guessed it was base64 data, but when I decoded it, I got non-ASCII characters.  However, they may be entering foreign characters and my Windows desktop couldn't display it.

Further, you could just write some code to prevent more than one link in a signature.  You could also intercept each forum post as it's made and make sure there aren't too many URLs.  If there are, then postings can be denied.
Title: Re: Deleting Spam
Post by: Mark Sands on 2009-07-16T12:06:20-05:00 (Thursday)
Scott and I are going to put our heads together and hopefully we can write a new anti-foreigner captcha system soon. I had the idea to make it an siue centralized test, but since legitimate outsiders aren't prohibited, maybe a combination of pictures and English math will do for now and we can go from there.
Title: Re: Deleting Spam
Post by: Jerry on 2009-07-16T13:09:28-05:00 (Thursday)
Please be sensitive to what is proposed and implemented.

A solution that is biased to native English speakers could easily be discriminatory toward students and faculty whose second language is English.
Title: Re: Deleting Spam
Post by: arcdrag on 2009-07-16T13:55:03-05:00 (Thursday)
Quote from: raptor on 2009-07-16T11:25:53-05:00 (Thursday)
Again, I also agree these aren't bots, and don't readily have any ideas minus LOTS of monitoring.

The groups responsible for the spamming will likely narrow down the scope of their efforts and begin to focus on forums in which it is most effective and their messages can stay up for more than a few days.  Simple monitoring will get rid of the problem temporarily after a few weeks, and will probably take less actual effort than writing entirely new captcha code. 
Title: Re: Deleting Spam
Post by: raptor on 2009-07-16T14:48:54-05:00 (Thursday)
I agree with Dr. Weinberg,  we cannot have a native english bias.

Honestly if these are real people, and are familiar with english then nothing automated we can do will help.  The only REAL option I see is to necessitate admin/moderator approval before being allowed membership.  I know that I myself am on the forum at least once in a 24 hours period if not mutliple times, let along I would see an email the forum generated to me very quickly.  Now, I realize this doesn't guarantee that one of the admin/mods would answer within 24 hrs.  So, to be fair to users waiting on us we could put a 24 hour time on it.  If an admin/mod doesn't approve/deny within 24 hrs. it auto approves.  Of course 24hrs could be any time this is just arbitrary.  

The one thing I DON"T like about this is not letting members join and be an active part right away.  I'm not sure if it would turn new comers away.  SO, why not only do this if they have an out of SIUE/area ip address.  All of the ip's from these attacks have been from very foreign countries.

Thoughts?

Scott
Title: Re: Deleting Spam
Post by: William Grim on 2009-07-16T18:14:06-05:00 (Thursday)
I think you should just start simply and deny posts that contain too many URLs and also limit URLs in signatures to just one.
Title: Re: Deleting Spam
Post by: Jerry on 2009-07-17T08:26:16-05:00 (Friday)

Interestingly I attended a talk by Luis von Ahn, creator of captcha: (http://en.wikipedia.org/wiki/Luis_von_Ahn).

First, he is an excellent speaker and you should attend one of his talks if you ever have the chance.

Second, he talked about reCAPTCHA (http://recaptcha.net/), which he developed as a way to harness the cognitive power of everyone typing in a captcha to do tasks that would require an enourmous amount of human power to accomplish. reCAPTCHA  is being used to digitize books and periodicals. In about 4 months from now they will have digitized the entire collection of the New York Times, which is over 100 years of papers. When it is done it will have only taken a total of 8 months!

Third, he talked about captcha sweatshops. He had a a funny story about interacting with one over email, but the important part of that story is that reCAPTCHA has IP blocking to prevent spamming from these sweatshops.

So I suggest that if possible in the framework of the CAOS site adopt reCAPTCHA. When it is used it will help do someting good for society and it combats captcha sweatshops up ffront.
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-17T08:47:46-05:00 (Friday)
I think I remember seeing a digg article about recaptcha a few months ago.  Sounds pretty cool.  It looks like there is already an SMF module for it: http://custom.simplemachines.org/mods/index.php?mod=1044.  I'll look into getting it integrated with our site ASAP.  Right now, I want to install another network card in one of the CS servers.  Please be patient if you notice a few of the CS sites are down for a few minutes during the next hour (I'll also be re-routing services after I get the card installed).

Thanks,
gb
Title: Re: Deleting Spam
Post by: raptor on 2009-07-17T11:21:13-05:00 (Friday)
I downloaded and reviewed the smf mod.  I think it should install right in.  I can test on an smf install for you if you like.

Scott
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-17T11:41:24-05:00 (Friday)
OK, all the CS websites should be back now.

Thanks for offering to help with the SMF module Scott.  I would like to see some of the CAOS admins get a little more involved in managing this site.  If I make a backup of the current database and website and email you the password to the server, would you be willing to install the mod?

Thanks,
gb
Title: Re: Deleting Spam
Post by: raptor on 2009-07-17T12:53:30-05:00 (Friday)
Tell you what, I have some smf install running on other servers.  Why don't I try it there first just to be sure.  Less risk involved.  I'll get back with you when I have results.

Re: getting more involved,

Mark should be making a MUCH more prominent role come fall.  The new site he was working on was placed on hold whilst working with Microsoft this summer, but we should be ready to launch early next semester.  I to agree the "web-admin" should be more of a web admin.  We are working toward that.

Scott
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-17T13:10:02-05:00 (Friday)
Thanks Scott.
Title: Re: Deleting Spam
Post by: Mark Sands on 2009-07-18T07:59:34-05:00 (Saturday)
Yes, that'd be very helpful if you could even somehow do an alias like the wordpress page is currently set up! :D
Title: Re: Deleting Spam
Post by: thatguy on 2009-07-18T23:16:17-05:00 (Saturday)
Might I suggest:

http://arstechnica.com/old/content/2006/04/6554.ars

Title: Re: Deleting Spam
Post by: Mark Sands on 2009-07-19T02:07:05-05:00 (Sunday)
I don't think the captchas are the big issue here though. As we've noticed, it seems a lot of the spam comes from real people.
Title: Re: Deleting Spam
Post by: thatguy on 2009-07-19T10:37:41-05:00 (Sunday)
My suggestion was based on two ideas:
1) That if we are going to have a captcha, it might as well be interesting.
2) That the captchas are being beaten by captcha sweatshops .

I have to agree with Scott's idea that requires an approval for new member's posts.  I've been to sites that do that and it isn't much of an issue.  The concern about the time it takes to approve a post shouldn't be an issue, an email could be sent to the entire CAOS board.  If the officers can't respond to something like this in a reasonable time, then the issue is with the board, not the system.
Title: Re: Deleting Spam
Post by: Gregory Bartholomew on 2009-07-19T11:48:52-05:00 (Sunday)
Quote from: Mark Sands on 2009-07-18T07:59:34-05:00 (Saturday)
Yes, that'd be very helpful if you could even somehow do an alias like the wordpress page is currently set up! :D

Hey Mark, Yah, sorry about that wordpress link being broken.  When I upgraded SMF to 1.1.9, I also upgraded the whole OS on the server.  I need to reinstall the wordpress stuff and recreate the links.  It is on my todo list, along with getting the Gallary going again.  I have some work that needs to be done on home.cs.siue.edu that might keep me busy for the first few workdays of the week though.

gb