• Welcome to Computer Association of SIUE - Forums.
 

Deleting Spam

Started by Jerry, 2009-07-06T16:57:28-05:00 (Monday)

Previous topic - Next topic

Robert Kennedy

Well, it does seem interesting that this spamming is happening almost immediately after China banned virtual currencies.  It makes me wonder if all those out of work World of Warcraft farmers are now making their $5/month by spamming forums instead. 

Jerry


I'm currently of the opinion that the spammers we are experiencing are not bots, so any solution suggested to combat bots will not be very useful.

Coincedently I went to a talk at the AI conference I'm attending in LA right now that was on how to break captcha. The state of the art for automatically breaking captcha according to this talk is not very sophisticated.

"Make a Little Bird House in Your Soul" - TMBG...

raptor

I'm working with Chris Lintott here at Oxford.  He recently spoke with the man who invented captcha regarding OCR on ancient texts.  The conversation apparently included a portion about how the best captchas haven't really been beaten, that its been people brute forcing through them and saving character images.  Anyone have any other info on this?

Again, I also agree these aren't bots, and don't readily have any ideas minus LOTS of monitoring.
President of CAOS
Software Engineer NASA Nspires/Roses Grant

Gregory Bartholomew

#18
For the record, the idea that I was proposing about adding a simple, randomized grammar quiz was supposed to be an attempt to cut down on the humans who appear to be spamming this site. I was attempting to exploit the apparent correlation between those who's English skills are extremely limited and the spammers. I figure that the grammatical subtleties that frequently trip up those of us who have English as our first language would be almost impossible for the uneducated foreigners who seem to be spamming this site to pass. It was just an idea. I thought (was hoping) maybe some ambitious CS student might find the program that I proposed a worthwhile challenge.

As for the apache logs, what I have only goes back so far, but I'm attaching the access logs for the registration page of CAOS to this post (you will probably need to be logged in to see the link). I did find the number of accesses to the captcha audio playback interesting, but nothing else in the logs looked particularly interesting or out of the ordinary. What do you all think - does the activity shown in the logs suggest a human with really poor captcha skills or a computer with really good captcha skills?

I have already made a subtle change to the captcha that should be ample to reduce a specialized captcha cracking program from being able to break our captcha. I'm not sure that a reduction in the number of spammers that we see at this point would be all that informative though as all the hostile comments that we have been making about spammers on this site lately might also cause a reduction in the number of human generated spam posts (at least for a while).

gb
......

William Grim

I don't really see anything in the logs, but I don't know what code index.php has in it.  I kind of want to know what the "rand" variable contains.  I guessed it was base64 data, but when I decoded it, I got non-ASCII characters.  However, they may be entering foreign characters and my Windows desktop couldn't display it.

Further, you could just write some code to prevent more than one link in a signature.  You could also intercept each forum post as it's made and make sure there aren't too many URLs.  If there are, then postings can be denied.
William Grim
IT Associate, Morgan Stanley

Mark Sands

Scott and I are going to put our heads together and hopefully we can write a new anti-foreigner captcha system soon. I had the idea to make it an siue centralized test, but since legitimate outsiders aren't prohibited, maybe a combination of pictures and English math will do for now and we can go from there.
Mark Sands
Computer Science Major

Jerry

Please be sensitive to what is proposed and implemented.

A solution that is biased to native English speakers could easily be discriminatory toward students and faculty whose second language is English.
"Make a Little Bird House in Your Soul" - TMBG...

arcdrag

Quote from: raptor on 2009-07-16T11:25:53-05:00 (Thursday)
Again, I also agree these aren't bots, and don't readily have any ideas minus LOTS of monitoring.

The groups responsible for the spamming will likely narrow down the scope of their efforts and begin to focus on forums in which it is most effective and their messages can stay up for more than a few days.  Simple monitoring will get rid of the problem temporarily after a few weeks, and will probably take less actual effort than writing entirely new captcha code. 

raptor

#23
I agree with Dr. Weinberg,  we cannot have a native english bias.

Honestly if these are real people, and are familiar with english then nothing automated we can do will help.  The only REAL option I see is to necessitate admin/moderator approval before being allowed membership.  I know that I myself am on the forum at least once in a 24 hours period if not mutliple times, let along I would see an email the forum generated to me very quickly.  Now, I realize this doesn't guarantee that one of the admin/mods would answer within 24 hrs.  So, to be fair to users waiting on us we could put a 24 hour time on it.  If an admin/mod doesn't approve/deny within 24 hrs. it auto approves.  Of course 24hrs could be any time this is just arbitrary.  

The one thing I DON"T like about this is not letting members join and be an active part right away.  I'm not sure if it would turn new comers away.  SO, why not only do this if they have an out of SIUE/area ip address.  All of the ip's from these attacks have been from very foreign countries.

Thoughts?

Scott
President of CAOS
Software Engineer NASA Nspires/Roses Grant

William Grim

#24
I think you should just start simply and deny posts that contain too many URLs and also limit URLs in signatures to just one.
William Grim
IT Associate, Morgan Stanley

Jerry


Interestingly I attended a talk by Luis von Ahn, creator of captcha: (http://en.wikipedia.org/wiki/Luis_von_Ahn).

First, he is an excellent speaker and you should attend one of his talks if you ever have the chance.

Second, he talked about reCAPTCHA (http://recaptcha.net/), which he developed as a way to harness the cognitive power of everyone typing in a captcha to do tasks that would require an enourmous amount of human power to accomplish. reCAPTCHA  is being used to digitize books and periodicals. In about 4 months from now they will have digitized the entire collection of the New York Times, which is over 100 years of papers. When it is done it will have only taken a total of 8 months!

Third, he talked about captcha sweatshops. He had a a funny story about interacting with one over email, but the important part of that story is that reCAPTCHA has IP blocking to prevent spamming from these sweatshops.

So I suggest that if possible in the framework of the CAOS site adopt reCAPTCHA. When it is used it will help do someting good for society and it combats captcha sweatshops up ffront.
"Make a Little Bird House in Your Soul" - TMBG...

Gregory Bartholomew

I think I remember seeing a digg article about recaptcha a few months ago.  Sounds pretty cool.  It looks like there is already an SMF module for it: http://custom.simplemachines.org/mods/index.php?mod=1044.  I'll look into getting it integrated with our site ASAP.  Right now, I want to install another network card in one of the CS servers.  Please be patient if you notice a few of the CS sites are down for a few minutes during the next hour (I'll also be re-routing services after I get the card installed).

Thanks,
gb
......

raptor

I downloaded and reviewed the smf mod.  I think it should install right in.  I can test on an smf install for you if you like.

Scott
President of CAOS
Software Engineer NASA Nspires/Roses Grant

Gregory Bartholomew

OK, all the CS websites should be back now.

Thanks for offering to help with the SMF module Scott.  I would like to see some of the CAOS admins get a little more involved in managing this site.  If I make a backup of the current database and website and email you the password to the server, would you be willing to install the mod?

Thanks,
gb
......

raptor

Tell you what, I have some smf install running on other servers.  Why don't I try it there first just to be sure.  Less risk involved.  I'll get back with you when I have results.

Re: getting more involved,

Mark should be making a MUCH more prominent role come fall.  The new site he was working on was placed on hold whilst working with Microsoft this summer, but we should be ready to launch early next semester.  I to agree the "web-admin" should be more of a web admin.  We are working toward that.

Scott
President of CAOS
Software Engineer NASA Nspires/Roses Grant