<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>spam</title>
  <link rel="alternate" type="text/html" href="http://vhata.net/tags/spam"/>
  <link rel="self" type="application/atom+xml" href="http://vhata.net/taxonomy/term/23/atom/feed"/>
  <id>http://vhata.net/taxonomy/term/23/atom/feed</id>
  <updated>2007-06-10T20:14:04+02:00</updated>
  <entry>
    <title>Fight spam and read books</title>
    <link rel="alternate" type="text/html" href="http://vhata.net/blog/2007/07/20/fight-spam-and-read-books" />
    <id>http://vhata.net/blog/2007/07/20/fight-spam-and-read-books</id>
    <published>2007-07-20T15:18:52+02:00</published>
    <updated>2008-09-27T12:25:23+02:00</updated>
    <author>
      <name>Jonathan Hitchcock</name>
    </author>
    <category term="blog" />
    <category term="facebook imports" />
    <category term="spam" />
    <summary type="html"><![CDATA[<p>
Well, I fixed my <a href="/blog/2007/06/10/drupal-anti-spam">spam problems</a>, it seems.  I am now using <a href="http://en.wikipedia.org/wiki/CAPTCHA">CAPTCHAs</a> on blog comments. A CAPTCHA is a way of checking whether the person accessing a web page is a "real" person by asking them to do something which computers find it hard to do.  Traditionally, this has involved asking them to type out a word in a picture, because computers have always had trouble with image processing.  However, software has improved at reading images, and this approach has started failing.  Some other ways to determine whether the user is a real person have been suggested:
<blockquote>
In order to prove your authenticity, please provide the answer to the following formula:
<img src="/files/2007/07/20/formula.jpg" alt="formula" title="formula" />
</blockquote>
And then there's:
</p>
<p style="text-align: center">
<a href="http://xkcd.com/c233.html">
<img src="http://imgs.xkcd.com/comics/a_new_captcha_approach.png" alt="a new captcha approach" title="a new captcha approach" />
</a>
</p>
<p>
I am using neither of these methods, unfortunately.  <a href="http://whijo.net/">Brad</a> pointed out <a href="http://recaptcha.net/">ReCAPTCHA</a> to me, which is now the <a href="http://en.wikipedia.org/wiki/ReCAPTCHA">recommended implementation</a> of the CAPTCHA system.  As described <a href="http://recaptcha.net/learnmore.html">on their page</a>, people perform word recognition all the time when they answer CAPTCHAs, and ReCAPTCHA uses this to assist in scanning the world's library archives into digital format.  When some pages of some books are scanned in, the software can't always work out what the words are supposed to be, so these words get used in CAPTCHAs, and we let the people of the world work out what they are.  If you're wondering how unknown words can be used in a CAPTCHA, go and read the link above.
</p>
<p>
Anyway, the point is, we're helping to digitize humanity's knowledge, and fighting spam at the same time. It's like <a href=" http://nigoro.jp/game/rosecamellia/rosecamellia.php">hitting two birds</a> with one stone.  I notice that Facebook also uses ReCAPTCHA in its sign-up form.  I think it's awesome.
</p>
<p>
Please let me know if there are any issues using the new CAPTCHAs when submitting comments?
</p>
<p>
<b>Update:</b> 
<a href="http://www.russellheimlich.com/blog/did-wikipedia-just-insult-me/">More captcha amusements</a> and <a href="http://www.tonsai.de/blog-english/2007/craziest-captchas-on-the-web/">yet more</a>.
</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>
Well, I fixed my <a href="/blog/2007/06/10/drupal-anti-spam">spam problems</a>, it seems.  I am now using <a href="http://en.wikipedia.org/wiki/CAPTCHA">CAPTCHAs</a> on blog comments. A CAPTCHA is a way of checking whether the person accessing a web page is a "real" person by asking them to do something which computers find it hard to do.  Traditionally, this has involved asking them to type out a word in a picture, because computers have always had trouble with image processing.  However, software has improved at reading images, and this approach has started failing.  Some other ways to determine whether the user is a real person have been suggested:
<blockquote>
In order to prove your authenticity, please provide the answer to the following formula:
<img src="/files/2007/07/20/formula.jpg" alt="formula" title="formula" />
</blockquote>
And then there's:
</p>
<p style="text-align: center">
<a href="http://xkcd.com/c233.html">
<img src="http://imgs.xkcd.com/comics/a_new_captcha_approach.png" alt="a new captcha approach" title="a new captcha approach" />
</a>
</p>
<p>
I am using neither of these methods, unfortunately.  <a href="http://whijo.net/">Brad</a> pointed out <a href="http://recaptcha.net/">ReCAPTCHA</a> to me, which is now the <a href="http://en.wikipedia.org/wiki/ReCAPTCHA">recommended implementation</a> of the CAPTCHA system.  As described <a href="http://recaptcha.net/learnmore.html">on their page</a>, people perform word recognition all the time when they answer CAPTCHAs, and ReCAPTCHA uses this to assist in scanning the world's library archives into digital format.  When some pages of some books are scanned in, the software can't always work out what the words are supposed to be, so these words get used in CAPTCHAs, and we let the people of the world work out what they are.  If you're wondering how unknown words can be used in a CAPTCHA, go and read the link above.
</p>
<p>
Anyway, the point is, we're helping to digitize humanity's knowledge, and fighting spam at the same time. It's like <a href=" http://nigoro.jp/game/rosecamellia/rosecamellia.php">hitting two birds</a> with one stone.  I notice that Facebook also uses ReCAPTCHA in its sign-up form.  I think it's awesome.
</p>
<p>
Please let me know if there are any issues using the new CAPTCHAs when submitting comments?
</p>
<p>
<b>Update:</b> 
<a href="http://www.russellheimlich.com/blog/did-wikipedia-just-insult-me/">More captcha amusements</a> and <a href="http://www.tonsai.de/blog-english/2007/craziest-captchas-on-the-web/">yet more</a>.
</p>
    ]]></content>
  </entry>
  <entry>
    <title>Drupal anti-spam</title>
    <link rel="alternate" type="text/html" href="http://vhata.net/blog/2007/06/10/drupal-anti-spam" />
    <id>http://vhata.net/blog/2007/06/10/drupal-anti-spam</id>
    <published>2007-06-10T20:14:04+02:00</published>
    <updated>2007-06-10T20:14:04+02:00</updated>
    <author>
      <name>Jonathan Hitchcock</name>
    </author>
    <category term="blog" />
    <category term="drupal" />
    <category term="spam" />
    <summary type="html"><![CDATA[<p>
Lazyweb, O, lazyweb, I call out to thee in my hour of need.  I installed the spam and trackback modules for drupal, and to the outside observer, my blog is nicely spam-free.  However, I get about fifty spam comments and spam trackbacks a day, which get trapped in the approval queue, and I have to manually wade through cialis and porn adverts/links to see if there are any real comments/trackbacks for any of my posts.
</p>
<p>
Depressingly, there generally aren't.
</p>
<p>
What's the best way to keep one's comments and trackbacks spam-free, without having to manually delete every single dodgy one, and without getting any false-positives?
</p>
<p>
A side note is that the trackback module isn't great - if I want to send a trackback, I have to manually find the trackback URL and put it in the little textbox - isn't there a nice drupal module that checks all outgoing URLs, and autodiscovers the trackbacks, and pings them?  The trackback module that I have installed seems to think that this is what it does, but it has delusions of grandeur, in my opinion.
</p>
    ]]></summary>
    <content type="html"><![CDATA[<p>
Lazyweb, O, lazyweb, I call out to thee in my hour of need.  I installed the spam and trackback modules for drupal, and to the outside observer, my blog is nicely spam-free.  However, I get about fifty spam comments and spam trackbacks a day, which get trapped in the approval queue, and I have to manually wade through cialis and porn adverts/links to see if there are any real comments/trackbacks for any of my posts.
</p>
<p>
Depressingly, there generally aren't.
</p>
<p>
What's the best way to keep one's comments and trackbacks spam-free, without having to manually delete every single dodgy one, and without getting any false-positives?
</p>
<p>
A side note is that the trackback module isn't great - if I want to send a trackback, I have to manually find the trackback URL and put it in the little textbox - isn't there a nice drupal module that checks all outgoing URLs, and autodiscovers the trackbacks, and pings them?  The trackback module that I have installed seems to think that this is what it does, but it has delusions of grandeur, in my opinion.
</p>
    ]]></content>
  </entry>
</feed>
