You are here


Drupal anti-spam

Lazyweb, O, lazyweb, I call out to thee in my hour of need. I installed the spam and trackback modules for drupal, and to the outside observer, my blog is nicely spam-free. However, I get about fifty spam comments and spam trackbacks a day, which get trapped in the approval queue, and I have to manually wade through cialis and porn adverts/links to see if there are any real comments/trackbacks for any of my posts.

Depressingly, there generally aren't.

What's the best way to keep one's comments and trackbacks spam-free, without having to manually delete every single dodgy one, and without getting any false-positives?

A side note is that the trackback module isn't great - if I want to send a trackback, I have to manually find the trackback URL and put it in the little textbox - isn't there a nice drupal module that checks all outgoing URLs, and autodiscovers the trackbacks, and pings them? The trackback module that I have installed seems to think that this is what it does, but it has delusions of grandeur, in my opinion.

Converting from Serendipity to Drupal

My old blog, on, was running Serendipity, but I've switched to using Drupal for everything on this site, Apart from being written in PHP, which is unfortunate, Drupal seems to be a fairly decent piece of software - pretty, easy to use, and well written. If I needed persuading of this fact, I would have been convinced by how easy it turned out to be to migrate my blog entries from Serendipity to Drupal.

Since I started setting up my hosting (maybe I should blog about that a bit later?), I've been using Postgres as much as possible, where there's a choice. I am using it behind my shorl generator, my quote database, and behind the various drupal sites on this machine, including this blog. The database that stored my serendipity entries on rucus, however, was a MySQL database.

It was easy enough to extract the information that I wanted to keep from the serendipity database:

   select id, title, timestamp, concat(body, extended) from serendipity_entries;
I wasn't that interested in trying to keep the comments, categories, and so on - I have retained the database, and I will go through the comments later and update various entries to include the comments, I think.

After that, the question was how to insert this data into Drupal. At first I thought of doing it manually: I created a test blog entry, with pg_dumps before and after, so I could compare the states of the database, and how it changed when a blog entry was created. It seemed simple enough, but the whole idea didn't sit right with me. So I had a look at the PHP code behind Drupal, and as I've said, it's incredibly simple and elegantly written.

It turns out, there's a node_save() function that you can call, passing it a node object (which needs properties such as 'title', 'body', etc), and it will update everything for you. It was that simple. All I needed was to write some PHP code that did the MySQL selection above from serendipity, created a node with the right properties, and saved it. This code would, of course, need to run within the Drupal environment so that it had access to the node_save() function, and was connected to the right database. This was also trivial to achieve: There is a nice tutoral on creating Drupal modules that made it easy.

I pre-created a table called 'blogdata' to contain the data I wanted:

CREATE TABLE blogdata (
  id int(11),
  title varchar(200),
  timestamp int(10),
  body text,
  done int(3) default '0'
And then populated it:
insert into blogdata select id, title, timestamp, concat(body, extended), 0 from serendipity_entries;
The relevant part of my Drupal module (which I could have actually stuck into any existing Drupal module in order for it to be run) was as follows:
$q = mysql_query("select * from blogdata where done=0 order by timestamp asc");
while($f = mysql_fetch_assoc($q)) {
   $newent = array('created' => $f[`timestamp`], 'title' => utf8_encode($f["title"]), 'body' => utf8_encode($f["body"]),
      'teaser' => utf8_encode($f["body"]), 'format' => 3, 'uid' => 1, 'type' => 'blog', 'status' => 1, "comment" => 2,
      'promote' => 0, 'sticky' => 0);
   $newento = (object)$newent;
$q = mysql_query("update blogdata set done=1");
I did actually have some issues at first because my data was encoded in ISO-8859-1/latin1, and Postgres was expecting UTF-8 data, but as you can see, I call the PHP utf8_encode() function to get around this. Many thanks to bje (whose domain is ironically called "serendipity" ;-) for getting my mind straight when I was being kak about this.

And that was it. My blog entries were imported perfectly. I still need to go through a few of them and fix entries that still hard-link to rucus, but that shouldn't take too long.

The only gripe with Drupal at first was the horrible URLs it created: "/node/124" sort of thing. However, with the nifty Pathauto module, those are a thing of the past.

Subscribe to RSS - drupal