You are here

Phoblogging

An apology

The sharp-eyed among you will have noticed that my blog has suddenly become one of those sites. I apologise for flooding your feed readers with pictures of seals, but let me explain.

A justification

The whole "let me post random photos from my life on my blog" thing was more an exercise in "how easy would it be to make a phoblog?" than a desire to share what my shoes look like. I will admit that when I took the sunset photo, I thought "this would be a really good thing to share with the world", because, let's admit it, Cape Town is one ridiculously beautiful city, and people need to hear that. But that got me thinking how easy it would be to make a photo shareable, and here is what I came up with.

An explanation

There is actually a function on my phone labelled "blog this", but I think it sends the image (or whatever) to a Sony-sponsored blogging site, and I'm frankly not interested in that. I wanted to solve the problem academically, for the general case, and as a side effect solve it for my specific case - I run this blog in a drupal instance on my own server hosted with Layered Tech.

A discussion

So, the various ways to get information from my phone to my server were MMS, email, some form of push to a web-page, or bluetooth/cable upload to a laptop/desktop which will send it on. The last option defeated the point - I wanted to be able to blog a photo from anywhere, using nothing but my phone. Using the web-page push is what the "blog this" function does, but for my specific case, I'd have to write a custom application for the phone, which was way more effort than I wanted to expend. Sending an MMS would require me to have a GSM modem listening somewhere to receive it, and had the added disadvantage of requiring that the images got resized down. So, it seems, the best way to get the information from my phone to my server was to simply send an email (with images attached).

A technical discussion

The rest of this post describes the technical details of what happens to the email when it arrives at my server.

As an overview: I catch mail meant for the phoblog using a procmail recipe, and pipe the mail to a python script, which parses the message and pulls out the relevant parts, constructing the body text, creating thumbnails of the images and saving them in the right place. Having deconstructed the message and constructed the blog post, it passes the bits (title, body, and publication date, which it extracts from the EXIF information in the photos) to a PHP script, which hooks into the Drupal API and actually creates the blog post.

The PHP script is necessary, since there's no other way to hook into the Drupal API. I could do something like faking a bunch of HTTP GETs and POSTs, and passing the information in as if I was actually blogging it from the web interface, but that's even more klunky than simply piping it into a PHP script. The question then arises why I couldn't write the whole thing in PHP, and save myself the expense of running two scripts requiring two different interpreters, but frankly, trying to get PHP to do what is necessary would end in such an inelegant, ugly, hackish result that it just wouldn't be worth it.

An added advantage to separating the Python parser and the PHP script is that you can replace the PHP script with one that injects an entry into a different blogging platform, and it'll still work fine. So, somebody could write a script that talks to Wordpress, and simply drop it into place.


The injector (the PHP script)

The PHP script needs to hook into the Drupal API, so we first need to bootstrap into the Drupal environment. First we fake some HTTP headers in the $_SERVER array so that Drupal knows which site is being "requested" (Drupal does some clever multi-site stuff based on which URL is being requested). Then we change to the Drupal base directory (defined as a constant at the top), include the bootstrap code (also defined at the top), and then simply run the drupal_bootstrap() function:

<?php 
// Defined as a constant, could/should be passed as an option or loaded from a config file:
define('PHOBLOG_DRUPAL_URI''http://vhata.net/');
// Fairly standard for Drupal installations, but as above:
define('PHOBLOG_DRUPAL_ROOT''/usr/share/drupal');
define('PHOBLOG_DRUPAL_BOOTSTRAP''includes/bootstrap.inc');

// Fake the necessary HTTP headers that Drupal needs:
$drupal_base_url parse_url(PHOBLOG_DRUPAL_URI);
$_SERVER['HTTP_HOST'] = $drupal_base_url['host'];
$_SERVER['PHP_SELF'] = $drupal_base_url['path'].'/index.php';
$_SERVER['REQUEST_URI'] = $_SERVER['SCRIPT_NAME'] = $_SERVER['PHP_SELF'];
$_SERVER['REMOTE_ADDR'] = NULL;
$_SERVER['REQUEST_METHOD'] = NULL;

// Change to Drupal root dir.
chdir(PHOBLOG_DRUPAL_ROOT);

require_once(
PHOBLOG_DRUPAL_BOOTSTRAP);
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

?>

Now we are running in a Drupal environment. The next step is to collect the information that we want to insert as a blog entry. We take the title and publish date from the arguments passed to the script, and then do a loop to read the body from standard input:

<?php 
$date 
$_SERVER['argv'][1];
$subject $_SERVER['argv'][2];

$fp fopen('php://stdin''r');
$body "";
while(
$line fgets($fp4096)) {
        
$body .= $line;
}

?>

I may be wrong, but I don't think there are any sanitization problems in the above code. Let me know if you can see any? I'm pretty sure I don't need to escape anything, since I pass all variables as-is to Drupal, which does full sanitization before using them. Anyway, the final step is to simply call the drupal node_save() function to save the blog post as a node (passing it some default values):

<?php 
node_save
((object)(array('created' => $date,
        
'title' => $subject,
        
'body' => $body,
        
'teaser' => $body,
        
'format' => '3',
        
'uid' => 1,
        
'type' => 'blog',
        
'status' => 1,
        
'comment' => 2,
        
'promote' => 0,
        
'sticky' => 0)));

?>

My only worry there is that I specify that the format is type 3 (unfiltered HTML) - this might leave the phoblogger open to code-injection exploits. I should probably specify type 1, filtered HTML, to make sure that nobody can accidentally blog something nasty.

So, that's the PHP script that injects the entry into Drupal. The other part of the system is, of course, the Python script that parses the email in the first place.

The parser (the Python script)

I'm not posting the full script here, for a number of reasons, mostly to do with it "not being finished yet". It works, but it doesn't do everything it should (including a complete security check, since that's kinda hard to implement on emails, which can be faked). Suffice it to say, it uses the optparse, ConfigParser and logging modules to be nicely configurable, runnable, and debuggable, and all that. But, yeah, I'm still embarrassed about it, and won't post sourcecode until I think it's good-looking enough for public consumption. What I will post here is bits of python code that demonstrate the actual meat of the thing - how I deconstruct and process the email that I receive.

The basic steps I perform are:

  1. Break up the email and extract the bits I need from it.
  2. Process each attachment part:
    • Text attachments get HTML-ified
    • HTML attachments get inserted as-is
    • Image attachments get thumbnailed, and the thumbnails and originals get stored somewhere web-accessible, and a chunk of HTML that references them gets created.
  3. Send the results of this processing to the injector script with the right subject and date.
Breaking up the email is trivial using the email module in python:

import email
msg = email.message_from_file(sys.stdin)
subject = u''.join(unicode(part, encoding or 'us-ascii') for part, encoding in email.header.decode_header(msg.get('subject')))
msgfrom = email.utils.getaddresses([msg.get('from')])[0][1]
msgid = msg.get('message-id')

for piece in msg.get_payload():
   processpiece(piece)

As you can see, no regular expressions needed to match headers, do MIME decoding, or break up an email address. You can even give it a list of all the different stupid formats for addresses that mail clients seem to use these days, and it will understand them:

>>> getaddresses(["jonathan@vhata.net", '"Jonathan III" <vhata@clug.org.za>', 'pope@vatican.org (Benedict)'])
[('', 'jonathan@vhata.net'),
 ('Jonathan III', 'vhata@clug.org.za'),
 ('Benedict', 'pope@vatican.org')]

I break each attachment up and send them to the processpiece() function one at a time.

Inside the processpiece() function, I can get at the content-type of the chunk I'm processing by using the get_content_type() method:

>>> piece.get_content_type()
'image/jpeg'
>>> piece.get_content_maintype()
'image'
>>> piece.get_content_subtype()
'jpeg'

and I can use this to work out what I want to do with the chunk. I can also get the chunk in its raw form (i.e. decoded from the MIME transport that email uses by simply calling get_payload() on it:

payload = piece.get_payload(decode=True)

If it's text, I simply replace all the newlines with HTML line breaks:

payload.replace("\n","<br />\n")

The difficult case is, of course, when it's an image. Here, I use the Python Imaging Library to process the image. I extract the EXIF timestamp and turn into a datetime structure, so that I can create a hierarchical directory tree to store the images. Then, I construct a thumbnail filename and create the thumbnail:

payload = piece.get_payload(decode=True)
image = Image.open(StringIO.StringIO(payload))

timestamp = datetime.datetime.strptime(image._getexif()[EXIF_DATETIME], "%Y:%m:%d %H:%M:%S")
self.entrystamp = timestamp

targetdir = "%04d/%02d/%02d" % (timestamp.year, timestamp.month, timestamp.day)
try:
   os.makedirs("%s/%s" % (TARGETDIR, targetdir), 0755)
except OSError:
   pass

fname = piece.get_filename()
(rootname, ext) = os.path.splitext(fname)
ext = ext.lower()
fname = "%s%s" % (rootname, ext)
thumbname = "%s-thumb%s" % (rootname, ext)

image.save("%s/%s/%s" % (TARGETDIR, targetdir, fname))
os.chmod("%s/%s/%s" % (TARGETDIR, targetdir, fname), 0644)
image2 = image.copy()
image2.thumbnail([THUMBSIZE,THUMBSIZE])
image2.save("%s/%s/%s" % (TARGETDIR, targetdir, thumbname))
os.chmod("%s/%s/%s" % (TARGETDIR, targetdir, thumbname), 0644)

Then I return a templated chunk of text to dump into the blog post. Easy as pie.

The last step is to pipe the individually formatted pieces to the injector script, passing it the date (extracted from the EXIF information above) and subject as parameters:

injector = subprocess.Popen([ADDCMD, entrystamp.strftime("%s"), "Phoblog: %s" % subject],stdin=subprocess.PIPE)
for piece in body:
   injector.stdin.write(piece)
injector.communicate()

And off it goes.

Some concerns

First and foremost, security is a problem. If I'm sending an email from my phone, anybody can send the same email from their own phone - there is no identification in the email. One way around this would be to require a keyword in the subject before accepting it. This is security by obscurity - anybody who gets hold of the keyword will be in. I can decrease this risk by forcing some sort of hash on the keyword. For example, if the keyword was "pilates", I could require that the number of consonants in the current day be appended to that: "pilates6" on a Sunday, "pilates7" on a Tuesday. This slightly decreases the risk, but not much. There are other, even cleverer variations on this theme, but they are all basically just security by obscurity. A better way would be to use authenticated SMTP, and only accept phoblog messages that were authenticated through my own SMTP server, and I think I might implement this, unless I can think of a flaw in the idea.

Another problem is that I might lay myself open to HTML/javascript/etc injections, but I think this will be allayed if I solve the problem above.

A conclusion

This has been a somewhat rambling, somewhat disjointed explication, but I hope it gives you the general gist of what I did. If I ever look at the script again, maybe I'll fix it up properly, and make it publicly available. I even registered phoblog.za.net but that's taking some time. Meantime, enjoy piccies.