You are here

geek

Dynamically generating files for periodic secure copying

Here is the situation:

Machine A needs to periodically fetch MySQL dumps from machine B, process them, and produce some report. A fairly trivial problem, and I could think of a number of ways do this, but they were all slightly inelegant:

  • Periodically create the dump on machine B, and have A fetch it periodically and process it. I didn't like this because the fetch is not triggered by the dump, so there's no sequentiality - you're not guaranteed that A will fetch it immediately when, and only when, B has finished creating the dump.
  • Periodicially create the dump on machine B and push it to machine A, which then periodically processes it. This has the same problem - the processing is not triggered when the data arrives on A. I really needed the dump, the transfer and the processing to happen sequentially, triggered by each other.

The obvious thing to do is:

  • Have machine A ssh to machine B and run a script that creates the dump, and then scp the dump back and process it.

However, since I'm automating this to run periodically, I need to set up a passphrase-less key for machine A to ssh to machine B, and I don't like having those lying around unrestricted. Fortunately, ssh has a mechanism to restrict a key: you can add the 'command=""' option to a key, which means that when using that keypair to ssh to a machine, you will only be able run that specific command. So, having written a script to perform the database dump on machine B, I restricted machine A's key so that it can only run that script when it connects to B.

But now we have a further problem: since machine A can only run the dump script when ssh'ing to B, it can no longer scp the dump back to itself (since 'scp' runs over ssh, it won't be able to do its thing, because it is restricted to running the dump script only). We could of course, set up another keypair with a different restriction, although it's not trivial to work out what "command" to allow in order to accept scp - the normal way would be to create a new account, and give it the 'scponly' shell, but now we have a whole new account, and... Do you see where this is going? It just gets more and more complicated and basically inelegant. It's easily doable, but it's just messy.

I was discussing this with Michael Gorven, and realised that even though the command restriction means that machine A will always run the dump script when it connects to machine B, that doesn't mean I can't put stuff in the dump script that lets machine A also do the scp. When a machine tries to run a different command using a restricted key, the original command is just run anyway, but the attempted command is passed in, via the SSH_ORIGINAL_COMMAND variable. So, I thought, I could examine that, see if it was an scp command, and manually run it if it was. This was going to cause security problems, though, because I would have to check exactly what the command was, and might miss some devious ways of getting around my checks and running some other command.

Finally, Michael suggested that I not bother making the distinction between ssh'ing in to do the dump, and connecting in to retrieve it via scp. Just put both steps in one, assume machine A is trying to scp the file, and create and pass back the dump over scp. The way to serve a file over scp is "scp -f $filename" - normal users never see this because it is handled behind the scenes by scp. So here is the final script (with certain details left out):

#!/bin/bash

FNAM=/tmp/dump.$$.sql
mysqldump DB1 Table1 > $FNAM
mysqldump DB2 Table2 >> $FNAM
bzip2 $FNAM
scp -f $FNAM.bz2
rm $FNAM.bz2

Now, no matter how machine A connects to machine B, this script will be run, and unless machine A is running its side of the scp protocol, the dialog will fail. An amusing extra is that no matter what file machine A is trying to retrieve, my database dump will be generated on the fly and sent to it:

$ scp -i mykey_dsa machineB:something.txt
dump.2835.sql.bz2       100%     6909KB  1.2MB/s  00:03

I thought that was cute, and I wanted to share it.

Phoblogging

An apology

The sharp-eyed among you will have noticed that my blog has suddenly become one of those sites. I apologise for flooding your feed readers with pictures of seals, but let me explain.

A justification

The whole "let me post random photos from my life on my blog" thing was more an exercise in "how easy would it be to make a phoblog?" than a desire to share what my shoes look like. I will admit that when I took the sunset photo, I thought "this would be a really good thing to share with the world", because, let's admit it, Cape Town is one ridiculously beautiful city, and people need to hear that. But that got me thinking how easy it would be to make a photo shareable, and here is what I came up with.

An explanation

There is actually a function on my phone labelled "blog this", but I think it sends the image (or whatever) to a Sony-sponsored blogging site, and I'm frankly not interested in that. I wanted to solve the problem academically, for the general case, and as a side effect solve it for my specific case - I run this blog in a drupal instance on my own server hosted with Layered Tech.

A discussion

So, the various ways to get information from my phone to my server were MMS, email, some form of push to a web-page, or bluetooth/cable upload to a laptop/desktop which will send it on. The last option defeated the point - I wanted to be able to blog a photo from anywhere, using nothing but my phone. Using the web-page push is what the "blog this" function does, but for my specific case, I'd have to write a custom application for the phone, which was way more effort than I wanted to expend. Sending an MMS would require me to have a GSM modem listening somewhere to receive it, and had the added disadvantage of requiring that the images got resized down. So, it seems, the best way to get the information from my phone to my server was to simply send an email (with images attached).

A technical discussion

The rest of this post describes the technical details of what happens to the email when it arrives at my server.

As an overview: I catch mail meant for the phoblog using a procmail recipe, and pipe the mail to a python script, which parses the message and pulls out the relevant parts, constructing the body text, creating thumbnails of the images and saving them in the right place. Having deconstructed the message and constructed the blog post, it passes the bits (title, body, and publication date, which it extracts from the EXIF information in the photos) to a PHP script, which hooks into the Drupal API and actually creates the blog post.

The PHP script is necessary, since there's no other way to hook into the Drupal API. I could do something like faking a bunch of HTTP GETs and POSTs, and passing the information in as if I was actually blogging it from the web interface, but that's even more klunky than simply piping it into a PHP script. The question then arises why I couldn't write the whole thing in PHP, and save myself the expense of running two scripts requiring two different interpreters, but frankly, trying to get PHP to do what is necessary would end in such an inelegant, ugly, hackish result that it just wouldn't be worth it.

An added advantage to separating the Python parser and the PHP script is that you can replace the PHP script with one that injects an entry into a different blogging platform, and it'll still work fine. So, somebody could write a script that talks to Wordpress, and simply drop it into place.


The injector (the PHP script)

The PHP script needs to hook into the Drupal API, so we first need to bootstrap into the Drupal environment. First we fake some HTTP headers in the $_SERVER array so that Drupal knows which site is being "requested" (Drupal does some clever multi-site stuff based on which URL is being requested). Then we change to the Drupal base directory (defined as a constant at the top), include the bootstrap code (also defined at the top), and then simply run the drupal_bootstrap() function:

<?php 
// Defined as a constant, could/should be passed as an option or loaded from a config file:
define('PHOBLOG_DRUPAL_URI''http://vhata.net/');
// Fairly standard for Drupal installations, but as above:
define('PHOBLOG_DRUPAL_ROOT''/usr/share/drupal');
define('PHOBLOG_DRUPAL_BOOTSTRAP''includes/bootstrap.inc');

// Fake the necessary HTTP headers that Drupal needs:
$drupal_base_url parse_url(PHOBLOG_DRUPAL_URI);
$_SERVER['HTTP_HOST'] = $drupal_base_url['host'];
$_SERVER['PHP_SELF'] = $drupal_base_url['path'].'/index.php';
$_SERVER['REQUEST_URI'] = $_SERVER['SCRIPT_NAME'] = $_SERVER['PHP_SELF'];
$_SERVER['REMOTE_ADDR'] = NULL;
$_SERVER['REQUEST_METHOD'] = NULL;

// Change to Drupal root dir.
chdir(PHOBLOG_DRUPAL_ROOT);

require_once(
PHOBLOG_DRUPAL_BOOTSTRAP);
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

?>

Now we are running in a Drupal environment. The next step is to collect the information that we want to insert as a blog entry. We take the title and publish date from the arguments passed to the script, and then do a loop to read the body from standard input:

<?php 
$date 
$_SERVER['argv'][1];
$subject $_SERVER['argv'][2];

$fp fopen('php://stdin''r');
$body "";
while(
$line fgets($fp4096)) {
        
$body .= $line;
}

?>

I may be wrong, but I don't think there are any sanitization problems in the above code. Let me know if you can see any? I'm pretty sure I don't need to escape anything, since I pass all variables as-is to Drupal, which does full sanitization before using them. Anyway, the final step is to simply call the drupal node_save() function to save the blog post as a node (passing it some default values):

<?php 
node_save
((object)(array('created' => $date,
        
'title' => $subject,
        
'body' => $body,
        
'teaser' => $body,
        
'format' => '3',
        
'uid' => 1,
        
'type' => 'blog',
        
'status' => 1,
        
'comment' => 2,
        
'promote' => 0,
        
'sticky' => 0)));

?>

My only worry there is that I specify that the format is type 3 (unfiltered HTML) - this might leave the phoblogger open to code-injection exploits. I should probably specify type 1, filtered HTML, to make sure that nobody can accidentally blog something nasty.

So, that's the PHP script that injects the entry into Drupal. The other part of the system is, of course, the Python script that parses the email in the first place.

The parser (the Python script)

I'm not posting the full script here, for a number of reasons, mostly to do with it "not being finished yet". It works, but it doesn't do everything it should (including a complete security check, since that's kinda hard to implement on emails, which can be faked). Suffice it to say, it uses the optparse, ConfigParser and logging modules to be nicely configurable, runnable, and debuggable, and all that. But, yeah, I'm still embarrassed about it, and won't post sourcecode until I think it's good-looking enough for public consumption. What I will post here is bits of python code that demonstrate the actual meat of the thing - how I deconstruct and process the email that I receive.

The basic steps I perform are:

  1. Break up the email and extract the bits I need from it.
  2. Process each attachment part:
    • Text attachments get HTML-ified
    • HTML attachments get inserted as-is
    • Image attachments get thumbnailed, and the thumbnails and originals get stored somewhere web-accessible, and a chunk of HTML that references them gets created.
  3. Send the results of this processing to the injector script with the right subject and date.
Breaking up the email is trivial using the email module in python:

import email
msg = email.message_from_file(sys.stdin)
subject = u''.join(unicode(part, encoding or 'us-ascii') for part, encoding in email.header.decode_header(msg.get('subject')))
msgfrom = email.utils.getaddresses([msg.get('from')])[0][1]
msgid = msg.get('message-id')

for piece in msg.get_payload():
   processpiece(piece)

As you can see, no regular expressions needed to match headers, do MIME decoding, or break up an email address. You can even give it a list of all the different stupid formats for addresses that mail clients seem to use these days, and it will understand them:

>>> getaddresses(["jonathan@vhata.net", '"Jonathan III" <vhata@clug.org.za>', 'pope@vatican.org (Benedict)'])
[('', 'jonathan@vhata.net'),
 ('Jonathan III', 'vhata@clug.org.za'),
 ('Benedict', 'pope@vatican.org')]

I break each attachment up and send them to the processpiece() function one at a time.

Inside the processpiece() function, I can get at the content-type of the chunk I'm processing by using the get_content_type() method:

>>> piece.get_content_type()
'image/jpeg'
>>> piece.get_content_maintype()
'image'
>>> piece.get_content_subtype()
'jpeg'

and I can use this to work out what I want to do with the chunk. I can also get the chunk in its raw form (i.e. decoded from the MIME transport that email uses by simply calling get_payload() on it:

payload = piece.get_payload(decode=True)

If it's text, I simply replace all the newlines with HTML line breaks:

payload.replace("\n","<br />\n")

The difficult case is, of course, when it's an image. Here, I use the Python Imaging Library to process the image. I extract the EXIF timestamp and turn into a datetime structure, so that I can create a hierarchical directory tree to store the images. Then, I construct a thumbnail filename and create the thumbnail:

payload = piece.get_payload(decode=True)
image = Image.open(StringIO.StringIO(payload))

timestamp = datetime.datetime.strptime(image._getexif()[EXIF_DATETIME], "%Y:%m:%d %H:%M:%S")
self.entrystamp = timestamp

targetdir = "%04d/%02d/%02d" % (timestamp.year, timestamp.month, timestamp.day)
try:
   os.makedirs("%s/%s" % (TARGETDIR, targetdir), 0755)
except OSError:
   pass

fname = piece.get_filename()
(rootname, ext) = os.path.splitext(fname)
ext = ext.lower()
fname = "%s%s" % (rootname, ext)
thumbname = "%s-thumb%s" % (rootname, ext)

image.save("%s/%s/%s" % (TARGETDIR, targetdir, fname))
os.chmod("%s/%s/%s" % (TARGETDIR, targetdir, fname), 0644)
image2 = image.copy()
image2.thumbnail([THUMBSIZE,THUMBSIZE])
image2.save("%s/%s/%s" % (TARGETDIR, targetdir, thumbname))
os.chmod("%s/%s/%s" % (TARGETDIR, targetdir, thumbname), 0644)

Then I return a templated chunk of text to dump into the blog post. Easy as pie.

The last step is to pipe the individually formatted pieces to the injector script, passing it the date (extracted from the EXIF information above) and subject as parameters:

injector = subprocess.Popen([ADDCMD, entrystamp.strftime("%s"), "Phoblog: %s" % subject],stdin=subprocess.PIPE)
for piece in body:
   injector.stdin.write(piece)
injector.communicate()

And off it goes.

Some concerns

First and foremost, security is a problem. If I'm sending an email from my phone, anybody can send the same email from their own phone - there is no identification in the email. One way around this would be to require a keyword in the subject before accepting it. This is security by obscurity - anybody who gets hold of the keyword will be in. I can decrease this risk by forcing some sort of hash on the keyword. For example, if the keyword was "pilates", I could require that the number of consonants in the current day be appended to that: "pilates6" on a Sunday, "pilates7" on a Tuesday. This slightly decreases the risk, but not much. There are other, even cleverer variations on this theme, but they are all basically just security by obscurity. A better way would be to use authenticated SMTP, and only accept phoblog messages that were authenticated through my own SMTP server, and I think I might implement this, unless I can think of a flaw in the idea.

Another problem is that I might lay myself open to HTML/javascript/etc injections, but I think this will be allayed if I solve the problem above.

A conclusion

This has been a somewhat rambling, somewhat disjointed explication, but I hope it gives you the general gist of what I did. If I ever look at the script again, maybe I'll fix it up properly, and make it publicly available. I even registered phoblog.za.net but that's taking some time. Meantime, enjoy piccies.

Subscribe to RSS - geek