Auto Generating PDF Covers on file upload with Django and ImageMagick

Note: This used to be on my tumblr which I shut down a bit ago. So if it looks familiar, it is.

Python Thumbnailing of a PDF

The following are simple instructions on creating a thumbnail from an uploaded PDF using ImageMagick. Our project calls ImageMagick as a terminal command from within a Django Project.

Environment:

I was developing on an iMac running OSXv10.6 (Snow Leopard), additionally we had a developer on a MacBook Pro running OSXv10.7 (Lion), and our production environment running Fedora release 14 (Laughlin).

Install ImageMagick and do a dry run

OSX with MacPorts: Get MacPorts here.

$ sudo port selfupdate
$ sudo port install ImageMagick

OSX with Homebrew: Get Homebrew here.

$ brew update
$ brew install imagemagick

Fedora with yum:

$ sudo yum install ImageMagick

Ubuntu with apt-get:

$ sudo apt-get install libmagickwand-dev imagemagick libmagickcore-dev

Testing to make sure ImageMagick Is Working

First, grab a PDF off the internet if you don’t have one handy. (I’ve been debugging some PayPal lately and happened to have the url for this readily available)

$ wget https://cms.paypal.com/cms_content/en_US/files/developer/PP_Sandbox_User_Guide.pdf

Next, test to make sure you can make a thumbnail from that PDF.

$ convert -thumbnail 222 PP_Sandbox_User_Guide.pdf[0] test.png

In this command we are making a 222px wide thumbnail of the [0]th, or first, page of the supplied PDF. The thumbnail is saved as the output file test.png.

If everything looks good at this point lets get it integrated into your django project.

Integration into Django

Requirement: A user should be able to upload a PDF to a model and a thumbnail should be generated and saved into a separate thumbnail field on that model instance.

An Example Model

from django.db import models
from django.db.models.signals import post_save
from django.utils.translation import ugettext as _
from django.conf import settings
import subprocess

class Pdf(models.Model):
    """Pdf class"""
    title = models.CharField(
        verbose_name = _(u'Title'),
        help_text = _(u'Commentary Title'),
        max_length = 255
    )
    slug = models.SlugField(
        verbose_name = _(u'Slug'),
        help_text = _(u'The unique uri component for this commentary'),
        max_length = 255,
        unique = True
    )
    document = models.FileField(
        verbose_name = _(u'The Pdf'),
        help_text = _(u'Upload a pdf document.'),
        upload_to = 'uploads/pdfs/'
    )
    thumbnail = models.ImageField(
        verbose_name = _(u'Thumbnail'),
        help_text = _(u'The thumbnail'),
        upload_to = 'uploads/pdfs/',
        blank = True, 
        null = True
    )

    def save(self):
        thumbnail = "uploads/pdfs/%s.png" % (self.slug,)
        self.thumbnail = thumbnail
        super(Pdf, self).save()
    
    def __unicode__(self):
        return self.title

# What to do after a PDF is saved
def pdf_post_save(sender, instance=False, **kwargs):
    """This post save function creates a thumbnail for the commentary PDF"""
    pdf = Pdf.objects.get(pk=instance.pk)
    command = "convert -quality 95 -thumbnail 222 %s%s[0] %s%s" % (settings.MEDIA_ROOT, pdf.document, settings.MEDIA_ROOT, pdf.thumbnail)

    proc = subprocess.Popen(command,
        shell=True,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )
    stdout_value = proc.communicate()[0]

# Hook up the signal
post_save.connect(pdf_post_save, sender=Pdf)

You will notice that most of the magic is happening in the overridden save and post save methods of my Pdf class. Firstly, In my overridden save method I set up the field value for the thumbnail. It is the relative url to the image from my MEDIA_ROOT variable.

I then add hook into the post_save signal. Using signals is necessary in this instance, because prior to the save function being run, the document hasn’t been moved into the expected upload location, and therefore the convert command will fail. Inside the post_save signal, run the thumbnailing command against the system using popen. Originally, I set this up using os.system, however I quickly realized that I needed more debugging information than was returned.

The Django Admin

from django.contrib import admin
from django import forms
from publications.models import Pdf
from publications.widgets.admin import AdminPdfThumnailWidget

class PdfAdminForm(forms.ModelForm):
    class Meta:
        model = Pdf
        widgets = {
            'thumbnail' : AdminPdfThumnailWidget(),
        }

class PdfAdmin(admin.ModelAdmin):
    """Pdf Administration"""
    form = PdfAdminForm

prepopulated_fields = {"slug": ("title",)}

Above is a pretty simple ModelAdmin, with a overridden widget on the thumbnail field of the ModelForm.

And the custom Widget: AdminPdfThumnailWidget

from django.contrib.admin.widgets import AdminFileWidget
from django.utils.translation import ugettext as _
from django.utils.safestring import mark_safe

class AdminPdfThumnailWidget(AdminFileWidget):
    
    def render(self, name, value, attrs=None):
        output = []
    
        if value != None:
            output.append(u'<img alt="%s" src="%s" />' % (value.url, value.url,))
        else :
            output.append(_(u'Thumbnail will be automatically generated from uploaded document.'))

        # This is commented out b/c maybe you want to be able to override the thumbnail?
        #output.append(super(AdminFileWidget, self).render(name, value, attrs))
    
        return mark_safe(u''.join(output))

this is pretty straight forward, it takes an AdminFileWidget and replaces the render output with an <img> tag

Written on March 6, 2013