Fork me on GitHub

Julython Code Refresh

When we launched over 6 months ago, Julython was built on Google Appengine. The site was built with Django, but used the built-in datastore for models. Google Appengine provides a complete set of tools to run any web application. Running on a PAAS removes a lot of the headaches associated with running a production site.

Then last month I started a new job at Rackspace working on the Cloud Database product, which is open source. It is really great to work for a company that recognizes the power of open source. So I got to work on converting the site to use standard Django models, Tastypie, Django Social Auth, and South. I am quite pleased with the results. Ironically the site now takes advantange of more open source projects, which we are trying to promote :)

Architecture

The new site uses a couple of products from the cloud services at Rackspace. First there is a Cloud Load Balancer in front of two 512 MB Cloud Servers. The two web nodes are running a nginx proxy in front of gunicorn. The web nodes talk to a 512 MB MySQL Cloud Database.

Appengine has a great tool for monitoring performance of your web apps. The dashboard has a number of graphs as well to check on the health of your web app. This is hard to give up, which is why I turned to New Relic. With New Relic you get much more data; I do not regret the switch.

During J(an)ulython I'll be working on my project cannula, which is a deployment tool for websites. Using cannula I can deploy the site like so:

$ git push cannula master

Living on the Edge

Even though Django 1.5 is still in beta, it is worth running to get the new configurable User model. This made the transition super easy from webapp2 user models on Appengine to a custom class to interact with Django Social Auth properly. Webapp2 stores the auth_ids in the format 'provider:uid' in a list property on the User class. It also provides methods to add new auth ids, etc. Now with Django 1.5 you can alter the User Class to add custom fields or methods. Here is the User model from Julthon:

from django.db import models
from django.contrib.auth.models import AbstractUser
from social_auth.models import UserSocialAuth

class User(AbstractUser):
    description = models.TextField(blank=True)
    url = models.URLField(blank=True, null=True)
    # more custom fields here

    def add_auth_id(self, auth_str):
        """
        Example::
            user = User.objects.get(username='foo')
            user.add_auth_id('email:foo@example.com')
        """
        provider, uid = auth_str.split(':')
        UserSocialAuth.create_social_auth(self, uid, provider)

    @classmethod
    def get_by_auth_id(cls, auth_str):
        """
        Example::
            user = User.get_by_auth_id('twitter:julython')
        """
        provider, uid = auth_str.split(':')
        sa = UserSocialAuth.get_social_auth(provider, uid)
        if sa is None:
            return None
        return sa.user

    @property
    def auth_ids(self):
        auths = self.social_auth.all()
        return [':'.join([a.provider, a.uid]) for a in auths]

Note

Using the provider 'email' you can store multiple email addresses for the user and let Django Social Auth handle the uniqueness of email addresses.

Now with very little change to the api endpoint for commits it is easy to associate a commit with a user by email address.

class Commit(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True, null=True)
    hash = models.CharField(max_length=255, unique=True)
    author = models.CharField(max_length=255, blank=True)
    name = models.CharField(max_length=255, blank=True)
    email = models.CharField(max_length=255, blank=True)
    message = models.CharField(max_length=2024, blank=True)

    @classmethod
    def user_model(cls):
        """No need to import our custom user model."""
        return cls._meta.get_field('user').rel.to

    @classmethod
    def create_by_email(cls, email, commits, project=None):
        """Create a commit by email address"""
        user = cls.user_model().get_by_auth_id('email:%s' % email)
                # create the commit here

You could accomplish everything just by interacting with Django Social Auth models. It just seems a little cleaner to use methods on the User model to get a user or add properties to it. The commit model in this case really should not care about the Social Auth models.

SQL, Oh How I Missed You

The Appengine Datastore is a NoSQL database which is really easy to work with. There is no configuration needed, and you don't have to create tables or alter your schema in order to make changes. My biggest complaints of the Datastore are the lack of a (fast and complete) count method and if you want to do a query there has to be an Index built for it. While there is more overhead associated with SQL databases, some things are much easier to do, which is one reason we decided to make the switch.

Location and Team Totals

One major pain point with the code last July was when people changed their location or team. Since all of the data was denormalized this meant that both location or team totals needed to be updated. Appengine has a nice builtin deferred task tool to spawn background tasks to do this. But in the SQL world this is just a simple JOIN query. OK, it is slightly complex, but still easy to pull off with a raw query.

First, here is the Location model:

class Location(models.Model):
    slug = models.SlugField(primary_key=True)
    name = models.CharField(max_length=64, blank=False)
    total = models.IntegerField(default=0) # this field is never updated!

The total field is never actually updated; it is just there to have a property to display the total from the raw query. If you have never used a raw query this is a great use case for it. Raw queries allow you to run any custom SQL and return the Model objects. In this example, the total for any one location is the sum of all the people in that location. Here is what that looks like in SQL:

SELECT july_user.location_id AS slug,
    people_location.name AS name,
    SUM(game_player.points) AS total
    FROM game_player, july_user, people_location
    WHERE game_player.user_id = july_user.id
    AND july_user.location_id = people_location.slug
    AND game_player.game_id = %s
    GROUP BY july_user.location_id
    ORDER BY total DESC
    LIMIT 50;

The magic is all in the "GROUP BY" statement. This SQL takes all the players (people who commited during the month), groups them by their location and sums up all their scores. Also you will notice it's returning the fields ('slug', 'name', 'total'), which are the same fields on the Location model. All it needs is the game id and it will return the top 50 locations:

>>> query = Location.objects.raw(LOCATION_SQL, [1])
>>> locations = [l for l in query]
>>> print locations
[<Location: Atlanta, GA>,
 <Location: Philadelphia, PA>,
 <Location: Boston, MA, USA>,
 <Location: Austin, TX>,
 ...

>>> atl = locations[0]
>>> print atl.name
Atlanta, GA
>>> print atl.total
1625

Warning

I you plan on using raw queries be sure not to use string formatting on the SQL. This will protect you from SQL injection attacks.

This is cleaner and more exact compared to the old code. Now the locations and teams are up to date without a deferred task or other background tasks. Normalization FTW!

Migrations

The Datastore on Google Appengine like other NoSQL databases does not require any schema modifications or table creation staments. This is great for development, as you can freely change your data models or add new ones without any errors or extra work. Tranditional relational databases require a bit more hand holding. If you haven't heard it yet you should use South to manage migrations. On top of managing the standard alter table statements it also provides support for data migrations as well.

Wrap up

Moving away from a complete PAAS solution can be a little scary. Thankfully there are a number of projects that can help your transition. I plan on detailing other aspects of the site in this blog as well. Hit the comment section up if you have any questions!