Field Notes Inside an Integrated Communications Agency

loop

  • Stronger Validators for the Django Admin

    In the current development version of Django, model field validation in the admin has one difficult shortcoming: validator functions don't know if the fields they are validating are for an object that is being created or edited. In the common case, this isn't a problem, but certain problems cannot be solved without this information. In particular, these problems can't be solved with Django validators:

    • Automatically creating unique URL slugs from other fields in the model (e.g. the "title" CharField)
    • Preventing self-referencing foreign key infinite loops

    Consider this partial implementation of a Blog Article model that has URLs like this "/2008-3-11/my-article/" which allows articles to have the same title (actually, the same slug) so long as they are published on different dates:

    #...
    class BlogArticle(models.Model):
       title = models.CharField(
          'Name',
          max_length = 100,
       )
       slug = models.CharField(
          'Slug',
          max_length = 100,
          editable = False,
       )
       published = models.DateField(
           'Date',
       )
       # more fields here...

       class Admin:
          # options...
          pass

       class Meta:
          unique_together = (('slug', 'published',),)

       def save(self):
          from django.template.defaultfilters import slugify
          self.slug = slugify(self.title)
          super(BlogArticle, self).save()

       def get_asbolute_url(self):
          p = self.published
          return "/%d-%d-%d/%s/" % (p.year, p.month, p.day, p.slug,)
    #...

    This model has a "slug" field that gets populated from the "title" field in the "save()" method. This seems okay at first glance, but consider what would happen if a user entered these two articles in the admin:

    title = "My Article"
    published = "2008-3-11"
    title = "my article"
    published = "2008-3-11"

    Despite the change in letter case, these articles will have the same slug and the same date, so the produced URLs won't be unique. Unfortunately, the "unique_together" constraint will not help us here in the admin, because "slug" is not calculated until "save" is called, which happens after all validators have run. This situation will produce a database exception, rather than a validation error, resulting in an ugly and confusing user experience. This little experiment also shows that adding an additional "unique_together" constraint for ('title', 'published',) would also be ineffective, because we could still get non-unique URLs from two similar titles with letter case changes or other subtle changes.

    To solve this properly, we'll need to write a validator for the "title" field that checks whether the slugified version of the title is actually unique for the given date. It should look something like this:

    def validate_title(field_data, all_data):
       # do nothing if 'title' or 'published' didn't pass initial validation
       if 'title' not in all_data or 'published' not in all_data:
          return

       # calculate the slug and published date and check if another article with the
       # same slug and published date already exists
       from django.template.defaultfilters import slugify
       from datetime import date
       slug = slugify(field_data)
       published = date(*[int(d) for d in all_data['published'].split('-')])

       if BlogArticle.objects.filter(slug = slug, published = published).count():
          # error ???
          pass

    This looks almost right. It will work perfectly when we're creating a new BlogArticle, but this will prove to be a miserable bug when editing an existing BlogArticle, because we'll be forced to change the blog title or publish date to get the validation to work. To correct this logic, the validator needs access to the BlogArticle object being edited. Suppose the validator has access to this object, call it "original_object," which will be set to None when we are creating an object. Then the validator would look like this:

    def validate_title(original_object, field_data, all_data)
       # do nothing if 'title' or 'published' didn't pass initial validation
       if 'title' not in all_data or 'published' not in all_data:
          return

       # calculate the slug and published date
       from django.template.defaultfilters import slugify
       from datetime import date
       slug = slugify(field_data)
       published = date(*[int(d) for d in all_data['published'].split('-')])

       # we need to check for unqiueness if we are creating a new object or
       # the existing object has changed
       if not original_object or original_object.slug != slug or original_object.published != published:
          if BlogArticle.objects.filter(slug = slug, published = published).count():
             raise ValidationError("A blog article with a similar title was already published on this date. Please change the title.")

    This will accomplish the validation we need, but there's a big problem here: how do we figure out the "original_object?" We could try to query for an existing object based on some data in "all_data," but that would be a fallacy, because "all_data" is being edited by the user and may no longer match an existing object in the database. We know that the admin view code must be aware of whether an object is being added or edited. If you take a quick look in the admin view functions (django.contrib.admin.views.main), you'll see an "add_stage" and a "change_stage" function. These functions instantiate an object of your model class' AddManipulator or ChangeManipulator, respectively, to accomplish the adding or editing process. The ChangeManipulator object stores the object being edited in its "original_object" attribute, and the AddManipulator has no such attribute. We need to find a point to hook into the admin / field / manipulator code to gain access to this "original_object" so we can pass it to our validator.

    The validation occurs after POST when the manipulator object's "get_validation_errors" function is called. This function calls "get_validation_errors" for each oldforms.FormField object in the manipulator (the manipulator's "fields" are all objects that subclass django.oldforms.FormField). The oldforms.FormField's "get_validation_errors" function calls each validator function in its "validator_list" attribute, passing each function "field_data" and "all_data," the value of the current field being processed and all the POSTed values, respectively. This is a clean mechanism for handling validation, but the oldforms.FormFields' "get_validation_errors" functions are still hopelessly unaware of the current object being edited.

    The trick that I've discovered is to hook into the code at the point where the manipulator's oldforms.FormField objects are created. At this point in the code, the "validator_list" can be modified with more powerful validators. When the admin views construct an AddManipulator or ChangeManipulator object, the manipulator constructor creates an oldform.FormField object for each field in your model class (for your reference, the model class fields inherit from django.db.models.fields.Field). The manipulator constructor accomplishes this by calling each model fields' "get_manipulator_fields" function, passing itself to each function as a parameter. The "get_manipulator_fields" function creates the oldforms.FormField object and, of course, sets its "validator_list" from the corresponding attribute stored in the model field object. At this point, we can hook in and modify the "validator_list" before the framework creates the oldforms.FormFields. I suggest doing this by subclassing a model field class. For django.db.models.fields.CharField, it looks like this:

    class ContextValidatedCharField(models.CharField):
       def __init__(self, context_validators = [], *args, **kwargs):
          # call the superclass constructor
          super(ContextValidatedCharField, self).__init__(*args, **kwargs)

          # keep track of the original validator list
          import copy
          self._orig_validator_list = copy.deepcopy(self.validator_list)

          # context_validators can be a single function or a list of functions
          if type(context_validators) is not list:
             context_validators = [context_validators,]
          self.context_validators = context_validators

       def get_manipulator_fields(self, opts, manipulator, change, name_prefix = '', rel = False, follow = True):
          # pass on original_object information to the custom validator(s)
          from django.utils.functional import curry
          new_validators = []
          for validator in self.context_validators:
             # convert the three-parameter validator into a two-parameter function by
             # currying in the original_object as the first parameter
             new_validators.append(curry(validator, getattr(manipulator, 'original_object', None)))
          self.validator_list = self._orig_validator_list + new_validators

          # just use the framework, which will incorporate our modified validator_list
          return super(ContextValidatedCharField, self).get_manipulator_fields(opts, manipulator, change, name_prefix, rel, follow)

       # we need this to make the field behave correctly
       def get_internal_type(self):
          return "CharField"

    Now, in our BlogArticle model, we'll change the "title" field to this:

       title = ContextValidatedCharField(
          verbose_name = 'Name',
          context_validators = [validate_title,],
          max_length = 100,
       )

    Note the new parameter "context_validators" that accepts our modified validate_title validator function. Now, slugs will be safely created from the title field and all blog articles will have unique URLs.

    Now, what about preventing infinite loops in self-referencing foreign keys? As a concrete example of this problem, consider this: We want to build a model representing a simple website's navigation system. We'll create a NavigationNode model that stores each node's parent NavigationNode (with top-level nodes having no parent) and information about the URL and page to display. A stripped-down version of the model might look like this:

    class NavigationNode(models.Model):
       parent = ContextValidatedForeignKey(
          'self',
          verbose_name = 'Parent Node',
          context_validators = [validate_parent,],
          blank = True,
          null = True,
       )
       slug = models.SlugField(
          'Slug',
       )
       full_path = models.CharField(
          max_length = 255,
          editable = False,
       )
       page = models.ForeignKey(
          FlatPage, # assume that we have a model called FlatPage
          verbose_name = "Flat Page",
       )
       # more fields here...

       class Admin:
          pass

       def save(self):
          # store the full path for easy URL lookups and other performance optimizations
          if self.parent:
             self.full_path = self.parent.full_path + self.slug + '/'
          else:
             self.full_path = '/' + self.slug + '/'
          super(NavigationNode, self).save()

       def get_absolute_url(self):
          return self.full_path

    Here, we've used a new field subclass called "ContextValidatedForeignKey" that works similarly to the "ContextValidatedCharField" described above. Its definition looks like this:

    class ContextValidatedForeignKey(models.ForeignKey):
       def __init__(self, to, context_validators = [], *args, **kwargs):
          # call the superclass constructor
          super(ContextValidatedForeignKey, self).__init__(to, *args, **kwargs)

          # keep track of the original validator list
          import copy
          self._orig_validator_list = copy.deepcopy(self.validator_list)

          if type(context_validators) is not list:
             context_validators = [context_validators,]
          self.context_validators = context_validators

       def get_manipulator_fields(self, opts, manipulator, change, name_prefix = '', rel = False, follow = True):
          # pass on context information to the custom validator(s)
          from django.utils.functional import curry
          new_validators = []
          for validator in self.context_validators:
             new_validators.append(curry(validator, getattr(manipulator, 'original_object', None)))
          self.validator_list = self._orig_validator_list + new_validators

          return super(ContextValidatedForeignKey, self).get_manipulator_fields(opts, manipulator, change, name_prefix, rel, follow)

       def get_internal_type(self):
          return "ForeignKey"

    Finally, we need to implement the validator:

    def validate_parent(original_object, field_data, all_data):
       # if previous validation of parent or slug failed, skip
       if 'parent' not in all_data or 'slug' not in all_data:
          return

       # get the parent and slug values
       slug = all_data['slug']
       if field_data:
          parent_id = int(field_data)
       else:
          parent_id = None

       # if we're creating a new object or the existing object has changed, we need to verify
       # that the URL will be unique
       if not original_object or original_object.parent_id != parent_id or original_object.slug != slug:
          # first check if the slug is okay here
          if parent_id:
             if NavigationNode.objects.filter(parent__id = parent_id, slug = slug).count():
                raise ValidationError("Another node already uses this URL. Please change the slug.")
          else:
             if NavigationNode.objects.filter(parent__isnull = True, slug = slug).count():
                raise ValidationError("Another root node already uses this URL. Please change the slug.")

       # next, we need to check for an infinite loop
       # if we're editing an object, we need to verify that the object doesn't exist anywhere
       # in its ancestor path
       if original_object and parent_id:
          p_id = parent_id
          while p_id:
             # try - just in case
             try:
                parent = NavigationNode.objects.get(pk = p_id)
             except:
                parent = None

             if parent and original_object.id == parent.id:
                raise ValidationError("Recursive path detected! This node cannot be in its own parent path.")

             # move up the path
             p_id = parent.parent and parent.parent.id or None

    And that's it. An attempt to solve a similar problem was written up on djangosnippets. The author wrote a standard validator, but the code falls short of solving the problem, because it assumes that the slug field does not change. The code is still prone to producing an infinite loop.

    Clearly, this "ContextValidatedField" method is a bit of a hack. A better solution might be to modify the Django framework to pass the original object data to all validator functions, preferably in a backwards-compatible way. One obvious problem with that approach is that validator functions could do perverse things like delete the original object. Maybe a better solution would be to have the framework pass another dictionary called "old_data" or "original_data" as a third parameter to each validator function that would contain all of the original object's data but would provide no mechanism to alter the original object.

    I wasn't able to find any TRAC tickets or other information with a quick search of djangoproject.com, and I think this problem may merit opening a ticket. I welcome any comments or ideas on how to solve this problem more cleanly or easily in Django.