Archive for the ‘Ordenadores’ Category

First impressions with Liferay

Sunday, October 9th, 2011

A couple of weekends ago I had my first experience customizing a Liferay site. I wrote a very simple theme to change the standard look & feel and I also wrote some portlets in several differente languages. The goal was to analyze which portlet technology was the best to suit our needs.

Writing the theme was not difficult at all. Liferay has an excellent SDK for writing plugins, which include themes. I didn’t start from scratch but used the great HTML5Goodness responsive theme. The main navigation menu was easy to do since there is a method to iterate over first level pages. However, when I tried to do the same with the footer menu I couldn’t find an easy and clean way to do it. I wanted to put standard links in the footer menu as the Terms of Use, the Privacy Policy and so on in the footer menu but I didn’t know how to organize this stuff in Liferay CMS so I could retrieve them back easily. I thought about using a portlet for the footer but I think that’s not the way to do it since that would affect the portlets layout for all the pages. So I added some variables in the XML that describes the theme and hardcoded the links there. At least I don’t have to change the theme code if we change any of those links.

About the portlets, we needed to write a portlet that pulls the content from an external service via REST calls and render it nicely using some kind of templates. These were the attempts I made and my conclusións:

  • Portlet written in Java: this was the obvious choice. The advantages was full access to Liferay API and easyness to integrate the portlet with the SDK stanrdard procedures. The disadvantages were, well, it has to be written in Java. We are far less productive in Java that with other languages. Just for making an HTTP request is quite involved. Hopefully Liferay has APIs for making this easier.
  • Portlet written in Javascript: This looked promising and was easy to setup, the problem was the importPackage and importClass functions were not available from the Rhino environment. This made Javascript just a toy language in Liferay since the language itself has no useful standard library and it needs to leverage the net or filesystem or any other calls to the runtime it runs on. This make javascript a very integrable language but also a very dependent language. If we can’t call Liferay API from Javascript and there are no network functions in the language itself we can’t use it for our purposes.
  • Portlet written in Python: Our last try was writting the portlet in Python. First we had to update the Jython jar that was included in Liferay since it was a little bit old. Then we added the jyson jar to the jython jar itself to have support for json parsing. We also used Liferay Network APIs since we couldn’t import urllib2 from Python. Finally we even managed to use the Python debugger (PDB) by running Tomcat in the foreground. (bin/catalina.sh run)

One important thing when writing portlets with a scripting language like Python or Javascript is that Liferay will concatenate all your modules into a single big file before running it in the scripting engine (jython or rhino, in our case). This is important to know when reading errors information where the line numbers is not always what we expect. In Python, we can avoid this behavior by changing the PYTHONPATH dinamically at runtime (at the beginning of our script) and then, importing our regular modules will work again.

Protecting Mercurial repositories with repoze.who and repoze.what

Saturday, November 27th, 2010

After months of planning and thinking about it we are starting to adding support for Mercurial at Yaco. It’s far from finished but it is already tested in a couple of projects. We used and still use Subversion for most of our projects and our workflow is really tied to it.

When I started to thing about using Mercurial at Yaco I sketched up a list of things that needed to be done:

  1. Migrate to Trac 12 since it has support for multiple repositories of different types. This is really important not only because you can see your Mercurial repos in Trac but because it allow us to migrate step by step. You can have a Subversion repository for the QA and graphic design people and one or more Mercurial repos for developers. Also it’s up to the project manager to decide wheter he wants to use Subversion or Mercurial. We don’t want to impose a technology when Subversion is just fine for many situations.
  2. Add several hooks in different stages of a changeset lifecycle:
    1. Before commit, check basic things about the quality of the code (pyflakes, pep8, pdb, …)
    2. Before commit check that the message mentions one or more Trac tickets
    3. After commit update one or more Trac tickets
  3. Serve the repositories via https and add a web interface to them. This mean adding authentication and authorization for them too.
  4. Train our people to use the new system
  5. Define common workflows for our development patterns

Step 1 was done several months ago and we are really happy with Trac 12.

Step 2 was achieved using hghooks and hghooks.yaco, a couple of python packages I wrote to suite this goal.

Step 3 is the reason of this post. More on this later.

We are between step 4 and 5 and right now and that is causing minor griefs among our developers :-)

So, let’s talk about authentication and authorization in the context of Mercurial repositories. Mercurial comes with a WSGI application for serving the repositories via HTTP called hgwebdir. It’s pretty easy to setup and has a nice looking web interface with lots of cool features.

But we have a couple of problems with hgwebdir:

  • It doesn’t handle autthentication at all. This is actually a feature, but we need to add some bits for this.
  • It reads authorization from a different file for each repository it handles. This authorization information does not have group support neither. This mean we need to duplicate a lot of information among repositories. Most of our developers have access to most of our repositories, while each customer has (read only) access just to the repositories related to his project.

With Subversion we do LDAP authentication with an Apache module and use mod_authz_svn for authorization purposes. This last module is really good and it allows you to keep all your authorization information in a single file. Really convenient.

Besides that, we would like to experiment with web servers other than Apache like nginx or Cherokee so we don’t want to depend on Apache for authentication neither.

Our solution involves using repoze.who for authentication and repoze.what for authorization. These are great Python packages that allows you to build really flexible middleware stuff. repoze.who and repoze.what don’t do anything useful by their own but define a set of rules and components to interact with the users and applications. Then there are plugins for both packages that really do the authorization and authentication stuff. One nice thing about repoze is that it uses the WSGI standard heavily so it’s easy to integrate them with other WSGI applications like hgwebdir.

WSGI middleware is like onion layers

We use repoze.who.plugins.ldap.LDAPSearchAuthenticatorPlugin for authentication. It connects to our LDAP server and check that the user and password entered by the user matches a user in our directory. The way the user enter his user and password is through authentication basic. repoze.who allows several other mechanisms but that one is fine for us.

Then for authorization we use the repoze.what.plugins.hgwebdir package that I wrote for our needs but trying to make it general enough so it can be used by other people. This plugin allows the system administrator to write all the repositories authorization information in a single file very similar to the one used by mod_authz_svn.

To put everything together we use a buildout that configures Mercurial, a nginx webserver and all the WSGI middleware and applications to serve the repositories.

Let’s see how our final WSGI application looks like:

from optparse import OptionParser
import sys
from wsgiref.simple_server import make_server

import mercurial.util
import mercurial.dispatch

from mercurial.hgweb.hgwebdir_mod import hgwebdir

from repoze.who.plugins.basicauth import BasicAuthPlugin
from repoze.who.plugins.ldap import LDAPSearchAuthenticatorPlugin

from repoze.what.middleware import setup_auth

from repoze.what.plugins.hgwebdir.adapters import HgwebdirGroupsAdapter
from repoze.what.plugins.hgwebdir.adapters import HgwebdirPermissionsAdapter
from repoze.what.plugins.hgwebdir.adapters import get_public_repositories
from repoze.what.plugins.hgwebdir.middleware import protect_repositories

def add_auth(app, authz_file):
   # Defining the group adapters; you may add as much as you need:
   groups = {'all_groups': HgwebdirGroupsAdapter(authz_file)}

   # Defining the permission adapters; you may add as much as you need:
   permissions = {'all_perms': HgwebdirPermissionsAdapter(authz_file)}

   # repoze.who identifiers; you may add as much as you need:
   basicauth = BasicAuthPlugin('Yaco Hg repositories')
   identifiers = [('basicauth', basicauth)]

   # repoze.who authenticators; you may add as much as you need:
   ldap_auth = LDAPSearchAuthenticatorPlugin('ldap://ldap.yaco.es',
                                     'ou=People,dc=yaco,dc=es', returned_id='login')
   authenticators = [('ldap_auth', ldap_auth)]

   # repoze.who challengers; you may add as much as you need:
   challengers = [('basicauth', basicauth)]

   app_with_auth = setup_auth(
                         app,
                         groups,
                         permissions,
                         identifiers=identifiers,
                         authenticators=authenticators,
                         challengers=challengers)
   return app_with_auth

def make_app(hg_config, authz_file):
   original_app = hgwebdir(hg_config)
   secured_app = protect_repositories(original_app,
                         get_public_repositories(authz_file))
   return add_auth(secured_app, authz_file)

The main entry point here is the make_app function. It recieves two filenames, one for configuring the repositories themselves and one for the authorization stuff. As you can see we are adding a couple of layers to the hgwebdir application. The inner one deals with authorization and the outer one with authentication. We have to deactivate hgwebdir authorization support to make it work. Otherwise authorization is done by two different modules and it won’t work. To deactivate it we configure all of our repositories with this option:

[web]
allow_push = *

By default hgwebdir allows read access to everybody and denies write access to everybody. We keep read access as hgwebdir defaults and add write access to everybody. repoze.what.plugins.hgwebdir will handle this stuff for us.

Now, to configure the authorization you just need to write a file like this:

[groups]
group1 = lgs, mviera
group2 = john, peter

[repo1]
@group1 = rw

[repo2]
@group2 = rw
@group1 = r

[repo3]
* = r
@group1 = rw

As I mentioned before, the syntax and semantics are very similar to the ones used in mod_authz_svn. One feature that is not possible with repoze.what.hgwebdir is specific authorization rules to some directories inside a repository.

Final arquitecture

I hope everything is clearer with this last figure.

Managing email, the Unix way

Saturday, August 21st, 2010

Introduction

I’ve been using Evolution, the mail client, since 2000. I think it was the first Linux program along with Netscape that I remember using on a daily basis. What can I say about a program I’ve used for more than 10 years? Well I’m very grateful to it and I hope its development continues and it gets better and better every day. But now it’s time for a change.

Motivation

There are two main reasons I wan to switch from Evolution to Mutt:

  • Speed. Evolution is now much faster than it used to be but when you have 5 Gigabytes of email it starts to behave slowly.
  • Knowledge. As an engineer I don’t feel comfortably using something I don’t totally understand and I even feel embarrased about my (lack of) understanding of email, something that I use (dammit, sometimes I even depend on) every day. That’s why switching to Mutt makes sense. Its take on email forces you to do it the Unix way. This mean using several small programs together to accomplish your task and by dividing the task into small pieces you understand the big picture much better. Now I can tell the difference between a MTA, a MUA  and a MDA.

General architecture

General architecture

As you can see in the above figure, there are quite a lot of small programms (blue boxes and red circle) cooperating to deliver my mail for me. I’ll try to explain what each of these ones do and how to configure them in the following sections.

As there are a lot of configuration files involved I won’t paste them here but point to the Mercurial repository where I keep them versioned.

Step 1. Fetching email: getmail

getmail is a MRA or Mail Retrieval Agent. Its main task is fetch your email from a POP3 or IMAP server. Unless you have a fixed IP and you run your own SMTP and IMAP/POP3 server you probably have your email delivered to your ISP or company’s own servers.

Another popular MRA is fetchmail. I use getmail because it’s way simpler to configure and whetever I have to decided between two programs and one of them is written in python I go for that one. I’m very comfortable with Python so if I ever need to change something or go to the source to understand what’s going on, that way is easier for me.

Anyway, this is my configuration file for getmail. You have to call the getmail program every time you want to download email from your IMAP/POP3 server. A typical run looks like this:

[lgs@ticotico ~]$ getmail
getmail version 4.20.0
Copyright (C) 1998-2009 Charles Cazabon.  Licensed under the GNU GPL version 2.
SimpleIMAPSSLRetriever:lgs@correo.yaco.es:993:
Enter password for SimpleIMAPSSLRetriever:lgs@correo.yaco.es:993:  *******
  0 messages (0 bytes) retrieved, 0 skipped

As this is a very tedious task I have set up a cron task that runs every 5 minutes. Yes, I’m that anxious about email :-) But the problem is that if I put this in a cron job it will ask for the password every time it runs. One solution is to put the password in the getmailrc configuration file but I don’t feel like sharing my password with the whole world and I really want to keep my configuration files in Bitbucket. So the other solution is to use expect. This is a common Unix tool for manipulating other programs. This is my getmail-no-password expect script:

#!/usr/bin/expect -f

set timeout 600
spawn /usr/bin/getmail
expect -exact "Enter password for SimpleIMAPSSLRetriever:lgs@correo.yaco.es:993:  "
send -- "put-your-real-password-here\r"
expect eof
exit

So expect will call getmail and as soon as the “Enter password …” phrase goes to stdin it will send my real password, followed by the ‘\r’ character. One minor issue is I have to give expect a timeout. After this time has passed expect will stop running. 600 seconds (10 minutes) is more than enough for running this script in my cron job but the first time I synchronize with my IMAP server it takes way more time than this so I just run getmail and enter my password manually instead of using the getmail-no-password expect wrapper.

This script is not in my Bitbucket repository for obvious reasons :-)

This is my crontab for fetching email:

*/5 * * * * /home/lgs/bin/getmail-no-password > /dev/null

Note that I redirect stdout to /dev/null but not stderr so cron will email me if something goes wrong with getmail or the expect wrapper script.

Step 2. Filtering email: maildrop

As soon as getmail gets the mail it can save it to the final destination, e.g. my Maildir directory, or we can send it to the MDA (Mail Delivery Agent) to do more powerful things. In my case I went for dropmail but as always there are other options like procmail. Procmail is more mature in the Unix world but I like dropmail syntax better.

getmail fetches an email message from the server and then run dropmail with the mail message as the argument. As you can see in my dropmail configuration file (called .mailfilter), I do several things with a mail message:

  • Pipe it to through reformail to add a Lines header, which is useful in Mutt.
  • Pipe it to spambayes (sb_filter) to add antispam features to my mail system. More on this later
  • Send a notification to my GNOME desktop in case the message is not spam. More on this later.
  • Filter the message and store it in one of my mail folders depending on several regular expression rules that match agains some header values of the mail message.

Step 3. Anti spam: spam bayes

The typical antispam software in Unix system is spamassassin but I use Spam Bayes because of several reasons:

  • I setup spambayes in the previous company I worked for and I was very happy with its results.
  • In my current company, spamassassin is used and it’s not very effective. This may be a configuration issue.
  • Spamassassin eats a lot of cpu.
  • Did I mention that spam bayes is written in python?

The way spam bayes work is quite simple: first you need to train it giving a bunch of spam messages and a bunch of ham (ham is the opposite of spam, in other words, good mail) messages. You do it with the following command line:

sb_mboxtrain.py -g .inbox/ -s .spam/

This will create a hammie.db database in Berkeley format. Now you should write a configuration file to tell spam bayes where this database file is located so it can use it to filter spam messages when called from dropmail. Here is my spambayes configuration file.

If you remember my maildrop configuration file this was the line that calls spam bayes:

xfilter "python -W ignore:::: /usr/bin/sb_filter.py -d ~/.hammie.db"

What’s that -W ignore::: argument to Python? Well spam bayes spit some warnings when run under python 2.5 or 2.6 and that’s not good for maildrop because python writes the warnings to stderr and maildrop thinks there is a problem with that process and aborts the mail delivery. So by calling python with the -W ignore::::: argument we tell it to keep the warnings quiet and don’t write them to stderr. I hope spam bayes get updated soon and I can remove this hack.

The good thing about spam bayes is that it can learn with your help. Any time it fails when filtering a message you can retrain it fix its behaviour. By the way, all that spam bayes does is adding some headers to the message and then you need to add some rules to your maildrop configuration file to put the message where it belongs. The interesting thing is that sometimes a message is not spam or ham because spam bayes is not sure. You can store these messages in a folder different than the spam folder so you can look it more frequently and teach spam bayes with them.

Step 4. Reading and composing emails: mutt

From all text based email clients mutt is the most popular. Israel Herraiz told me about Sup and it looks very promising. I think I’ll give it a try pretty soon. But for now, I’ll stick to Mutt for a while.

Mutt is the most complex software from all the programs I mentioned in this post, so its configuration file is the biggest one. Hopefully I have added enough comments to make it easier to understand the options that I use.

Step 5. Sending emails: msmtp

The last step consist in sending emails. Until recently Mutt didn’t know how to talk SMTP, the protocol for sending emails and due to popular demand that feature has been added. But in keeping the original Mutt spirit and thus, the Unix way of using one program for each task, I decided to go for msmtp, a small client for sending emails. As always, there are plenty of options in Linux land but I think msmtp does a very decent job. As usual, you can check my msmtp configuration file in Bitbucket. Remember to set the “set sendmail=”/usr/bin/msmtp” option in Mutt.

In these days full of spam it’s usual that your SMTP server ask you for authorization information unless you are in the same private network. As in the getmail case I didn’t want to write my password in plain text in the msmtp configuration file. Fortunately msmtp has support for GNOME (and MacOS X) keyring so you can simple store your password in the safe keyring of your operating system and msmtp will pick it up for you there. The only issue is that you need to be logged in a graphical GNOME session in order have access to the keyring.

Anyway, to store your password in the keyring use the following command changing your username and server for your own values:

./msmtp-gnome-tool.py –set-password –username lgs –server correo.yaco.es

Unfortunately you need the source code of msmtp to use this command since it is not in the msmtp rpm package that I installed in my Fedora system. So go ahead and grab it. Once you untar it, it should be in the msmtp/scripts/msmtp-gnome-tool/msmtp-gnome-tool.py

Bonus track 1: notifications

Mutt is great and all but one thing I miss from Evolution is GNOME notifications every time a new mail reaches my inbox. Well, with such flexible system as the one I built, that’s not difficult to have. Just add this line to the maildrop configuration file:

xfilter "python /home/lgs/bin/email_notificator.py"

I added after storing spam in the spam folder since I don’t want to be notified when spam arrives.

Here is the content of my email_notificator.py script:

import email
import subprocess
import sys

if __name__ == '__main__':
    data = sys.stdin.read()

    message = email.message_from_string(data)
    from_ = 'Unknown from'
    if 'From' in message:
        from_ = message['From']

    subject = 'Unknown subject'
    if 'Subject' in message:
        subject = message['Subject']

    cmd = ['/usr/bin/notify-send', '-u', 'critical', '-t', '5000',
           '-i', '/usr/share/icons/gnome/48x48/actions/mail_new.png',
           '%s' % from_, '%s' % subject]
    subprocess.call(cmd, env={'DISPLAY': ':0.0'})
    print data

No rocket science here. In the following screenshot you can see the script in action:

Notification of email arriving

Notification of email arriving

Bonus track 2: logs rotation

Several programs of my email ecosystem log information to different files. As always you should rotate your logs if you don’t want your hard disk to become full really fast. Just add a configuration file for log rotate as mine and add a line to your crontab like this one:

0 9 * * * /usr/sbin/logrotate /home/lgs/mail_logs/logrotate.conf --state=/home/lgs/mail_logs/logrotate.status

Bonus track 3: reading mails from my android device

One of the nice things of being a text only program is that you can use mutt from any computer that can connect through a ssh link to the computer you have your mail and mutt installed. In Android there is the fantastic ConnectBot application which let you do just that. See this screenshot in you don’t believe it :-)

Mutt in Android

Mutt in Android

The only gotchas of using Mutt from my Android phone is that the GNOME integration gets in my way: now I can’t send emails since msmtp will look for the password in the GNOME keyring and there is no such thing in Android. One way to fix this issue is to configure another account in msmtp which doesn’t use the keyring but ask for the password manually and switch to that account with a key combination when using it from the phone.

TODO

  • Mark the spam message as already readed so I don’t get distracted every time a spam message arrives my spam folder. Then once in a while I can review them looking for false positives.
  • Addressbook integration.
  • PGP integration

References

Pie charts autopadding

Monday, July 26th, 2010

I’ve been hacking on the Pycha autopadding feature lately and, as most things I write here, I’ll try to summarize the tricky parts so I can understand them in a couple of months. As a side effect, I hope you enjoy this post.

In current Pycha versions there is a padding option but it is not what you expect. The current padding represents the space between the limit of the graphics area and the chart area. The labels and ticks are drawn in this space. What this means is the programmer has to manually adjust the padding in order to make the labels render correctly. When you are using pycha for drawing one or two charts in an environment under your control this is not a big issue. When you draw lots of charts from wild data sources, autopadding helps a lot.

Figure 1. Pie Chart with manual padding

Figure 1. Pie Chart with manual padding

In Figure 1 you can see old style pie charts where we had to manually adjust the padding to make the labels fit. As a side effect the pie was not centered in the graphics area. Here the padding options are: {‘top’: 0, ‘right’: 10, ‘left’: 70, ‘bottom’: 0}. Imagine what happen if the dataset changes: we would probably need to change the padding.

So, what exactly is autopadding? Well, it’s the feature that lets pycha compute the labels and ticks area automatically. Now the padding that you specify is real empty space.

Figure 2. Pie chart with autopadding

Figure 2. Pie chart with autopadding

For most of the charts that pycha supports this feature is not very difficult. Except for the pie charts. In this case, it took me a while to get it working and the results, while good enough, are not perfect.

In figure 2 you can see such results. As you can see the pie is smaller now and the slices are ordered in a different way. The order really matters and I will probably add an option in the future to indicate the starting angle (now is 0) so there is a small degree of control left to the programmer. Note that this time the four paddings are equal to zero.

Let’s describe the problem: given a rectangular area (width and height) and a list of tuples made of a label and an angle, compute the circunference radius and the label positions so that the pie area is maximized and the labels do not override the pie.

For each label the angle associated with it is the angle that split its pie slice into two slices of the same side. If you have the initial and final angle of each slice (as I do) computing this angle is as easier as summing both and dividing by two.

Step 1. So the first thing I do is compute the center of the area and the bounding boxes of the labels. Any decent graphics library has a function for that and Cairo is no different: cairo_text_extents.

Step 2. Now for each tuple in my label-angle list I draw a ray that starts at the center of the chart and goes into the direction given by the angle. Remember that a rect can be defined with a point and an angle.

Pie Chart in debug mode

Figure 3. Pie Chart in debug mode showing the rays in red

Step 3. I calculate the intersection of these rays with the rectangle that defines my graphics area (drawn in blue in the Figure 3 because debug mode is enabled). Clipping is not useful here since I want the coordinates for every intersection, the rays are not drawn in the final result unless debug mode is enabled. In order to make this computation easier I divide the area in four parts delimited by the two diagonals of the rectangle. Every ray is defined by an ecuation of the form y = mx + n where m is tan (angle) and I know which of the four sides of the rectangle the ray is going to intersect bases on the angle. As every side of the rectangle is also defined by a very simple ecuation, all I have to do is solve an ecuation system for each ray.

Figure 4. Division of the graphics area using the two diagonals

Figure 4. Division of the graphics area using the two diagonals

Nice. We have now a center and a bunch of rays and intersection points. We also have bounding boxes for the labels. Now I have to place each bounding box with two constraints:

  • Its center lies on its associated ray
  • One of its side lies on one of the sides of the big chart rectangle

Step 4. It’s easy to guess what side of the bounding box matches with which side of the chart rectangle. A little bit of trigonometry will do the rest.

Figure 5. Initial labels position

Figure 5. Initial labels position

Step 5. Ok, now I have all the labels, or more correctly, their bounding boxes, positioned in the chart rectangle, as far as I can from the center. Now the maximun radius for the pie is the minimum of the distances from the bounding boxes to the center. For every bounding box there is one side that is closer than the others to the center, so the distance we want is the distance from that side to the center. Again, having divided the area in those four parts makes this step this easier. In our example shown in Figure 5 the label that restricts the pie radio is the one that reads “stackedbar.py (6.9%)“.

The good news are that we have achieved one of our goals: compute the radius of our pie chart. The bad news are that was the easy part. In the second round of our algorithm we are going to move the labels bounding boxes as close as we can to the pie. Otherwise they don’t look good and the conection between the slice and its label is lost. As you may guess all the labels can be moved closer to the center except for one. That’s the label that gave as the radius of the pie circunference.

Step 6. So let’s try to move the labels. For this task I will divide the chart rectangle in four sides but this time I’ll use two perpendicular rects, one horizontal and one vertical. So I’ll get my circunference divided in four quadrants. For each label I’ll compute the intersection of its ray and our pie circunference. Remember that now we know its radius.

Figure 6. Division by two perpendicular rects that touch the center

Figure 6. Division by two perpendicular rects that touch the center

Step 7. For each label place one corner of its bounding box in the point computed in step 6. For the first quadrant this corner is the left down corner, for the second quadrant it’s the right down corner, for the third quadrant it’s the right top corner and for the fourth quadrant it’s the left top corner. This is a feasible position but it doesn’t look very nice so we will try to improve it.

Step 8. For each label draw another circunference with the same center as the pie circunference and the radius equal to the distance between the center of the bounding box and the center of the circunference. Now let’s compute the intersection between this second circunference and the ray of the same label. Voilá, that’s the point where we are going to move the center of the bounding box.

Figure 6. Detail of label reposition

Figure 6. Detail of label reposition

So that’s the end of the algorithm, the labels are now positioned closer to the pie circunference but as I said before, it’s not perfect since for those labels that are near the top and bottom side of the circunference the label could be closer than it is. I may have used a soft computing algorithm that moves the bounding box along its ray until touches the pie circunference but that would be slower.

By the way, I’m using a named branch for this feature and hopefullly it will get merged into the main branch for the 0.7.0 release of pycha.

No space left is not a problem anymore

Tuesday, May 11th, 2010

Today I was working with virtual machines and when I tried to install some packages the 2 GB hard disk I configured for this virtual machine reached its limit. By the way, it’s pretty amazing a vanilla CentOS 5.4 installation needs almost 2 GB.

Anyway, this is what I did to increase the hard disk space without reinstalling the whole thing:

  1. Create a new hard disk with Virtual Machine Manager. I created it with another 2 GB and selected VirtIO as the disk type.
  2. Start the virtual machine and create a partition in the new disk. /dev/vdb1 is the new disk:
       Device Boot      Start         End      Blocks   Id  System
    /dev/vdb1               1        4063     2047720+  83  Linux
  3. Initialize a physical volume on the new disk:
    pvcreate /dev/vdb1
  4. Add this physical volume to the logical volume group:
    vgextend VolGroup00 /dev/vdb1
  5. Extend the logical volume group:
    lvextend /dev/VolGroup00/LogVol00 -L+1.9G
    Rounding up size to full physical extent 1.91 GB
    Extending logical volume LogVol00 to 3.84 GB
    Logical volume LogVol00 successfully resized
  6. Resize the filesystem:
    resize2fs /dev/VolGroup00/LogVol00
    resize2fs 1.39 (29-May-2006)
    Filesystem at /dev/VolGroup00/LogVol00 is mounted on /; on-line resizing required
    Performing an on-line resize of /dev/VolGroup00/LogVol00 to 1007616 (4k) blocks.
    The filesystem on /dev/VolGroup00/LogVol00 is now 1007616 blocks long.

Thanks to LVM now I have 2 more gigabytes on my virtual disk and no reinstallation was needed.

Uso del DNIe en Fedora 11

Wednesday, October 28th, 2009

Hoy he intentado usar mi DNI electrónico en mi sistema operativo Fedora 11 y me entristece decir que no ha sido fácil y una vez más quedamos relegados a este gueto que es el software libre. ¿Por qué? Muy fácil, el driver del DNI electrónico no es libre y eso hace que su distribución esté totalmente ligada a un conjunto específico de versiones muy limitado de las librerías de las que depende. Cuando trabajé en el soporte del DNIe para Guadalinex v4 ya descubrimos este problema y nos resignamos a admitir que el DNIe iba a funcionar con la Guadalinex recién instalada pero probablemente dejaría de hacerlo cuando se actualizaran ciertas librerías del sistema, algo relativamente frecuente en cualquier distribución de Linux.

Parece que este problema no existe en el mundo Windows ya que las actualizaciones son mucho menos frecuentes y la gente del DNIe puede permitirse el lujo de distribuir un binario que funcionará durante al menos 3 o 4 años que es lo que suele durar una versión de este sistema operativo, frente a los 6-8 meses que duran las distribuciones de Linux. Es justo admitir que no es fácil seguir el ritmo de evolución de una distribución Linux pero también hay que recordar que este problema desaparecería si el código en cuestión fuera libre y se distribuyeran las fuentes listas para compilar con cualquier versión de las librerías implicadas.

Bueno, basta de quejas y vamos a lo que interesa, a hacer que funcione.

Lo primero es descargarnos los binarios para Fedora 10 que es lo más reciente que hay en la página de la policia. En este punto debemos sentirnos afortunados de ser usuarios de una de las tres distribuciones de Linux soportadas: Debian, Ubuntu y Fedora. La verdad, no sé como lo hará la gente de Gentoo, Suse, Mandriva, o Arch, por citar algunas de las más populares.

Desempaquetamos el tar y obtenemos dos paquetes rpm:

  • opensc-0.11.7-7.fc10.i386.rpm
  • opensc-dnie-1.4.6-2.fc10.i386.rpm

Al intentar instalar el primero tendremos un error de dependencias:

[root@ticotico ~]# rpm -Uvh --force opensc-0.11.7-7.fc10.i386.rpm
error: Error de dependencias:
libcrypto.so.7 se necesita para opensc-0.11.7-7.fc10.i386
libltdl.so.3 se necesita para opensc-0.11.7-7.fc10.i386

Y lo que está ocurriendo aquí es que las librerías libcrypto (paquete openssl) y libtdl (paquete libtool-ltdl) instaladas en mi sistema son más recientes que esas. Ahora nos toca forzar la instalación de opensc-0.11.7 aún sabiendo que es peor que opensc 0.11.8 que es la versión que ofrece Fedora 11, ya que la versión 0.11.7 tiene vulnerabilidades de seguridad. Los chicos del DNIe dicen que no liberan el código porque sería inseguro hacerlo así pero no actualizan su paquete haciendo que tengamos que usar librerías con fallos de seguridad. Una vez más, la seguridad por oscuridad triunfa. Un driver para atarnos a todos al reino de las sombras.

No nos desviemos, tras desinstalar nuestro querido opensc-0.11.8 de Fedora 11 instalaremos la versión 0.11.7:

[root@ticotico ~]# yum remove opensc
[root@ticotico ~]# rpm -Uvh --nodeps opensc-0.11.7-7.fc10.i386.rpm
Preparando...               ########################################### [100%]
   1:opensc                 ########################################### [100%]

Ahora bien, ahora tenemos que arreglar a mano el problema de las dependencias. Rezaremos para que las nuevas versiones de libcrypto y libtdll sean compatibles hacia atras. Con el comando ldd vemos si opensc tiene sus dependencias disponibles:

[root@ticotico ~]# ldd /usr/bin/opensc-tool
linux-gate.so.1 =>  (0x00525000)
libopensc.so.2 => /usr/lib/libopensc.so.2 (0x00dd8000)
libcrypto.so.7 => not found
libopenct.so.1 => /usr/lib/libopenct.so.1 (0x00d19000)
libz.so.1 => /lib/libz.so.1 (0x00318000)
libltdl.so.3 => not found
libdl.so.2 => /lib/libdl.so.2 (0x002f4000)
libscconf.so.2 => /usr/lib/libscconf.so.2 (0x00496000)
libpthread.so.0 => /lib/libpthread.so.0 (0x002fb000)
libc.so.6 => /lib/libc.so.6 (0x00157000)
/lib/ld-linux.so.2 (0x0012f000)
libcrypto.so.7 => not found
libltdl.so.3 => not found

Como era de esperar no encuentra ni libcrypto ni libtdl. Hacemos enlaces simbólicos para facilitarle la tarea:

[root@ticotico ~]# cd /usr/lib
[root@ticotico lib]# ln -s libltdl.so.7 libltdl.so.3
[root@ticotico lib]# ln -s libcrypto.so.8 libcrypto.so.7

Probamos otra vez con ldd:

[root@ticotico lib]# ldd /usr/bin/opensc-tool
linux-gate.so.1 =>  (0x009d2000)
libopensc.so.2 => /usr/lib/libopensc.so.2 (0x00670000)
libcrypto.so.7 => /usr/lib/libcrypto.so.7 (0x05e96000)
libopenct.so.1 => /usr/lib/libopenct.so.1 (0x00b19000)
libz.so.1 => /lib/libz.so.1 (0x00318000)
libltdl.so.3 => /usr/lib/libltdl.so.3 (0x063ea000)
libdl.so.2 => /lib/libdl.so.2 (0x002f4000)
libscconf.so.2 => /usr/lib/libscconf.so.2 (0x00897000)
libpthread.so.0 => /lib/libpthread.so.0 (0x002fb000)
libc.so.6 => /lib/libc.so.6 (0x00157000)
/lib/ld-linux.so.2 (0x0012f000)

Ahora ya podemos instalar el paquete del DNIe y hacerlo funcionar según las instrucciones oficiales:

[root@ticotico ~]# rpm -Uvh opensc-dnie-1.4.6-2.fc10.i386.rpm
Preparando...               ########################################### [100%]
   1:opensc-dnie            ########################################### [100%]

Ahora ya sólo nos queda validar el certificado en la página de pruebas: http://www.dnielectronico.es/como_utilizar_el_dnie/verificar.html

Actualización 29 Octubre 2009

Después de todo el follón para tener funcionando el DNIe que tuve ayer resulta que los certificados digitales que hay dentro me han caducado desde hace 4 meses. Parece que los certificados tienen una validez de 30 meses mientras que la tarjeta en sí vale para 4 años. No me preguntéis por qué, yo tampoco lo entiendo.

Así que tras haber despotricado sobre el driver del DNIe me parece justo alabar el proceso de renovación de los certificados porque aquí sí que creo que han hecho un estupendo trabajo los responsables del DNIe. He ido a la comisaría  más cercana de mi casa, en Sevilla, y aunque no estaba abierta al público aún (eran las 7:40 de la mañna) un policía bastante amable me ha preguntado que qué quería y tras explicarle lo de la renovación de los certificados del DNIe me ha dejado pasar para que lo hiciera yo mismo con un quiosco diseñado para tal fin. Una vez en el quiosco la experiencia de usuario ha sido magnífica y tras poner el dedo en el lector de huellas me ha renovado los dos certificados sin problema. Ha tardado un poco pero bueno, más vale lento pero que sea fiable y no pete. Mi satisfacción se debe por tanto a:

  1. Un estupendo trabajo con el quiosco tanto hardware como software.
  2. Un funcionario amable (el policia, en este caso) que me pregunta qué necesito y me permite usar un servicio aunque no haya comenzado el horario de atención al público.
  3. Una ausencia total de espera, en parte debido a la hora, pero también a que se elimina la intervención de funcionarios para esta tarea.
  4. Un ahorro para la administración pública al no necesitar de funcionarios para renovar estos certificados.

Actualización 26 Marzo 2010

Instrucciones para que funcione en Fedora 12

Pycha padding present and future

Sunday, September 13th, 2009

Simon Ilyushchenko sent me a patch some weeks ago about tick labels in Pycha. The patch fixes the label position in the case where it goes out of the surface dimensions. The problem with his patch is that is too specific and only works well for the most simple case. For example, it doesn’t solve the problem:

  • when you have rotated labels
  • when you have pie charts
  • for the y axis tick labels

And that’s why I didn’t add his patch to pycha. When developing the library I always try to make it as orthogonal and predictable as possible, so if a feature is added and it should work in all places where it makes sense. Users that send me patches are not usually concerned about edge cases but that’s my work as the library author and maintainer.

But the real point is that Simon raised a real issue with Pycha that I was worried about since the very beginning: right now the user has to adjust the padding options to make sure that tick labels and other things are kept inside the surface. Clearly this is suboptimal and for automatically computed charts without human intervention it can be a real issue.

So let me explain how pycha uses its real state space right now. First, the user tells the library how big the rendering surface is going to be: let’s call these parameters surface_width and surface_height. Then there are these padding options: padding.top, padding.bottom, padding.left and padding.right. Let’s see it in the following figure:

figure1

Now we have a padding area and a chart area. The chart area is where pycha draws the chart. The axis (when used) are drawn in the boundary of the chart area and the tick labels and axis titles are drawn in the padding rectangle. Other things like the chart title and the legend are also drawn in that rectangle. You can see it in the next figure where the chart area has a blue background:

figure2

Now, I want to change this and make Pycha automatically compute all its elements so the padding is actually that, blank space. Optimally Pycha should not draw anything into the padding rectangle. Well, there maybe some exceptions, see the end of this post for more information about them. So let’s define some new areas:

figure3

The goal is to make Pycha compute all these areas automatically so the chart area is reduced to accomodate all the other things the library draws in the surface without cluttering the padding area. This is a big change and it won’t happen for 0.5.2 but I’ll try to have it for the 0.6.0 release. Obviously this is a backwards incompatible change and new charts rendered with the same options will look smaller. Maybe I’ll set the default values for the padding options to zero to minimize the update impact.

The way Pycha will compute these areas is actually pretty similar than the way graphical toolkits like Gtk+, Tk, Qt, Swing, GWT do it when using layout containers where the widgets are placed in. There is a layout phase where the area’s dimensions are negociated and calculated and then there is a rendering phase when the elements/widgets are actually drawn. For bar and line charts this may look overkill but for other charts like pie charts is really the only sane way to do it.

The legend is a special case and I’m still not sure about positioning it using ‘absolute’ coordinates or give it its own area that is taken into account for the layout algorithm.

There are more than one way to do it

Sunday, April 19th, 2009

I wrote a method the other day. It was not a particularly difficult method and before writing it, it didn’t seem like a challenge. When I finished it, it looked quite good and simple but there was something that keep my eyes focused on a point instead of enjoying the method as a whole. Something similar as when you buy a shiny beatiful monitor, you plug it in and you discover it has a dead pixel. No matter how bright and fast your monitor is, that rotten pixel is ruining your experience.

Let me show you my broken pixel.

But before I’ll give you a small explanation of what the method does. Its class has a list of collectors, where each collector is a callable that return a generator of photos . The method collect the photos of all collectors until a maximum is reached, then it return the photos using an auxiliar method that does its own job postprocesing the list of photos.

Example carrusel

Example carrusel

Here is my initial version:


    def collect_photos1(self, destination):
        n_photos_collected = 0
        for collector in self.collectors:
            for photo_info in collector(destination):
                self.add_photo(photo_info)
                n_photos_collected += 1
                if n_photos_collected > self.max:
                    return self.get_photos()

        return self.get_photos()

As you can see, no rocket science so far. It’s just a nested for loop that iterates over all the photos until we get as many as we want. The rotten pixel is the return self.get_photos() line. It is bad because it’s duplicated, one in the finished condition and one at the end of the method. There are two problems with that duplication:

  • The first one is a practical problem. What if, in the future, the get_photos aux method need to receive a parameter. We need to update the call to that method, but as we have two calls there is a chance that we forget to update one of them. If we had only one the error probability of an update would be lower.
  • The second problem is a theorical one. We are using a nested for loop to iterate over a list of lists of photos but we do not know in advance how long those lists are and how many loop body we are going to iterate.

My second version is basically the same as the first but instead of duplicating the return self.get_photos line we are duplicating the break condition. Now, if we add a parameter to the get_photos method we will have to update just one call. But if we want to add another condition to stop collecting photos we will have to update two if statements. In other words, this version is pretty much the same.


    def collect_photos2(self, destination):
        n_photos_collected = 0
        for collector in self.collectors:
            for photo_info in collector(destination):
                self.add_photo(photo_info)
                n_photos_collected += 1
                if n_photos_collected > self.max:
                    break
            if n_photos_collected > self.max:
                break

        return self.get_photos()

Then I tried to convert the for loops into while loops since basic computer science tell us that a while loop is the choice to make when the number of iterations is unknown. So this version does not have the previous two problems and it’s easier to maintain. Nevertheless, it introduces another problem: it’s harder to read and to understand.


    def collect_photos3(self, destination):
        n_photos_collected = 0
        need_more_photos = True
        collectors_iterator = iter(self.collectors)
        while need_more_photos and collectors_iterator.has_next():
            collector = collectors_iterator.next()

            photos_iterator = collector(destination)
            while need_more_photos and photos_iterator.has_next():
                photo_info = photos_iterator.next()
                self.add_photo(photo_info)
                n_photos_collected += 1

                if n_photos_collected > self.max:
                    need_more_photos = False

        return self.get_photos()

Version 4 is a crazy one. It tries to simulate to goto statement in Python. Everybody will tell you the goto statement is evil and well, most of the times, it is. But sometimes it has no good replacement like this one where we want to stop two nested loops with a statement. Too bad Python does not have a goto statement :(


    def collect_photos4(self, destination):
        n_photos_collected = 0
        try:
            for collector in self.collectors:
                for photo_info in collector(destination):
                    self.add_photo(photo_info)
                    n_photos_collected += 1
                    if n_photos_collected > self.max:
                        raise TypeError('goto')

        except TypeError:
            pass

        return self.get_photos()

Now, what if we separate the two problems this method is trying to solve: iterating over the photos and adding to our result set. Doing so we can write a simple generator that gives us a photo at a time and then a while loop that check there are more photos in this generator and we actually want them. I like this version more than any of the previous one but still, the while loop is not very readable.


    def collect_photos5(self, destination):
        def photos_fetcher():
            for collector in self.collectors:
                for photo in collector(destination):
                    yield photo

        n_photos_collected = 0
        photos_iterator = photos_fetcher()
        while n_photos_collected < self.max and photos_iterator.has_next():
            photo_info = photos_iterator.next()
            self.add_photo(photo_info)
            n_photos_collected += 1

        return self.get_photos()

And then, the last version. I decided to add the stop condition logic into the generator and as it is a nested function I can put a return statement to exit from the two nested loops. Then the main loop can be a for loop without the boilerplate management code that the while loop needed.


    def collect_photos6(self, destination):
        def photos_fetcher(max_photos):
            n_photos_collected = 0
            for collector in self.collectors:
                for photo in collector(destination):
                    if n_photos_collected > max_photos:
                        return
                    yield photo
                    n_photos_collected += 1

        for photo_info in photos_fetcher(self.max):
            self.add_photo(photo_info)

        return self.get_photos()

What version do you like most?

Mapping inheritance to a RDBMS with storm. Autoproxy version

Tuesday, January 27th, 2009

We saw in the Proxy version that some types of queries were not possible using the lazr.delegates package. We solved that problem using Storm’s Proxy objects but we lost easyness in the process.

This time we will try to combine both aproaches to get the best of both worlds.

The interface definitions do not change:

    >>> from zope.interface import Interface, Attribute
    >>> class IPerson(Interface):
    ...     name = Attribute("Name")
    ...
    >>> class ISecretAgent(IPerson):
    ...     passcode = Attribute("Passcode")
    ...
    >>> class ITeacher(IPerson):
    ...     school = Attribute("School")

Neither the Person class:

    >>> from zope.interface import implements
    >>> from zope.interface.verify import verifyClass
    >>> from lazr.delegates import delegates
    >>> from storm.locals import Store, Storm, Unicode, Int, Proxy, Reference
    >>> class Person(Storm):
    ...     implements(IPerson)
    ...
    ...     __storm_table__ = "person"
    ...
    ...     id = Int(allow_none=False, primary=True)
    ...     name = Unicode()
    ...     person_type = Int(allow_none=False)
    ...     _person = None
    ...
    ...     def __init__(self, store, name, person_class, **kwargs):
    ...         self.name = name
    ...         self.person_type = person_class.person_type
    ...         store.add(self)
    ...         self._person = person_class(self, **kwargs)
    ...
    ...     @property
    ...     def person(self):
    ...         if self._person is None:
    ...             assert self.id is not None
    ...             person_class = BasePerson.get_class(self.person_type)
    ...             self._person = Store.of(self).get(person_class, self.id)
    ...         return self._person
    ...
    >>> verifyClass(IPerson, Person)
    True

Now, the real magic is in the metaclass. We use it not only to register our subclasses (so the Person.person property can find them) but to automatically store the attributes we needed to set manually in our previous version: the person_id and person attributes and a proxy object for each Person attribute. It’s like we are reimplementing inheritance in Python but not really :-)

    >>> from storm.properties import PropertyPublisherMeta
    >>> class PersonType(PropertyPublisherMeta):
    ...     def __init__(self, name, bases, dict):
    ...         if hasattr(self, '__storm_table__'):
    ...             # this need to be done before calling the superclass
    ...             # otherwise Storm will cry about not having a primary key
    ...             self.person_id = Int(allow_none=False, primary=True)
    ...
    ...         super(PersonType, self).__init__(name, bases, dict)
    ...
    ...         if not hasattr(self, '_person_types_registry'):
    ...             self._person_types_registry = {}
    ...         elif hasattr(self, '__storm_table__'):
    ...             key = len(self._person_types_registry)
    ...             self._person_types_registry[key] = self
    ...             self.person_type = key
    ...
    ...             self.person = Reference(self.person_id, Person.id)
    ...             self._add_proxy_properties()
    ...
    ...     def _add_proxy_properties(self):
    ...         for name in IPerson:
    ...             if not hasattr(self, name):
    ...                 remote_attr = getattr(Person, name)
    ...                 setattr(self, name, Proxy(self.person, remote_attr))
    ...
    ...     def get_class(self, person_type):
    ...         return self._person_types_registry[person_type]

Not our BasePerson is really simple

    >>> class BasePerson(Storm):
    ...     __metaclass__ = PersonType
    ...
    ...     def __init__(self, person):
    ...         self.person = person

And so are the subclasses. No repetition, so it is less prone to mistakes :-)

    >>> class SecretAgent(BasePerson):
    ...     implements(ISecretAgent)
    ...
    ...     __storm_table__ = "secret_agent"
    ...     passcode = Unicode()
    ...
    ...     def __init__(self, person, passcode=None):
    ...         super(SecretAgent, self).__init__(person)
    ...         self.passcode = passcode
    ...
    >>> verifyClass(ISecretAgent, SecretAgent)
    True
    >>> class Teacher(BasePerson):
    ...     implements(ITeacher)
    ...
    ...     __storm_table__ = "teacher"
    ...     school = Unicode()
    ...
    ...     def __init__(self, person, school=None):
    ...         super(Teacher, self).__init__(person)
    ...         self.school = school
    ...
    >>> verifyClass(ITeacher, Teacher)
    True

Let’s make sure our queries work as expected:

    >>> from storm.locals import create_database
    >>> database = create_database("sqlite:")
    >>> store = Store(database)
    >>> result = store.execute("""
    ...     CREATE TABLE person (
    ...         id INTEGER PRIMARY KEY,
    ...         person_type INTEGER NOT NULL,
    ...         name TEXT NOT NULL)
    ... """)
    >>> result = store.execute("""
    ...     CREATE TABLE secret_agent (
    ...         person_id INTEGER PRIMARY KEY,
    ...         passcode TEXT)
    ... """)
    >>> result = store.execute("""
    ...     CREATE TABLE teacher (
    ...         person_id INTEGER PRIMARY KEY,
    ...         school TEXT)
    ... """)
    ...
    >>> secret_agent = Person(store, u"James Bond",
    ...                        SecretAgent, passcode=u"007")
    >>> ISecretAgent.providedBy(secret_agent.person)
    True
    >>> teacher = Person(store, u"Albus Dumbledore",
    ...                  Teacher, school=u"Hogwarts")
    >>> ITeacher.providedBy(teacher.person)
    True
    >>> store.commit()
    >>> del secret_agent
    >>> del teacher
    >>> store.rollback()
    >>> secret_agent = store.find(SecretAgent).one()
    >>> secret_agent.name, secret_agent.passcode
    (u'James Bond', u'007')
    >>> teacher = store.find(Teacher).one()
    >>> teacher.name, teacher.school
    (u'Albus Dumbledore', u'Hogwarts')
    >>> secret_agent = store.find(SecretAgent, SecretAgent.name==u'James Bond').one()
    >>> secret_agent.passcode
    u'007'
    >>> teacher = store.find(Teacher, Teacher.school==u'Hogwarts').one()
    >>> teacher.name
    u'Albus Dumbledore'

So we made it by using a metaclass that automatically generate a bunch of attributes.

Mapping inheritance to a RDBMS with storm. Proxy version

Friday, January 23rd, 2009

One problem that we had in the first version of our inheritance by composition pattern was that we could not make storm queries using the subclasses. In other words, the following would return None::

    secret_agent = store.find(SecretAgent, SecretAgent.name==u'James Bond').one()

The reason is that the expresion SecretAgent.name would resolve to a Passthrough lazr.delegates object that Storm does not know how to handle.

This time we will try to fix this problem using a manually generated version of our classes using storm’s Proxy objects.

The interface definitions do not change:

    >>> from zope.interface import Interface, Attribute
    >>> class IPerson(Interface):
    ...     name = Attribute("Name")
    ...
    >>> class ISecretAgent(IPerson):
    ...     passcode = Attribute("Passcode")
    ...
    >>> class ITeacher(IPerson):
    ...     school = Attribute("School")

Neither the Person class:

    >>> from zope.interface import implements
    >>> from zope.interface.verify import verifyClass
    >>> from lazr.delegates import delegates
    >>> from storm.locals import Store, Storm, Unicode, Int, Proxy, Reference
    >>> class Person(Storm):
    ...     implements(IPerson)
    ...
    ...     __storm_table__ = "person"
    ...
    ...     id = Int(allow_none=False, primary=True)
    ...     name = Unicode()
    ...     person_type = Int(allow_none=False)
    ...     _person = None
    ...
    ...     def __init__(self, store, name, person_class, **kwargs):
    ...         self.name = name
    ...         self.person_type = person_class.person_type
    ...         store.add(self)
    ...         self._person = person_class(self, **kwargs)
    ...
    ...     @property
    ...     def person(self):
    ...         if self._person is None:
    ...             assert self.id is not None
    ...             person_class = BasePerson.get_class(self.person_type)
    ...             self._person = Store.of(self).get(person_class, self.id)
    ...         return self._person
    >>> verifyClass(IPerson, Person)
    True

Neither our custom metaclass:

    >>> from storm.properties import PropertyPublisherMeta
    >>> class PersonType(PropertyPublisherMeta):
    ...     def __init__(self, name, bases, dict):
    ...         super(PersonType, self).__init__(name, bases, dict)
    ...         if not hasattr(self, '_person_types_registry'):
    ...             self._person_types_registry = {}
    ...         elif hasattr(self, '__storm_table__'):
    ...             key = len(self._person_types_registry)
    ...             self._person_types_registry[key] = self
    ...             self.person_type = key
    ...
    ...     def get_class(self, person_type):
    ...         return self._person_types_registry[person_type]

Here start the changes. Our BasePerson class does not have the delegates call and does not define the person_id attribute and person reference.

    >>> class BasePerson(Storm):
    ...     __metaclass__ = PersonType
    ...
    ...     def __init__(self, person):
    ...         self.person = person

Instead we define the person_id and person reference in each subclass and also we define each attribute of the Person base class as a proxy to the related attribute of the person reference.

    >>> class SecretAgent(BasePerson):
    ...     implements(ISecretAgent)
    ...
    ...     __storm_table__ = "secret_agent"
    ...     passcode = Unicode()
    ...     person_id = Int(allow_none=False, primary=True)
    ...     person = Reference(person_id, Person.id)
    ...     name = Proxy(person, Person.name)
    ...
    ...     def __init__(self, person, passcode=None):
    ...         super(SecretAgent, self).__init__(person)
    ...         self.passcode = passcode
    >>> verifyClass(ISecretAgent, SecretAgent)
    True

We do it again for the Teacher class:

    >>> class Teacher(BasePerson):
    ...     implements(ITeacher)
    ...
    ...     __storm_table__ = "teacher"
    ...     school = Unicode()
    ...     person_id = Int(allow_none=False, primary=True)
    ...     person = Reference(person_id, Person.id)
    ...     name = Proxy(person, Person.name)
    ...
    ...     def __init__(self, person, school=None):
    ...         super(Teacher, self).__init__(person)
    ...         self.school = school
    >>> verifyClass(ITeacher, Teacher)
    True

Time to test the database storage:

    >>> from storm.locals import create_database
    >>> database = create_database("sqlite:")
    >>> store = Store(database)
    >>> result = store.execute("""
    ...     CREATE TABLE person (
    ...         id INTEGER PRIMARY KEY,
    ...         person_type INTEGER NOT NULL,
    ...         name TEXT NOT NULL)
    ... """)
    >>> result = store.execute("""
    ...     CREATE TABLE secret_agent (
    ...         person_id INTEGER PRIMARY KEY,
    ...         passcode TEXT)
    ... """)
    >>> result = store.execute("""
    ...     CREATE TABLE teacher (
    ...         person_id INTEGER PRIMARY KEY,
    ...         school TEXT)
    ... """)
    >>> secret_agent = Person(store, u"James Bond",
    ...                        SecretAgent, passcode=u"007")
    >>> ISecretAgent.providedBy(secret_agent.person)
    True
    >>> teacher = Person(store, u"Albus Dumbledore",
    ...                  Teacher, school=u"Hogwarts")
    >>> ITeacher.providedBy(teacher.person)
    True
    >>> store.commit()

And what’s more important, all this changes should make possible to do the query with the subclass:

    >>> del secret_agent
    >>> del teacher
    >>> store.rollback()
    >>> secret_agent = store.find(SecretAgent).one()
    >>> secret_agent.name, secret_agent.passcode
    (u'James Bond', u'007')
    >>> teacher = store.find(Teacher).one()
    >>> teacher.name, teacher.school
    (u'Albus Dumbledore', u'Hogwarts')
    >>> secret_agent = store.find(SecretAgent, SecretAgent.name==u'James Bond').one()
    >>> secret_agent.passcode
    u'007'

We have improved the power of the pattern but now it is much more verbose to write each subclass since we need to repeat a lot of things. Note that we can not move the definition of the person_id and person attributes to the BasePerson aux class because Storm will tell us that it lacks the __storm_table__ attribute. In other words: storm does not allow attributes in abstract classes.