prettyprinted.net

More Cygwin Goodies

This was posted March 24, 2009. It has received 0 comments and 0 pingbacks

I stumpled upon a couple of Cygwin .bashrc goodness whilst browsing Reddit today. As a result I stopped for a moment and remembered that I too have a few goodies that I use and love every day.

The default Cygwin terminal emulator is cmd.exe. Instead of replacing this with Putty as Athul suggests I've edited c:\cygwin\cygwin.bat to open a rxvt terminal with a nice font:

@echo off
C:
chdir C:\Cygwin\bin
rxvt -bg black -fn "Lucida Console-12" -e bash --login -i

Finally a decent terminal emulator with copy and paste the way it's supposed to be!

If you're like me you hate the default /cygdrive/c setup. To save me from typing cd /cygdrive all the time. I always run this one-liner on fresh Cygwin installations:

for d in $(/bin/ls /cygdrive); do ln -s "/cygdrive/$d" "/$d"; done

It creates symlinks from /cygdrive/c to /c for all your drives. Saves me from a lot of typing, every day.

What's new in the upcoming Django 1.1?

This was posted March 15, 2009. It has received 0 comments and 0 pingbacks

With the release of Django 1.1 coming up in a month I thought I'd give the roadmap a quick look to find out what to expect. Every must-have feature that was planned for 1.1 has been commited to trunk (not counting the two features that were considered for 1.1 that have been postponed to 1.2).

The most exciting feature of Django 1.1 is ORM aggregation. The documentation is extensive and others have written long and explanatory blog posts. It's a really cool feature, go check it out!

But what else can you expect from Django 1.1?

for...empty r9530, documentation

As far as I can tell this is the only template language change. An optional empty clause to the for loop that mimics the behavior of the Python for...else:

{% for blog_post in month_list %}
    <li>{{ blog_post.title }}</li>
{% empty %}
    <li>Sorry, no posts!</li>
{% endfor %}

Quite handy and easier to read and write than the common if-else equivalent.

Cached sessions r9727, documentation

For large sites with heavy session usage the new cache-based session backend may be just the thing. Every session read is performed on the cache and every write goes the database.

It's easy enough to enable and it may boost performance. Nice!

F() expressions in query filter r9792, documentation

In addition to the long-awaited ORM aggregation support, this is another nice-to-have ORM query feature. F-expressions can be used in queries to compare a field to another field on the same row.

It is probably best explained with an example, from the documentation:

Entry.objects.filter(n_pingbacks__lt=F('n_comments'))

This will retrieve the blog posts where the n_pingbacks field is less than (_lt) the n_comments field. The documentation is quite good and has a few more examples.

Day-of-week query filter r9818, documentation

A minor addition to the query language is the week_day lookup for date and datetime types. Once again explained better with an example:

Entry.objects.filter(pub_date__week_day=2)

One gotcha that will haunt the mailing lists and IRC channels is the numbering of the days:

Week days are indexed with day 1 being Sunday and day 7 being Saturday.

This prompted a long thread on the django-developer mailing list. Personally I'm not that happy with the outcome, I live in a country where no one would expect any other day than Monday to be the first. As mentioned in the django-dev thread the obvious workaround for troublesome minds is to use constants:

SUNDAY, MONDAY. TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY = range(0, 7)
Model.object.filter(date__week_day=TUESDAY)

No confusion and more readable code. Nice.

Customizable django.contrib.comments r9890, documentation

I have previously shared my hack to use custom forms with django.contrib.comments. It's nice to see that the comments framework is much more customizable in the upcoming release. However it feels a bit clumsy to create a new module just to be able to use subclassed forms. Classes and inheritance are the preferred tools for such extensions in my opinion.

The documentation is extensive and covers most use-cases.

---

This is only a few of the new features for Django 1.1. From the looks of it will be a fine release, as always kudos goes to the great developers.

Awesome Pandas

This was posted March 5, 2009. It has received 0 comments and 0 pingbacks

I spent an hour yesterday playing with the Flickr Pandas. You've never heard of the Flickr Pandas? Laughing Meme has an accurate description:

These are Flickr real time data APIs. We’re building streams of photos in real time. Examining the huge stream of data events that happen on Flickr, the social activity, the searching, the meta-data creation, and fishing from that stream to build 3 real time streams. We’re then exposing those streams via a near real time polling based API.

It turns out that these pandas are really good at locating interesting photos in real-time. I've made a page that shows the awesomeness of these pandas. Nothing fancy, but it shows you a collage of 50 or so photos.

On Clean Builds

This was posted Jan. 4, 2009. It has received 0 comments and 0 pingbacks

Johannes Brodwall wrote a terrific piece today about compiler warnings: keep your build clean, strive to have zero compiler warnings. Always! I pretty much agree with him on everything. Keeping my build clean has always had a high priority for me.

At work I maintain a pile of legacy code, mostly Java. Every now and then I'm so lucky that I get to bugfix or add a feature to a project that hasn't been checked out of CVS in years.

If I'm really lucky I'll spend a day or two to get it to build and deploy to staging. I have noticed that projects with lots and lots of compiler warnings are those that are hardest to deploy. Projects with clean code is usually also extremely easy to build and deploy. It is probably not a coincidence.

Compiler warnings are found during static type checking and code analysis. So where does that leave us in regard to dynamic languages? For my private projects I tend to use Python, where the term compiler warning is undefined. Python code is compiled to bytecode at run-time and the compiler is usually very quiet.

I have noticed that a lot of what the Java compiler (or is it Eclipse?) warns about are the kind of things you'd expect to be jotted down in your coding conventions document. In Python we have the almighty PEP8 that dictates style and convention, source code analysis based on PEP8 could tell you something about the code.

And what do you know? Such a tool already exists! Through Google I found pep8.py, a script that runs through your Python code and warns you about PEP8-violations. For fun I ran it on one of my smaller projects and PEP8ified the code and sadly most of the warnings are about whitespace, whitespace and trailing whitespace::

./development_utils/shared.py:6:10: E401 multiple imports on one line
./development_utils/shared.py:8:1: E302 expected 2 blank lines, found 1
./development_utils/management/commands/createadmin.py:51:19: E201 whitespace   after '['
./development_utils/management/commands/createadmin.py:61:1: W391 blank line at end of file
./development_utils/management/commands/createapp.py:32:80: E501 line too long (98 characters)
./development_utils/management/commands/createapp.py:75:33: E231 missing whitespace after ':'

Most of these should be automatically fixed by your $EDITOR, which means I have some Emacs-hacking to be done in the future.

This little exercise has raised a question: Does the fact that we don't have static analysis and compiler warnings say anything about the quality of our dynamic language code?

Avoiding Python syntax errors with git

This was posted Dec. 22, 2008. It has received 0 comments and 0 pingbacks

I recently switched source code management systems; from Bazaar to Git, mostly because of Github. If you don't know Github, check it out and give it a try. It made me understand and appreciate distributed source control management systems in a whole new way.

I never played much with Bazaar hooks as they seemed clumsy and unnecessary complex. Git hooks on the other side are incredible easy to set up and maintain as they are simply executable files in the .git/hooks sub-directory.

I mostly program in Python and even though many of my projects have unit tests, some of them don't. Once in a while a typo creeps into my code and renders the Python invalid so I put together this surprisingly simple pre-commit hook that compiles all my Python code and aborts the commit if any syntax errors were found:

$ git commit -a -m "testing pre-commit hook"
Compiling ./setup.py ...
  File "./setup.py", line 6
    def foo()            
^
SyntaxError: invalid syntax

The shell-script that is run before I commit is only two lines:

$ cat .git/hooks/pre-commit 
#!/bin/sh
python -m compileall -q .

The compileall module is a part of the Python standard library and when executed as a script it recursively compiles all your Python files. The -q flag makes it only spit out errors. To enable the hook ensure that the pre-commit file has the executable flag set.

Subclassing Django's CommentForm

This was posted Nov. 24, 2008. It has received 0 comments and 0 pingbacks

Django comes with batteries included and a very powerful and useful battery is the django.contrib.comments framework. However, to make it in time for the long awaited 1.0 release the comments framework was released without customization hooks. See #8630 for more details.

The long story in a short sentence is that you cannot subclass or customize CommentForm in today's version without getting your hands dirty.

I recently enabled Captcha verification to comments on this site to prevent spammers from filling my database with non-public comments. Tor Brede Vekterli has written about how to do this by using signals, but for technical reasons I'd rather not go down that road. So I landed on the next possible strategy, namely subclassing CommentForm.

My subclass is a standard Django form:

class CaptchaCommentForm(CommentForm):
    captcha = forms.CharField(max_length=20, label='Enter this word')

    def clean_captcha(self):
        # verify self.cleaned_data['captcha'] here, details omitted,
        # raise forms.ValidationError if verification fails

Since I wanted to control the way this field is displayed by adding an image I also edited my form template. In my comments/form.html template I added a check to include the captcha image:

{% for field in form %}
  {% ifequal field.name "captcha" %}
    do whatever it takes to display the custom field.
  {% else %}
    display regular comment form field
  {% endifequal %}
{% endfor %}

As I mentioned above the comments framework is not really designed to be extended or customized in this way. To make the framework use my CaptchaCommentForm instead of its own CommentForm I resorted to monkeypatching!

The framework uses a single function, comments.get_form, whenever it needs a fresh CommentForm instance. What I wanted is for that function to return my CaptchaCommentForm instead. Here is what I did to my urls.py:

from django.contrib import comments
from forms import CaptchaCommentForm

def override_get_form():
    return CaptchaCommentForm()

comments.get_form = override_get_form

Thanks to the dynamic features of Python we can simply replace the original function with our own that returns the correct instance. Not pretty, but it works.

Is gst a bot?

This was posted Oct. 27, 2008. It has received 2 comments and 0 pingbacks

I recently noticed that nearly all of the links in the programming section on Reddit that I follow is posted by a certain user named gst. Curious about this user I checked out his Reddit profile only to find that he has submitted a lot of links. No surprise there!

The normal Google searches turned out amazingly few clues as to who this person is.

Apparently some people feel that he is spamming Reddit and posted a request to "be saved from gst". The discussion that followed suggested that gst is not a regular user, but a bot posting links from gerd.storm's Delicious account:

gst scrapes from gerd storm from del.icio.us. (hence the name gst for gerd storm)

However in another thread gst is asked where he gets his links from and he replies:

blogsearch.google.com. google reader. dzone.com. news.ycombinator.com. scanning google groups for python haskell lisp and scheme. following interesting links.

Some felt that such an answer implies that he is in fact human, however judging by the looks of his average comment I am inclined to say the opposite.

One thing is for sure; gst is posting a lot of links to Reddit. Averaging one every few minutes he's probably the most active user on Reddit. One would think that such a user kept a high profile, but my searches tonight has led me nowhere. I still can not answer the question; is gst a bot?

If gst is a bot, I'd love to see the sources and its input. If he's 100% human I'd really love to hear how he manages such a high volume.

SlugField and Django 1.0

This was posted Oct. 7, 2008. It has received 1 comment and 0 pingbacks

I stumbled upon this non-functional fix for using prepopulated admin fields in Django 1.0. If you manage to read the code you'll notice that it really does not conform to Django-1.0 standards; it uses old-style inner Admin-classes.

The proper way to do this is of course to use the ModelAdmin classes that were introduced with the merge of the newforms-admin branch.

Declare your model as usual in models.py without the inner Admin class:

class Article(models.Model):
    title = models.CharField()
    slug = models.SlugField()

Then declare the ModelAdmin class in admins.py. This is where you set properties regarding the admin interface, it replaces the old-style inner Admin class:

class ArticleAdmin(admin.ModelAdmin):
    prepopulated_fields = {"slug": ("title",)}

admin.site.register(Article, ArticleAdmin)

This is the correct way to do it. Rip out the inner Admin-classes and create new ModelAdmin-classes. Read the new and updated documentation for more on newforms-admin.

Redesign

This was posted Oct. 5, 2008. It has received 0 comments and 0 pingbacks

Today I spent a total of 45 minutes and redesigned this site. I'm in love with simplistic designs and this new design is just that. Dead simple.

In other news: I'm fascinated by Photoswap. I will experiment with it for a few days and then perhaps jot down a few thoughts.

SSH breakin attempts

This was posted Sept. 27, 2008. It has received 0 comments and 0 pingbacks

Yesterday I read an interesting analysis of ssh attacks on a honeypot put up by the New Zealand Honeynet Alliance. Since I have a few Linux boxes on public IPs I was of course aware of the problem, but I have never bothered to take a closer look at my logs.

Inspired by that article I decided to take a quick look at what's happening on one of my virtual boxes.The server in question has been online for a year now, with a static ip address. I have logs going back to the first day that this server was brough online.

My messages file is 29MB. Let's see how much of that is related to the ssh daemon.

$ grep sshd messages > messages.sshd
$ ls -lh 
-rw-r--r-- 1 steingrd steingrd  29M 2008-09-27 17:35 messages
-rw-r--r-- 1 steingrd steingrd  24M 2008-09-27 17:36 messages.sshd

Whoa! I decided to base my analysis on lines dealing with invalid user names, since each line represents a possible break in attempt. I wrote a simple Python script for analyzing it:

$ python sshd_analyzer.py messages.sshd.invaliduser 

break in attempts 97264
distinct users 12263
distinct ip adresses 1670

top ip addresses
[(3497, '78.109.21.33'),
 (2621, '72.9.228.152'),
 (2442, '140.112.113.24'),
 (1814, '60.248.211.30'),
 (1746, '91.190.236.11'),
 (1665, '61.135.204.133'),
 (1579, '66.0.29.75'),
 (1529, '85.125.82.114'),
 (1426, '194.94.111.40'),
 (1376, '82.159.205.86')]

top users
[(2665, 'test'),
 (2479, 'admin'),
 (1128, 'guest'),
 (999, 'oracle'),
 (928, 'user'),
 (692, 'access'),
 (641, 'account'),
 (488, 'albert'),
 (482, 'clamav'),
 (436, 'asterisk')]

Over the last year I have been attacked almost one hundred thousand times by 1670 different ip address. These attacks have used 12263 different user names. The top 10 user names and the top 10 attacking ip addresses are listed.

This script takes as input the list of invalid users used in the attacks. You might notice that root is missing, root not being an invalid user it did not appear in my input. However, the root account was attacked 3012 times:

$ grep "Authentication failure for root" messages.sshd | wc -l
3012

Which of course puts it at number one.

My simple scripts

This was posted Sept. 25, 2008. It has received 0 comments and 0 pingbacks

If you've read this blog before, perhaps you know that I keep my Django projects in a simple and flexible directory structure. I've written not one, but two posts about it earlier.

To be honest I'm quite proud of this setup. It fixes several things that the semi-standard Django project layout does not address. Most importantly it keeps my PYTHONPATH in a single directory.

Recently I have ported several projects to Django 1.0, which was a painful experience. However, some sites remain, for example rssepisodes.com which I haven't worked on for months. Today I decided that I had to fix a few bugs, so I eagerly cd'ed into the directory and tried to start the development server.

Now if you're like me you've completely forgotten that your PYTHONPATH contains Django-1.0 which of course is completely incompatible with the Subversion checkout that this code requires. No problemo you say, just check out the required revision. And of course that is what I did. But since my ~/python/django symlink points to ~/src/django-1.0/django I have to update that link. (~/python being my PYTHONPATH).

I must admit. I love stupid shell scripts. So I quickly created the following scripts. Now I run them whenever I need to change my django symlink.

$ cat ~/bin/use_django_7209.sh
#!/bin/bash -x
CURR=$PWD
cd $HOME/python
rm django
ln -s $HOME/src/django-svn-7209/django django
cd $CURR

And when I want to switch back:

$ cat ~/bin/use_django_1.0.sh 
#!/bin/bash -x
CURR=$PWD
cd $HOME/python
rm django
ln -s $HOME/src/django-1.0/django django
cd $CURR

Sure, it's dead simple. It's basic. It's the next step from hello, world.

But it freaking works! And it saves me 30 seconds a few times a day.

Inner Beauty

This was posted Sept. 24, 2008. It has received 0 comments and 0 pingbacks

In my day job as a Java architect and developer I sometimes stumble upon beautiful code. Of course, more often I read through piles and piles of ugly, unmaintainable crap. I have the pleasure of maintaining and working with Java code that has evolved for 8-9 years, through various technologies and developed by external consultants with diverging styles.

Perhaps the most important application of them all was developed that way. It embodies a gruesome mix of EJBs and an astonishing amount of layers. I usually refer to it as the beautiful 14-layer architecture.

This application, weighing in at approximately 130'000 lines of code, is a mastodont. Its task is simple: provide a web application and an API for searching a gigantic database. Simple enough.

Blend in a few layers of EJB and other remote procedure call technologies just because you can, cover it with at least two layers of leaky abstractions to hide your implementation and spread it across three modules and voilá, you have a created a monster.

One of many ongoing tasks for my team the last month has been to rewrite this monster into a simple J2SE API, either from scratch or by reusing parts of the code. Our first reaction was of course to rewrite everything, we wouldn't want to touch this monster with a ten foot pole. However, my employer and my team has also invested lots of time into the existing code so we decided that we would slowly rewrite our monster into something maintainable by todays standards.

We pealed away layer after layer, ripped out tons of ServiceLocators and other EJB-patterns. We used Eclipse and its reformat-code functionality over and over again. And guess what?

Beneath it all we found a beautiful little API, probably better than anything we would have cooked together ourselves.

... bytes from prettyprinted: time=5 months

This was posted Sept. 15, 2008. It has received 0 comments and 0 pingbacks

Woah! 5 months has gone by and I haven't written a single blog entry. Lots has happened though.

I started writing a patch for django-pingback, I haven't completed it yet but I will soon. I'm already using the code on this blog.

I also wrote a Django-like template language for Java named Templatext. Maybe I'll write more about that later. I think I'm proud of the code, but I haven't looked at it the last month so I'm not sure. Writing it sure was lots of fun.

I upgraded all my Django sites to 1.0. That was painful. Very painful.

I finally moved this site to a new hosting provider. No pain, a pleasant experience so far.

Smells Like Django

This was posted April 8, 2008. It has received 0 comments and 0 pingbacks

Google AppEngine was launched to the public today and it looks very cool. I was fortunate enough to get an invitation almost immediately after I signed up on the waiting list. The API has a strong Django smell, this will most likely be very interesting.

Stupid environment variable! Eh. Or me.

This was posted April 1, 2008. It has received 0 comments and 0 pingbacks

Meh. For the first time ever this blog received a fair amount of visits. And what happens?

Stupid me decides to upgrade and forgets to reset an environment variable which makes Django believe it's running in a debug environment; tables, files, everything was in the "wrong" location.

The next time I'll be more careful...