What "gooli" really means

December 2nd, 2007

Apparently the nick name I’ve been called since my first week at the army has different meanings in different languages. And some are not so pleasant :)

Sarah Siegel writes:

“Channa, I don’t want to forget to ask you for a word today. How do you say, ‘Bull’ in Kannada?”
Gooli, Ma’am.”

And the urban dictionary says:

Goolies

Noun:
1. Private parts
2. Family Jewels

“I kicked him in the goolies”

Balls of bull. Nice!

A simple lexer in Python

October 21st, 2007

I’m taking a course on building compilers at the Israeli Open University and just learned how to use flex. It occurred to me that building a simple lexical analyzer should be quite easy with Python’s re module. A typical lexical analyzer read a stream of text input and splits it into a list of tokens. The simplest example of such a thing is the split function which takes a sentence and returns the list of words in it.

s = "A simple lexer in Python"
s.split()
[‘A’, ’simple’, ‘lexer’, ‘in’, ‘Python’]

The problem becomes more complex when you need to separate the tokens you find into different kinds, words and numbers, for instance. We’ll use a well known lyric as our sample text:

s = """99 bottles of beer on the wall, 99 bottles of beer.
Take one down and pass it around, 98 bottles of beer on the wall."
""

The first thing we need to do is build a regular expression that recognizes words and another one that recognizes numbers. Although there are shorter ways to build those regular expressions, I like the less obscure form:

wordsRegex = "[A-Za-z]+"
numbersRegex = "[0-9]+"

We could now use findall on the string and get all the numbers and words out of it.

re.findall(wordsRegex, s)
[‘bottles’, ‘of’, ‘beer’, ‘on’, ‘the’, ‘wall’, ‘bottles’, ‘of’, ‘beer’, ‘Take’, ‘one’, ‘down’, ‘and’, ‘pass’, ‘it’, ‘around’, ‘bottles’, ‘of’, ‘beer’, ‘on’, ‘the’, ‘wall’]

re.findall(numbersRegex, s)
[‘99′, ‘99′, ‘98′]

But wait, you say, that isn’t what we wanted at all! We need to get the tokens in the order of their appearance in text and still get the type of each token. Something along the lines of

for tokenType, tokenText in lexer(s):
    print tokenType, tokenText

would be really nice.

In order to do that, we’ll need to combine both regular expressions into one and iterate on the result of findall examining each token to decide on its type.

regex = "(%s)|(%s)" % (wordsRegex, numbersRegex)
‘([A-Za-z]+)|([0-9]+)’
re.findall(regex, s)
[(, ‘99′), (‘bottles’, ), (‘of’, ), (‘beer’, ),
(‘on’, ), (‘the’, ), (‘wall’, ), (, ‘99′),
(‘bottles’, ), (‘of’, ), (‘beer’, ), (‘Take’, ),
 (‘one’, ), (‘down’, ), (‘and’, ), (‘pass’, ),
(‘it’, ), (‘around’, ), (, ‘98′), (‘bottles’, ),
(‘of’, ), (‘beer’, ), (‘on’, ), (‘the’, ), (‘wall’, )]

As you can see, the result of the call to findall is a list of tuples, each containing a single match. If you look closely at the way I’ve combined the two regular expressions, you’ll see that each part is surrounded with parenthesis and that there’s a pipe (|) between the expressions. The compound regular expression matches either a number rf a word and each tuple in the return value of findall contains the matches for each parenthesized part of the regexp. However, since we combined the parts using a pipe (|), only one of the parts matches each time.

Using that knowledge we can now construct a simple loop that shows the token type for each of the words in the lyric:

for t in re.findall(regex, s):
    if t[0]:
        print "word", t[0]
    elif t[1]:
        print "number", t[1]

We now have most of the knowledge we need to build ourselves a lexer that will take a list of regular expressions and some text and return (or even better, generate) an list of tokens and their types. We’ll need to combine the regular expressions for each token into one big regex using pipes, scan the string, and gather the tokens and their types.

Our usage code looks like this:

definitions = [
    ("word", "[A-Za-z]+"),
    ("number", "[0-9]+"),
]

lex = Lexer(definitions)
for tokenType, tokenValue in lex.parse(s):
    print tokenType, tokenValue

And here is the code for the lexer itself:

class Lexer(object):
    def __init__(self, definitions):
        self.definitions = definitions
        parts = []
        for name, part in definitions:
            parts.append("(?P<%s>%s)" % (name, part))
        self.regexpString = "|".join(parts)
        self.regexp = re.compile(self.regexpString, re.MULTILINE)

    def parse(self, text):
        # yield lexemes
        for match in self.regexp.finditer(text):
            found = False
            for name, rexp in self.definitions:
                m = match.group(name)
                if m is not None:
                    yield (name, m)
                    break

Some notes on the implementation are in order. I’ve used the little known (?P<name>…) syntax for naming the parenthesized groups of regular expressions. Using that syntax the expression (?P<word>[A-Za-z]) matches a word and that match is accessible with match.group(’word’) where match is a re.Match object.

In order to speed things up a bit, I’ve compiled the regular expression when the Lexer object is created, used the finditer function instead of findall, and made parse a generator instead of a list returning function.

Using this simple lexer implementation it was quite simple to create a Python-to-HTML converter with syntax highlighting that works well enough to highlight the code of the highlighter itself!

The code for the lexer and syntax highlighter example are available here and on my snippets page. You can also see the result of running the syntax highlighter on itself here.

Enjoy lexing and let me know if you found this useful.

DreamHost PR stunt?

October 19th, 2007

DreamHost is a web hiosting company. I’ve never hosted anything with them, but now I might. They’ve been EVICTED from their office spaces for drunken behaviour and other types of misconduct and they’ve blogged about it, with pictures and all (you should also read the comments, they are quite funny).

Would you host your website with a company whose offices look like this?

image 

 

I don’t really know what to think about this. Many “serious” companies I’ve delat with in the past provide crappy service although their offices are sparky clean and they don’t do silly things like consume enough alcohol to get evicted. On the other hand it does seem ensettling that the company that you rely on to keep your data safe behaves like a college fraternity.

I did write about them however as others have done and that’s got to be worth something. After all, there’s no such thing as bad press, right?

Ian’s comments on Testuff

October 19th, 2007

Wow!

I finally sat down today to write those Testuff emails I talked about and just as I was getting into the mood of doing that I spotted this post. Apparently Ian keeps track of whoever mentions his name and had quite a few things to say about our offer and our site. Thanks Ian!

Most of the comments about Testuff are dead on and we’re defeinitely going to address them, both on the website and in our application. Following are a couple of items I want to elaborate on.

Ian mentioned that it is unclear why he needed to download something:

It’s also a bit confusing which parts are online and why I’m downloading something. Clearing that up a bit would be useful.

We’re going to change the site to convey it better, but I do want to answer it right here for those who might have the same question. Testuff is a hybrid application with a rich GUI front end and a web-based backend. That means you have to download the client application to use it, but everything is stored online and can be shared between several people. That is similar to how services like iTunes and Chandler work.

Ian also said that we need to state clearly that Testuff integrated with existing bug trackers:

Your site makes it appear that the bugs are logged with you , though I found one random note that suggests it actually integrates with commercial bug trackers. This is a huge point, nobody wants to log bugs with you. You should prominently display the names of the bug trackers you support all over the place. That way when I come to your site I can see my bug trackers name and know you support it right away and that this improves my existing bug tracker not replaces it.

Yes, we are going to integrate with existing bug trackers, but we haven’t done that yet. That’s why there’s only a random note about it on the site and it’s not in H1 on the main page. We’re hard at work on Trac integration with Bugzilla and Fogbugz on our feature list for the coming weeks. The selling point of actually improving your existing bug tracker is a great spin. After all, everybody uses a bug tracker these days (even if it is a simple excel sheet) and having the video records of the bugs in it could be huge!

Our original concept for building Testuff was to create something akin to TestDirector, but lighter, simpler and more useful for small companies. Using what we know about testing and QA we built a tool you could manage your testing process with - create tests, run them, record the results, and see reports about the quality of your product. That feature list seems to strike a note with the larger companies that already have a team of testers in place who are looking for tools to imrpove their processes. Smaller companies and mISV on the other hand, which we’re eager to please, seem to have less interest in test managament and are more excited by a better way to reproduce bugs.

 

Testuff is a young service and is a work in progress. I am very eager to hear more comments and thoughts on the subject, especially the negative ones as you learn the most from those. I promise to address each and every one.

Testuff - a test case management service

October 17th, 2007

<marketing>

iconI haven’t posted too much here lately and for a good reason. Arik and I have been hard at work to release the first public beta of our test management service called Testuff. Developing software is hard enough when you have plenty of resources but when you are a one- or two-man shop with limited funds it’s even harder. We’ve built Testuff to help small companies and mISV’s like ourselves manage and run their software tests. We’ve based it on the SaaS model so you don’t have to install any servers, but we also made a rich desktop client for it so you could enjoy a better user experience. If you’re doing any sort real development for actual, breathing clients, you should try it out.

</marketing>

It’s been a week since the public release and although we made some marketing efforts (like this post) we’re still not getting enough traffic to our site. Only a few people have actually downloaded and tried to use our application and I think there’s only one name on that list that I don’t know. I realize we should be doing more marketing and getting the word out to as many people as we can but I don’t seem to be able to get past my perfectionism. I’m looking at Testuff now and it is (aside from some bugs and quirks) a fine achievement. It is quite convenient, rather pretty and has some really cool features like recording the video of the application you’re testing so you could reproduce the bugs with ease. However, since I’ve been working on it for so long, I’ve gotten u sed to all the cool things by now and I am already cultivating a new vision in my mind. A cleaner interface, less features, a faster bug video recorder, an ability to email a test to your friends who could run it and report the recording of the bug directly and so on. I’m struggling because I’ve promised my partner I’d write emails to some key figures in the micro ISV world (people like Bob Walsh, Eric Sink, Joel Spolsky, Ian Landsman and Andy Brice). But how can I describe the wonders of Testuff to them when I’m already thinking about the next version and the one after it?

Another thing I’m worried about is the fact that although every developer and QA I’ve talked to was very excited about Testuff, very few have visited the site and tried it out, not to mention started using it on their own team. Price shouldn’t be an obstacle as we’re giving it out for free right now and I don’t think there is a lack of need for a service like this. Something is amiss however and I still haven’t figured out what it is.

I’d love to hear any thoughts you may have on the subject and any advice you might have. You’ll probably need to install Testuff to do that (Ha! Gotcha!) so you’d better head on to the Testuff download page.

Building M2Crypto on Windows

September 25th, 2007

Here’s another installment in what seems to be turning into a series of compilation instructions for Windows of libraries that were born and raised on Linux.

Python has only the most basic support for secure SSL and HTTPS and if you know anything about how SSL works, you’ll know that support doesn’t provide enough security. I’ll leave the discussion of SSL, TLS, HTTPS and other related protocols and technologies to people who actually know something about it (any good links I should put here?), but the following quote from the Python documentation should put even the uninitiated on their toes:

class HTTPSConnection(host[, port, key_file, cert_file])

A subclass of HTTPConnection that uses SSL for communication with secure servers. Default port is 443. key_file is the name of a PEM formatted file that contains your private key. cert_file is a PEM formatted certificate chain file.

Warning: This does not do any certificate verification!

The red color is mine, but the warning is there (at least in Python 2.4.4 - I’ve been a bit slow to adopt 2.5 yet, but I don’t think it has changed).

What that means is that although you might think you’re using a secure connection when you’re using HTTPSConnection you really aren’t. At least not as secure as you thought. Although all the data transferred between you and the server will be encrypted, you won’t actually know you’re talking to the right server and wil be vulnerable to the man-in-the-middle attack.

But fear not, because M2Crypto comes to the rescue. M2Crypto is a Python library based on the well known OpenSSL library which does all the right cryptographic magic in all the right ways. M2Crypto has a compatible HTTPSConnection class that should work as a drop-in replacement of the one in httplib and actually authenticate the server correctly.

Now that we’ve got all this unimportant stuff out of the way, lets get our hands a dirty with building the library on Windows.

Tools you’ll need

Here are the programs you’ll need installed before you dig in:

  1. Python 2.4 or later - might work with earlier versions, but I haven’t tested it with anything but Python 2.4.4.
  2. Microsoft Visual Studio 2003 - this is the version that Python 2.4/2.5 is built with and this is the version you need to build M2Crypto. I don’t think any other (including 2005) will work.
  3. ActivePerl 5.8.7 - that’s the version I used, but I guess any reasonable Perl will do.
  4. Command prompt - you don’t need to install it, but you’re going to be using it a lot so you’d best be familiar with it.

Building OpenSSL for Windows

The first thing we’ll need to do is build us a fresh OpenSSL DLL.

  1. Download the latest OpenSSL source package from http://www.openssl.org/source/.
  2. Unzip and untar the package somewhere and open a command prompt there.
  3. > perl Configure VC-WIN32 –prefix=c:/openssl
  4. > ms\do_masm
  5. > nmake -f ms\ntdll.mak
  6. > nmake -f ms\ntdll.mak install

If something doesn’t work, refer to the INSTALL.W32 file in  the OpenSSL source package. I followed the intructions there to the letter and they worked.

Building M2Crypto for Windows

M2Crypto uses a tool called SWIG to help write the Python code that wraps the OpenSSL library that is written in C, so we’ll have to download and install it.

Let’s go.

  1. Download the latest SWIG Windows binaries from http://www.swig.org/download.html .
  2. Unzip and untar the SWIG package to some directory and add that directory to your PATH.
  3. Download the latest M2Crypto sources from http://chandlerproject.org/bin/view/Projects/MeTooCrypto.
  4. Unzip and untar the M2Crypto source somewhere and open a command prompt there.
  5. > python setup.py build_ext –openssl c:/openssl
  6. > python setup.py bdist_wininst

That last command will create a nice M2Crypto-0.18.win32-py2.4.exe file in the dist subdirectory which you can run to install M2Crypto in the Python site-packages directory.

To test your build, run python and do import M2Crypto. If you get an error that says ‘ImportError: DLL load failed with error code 182′, it’s because the M2Crypto library can’t find the OpenSSL DLLs. You’ll need to place the libeay32.dll and ssleay32.dll files somewhere python can find them. The directory in which your script resides is a good bet.