<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>gooli.org &#187; Development</title>
	<atom:link href="http://www.gooli.org/blog/category/development/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.gooli.org/blog</link>
	<description>on software development and related issues</description>
	<lastBuildDate>Mon, 28 Feb 2011 07:05:59 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Unicode and permalinks</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/</link>
		<comments>http://www.gooli.org/blog/unicode-and-permalinks/#comments</comments>
		<pubDate>Mon, 22 Sep 2008 08:11:38 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Testuff]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155</guid>
		<description><![CDATA[Working on integrating of automation scripts with Testuff, I&#8217;ve encountered an interesting Unicode-related issue I&#8217;d like to share.
The integration allows for an automated testing script to report the results of its run to the Testuff server. In order for the results to be grouped, displayed and summarized correctly, the automation script needs to tell the [...]]]></description>
			<content:encoded><![CDATA[<p>Working on integrating of automation scripts with <a href="http://www.testuff.com/">Testuff</a>, I&#8217;ve encountered an interesting Unicode-related issue I&#8217;d like to share.</p>
<p>The integration allows for an automated testing script to report the results of its run to the Testuff server. In order for the results to be grouped, displayed and summarized correctly, the automation script needs to tell the server which test it ran, and whether the test has passed or failed. A long discussion emerged on what the best way to uniquely identify tests.</p>
<p>After quite a bit of back and forth, we&#8217;ve settled on <a href="http://en.wikipedia.org/wiki/Permalink">permalinks</a>, those more-or-less-readable URLs that are in common use in blogs. The idea of a permalink is to take the title (of a blog post or a test) and replace any characters that aren&#8217;t numbers or letters with an underscore or a hyphen. Using this simple scheme, &#8220;Unicode and permalinks&#8221; becomes &#8220;unicode-and-permalinks&#8221;, which is quite suitable for use in a URL.</p>
<p>The implementation is a simple regular expression:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">def</span> to_permalink<span style="color: black;">&#40;</span><span style="color: #dc143c;">string</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;[^a-zA-Z0-9]+&quot;</span>, <span style="color: #483d8b;">&quot;_&quot;</span>, <span style="color: #dc143c;">string</span><span style="color: black;">&#41;</span>.<span style="color: black;">lower</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div>
<p>While this code works perfectly for the English language, it doesn&#8217;t work at all if <em>string</em> is a Unicode string containing something in Hebrew, Russian or Polish &#8211; language that some of our customers use. And so, I set out to write code that will essentially behave like the regular expression above, but will work for letters and numbers in all the languages of the world.</p>
<p>Fortunately the Unicode standard includes a rarely used classification of characters into various categories. For each given character we can find out whether it is an uppercase letter, a lowercase letter, and number, a punctuation mark and so on. Surprisingly, Python includes a module called <a href="http://docs.python.org/lib/module-unicodedata.html">unicodedata</a> that contains all that information. The function <em>category</em> accepts a character and returns a string that tells us <a href="http://www.unicode.org/Public/4.1.0/ucd/UCD.html#General_Category_Values">what the character is</a>: &#8220;Lu&#8221; denotes an uppercase letter, &#8220;Nd&#8221; denotes a decimal digit, etc.</p>
<p>All that remains to be done is go over the characters in the title, keep the letters and numbers, and replace all the other characters with a dash or an underscore. The regular expression at the end replaces any sequence of underscores into a single underscore to make the resulting URLs even nicer to look at.</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">def</span> to_permalink<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;<br />
&nbsp; &nbsp; Converts sequences of characters that aren&#8217;t letters or numbers<br />
&nbsp; &nbsp; to a single underscore to achieve wikpedia like unicode URLs.<br />
&nbsp; &nbsp; &quot;</span><span style="color: #483d8b;">&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">unicodedata</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> conv<span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #dc143c;">unicodedata</span>.<span style="color: black;">category</span><span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;L&quot;</span>, <span style="color: #483d8b;">&quot;N&quot;</span><span style="color: black;">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> c<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;_&quot;</span><br />
&nbsp; &nbsp; s2 = <span style="color: #483d8b;">&quot;&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>conv<span style="color: black;">&#40;</span>c<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> c <span style="color: #ff7700;font-weight:bold;">in</span> s<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;_+&quot;</span>, <span style="color: #483d8b;">&quot;_&quot;</span>, s2<span style="color: black;">&#41;</span></div>
<p><em>[Update]</em> Or, as Almad correctly pointed out, you could just use the <em>re</em> module support for Unicode and be done with it in two lines, which kind of takes the air out of this post.</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">def</span> to_permalink<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\W</span>+&quot;</span>, <span style="color: #dc143c;">re</span>.<span style="color: black;">UNICODE</span><span style="color: black;">&#41;</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;_&quot;</span>, s<span style="color: black;">&#41;</span></div>
<p>There&#8217;s one other thing to consider when dealing with Unicode permalinks. If you&#8217;re a native speaker of a language other than English, you&#8217;ve probably seen URLs that in your own language in Wikipedia.</p>
<p><a href="http://gooli.org/blog/wp-content/uploads/2008/09/hebrew-url.png"><img class="alignnone size-medium wp-image-156" title="hebrew-url" src="http://gooli.org/blog/wp-content/uploads/2008/09/hebrew-url-300x23.png" alt="" width="300" height="23" /></a></p>
<p><a href="http://gooli.org/blog/wp-content/uploads/2008/09/russian-url.png"><img class="alignnone size-medium wp-image-157" title="russian-url" src="http://gooli.org/blog/wp-content/uploads/2008/09/russian-url-300x23.png" alt="" width="300" height="23" /></a></p>
<p>From the looks of it, URLs can include characters in any language. Right?</p>
<p>Wrong.</p>
<p><a href="http://tools.ietf.org/html/rfc3986">RFC3986</a> defines the syntax for URLs (actually URIs, but that&#8217;s a moot point) explicitly and states which characters are allowed in a URL. This includes <a href="http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_in_a_URI">little more than English letters and numbers</a> from the lower half of the <a href="http://www.asciitable.com/">ASCII chart</a>.</p>
<p>If you look at the headers your browser passes when you access such a URL, you&#8217;ll see that it encodes all the characters with percent encoding, so neither the browser nor the web server is violating the standard. This is what the server saw when I navigated to the main Hebrew page of Wikipedia:</p>
<pre>GET /wiki/%D7%A2%D7%9E%D7%95%D7%93_%D7%A8%D7%90%D7%A9%D7%99 HTTP/1.1
Host: he.wikipedia.org</pre>
<p>In order to understand what this percent encoding means, you need to know a <a href="http://www.joelonsoftware.com/articles/Unicode.html">bit about Unicode</a>. Basically, the Unicode URL is encoded in UTF8 and each byte of the UTF8-encoded string is encoded using percent encoding. The browser apparently recognized this specific encoding scheme (which isn&#8217;t documented anywhere I could fine) and displays nice internationalized URLs for the user.</p>
<p>If you want to support such URLs in your server, you&#8217;ll probably need to write some code to translate the percent-encoded URLs into their actual Unicode representation.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/unicode-and-permalinks/&amp;title=Unicode+and+permalinks" title="Add 'Unicode and permalinks' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Unicode and permalinks' to Del.icio.us" alt="Add 'Unicode and permalinks' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/unicode-and-permalinks/&amp;title=Unicode+and+permalinks" title="Add 'Unicode and permalinks' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Unicode and permalinks' to digg" alt="Add 'Unicode and permalinks' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/unicode-and-permalinks/" title="Add 'Unicode and permalinks' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Unicode and permalinks' to Technorati" alt="Add 'Unicode and permalinks' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/unicode-and-permalinks/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Accessing SVN revision via a browser</title>
		<link>http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/</link>
		<comments>http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 08:17:56 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/</guid>
		<description><![CDATA[Most people who use Subversion know that you can access the repository with your browser to get a readonly interface that you can use to take a cursory look at the files in there.
This is how the Python repository looks like via http://svn.python.org/projects/python:

It says Revision 65620 at the top and I&#8217;ve always wondered if you [...]]]></description>
			<content:encoded><![CDATA[<p>Most people who use Subversion know that you can access the repository with your browser to get a readonly interface that you can use to take a cursory look at the files in there.</p>
<p>This is how the Python repository looks like via <a href="http://svn.python.org/projects/%21svn/bc/5000/python">http://svn.python.org/projects/python</a>:</p>
<p><img style="max-width: 800px;" src="http://gooli.org/blog/wp-content/uploads/2008/08/python-svn.png" /></p>
<p>It says <span style="font-style: italic;">Revision 65620</span> at the top and I&#8217;ve always wondered if you could access another revision in the same simple way. Turns out <a href="http://yuji.wordpress.com/2008/07/08/svn-access-specific-revision-from-web-browser/">there is a way</a>.</p>
<p>All you need to do is add <span style="font-weight: bold;">!svn/bc/REVISION</span> to the URL:</p>
<p><a href="http://svn.python.org/projects/%21svn/bc/5000/python">http://svn.python.org/projects/<span style="font-weight: bold;">!svn/bc/5000</span>/python</a> shows revision 5000 of the Python Subversion repository.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/&amp;title=Accessing+SVN+revision+via+a+browser" title="Add 'Accessing SVN revision via a browser' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Accessing SVN revision via a browser' to Del.icio.us" alt="Add 'Accessing SVN revision via a browser' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/&amp;title=Accessing+SVN+revision+via+a+browser" title="Add 'Accessing SVN revision via a browser' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Accessing SVN revision via a browser' to digg" alt="Add 'Accessing SVN revision via a browser' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/" title="Add 'Accessing SVN revision via a browser' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Accessing SVN revision via a browser' to Technorati" alt="Add 'Accessing SVN revision via a browser' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/accessing-svn-revision-via-a-browser/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Digging into Python&#8217;s PYC files</title>
		<link>http://www.gooli.org/blog/digging-into-pythons-pyc-files/</link>
		<comments>http://www.gooli.org/blog/digging-into-pythons-pyc-files/#comments</comments>
		<pubDate>Fri, 25 Jan 2008 12:20:23 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Testuff]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/digging-into-pythons-pyc-files/</guid>
		<description><![CDATA[One of the first things we needed to do when we started working on Testuff, was to figure out how are we going to update the installed desktop clients. This is one of those problems that seems to usually fall under the NIH syndrome, and like many others before me, I invented my own scheme. [...]]]></description>
			<content:encoded><![CDATA[<p>One of the first things we needed to do when we started working on <a href="http://www.testuff.com">Testuff</a>, was to figure out how are we going to update the installed desktop clients. This is one of those problems that seems to usually fall under the <a href="http://en.wikipedia.org/wiki/Not_Invented_Here">NIH</a> syndrome, and like many others before me, I invented my own scheme. The gist of it is a <a href="http://download.testuff.com/release/version.xml">version.xml</a> file that sits alongside the setup file for the newest release and looks something like this:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;update-info version="0.8.0[1212]"&gt;
    &lt;update file="TestuffSetup.exe" from-version="all"/&gt;
    &lt;update file="TestuffUpdate.exe" from-version="0.7.1[1110]"/&gt;
    &lt;md5hashes&gt;
        &lt;file md5="3a23dd6eff6fd6c1d0fbfcbfb0d57221" path="async.pyc"/&gt;
        &lt;file md5="0d1ea490a18c65cec7ba8715b5ea9e69" path="atexit.pyc"/&gt;
        &lt;file md5="166723a4330a98b573119326fc689322" path="base64.pyc"/&gt;
        &lt;file md5="01c1bda049936de570ed922424c057a8" path="BeautifulSoup.pyc"/&gt;
    &lt;/md5hashes&gt;
&lt;/update-info&gt;</pre>
<p>When the Testuff client launches, it gets the version.xml file from the server and compares its version to the version attribute of the <em>update-info</em> tag. If the client&#8217;s version is wrong, it checks the <em>update</em> tags to see which update it should download and install. We generate two separate setup files &#8211; one to update the most recent version to the new one called and another to update all the other (older) versions.</p>
<p>Aside from the info about which version of the client should use which update file, version.xml also contains the MD5 hashes for each file in the distribution. That might seem like a lot of wasted space and time, but it&#8217;s actually there for a very good reason. When our setup building script is creating <em>TestuffUpdate.exe</em>, it too downloads version.xml from our server. It then tries to determine which files have changed or have been added since the last version by comparing the MD5 hashes in version.xml to the the hashes of the actual files that have been generated by the build. Any file that is different is added to the update so we can be sure we haven&#8217;t missed any essential component in the update.</p>
<p>Recently I discovered that our update files are much larger than they should be. We release a new version with just a couple of fixes in a single module, and the size of the update is half the size of the full install. As it turned out, that most of the PYC files were marked as changed and added to the update. That didn&#8217;t seem right, especially for things like threading.pyc, which is a Python module that shouldn&#8217;t change unless you upgrade to a different version of Python, which we didn&#8217;t (still stuck at 2.4.4 I&#8217;m afraid). That got me curious enough to go digging in the, apparently undocumented, binary structure of the PYC files.</p>
<blockquote><p>This module contains functions that can read and write Python values in a binary  format. The format is specific to Python, but independent of machine  architecture issues (e.g., you can write a Python value to a file on a PC,  transport the file to a Sun, and read it back there). <strong>Details of the format are  undocumented on purpose</strong>; it may change between Python versions (although it  rarely does).</p></blockquote>
<p>The first thing I did was compare the two threading.pyc files &#8211; the one from the current distribution and the one just generated by the build script. The result showed there was difference in only two bytes:</p>
<pre>D:GooliDevTempcompare&gt;fc /b threading-old.pyc threading-new.pyc
Comparing files threading-old.pyc and threading-new.pyc
00000004: 6E CE
00000005: F7 6A</pre>
<p>Only two bytes differ, and they are right at the beginning of the file? That looks suspiciously like a version or a timestamp in the file header. Since the PYC file structure is undocumented, I went looking for the details in Python&#8217;s source code, but the answer was actually closer to home &#8211; in the <a href="http://docs.python.org/lib/compiler.html">compiler</a> package. A file called pycodegen.py in Python\Lib\compiler contains the following code:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">def</span> getPycHeader<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; mtime = <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">getmtime</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">filename</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; mtime = <span style="color: #dc143c;">struct</span>.<span style="color: black;">pack</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;&amp;lt;i&#8217;</span>, mtime<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">MAGIC</span> + mtime</div>
<p>So, the PYC header file contains a magic number that identifies the Python release and the modification time of the original source file as the number of seconds since the epoch. That shouldn&#8217;t be a problem &#8211; the threading module hasn&#8217;t changed and should have the same timestamp. But as we&#8217;ve seen, the PYC files were different. How can that be?</p>
<p>Acting on a hunch, I wrote a short script to read the header from the PYC file and print the embedded date:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>, <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">struct</span><br />
<span style="color: #ff7700;font-weight:bold;">import</span> timedef print_internal_date<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; f = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>filename, <span style="color: #483d8b;">&quot;rb&quot;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; data = f.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; mtime = <span style="color: #dc143c;">struct</span>.<span style="color: black;">unpack</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&amp;lt;i&quot;</span>, data<span style="color: black;">&#91;</span><span style="color: #ff4500;">4</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">time</span>.<span style="color: black;">asctime</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">time</span>.<span style="color: black;">gmtime</span><span style="color: black;">&#40;</span>mtime<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>print_internal_date<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;threading-old.pyc&quot;</span><span style="color: black;">&#41;</span><br />
print_internal_date<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;threading-new.pyc&quot;</span><span style="color: black;">&#41;</span></div>
<p>Which printed the following results:</p>
<pre>Mon Mar 13 22:51:26 2006
Mon Mar 13 12:51:26 2006</pre>
<p>Notice anything odd about them? They are <em>exactly</em> 10 hours apart. At first I thought I might actually be looking at two different versions of threading.py, but the chances of two edits being exactly 10 hours apart right down to the second is practically non-existent. It had to be something with time zones. I live and work in Israel, which is at GMT+2:00. The default timezone for Windows is Pacific time, which is GMT-8:00. Exactly 10 hours apart. However, no matter how I tweak the Regional Settings on my computer, all the PYC files I generate here have the same timestamp. Perhaps it has to do with the timezone you have set when you install Python. If I ever find out, I&#8217;ll let you know.</p>
<p>But that wasn&#8217;t the point of this post. The point was to figure out what PYC files look inside and we did that, at least in part &#8211; they start with a magic number that is different for each Python version (check out the comments in <a href="http://svn.python.org/view/python/tags/r244/Python/import.c?rev=52384&amp;view=markup">import.c</a>), and they have an embedded timestamp of the source code they got generated from after that. The rest is generated by the marshal module and can be read by it to get the code objects and the global data in the module.</p>
<p>Another thing to be learned from this is that we really should always build the Testuff client on the same machine, which is why I&#8217;m heading to the office right now to burn a copy of the VMWare image I created with everything needed to build Testuff. We got a new version with a couple of important fixes to our Mantis support to release today.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/digging-into-pythons-pyc-files/&amp;title=Digging+into+Python%26%238217%3Bs+PYC+files" title="Add 'Digging into Python&#8217;s PYC files' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Digging into Python&#8217;s PYC files' to Del.icio.us" alt="Add 'Digging into Python&#8217;s PYC files' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/digging-into-pythons-pyc-files/&amp;title=Digging+into+Python%26%238217%3Bs+PYC+files" title="Add 'Digging into Python&#8217;s PYC files' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Digging into Python&#8217;s PYC files' to digg" alt="Add 'Digging into Python&#8217;s PYC files' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/digging-into-pythons-pyc-files/" title="Add 'Digging into Python&#8217;s PYC files' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Digging into Python&#8217;s PYC files' to Technorati" alt="Add 'Digging into Python&#8217;s PYC files' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/digging-into-pythons-pyc-files/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Win32: GetFocus across process boundries</title>
		<link>http://www.gooli.org/blog/win32-getfocus-across-process-boundries/</link>
		<comments>http://www.gooli.org/blog/win32-getfocus-across-process-boundries/#comments</comments>
		<pubDate>Sat, 29 Dec 2007 18:10:55 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/win32-finding-the-focused-windows/</guid>
		<description><![CDATA[I&#8217;ve been using Miranda for all my chats for a few years now, since the time it was a nightmare to install and configure. The chief reason I like it is the fact that it supports Hebrew amazingly well, via the TabSRMM plugin. You can even set the right-to-left settings for the chat window on [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using <a href="http://www.miranda-im.org/">Miranda</a> for all my chats for a few years now, since the time it was a nightmare to install and configure. The chief reason I like it is the fact that it supports Hebrew amazingly well, via the <a href="http://addons.miranda-im.org/details.php?action=viewfile&amp;id=1401">TabSRMM</a> plugin. You can even set the right-to-left settings for the chat window on a person-by-person basis.</p>
<p>However, the latest version of Miranda broke my language switching gizmo, <a href="http://www.gooli.org/blog/recaps/">Recaps</a>. Every once in a while the CapsLock key would stop switching the language in the Miranda chat window, and even restarting Recaps didn&#8217;t help. The basic thing Recaps does upon detecting that CapsLock has been pressed is send an <a href="http://msdn2.microsoft.com/en-us/library/ms632630.aspx">WM_INPUTLANGCHANGEREQUEST</a> message the the current foreground window, which is obtained via the <a href="http://msdn2.microsoft.com/en-us/library/ms633505.aspx">GetForegroundWindow</a> Win32 API function. Apparently I overlooked the fact that the MSDN specifically says the aforementioned message is sent to the window that <em>has the keyboard focus</em>, which I clear wasn&#8217;t doing. GetForegroundWindow just gets you the current top-level window, not the actual control that has the keyboard focus.</p>
<p>All I had to do was just replace the GetForegroundWindow call something that gets the window that has the focus and be done with it. And <a href="http://msdn2.microsoft.com/en-us/library/ms646294%28VS.85%29.aspx">GetFocus</a> seemed to be just what I needed. Only there was a catch:</p>
<blockquote><p>The <strong>GetFocus</strong> function retrieves the handle to the window that has the keyboard focus, <em>if the window is attached to the calling thread&#8217;s message queue</em>.</p></blockquote>
<p>That isn&#8217;t good enough of course as the whole point was finding out which window had the focus when that window belonged to a different application, running in a different process, in a different thread. There are <a href="http://www.codeguru.com/Cpp/W-P/system/processesmodules/article.php/c5767/">a few techniques</a> to inject code into a remote process, but they all require creating a separate DLL and tricking the target process into loading it either via <a href="http://msdn2.microsoft.com/en-us/library/ms644990.aspx">hooks</a> or using the <a href="http://www.codeguru.com/Cpp/W-P/system/processesmodules/article.php/c5767/">CreateRemoteThread/LoadLibrary</a> trick. I was getting ready to dive into that dark abyss when I stumbled upon the <a href="http://msdn2.microsoft.com/en-us/library/ms681956%28VS.85%29.aspx">AttachThreadInput</a> function:</p>
<blockquote><p>Windows created in different threads typically process input independently of each other. That is, they have their own input states (<em>focus</em>, active, capture windows, key state, queue status, and so on), and they are not synchronized with the input processing of other threads. By using the <strong>AttachThreadInput </strong>function, a thread can attach its input processing to another thread. This also allows threads to share their input states, so they can call the SetFocus function to set the keyboard focus to a window of a different thread. This also allows threads to get key-state information. These capabilities are not generally possible.</p></blockquote>
<p>So, all I had to do was call AttachThreadInput to connect my main UI thread to the thread responsible for the current foreground window, call GetFocus to find out which window has the focus, and then call AttachThreadInput again to give the control of the input back to the original thread.</p>
<p>Here&#8217;s the code to do that:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;">HWND RemoteGetFocus<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><br />
<span style="color: #000000;">&#123;</span><br />
&nbsp; &nbsp; HWND hwnd = GetForegroundWindow<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;<br />
&nbsp; &nbsp; DWORD remoteThreadId = GetWindowThreadProcessId<span style="color: #000000;">&#40;</span>hwnd, <span style="color: #0000ff;">NULL</span><span style="color: #000000;">&#41;</span>;<br />
&nbsp; &nbsp; DWORD currentThreadId = GetCurrentThreadId<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;<br />
&nbsp; &nbsp; AttachThreadInput<span style="color: #000000;">&#40;</span>remoteThreadId, currentThreadId, <span style="color: #0000ff;">TRUE</span><span style="color: #000000;">&#41;</span>;<br />
&nbsp; &nbsp; HWND focused = GetFocus<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>;<br />
&nbsp; &nbsp; AttachThreadInput<span style="color: #000000;">&#40;</span>remoteThreadId, currentThreadId, <span style="color: #0000ff;">FALSE</span><span style="color: #000000;">&#41;</span>;<br />
&nbsp; &nbsp; <span style="color: #0000ff;">return</span> focused;<br />
<span style="color: #000000;">&#125;</span></div>
<p>This function will only work if called from a thread that has created a message queue, which is the case for a standard Win32 main thread that runs in a message loop.</p>
<p>Miranda and Recaps now play well together.</p>
<p>Yay!</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/win32-getfocus-across-process-boundries/&amp;title=Win32%3A+GetFocus+across+process+boundries" title="Add 'Win32: GetFocus across process boundries' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Win32: GetFocus across process boundries' to Del.icio.us" alt="Add 'Win32: GetFocus across process boundries' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/win32-getfocus-across-process-boundries/&amp;title=Win32%3A+GetFocus+across+process+boundries" title="Add 'Win32: GetFocus across process boundries' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Win32: GetFocus across process boundries' to digg" alt="Add 'Win32: GetFocus across process boundries' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/win32-getfocus-across-process-boundries/" title="Add 'Win32: GetFocus across process boundries' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Win32: GetFocus across process boundries' to Technorati" alt="Add 'Win32: GetFocus across process boundries' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/win32-getfocus-across-process-boundries/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Touch typing in multiple languages &#8211; Recaps</title>
		<link>http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/</link>
		<comments>http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/#comments</comments>
		<pubDate>Mon, 10 Dec 2007 10:55:02 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/</guid>
		<description><![CDATA[I learned touch typing a long time ago. Since I spend most of my waking hours in front of a computer typing either text or code, touch typing is something I can&#8217;t live without. Sometimes however, I am faced with a daunting task of writing an email or a document in a mix of two [...]]]></description>
			<content:encoded><![CDATA[<p>I learned touch typing a long time ago. Since I spend most of my waking hours in front of a computer typing either text or code, touch typing is something I can&#8217;t live without. Sometimes however, I am faced with a daunting task of writing an email or a document in a mix of two languages. Technical documents in Hebrew for instance, usually contain quite a lot of English terms. I can touch type in Hebrew as well as I can in English, but when the time comes to switch between languages, that weird Alt-Shift combination really kills my flow. I might be nitpicking a bit here, but I can&#8217;t tell you how many times I pressed Shift-Alt instead of Alt-Shift and wound up in the application&#8217;s menu instead of changing the current language.<a href="http://gooli.org/blog/wp-content/uploads/2007/12/image1.png"><img style="border: 0px none ; margin: 5px;" alt="image" src="http://gooli.org/blog/wp-content/uploads/2007/12/image-thumb1.png" align="right" border="0" height="198" width="299" /></a></p>
<p>Then there&#8217;s the CapsLock key. I don&#8217;t think anybody uses it nowadays, and even the touch typists seem to just HOLD THE SHIFT WITH THEIR PINKY and type what needs to be in capital letters. I wrote a small program called <a href="http://www.gooli.org/blog/recaps">Recaps</a> a while ago that converts CapsLock into a language switching key. Now I can&#8217;t live without it. I find myself instinctively hitting CapsLock to switch languages never thinking about it, even on computers I didn&#8217;t install it on. Needless to say it&#8217;s one of the first things I install on a computer I need to work on.</p>
<p>I talked to an old friend of mine last night who said he was using Recaps and spreading it around but he was missing a feature. When there were more three or more languages installed on the computer, Recaps would just cycle through all of them, like Alt-Shift does. Most times however, you only use two languages at any given time, typically English and your native tongue, and only need to switch between these two.</p>
<p>Doing this in Win32 API was a bitch, but I finally got a tray icon and a small menu to work. The menu shows the list of languages currently installed on your computer with check boxes next to them. Hitting CapsLock now only cycles through the languages that are currently enabled and even saves the active languages between runs.</p>
<p>You can download source and binaries for the new 0.3 version from my <a href="http://www.gooli.org/blog/recaps">Recaps page</a>.</p>
<p>I&#8217;d love to know if anybody finds it as useful as I do.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/&amp;title=Touch+typing+in+multiple+languages+%26%238211%3B+Recaps" title="Add 'Touch typing in multiple languages &#8211; Recaps' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Touch typing in multiple languages &#8211; Recaps' to Del.icio.us" alt="Add 'Touch typing in multiple languages &#8211; Recaps' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/&amp;title=Touch+typing+in+multiple+languages+%26%238211%3B+Recaps" title="Add 'Touch typing in multiple languages &#8211; Recaps' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Touch typing in multiple languages &#8211; Recaps' to digg" alt="Add 'Touch typing in multiple languages &#8211; Recaps' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/" title="Add 'Touch typing in multiple languages &#8211; Recaps' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Touch typing in multiple languages &#8211; Recaps' to Technorati" alt="Add 'Touch typing in multiple languages &#8211; Recaps' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/touch-typing-in-multiple-languages-recaps/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>A simple lexer in Python</title>
		<link>http://www.gooli.org/blog/a-simple-lexer-in-python/</link>
		<comments>http://www.gooli.org/blog/a-simple-lexer-in-python/#comments</comments>
		<pubDate>Sat, 20 Oct 2007 22:59:54 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/a-simple-lexer-in-python/</guid>
		<description><![CDATA[I&#8217;m taking a course on building compilers at the Israeli Open University and just learned how to use flex. It occurred to me that building a simple lexical analyzer should be quite easy with Python&#8217;s re module. A typical lexical analyzer read a stream of text input and splits it into a list of tokens. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m taking a course on building compilers at the <a href="http://www-e.openu.ac.il/">Israeli Open University</a> and just learned how to use <a href="http://dinosaur.compilertools.net/">flex</a>. It occurred to me that building a simple <a href="http://en.wikipedia.org/wiki/Lexical_analysis">lexical analyzer</a> should be quite easy with Python&#8217;s <a href="http://docs.python.org/lib/module-re.html">re module</a>. A typical lexical analyzer read a stream of text input and splits it into a list of tokens. The simplest example of such a thing is the <i>split</i> function which takes a sentence and returns the list of words in it.</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;">s = <span style="color: #483d8b;">&quot;A simple lexer in Python&quot;</span><br />
s.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
<span style="color: black;">&#91;</span><span style="color: #483d8b;">&#8216;A&#8217;</span>, <span style="color: #483d8b;">&#8217;simple&#8217;</span>, <span style="color: #483d8b;">&#8216;lexer&#8217;</span>, <span style="color: #483d8b;">&#8216;in&#8217;</span>, <span style="color: #483d8b;">&#8216;Python&#8217;</span><span style="color: black;">&#93;</span></div>
<p>The problem becomes more complex when you need to separate the tokens you find into different kinds, words and numbers, for instance. We&#8217;ll use <a href="http://99-bottles-of-beer.net/">a well known lyric</a> as our sample text:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;">s = <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;99 bottles of beer on the wall, 99 bottles of beer.<br />
Take one down and pass it around, 98 bottles of beer on the wall.&quot;</span><span style="color: #483d8b;">&quot;&quot;</span></div>
<p>The first thing we need to do is build a regular expression that recognizes words and another one that recognizes numbers. Although there are shorter ways to build those regular expressions, I like the less obscure form:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;">wordsRegex = <span style="color: #483d8b;">&quot;[A-Za-z]+&quot;</span><br />
numbersRegex = <span style="color: #483d8b;">&quot;[0-9]+&quot;</span></div>
<p>We could now use findall on the string and get all the numbers and words out of it.</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span>wordsRegex, s<span style="color: black;">&#41;</span><br />
<span style="color: black;">&#91;</span><span style="color: #483d8b;">&#8216;bottles&#8217;</span>, <span style="color: #483d8b;">&#8216;of&#8217;</span>, <span style="color: #483d8b;">&#8216;beer&#8217;</span>, <span style="color: #483d8b;">&#8216;on&#8217;</span>, <span style="color: #483d8b;">&#8216;the&#8217;</span>, <span style="color: #483d8b;">&#8216;wall&#8217;</span>, <span style="color: #483d8b;">&#8216;bottles&#8217;</span>, <span style="color: #483d8b;">&#8216;of&#8217;</span>, <span style="color: #483d8b;">&#8216;beer&#8217;</span>, <span style="color: #483d8b;">&#8216;Take&#8217;</span>, <span style="color: #483d8b;">&#8216;one&#8217;</span>, <span style="color: #483d8b;">&#8216;down&#8217;</span>, <span style="color: #483d8b;">&#8216;and&#8217;</span>, <span style="color: #483d8b;">&#8216;pass&#8217;</span>, <span style="color: #483d8b;">&#8216;it&#8217;</span>, <span style="color: #483d8b;">&#8216;around&#8217;</span>, <span style="color: #483d8b;">&#8216;bottles&#8217;</span>, <span style="color: #483d8b;">&#8216;of&#8217;</span>, <span style="color: #483d8b;">&#8216;beer&#8217;</span>, <span style="color: #483d8b;">&#8216;on&#8217;</span>, <span style="color: #483d8b;">&#8216;the&#8217;</span>, <span style="color: #483d8b;">&#8216;wall&#8217;</span><span style="color: black;">&#93;</span></p>
<p><span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span>numbersRegex, s<span style="color: black;">&#41;</span><br />
<span style="color: black;">&#91;</span><span style="color: #483d8b;">&#8216;99&#8242;</span>, <span style="color: #483d8b;">&#8216;99&#8242;</span>, <span style="color: #483d8b;">&#8216;98&#8242;</span><span style="color: black;">&#93;</span></div>
<p>But wait, you say, that isn&#8217;t what we wanted at all! We need to get the tokens in the order of their appearance in text and still get the type of each token. Something along the lines of </p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">for</span> tokenType, tokenText <span style="color: #ff7700;font-weight:bold;">in</span> lexer<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> tokenType, tokenText</div>
<p>would be really nice. </p>
<p>In order to do that, we&#8217;ll need to combine both regular expressions into one and iterate on the result of findall examining each token to decide on its type.</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;">regex = <span style="color: #483d8b;">&quot;(%s)|(%s)&quot;</span> % <span style="color: black;">&#40;</span>wordsRegex, numbersRegex<span style="color: black;">&#41;</span><br />
<span style="color: #483d8b;">&#8216;([A-Za-z]+)|([0-9]+)&#8217;</span><br />
<span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span>regex, s<span style="color: black;">&#41;</span><br />
<span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8221;</span>, <span style="color: #483d8b;">&#8216;99&#8242;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;bottles&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;of&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;beer&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <br />
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;on&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;the&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;wall&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8221;</span>, <span style="color: #483d8b;">&#8216;99&#8242;</span><span style="color: black;">&#41;</span>, <br />
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;bottles&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;of&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;beer&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;Take&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>,<br />
&nbsp;<span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;one&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;down&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;and&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;pass&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <br />
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;it&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;around&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8221;</span>, <span style="color: #483d8b;">&#8216;98&#8242;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;bottles&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <br />
<span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;of&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;beer&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;on&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;the&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span><span style="color: #483d8b;">&#8216;wall&#8217;</span>, <span style="color: #483d8b;">&#8221;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></div>
<p>As you can see, the result of the call to findall is a list of tuples, each containing a single match. If you look closely at the way I&#8217;ve combined the two regular expressions, you&#8217;ll see that each part is surrounded with parenthesis and that there&#8217;s a pipe (|) between the expressions. The compound regular expression matches either a number rf a word and each tuple in the return value of findall contains the matches for each parenthesized part of the regexp. However, since we combined the parts using a pipe (|), only one of the parts matches each time.</p>
<p>Using that knowledge we can now construct a simple loop that shows the token type for each of the words in the lyric:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">for</span> t <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span>regex, s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> t<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;word&quot;</span>, t<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">elif</span> t<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;number&quot;</span>, t<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span></div>
<p>We now have most of the knowledge we need to build ourselves a lexer that will take a list of regular expressions and some text and return (or even better, generate) an list of tokens and their types. We&#8217;ll need to combine the regular expressions for each token into one big regex using pipes, scan the string, and gather the tokens and their types.</p>
<p>Our usage code looks like this:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;">definitions = <span style="color: black;">&#91;</span><br />
&nbsp; &nbsp; <span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;word&quot;</span>, <span style="color: #483d8b;">&quot;[A-Za-z]+&quot;</span><span style="color: black;">&#41;</span>,<br />
&nbsp; &nbsp; <span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;number&quot;</span>, <span style="color: #483d8b;">&quot;[0-9]+&quot;</span><span style="color: black;">&#41;</span>,<br />
<span style="color: black;">&#93;</span></p>
<p>lex = Lexer<span style="color: black;">&#40;</span>definitions<span style="color: black;">&#41;</span><br />
<span style="color: #ff7700;font-weight:bold;">for</span> tokenType, tokenValue <span style="color: #ff7700;font-weight:bold;">in</span> lex.<span style="color: black;">parse</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">print</span> tokenType, tokenValue</div>
<p>And here is the code for the lexer itself:</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">class</span> Lexer<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, definitions<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">definitions</span> = definitions<br />
&nbsp; &nbsp; &nbsp; &nbsp; parts = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> name, part <span style="color: #ff7700;font-weight:bold;">in</span> definitions:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; parts.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;(?P&amp;lt;%s&amp;gt;%s)&quot;</span> % <span style="color: black;">&#40;</span>name, part<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">regexpString</span> = <span style="color: #483d8b;">&quot;|&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>parts<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">regexp</span> = <span style="color: #dc143c;">re</span>.<span style="color: #008000;">compile</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">regexpString</span>, <span style="color: #dc143c;">re</span>.<span style="color: black;">MULTILINE</span><span style="color: black;">&#41;</span></p>
<p>
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> parse<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, text<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># yield lexemes</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> match <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">regexp</span>.<span style="color: black;">finditer</span><span style="color: black;">&#40;</span>text<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; found = <span style="color: #008000;">False</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">for</span> name, rexp <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">definitions</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; m = match.<span style="color: black;">group</span><span style="color: black;">&#40;</span>name<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> m <span style="color: #ff7700;font-weight:bold;">is</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">None</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">yield</span> <span style="color: black;">&#40;</span>name, m<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">break</span></div>
<p>Some notes on the implementation are in order. I&#8217;ve used the little known (?P&lt;name&gt;&#8230;) syntax for naming the parenthesized groups of regular expressions. Using that syntax the expression (?P&lt;word&gt;[A-Za-z]) matches a word and that match is accessible with match.group(&#8217;word&#8217;) where match is a re.Match object.</p>
<p>In order to speed things up a bit, I&#8217;ve compiled the regular expression when the Lexer object is created, used the finditer function instead of findall, and made parse a generator instead of a list returning function.</p>
<p>Using this simple lexer implementation it was quite simple to create a Python-to-HTML converter with syntax highlighting that works well enough to highlight the code of the highlighter itself!</p>
<p>The code for the lexer and syntax highlighter example are available <a href="/snippets/pylexer.py">here</a> and on my <a href="snippets">snippets page</a>. You can also see the result of running the syntax highlighter on itself <a href="/snippets/pylexer.html">here</a>.</p>
<p>Enjoy lexing and let me know if you found this useful.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/a-simple-lexer-in-python/&amp;title=A+simple+lexer+in+Python" title="Add 'A simple lexer in Python' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'A simple lexer in Python' to Del.icio.us" alt="Add 'A simple lexer in Python' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/a-simple-lexer-in-python/&amp;title=A+simple+lexer+in+Python" title="Add 'A simple lexer in Python' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'A simple lexer in Python' to digg" alt="Add 'A simple lexer in Python' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/a-simple-lexer-in-python/" title="Add 'A simple lexer in Python' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'A simple lexer in Python' to Technorati" alt="Add 'A simple lexer in Python' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/a-simple-lexer-in-python/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Testuff &#8211; a test case management service</title>
		<link>http://www.gooli.org/blog/testuff-a-test-case-management-service/</link>
		<comments>http://www.gooli.org/blog/testuff-a-test-case-management-service/#comments</comments>
		<pubDate>Wed, 17 Oct 2007 13:41:24 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Marketing]]></category>
		<category><![CDATA[Testuff]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/testuff-a-test-case-management-service/</guid>
		<description><![CDATA[&#60;marketing&#62;
I haven&#8217;t posted too much here lately and for a good reason. Arik and I have been hard at work to release the first public beta of our test management service called Testuff. Developing software is hard enough when you have plenty of resources but when you are a one- or two-man shop with limited [...]]]></description>
			<content:encoded><![CDATA[<p>&lt;marketing&gt;</p>
<p><a href="http://gooli.org/blog/wp-content/uploads/2007/10/icon.jpg"><img src="http://gooli.org/blog/wp-content/uploads/2007/10/icon-thumb.jpg" style="border: 0px none " alt="icon" align="right" border="0" height="160" width="160" /></a>I haven&#8217;t posted too much here lately and for a good reason. Arik and I have been hard at work to release the first public beta of our test management service called <a href="http://www.testuff.com">Testuff</a>. Developing software is hard enough when you have plenty of resources but when you are a one- or two-man shop with limited funds it&#8217;s even harder. We&#8217;ve built Testuff to help small companies and <a href="http://en.wikipedia.org/wiki/Micro_ISV">mISV&#8217;s</a> like ourselves manage and run their software tests. We&#8217;ve based it on the <a href="http://en.wikipedia.org/wiki/Software_as_a_Service">SaaS</a> model so you don&#8217;t have to install any servers, but we also made a rich desktop client for it so you could enjoy a better user experience. If you&#8217;re doing any sort real development for actual, breathing clients, you should <a href="http://www.testuff.com/download">try it out</a>.</p>
<p>&lt;/marketing&gt;</p>
<p>It&#8217;s been a week since the public release and although we made some marketing efforts (like this post) we&#8217;re still not getting enough traffic to our site. Only a few people have actually downloaded and tried to use our application and I think there&#8217;s only one name on that list that I don&#8217;t know. I realize we should be doing more marketing and getting the word out to as many people as we can but I don&#8217;t seem to be able to get past my perfectionism. I&#8217;m looking at Testuff now and it is (aside from some bugs and quirks) a fine achievement. It is quite convenient, rather pretty and has some really cool features like recording the video of the application you&#8217;re testing so you could reproduce the bugs with ease. However, since I&#8217;ve been working on it for so long, I&#8217;ve gotten u sed to all the cool things by now and I am already cultivating a new vision in my mind. A cleaner interface, less features, a faster bug video recorder, an ability to email a test to your friends who could run it and report the recording of the bug directly and so on. I&#8217;m struggling because I&#8217;ve promised my partner I&#8217;d write emails to some key figures in the micro ISV world (people like <a href="http://47hats.com/">Bob Walsh</a>, <a href="http://www.ericsink.com/">Eric Sink</a>, <a href="http://www.joelonsoftware.com/">Joel Spolsky</a>, <a href="http://www.userscape.com/blog/">Ian Landsman</a> and <a href="http://successfulsoftware.net/">Andy Brice</a>). But how can I describe the wonders of Testuff to them when I&#8217;m already thinking about the next version and the one after it?</p>
<p>Another thing I&#8217;m worried about is the fact that although every developer and QA I&#8217;ve talked to was very excited about Testuff, very few have visited the site and tried it out, not to mention started using it on their own team. Price shouldn&#8217;t be an obstacle as we&#8217;re giving it out for free right now and I don&#8217;t think there is a lack of need for a service like this. Something is amiss however and I still haven&#8217;t figured out what it is.</p>
<p>I&#8217;d love to hear any thoughts you may have on the subject and any advice you might have. You&#8217;ll probably need to install Testuff to do that (Ha! Gotcha!) so you&#8217;d better head on to the <a href="http://www.testuff.com/download">Testuff download page</a>.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/testuff-a-test-case-management-service/&amp;title=Testuff+%26%238211%3B+a+test+case+management+service" title="Add 'Testuff &#8211; a test case management service' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Testuff &#8211; a test case management service' to Del.icio.us" alt="Add 'Testuff &#8211; a test case management service' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/testuff-a-test-case-management-service/&amp;title=Testuff+%26%238211%3B+a+test+case+management+service" title="Add 'Testuff &#8211; a test case management service' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Testuff &#8211; a test case management service' to digg" alt="Add 'Testuff &#8211; a test case management service' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/testuff-a-test-case-management-service/" title="Add 'Testuff &#8211; a test case management service' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Testuff &#8211; a test case management service' to Technorati" alt="Add 'Testuff &#8211; a test case management service' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/testuff-a-test-case-management-service/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Building M2Crypto on Windows</title>
		<link>http://www.gooli.org/blog/building-m2crypto-on-windows/</link>
		<comments>http://www.gooli.org/blog/building-m2crypto-on-windows/#comments</comments>
		<pubDate>Tue, 25 Sep 2007 19:19:30 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/building-m2crypto-on-windows/</guid>
		<description><![CDATA[Here&#8217;s another installment in what seems to be turning into a series of compilation instructions for Windows of libraries that were born and raised on Linux.
Python has only the most basic support for secure SSL and HTTPS&#160;and if you know anything about how SSL works, you&#8217;ll know that support doesn&#8217;t provide enough security. I&#8217;ll leave&#160;the [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s another installment in what seems to be turning into a series of compilation instructions for Windows of libraries that were born and raised on Linux.</p>
<p>Python has only the most basic support for secure SSL and HTTPS&nbsp;and if you know anything about how SSL works, you&#8217;ll know that support doesn&#8217;t provide enough security. I&#8217;ll leave&nbsp;the discussion of SSL, TLS, HTTPS and other related protocols and technologies to people who actually know something about it (any good links I should put here?), but the following quote from the Python documentation should put even the uninitiated on their toes:</p>
<blockquote><p>
<dt>
<p><font face="Courier New" size="1">class HTTPSConnection(<em>host</em>[, <em>port</em>, <em>key_file</em>, <em>cert_file</em>])</font></p>
<dd>A subclass of <tt>HTTPConnection</tt> that uses SSL for communication with secure servers. Default port is <code>443</code>. <var>key_file</var> is the name of a PEM formatted file that contains your private key. <var>cert_file</var> is a PEM formatted certificate chain file.
<p><font color="#ff0000"><b>Warning:</b> This does not do any certificate verification!</font></p>
</dd>
</blockquote>
<p>The red color is mine, but the warning is there (at least in Python 2.4.4 &#8211; I&#8217;ve been a bit slow to adopt 2.5 yet, but I don&#8217;t think it has changed).
<p>What that means is that although you might think you&#8217;re using a secure connection when you&#8217;re using HTTPSConnection you really aren&#8217;t. At least not as secure as you thought. Although all the data transferred between you and the server will be encrypted, you won&#8217;t actually know you&#8217;re talking to the right server and wil be vulnerable to the <a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack">man-in-the-middle attack</a>.
<p>But fear not, because <a href="http://chandlerproject.org/bin/view/Projects/MeTooCrypto">M2Crypto</a> comes to the rescue. M2Crypto&nbsp;is a Python library based on the well known <a href="http://www.openssl.org/">OpenSSL</a> library which does all the right cryptographic magic in all the right ways. M2Crypto has a compatible HTTPSConnection class that should work as a drop-in replacement of the one in httplib and actually authenticate the server correctly.
<p>Now that we&#8217;ve got all this unimportant stuff out of the way, lets get our hands a dirty with building the library on Windows.<br />
<h4>Tools you&#8217;ll need</h4>
<p>Here are the programs you&#8217;ll need installed before you dig in:</p>
<ol>
<li><strong>Python 2.4 or later</strong> &#8211; might work with earlier versions, but I haven&#8217;t tested it with anything but Python 2.4.4.</li>
<li><strong>Microsoft Visual Studio 2003</strong> &#8211; this is the version that Python 2.4/2.5 is built with and this is the version you need to build M2Crypto. I don&#8217;t think any other (including 2005) will work.</li>
<li><a href="http://www.activestate.com/store/activeperl/download/"><strong>ActivePerl 5.8.7</strong></a> &#8211; that&#8217;s the version I used, but I guess any reasonable Perl will do.</li>
<li>Command prompt &#8211; you don&#8217;t need to install it, but you&#8217;re going to be using it a lot so you&#8217;d best be familiar with it.</li>
</ol>
<h4>Building OpenSSL for Windows</h4>
<p>The first thing we&#8217;ll need to do is build us a fresh OpenSSL DLL.</p>
<ol>
<li>Download the latest OpenSSL source package from <a title="http://www.openssl.org/source/" href="http://www.openssl.org/source/">http://www.openssl.org/source/</a>.</li>
<li>Unzip and untar the package somewhere and open a command prompt there.</li>
<li><font face="Courier New">&gt; perl Configure VC-WIN32 &#8211;prefix=c:/openssl</font></li>
<li><font face="Courier New">&gt; ms\do_masm</font></li>
<li><font face="Courier New">&gt; nmake -f ms\ntdll.mak</font></li>
<li><font face="Courier New">&gt; nmake -f ms\ntdll.mak install</font></li>
</ol>
<p>If something doesn&#8217;t work, refer to the INSTALL.W32 file in&nbsp; the OpenSSL source package. I followed the intructions there to the letter and they worked.</p>
<h4>Building M2Crypto for Windows</h4>
<p>M2Crypto uses a tool called <a href="http://www.swig.org/">SWIG</a> to help write the Python code that wraps the OpenSSL library that is written in C, so we&#8217;ll have to download and install it.
<p>Let&#8217;s go.
<ol>
<li>Download the latest SWIG Windows binaries from <a title="http://www.swig.org/download.html " href="http://www.swig.org/download.html ">http://www.swig.org/download.html </a>.</li>
<li>Unzip and untar the SWIG package to some directory and add&nbsp;that directory to your PATH.</li>
<li>Download the latest M2Crypto sources from <a title="http://chandlerproject.org/bin/view/Projects/MeTooCrypto" href="http://chandlerproject.org/bin/view/Projects/MeTooCrypto">http://chandlerproject.org/bin/view/Projects/MeTooCrypto</a>.</li>
<li>Unzip and untar the M2Crypto source somewhere and open a command prompt there.</li>
<li><font face="Courier New">&gt; python setup.py build_ext &#8211;openssl c:/openssl</font></li>
<li><font face="Courier New">&gt; python setup.py bdist_wininst</font></li>
</ol>
<p>That last command will create a nice <font face="Courier New">M2Crypto-0.18.win32-py2.4.exe</font> file in the dist subdirectory which you can run to install M2Crypto in the Python site-packages directory.
<p>To test your build, run python and do import M2Crypto. If you get an error that says &#8216;ImportError: DLL load failed with error code 182&#8242;, it&#8217;s because the M2Crypto library can&#8217;t find the OpenSSL DLLs. You&#8217;ll need to place the <font face="Courier New">libeay32.dll</font> and <font face="Courier New">ssleay32.dll </font>files somewhere python can find them. The directory in which your script resides is a good bet.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/building-m2crypto-on-windows/&amp;title=Building+M2Crypto+on+Windows" title="Add 'Building M2Crypto on Windows' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Building M2Crypto on Windows' to Del.icio.us" alt="Add 'Building M2Crypto on Windows' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/building-m2crypto-on-windows/&amp;title=Building+M2Crypto+on+Windows" title="Add 'Building M2Crypto on Windows' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Building M2Crypto on Windows' to digg" alt="Add 'Building M2Crypto on Windows' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/building-m2crypto-on-windows/" title="Add 'Building M2Crypto on Windows' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Building M2Crypto on Windows' to Technorati" alt="Add 'Building M2Crypto on Windows' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/building-m2crypto-on-windows/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>YouTube runs on Python</title>
		<link>http://www.gooli.org/blog/youtube-runs-on-python/</link>
		<comments>http://www.gooli.org/blog/youtube-runs-on-python/#comments</comments>
		<pubDate>Thu, 23 Aug 2007 10:14:04 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/youtube-runs-on-python/</guid>
		<description><![CDATA[I love Python, I really do. It has clear syntax, a nice library support and I feel very productive using it.
But Python is an interpreted language and it&#8217;s slow. You don&#8217;t feel it until you do something stupid like going through all the pixels in a bitmap and converting them into gray scale. I tried [...]]]></description>
			<content:encoded><![CDATA[<p>I love <a href="http://www.python.org">Python</a>, I really do. It has clear syntax, a nice library support and I feel very productive using it.</p>
<p>But Python is an interpreted language and it&#8217;s slow. You don&#8217;t feel it until you do something stupid like going through all the pixels in a bitmap and converting them into gray scale. I tried doing that once to create grayed out versions of my button images in a <a href="http://www.wxpython.org">wxPython</a> app&nbsp;I was building. I couldn&#8217;t belive how slow that was.&nbsp; Then again, the slowness might have been due to the calls to the <a href="http://www.wxwidgets.org">wxWidgets</a> C++ layer and back.</p>
<p>Anyway..</p>
<p>I was completely unaware of the fact that the guys at <a href="http://www.youtube.com">YouTube</a>, the mega site that serves millions of users daily, use Python. And they don&#8217;t just <em>use</em> it, their whole damn server side runs on it.</p>
<p>Here&#8217;s a lecture by Cuong Do Cuong, the engineering manager at YouTube. Apparently, he was there from the very start and he tells an interesting story of dealing with exponential growth and handling the scalability issues as they arised.</p>
<p><embed id="VideoPlayback" style="width: 400px; height: 326px" src="http://video.google.com/googleplayer.swf?docId=-6304964351441328559&amp;hl=en" type="application/x-shockwave-flash" flashvars=""> </embed>
<p>[Update: The lecture is a bit boring, but the real interesting stuff is during the Q&amp;A at the last 10 minutes or so. You should watch them.]</p>
<p>The amazing thing is that the guys that did all kinds of insanely amazing&nbsp;things to acommodate their growth rate, including hacking the <a href="http://www.lighttpd.net/">lighttpd</a>&nbsp;source code to improve the way it does multithreading, still use Python for their server side code. According to Coung, it never was the bottleneck and the speed problems they did have were easily solved by throwing in several more servers. The speed of development in Python on the other hand, helped them respond quickly to the changes and implement new ideas almost on the spot.</p>
<p>When I chose Python as the main language for <a href="http://www.tuzig.com">Tuzig</a>&nbsp;about two years ago, I knew very little about it. I knew only that everything I tried thus far (C++, Java and&nbsp;.NET) wasn&#8217;t going get us quickly enough to where we were going and that Python looked very promising. I haven&#8217;t looked back since. I usually try to choose the best tool for the job, but Python seems to excel in many areas : from small one-off scripts to&nbsp;complex GUI applications (thanks to wxPython) to web applications. I can even write Python stored procedures for PostgreSQL.</p>
<p>And speedwise, if it&#8217;s good enough for YouTube, it should be good enough for me.</p>
<p>If you don&#8217;t know Python, you should take a look at <a href="http://www.diveintopython.org">DiveIntoPython</a>. It&#8217;s a great book by Mark Pilgrim that is written for experienced programmers and focuses on the things unique to Python instead of explaining, like many other books, what a for loop is. It&#8217;s also freely available on the net so can dive right in.</p>
<div class="wlWriterSmartContent" id="0767317B-992E-4b12-91E0-4F059A8CECA8:458f3b32-53fe-4d96-a595-644c1d6b34d9" contenteditable="false" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px">Technorati tags: <a href="http://technorati.com/tags/python" rel="tag">python</a>, <a href="http://technorati.com/tags/youtube" rel="tag">youtube</a>, <a href="http://technorati.com/tags/scalability" rel="tag">scalability</a></div>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/youtube-runs-on-python/&amp;title=YouTube+runs+on+Python" title="Add 'YouTube runs on Python' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'YouTube runs on Python' to Del.icio.us" alt="Add 'YouTube runs on Python' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/youtube-runs-on-python/&amp;title=YouTube+runs+on+Python" title="Add 'YouTube runs on Python' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'YouTube runs on Python' to digg" alt="Add 'YouTube runs on Python' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/youtube-runs-on-python/" title="Add 'YouTube runs on Python' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'YouTube runs on Python' to Technorati" alt="Add 'YouTube runs on Python' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/youtube-runs-on-python/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Tracking down rogue print statements</title>
		<link>http://www.gooli.org/blog/tracking-down-rogue-print-statements/</link>
		<comments>http://www.gooli.org/blog/tracking-down-rogue-print-statements/#comments</comments>
		<pubDate>Wed, 07 Feb 2007 13:48:43 +0000</pubDate>
		<dc:creator>gooli</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.gooli.org/blog/tracking-down-rogue-print-statements/</guid>
		<description><![CDATA[
I&#8217;ve spend the last hour hunting for a rogue print somewhere in my code. I almost never use a debugger these days, relying almost exclusively on debug prints. Python makes it easy and Scite makes that even easier. That habit has just bit me in the ass. Like I said, I spend the last hour [...]]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ve spend the last hour hunting for a rogue <code>print</code> somewhere in my code. I almost never use a debugger these days, relying almost exclusively on debug prints. Python makes it easy and <a href="http://www.scintilla.org/SciTE.html">Scite</a> makes that even easier. That habit has just bit me in the ass. Like I said, I spend the last hour hunting for that rogue <code>print</code> statement I&#8217;ve put somewhere and couldn&#8217;t find. Turns out it was in one of Python&#8217;s built-in modules.
</p>
<p>
I wrote a small class that to help me out. This class prints the current call stack every time somebody tries to print something. When the class was ready, it took me just a few seconds to locate the evil <code>print</code>.
</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">traceback</span><br />
<span style="color: #ff7700;font-weight:bold;">class</span> Tracer:<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, oldstream<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">oldstream</span> = oldstream<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">count</span> = <span style="color: #ff4500;">0</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">lastStack</span> = <span style="color: #008000;">None</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> write<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, s<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; newStack = <span style="color: #dc143c;">traceback</span>.<span style="color: black;">format_stack</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> newStack != <span style="color: #008000;">self</span>.<span style="color: black;">lastStack</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">oldstream</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>newStack<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">lastStack</span> = newStack<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #008000;">self</span>.<span style="color: black;">oldstream</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span></div>
<p>
The class uses the builtin <code>traceback</code> module&#8217;s <code>format_stack</code> function to get the stack info and prints it. One thing I learned while doing that is that a <code>print</code> statement is actually translated to <i>two</i> <code>write</code>s.  Don&#8217;t know if that means anything, but that&#8217;s the reason for the comparison trick in the code.
</p>
<p>
This is how you&#8217;d use the class:
</p>
<div class="ch_code_container" style="font-family: monospace;height:100%;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span><br />
<span style="color: #dc143c;">sys</span>.<span style="color: black;">stdout</span> = Tracer<span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">stdout</span><span style="color: black;">&#41;</span><br />
<span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span> = Tracer<span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span><span style="color: black;">&#41;</span></div>
<p>
Enjoy.</p>
<!-- Social Bookmarks BEGIN -->
<div class="social_bookmark">
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://del.icio.us/post?url=http://www.gooli.org/blog/tracking-down-rogue-print-statements/&amp;title=Tracking+down+rogue+print+statements" title="Add 'Tracking down rogue print statements' to Del.icio.us"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/delicious.png" title="Add 'Tracking down rogue print statements' to Del.icio.us" alt="Add 'Tracking down rogue print statements' to Del.icio.us" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://digg.com/submit?phase=2&amp;url=http://www.gooli.org/blog/tracking-down-rogue-print-statements/&amp;title=Tracking+down+rogue+print+statements" title="Add 'Tracking down rogue print statements' to digg"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/digg.png" title="Add 'Tracking down rogue print statements' to digg" alt="Add 'Tracking down rogue print statements' to digg" /></a>
<a class="social_img" onclick="window.open(this.href, '_blank', 'scrollbars=yes,menubar=no,height=600,width=750,resizable=yes,toolbar=no,location=no,status=no'); return false;" href="http://www.technorati.com/faves?add=http://www.gooli.org/blog/tracking-down-rogue-print-statements/" title="Add 'Tracking down rogue print statements' to Technorati"><img src="http://gooli.org/blog/wp-content/plugins/social_bookmarks/technorati.png" title="Add 'Tracking down rogue print statements' to Technorati" alt="Add 'Tracking down rogue print statements' to Technorati" /></a>
</div>
<!-- Social Bookmarks END -->
]]></content:encoded>
			<wfw:commentRss>http://www.gooli.org/blog/tracking-down-rogue-print-statements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

