<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Unicode and permalinks</title>
	<atom:link href="http://www.gooli.org/blog/unicode-and-permalinks/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.gooli.org/blog/unicode-and-permalinks/</link>
	<description>on software development and related issues</description>
	<lastBuildDate>Thu, 11 Mar 2010 09:00:39 -0600</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: video encoding</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-12778</link>
		<dc:creator>video encoding</dc:creator>
		<pubDate>Wed, 22 Jul 2009 23:57:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-12778</guid>
		<description>thank you, great post.

i never knew it could be done, and this could really help me in my new project.</description>
		<content:encoded><![CDATA[<p>thank you, great post.</p>
<p>i never knew it could be done, and this could really help me in my new project.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lawrence Sheed</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-8054</link>
		<dc:creator>Lawrence Sheed</dc:creator>
		<pubDate>Tue, 17 Feb 2009 17:47:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-8054</guid>
		<description>@Love Encounter Flow.

Quote  &quot;...no rfc that would govern usage and announcement of encodings in urls&quot;

Actually, there is:  RFC 3987

Non Ascii (and valid) Characters are assumed to be UTF-8, and should be encoded in percent encoding.

This article at the W3C http://www.w3.org/International/articles/idn-and-iri/  talks about this.

I&#039;m in the middle of a discussion about URI implementation in an open source CMS at the moment about this (which I how I found this link)


Lawrence / Computer Solutions Design China.</description>
		<content:encoded><![CDATA[<p>@Love Encounter Flow.</p>
<p>Quote  &#8220;&#8230;no rfc that would govern usage and announcement of encodings in urls&#8221;</p>
<p>Actually, there is:  RFC 3987</p>
<p>Non Ascii (and valid) Characters are assumed to be UTF-8, and should be encoded in percent encoding.</p>
<p>This article at the W3C <a href="http://www.w3.org/International/articles/idn-and-iri/" rel="nofollow">http://www.w3.org/International/articles/idn-and-iri/</a>  talks about this.</p>
<p>I&#8217;m in the middle of a discussion about URI implementation in an open source CMS at the moment about this (which I how I found this link)</p>
<p>Lawrence / Computer Solutions Design China.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andy mckay</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-6264</link>
		<dc:creator>andy mckay</dc:creator>
		<pubDate>Mon, 27 Oct 2008 13:12:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-6264</guid>
		<description>Perhaps a quick peek at Plone code might help that takes utf-8 and runs it through a decode to form a nice asciid url. Tthere&#039;s going to be problems with it but Plone uses this to make urls. Here&#039;s some sample code, https://svn.plone.org/svn/plone/CMFPlone/tags/2.5.5/UnicodeNormalizer.py and a sample that does it in JSONP: http://clearwind-labs.appspot.com/</description>
		<content:encoded><![CDATA[<p>Perhaps a quick peek at Plone code might help that takes utf-8 and runs it through a decode to form a nice asciid url. Tthere&#8217;s going to be problems with it but Plone uses this to make urls. Here&#8217;s some sample code, <a href="https://svn.plone.org/svn/plone/CMFPlone/tags/2.5.5/UnicodeNormalizer.py" rel="nofollow">https://svn.plone.org/svn/plone/CMFPlone/tags/2.5.5/UnicodeNormalizer.py</a> and a sample that does it in JSONP: <a href="http://clearwind-labs.appspot.com/" rel="nofollow">http://clearwind-labs.appspot.com/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Cave &#187; Blog Archive &#187; Unicode Permalinks</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-5731</link>
		<dc:creator>The Cave &#187; Blog Archive &#187; Unicode Permalinks</dc:creator>
		<pubDate>Sat, 27 Sep 2008 17:08:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-5731</guid>
		<description>[...] Even better, solid information on uncommon (and poorly understood) Unicode handling in Python. [...]</description>
		<content:encoded><![CDATA[<p>[...] Even better, solid information on uncommon (and poorly understood) Unicode handling in Python. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gooli</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-5618</link>
		<dc:creator>gooli</dc:creator>
		<pubDate>Tue, 23 Sep 2008 12:39:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-5618</guid>
		<description>Damn! I didn&#039;t know the re module could do that.</description>
		<content:encoded><![CDATA[<p>Damn! I didn&#8217;t know the re module could do that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Almad</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-5617</link>
		<dc:creator>Almad</dc:creator>
		<pubDate>Tue, 23 Sep 2008 12:16:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-5617</guid>
		<description>Is it not sufficient to use re.sub(&quot;\W&quot;, &quot;_&quot;, re.UNICODE)?</description>
		<content:encoded><![CDATA[<p>Is it not sufficient to use re.sub(&#8221;\W&#8221;, &#8220;_&#8221;, re.UNICODE)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Love Encounter Flow</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-5590</link>
		<dc:creator>Love Encounter Flow</dc:creator>
		<pubDate>Mon, 22 Sep 2008 12:55:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-5590</guid>
		<description>oops, that alternative percent-encoding would be %ua34f and so on, the u being an indicator that four digits are used and the character set referred to is unicode.</description>
		<content:encoded><![CDATA[<p>oops, that alternative percent-encoding would be %ua34f and so on, the u being an indicator that four digits are used and the character set referred to is unicode.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Love Encounter Flow</title>
		<link>http://www.gooli.org/blog/unicode-and-permalinks/comment-page-1/#comment-5585</link>
		<dc:creator>Love Encounter Flow</dc:creator>
		<pubDate>Mon, 22 Sep 2008 11:22:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.gooli.org/blog/?p=155#comment-5585</guid>
		<description>there&#039;s even more things to be aware of in regard to unicode in urls: (1) some browsers, including firefox 2, may chose to send *some* urls not in utf-8, but in a legacy encoding such as the system default or plain latin-1. for ffx2, this would appear to be true whenever the characters entered by the user happen to be encodable as latin-1 etc. there is no rfc that would govern usage and announcement of encodings in urls (what a joke), so you have to guess yourself when doing url decoding (i always use a routine that first tries utf-8, then latin-1 or similar as a fallback. web application frameworks surprisingly often fail in doing that for me). (2) there is yet another percent-encoding style using unicode character ids, using four hex digits à la %a34f (see http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations); this has been rejected by the w3c (probably on the grounds that this would definitely make things too easy—which is presumably why the standards bodies adopted http://en.wikipedia.org/wiki/Punycode, by far the weirdest character encoding standard ever released). (3) on the bright side, there is a healthy tendency among browser vendors to show, in the address bar, the intended, not the encoded likeness of the url typed in. the ffx2 locationbar² extension does it, google chrome does it, flock does it (the fine people over at ffx3 have sadly missed the trend so far—although your screenshots seem to indicate otherwise?). this is an important feature to get readable urls for the people of the world. hopefully, with browser vendors paying more attention to this issue, encoding problems will also become less of a burden in the future.</description>
		<content:encoded><![CDATA[<p>there&#8217;s even more things to be aware of in regard to unicode in urls: (1) some browsers, including firefox 2, may chose to send *some* urls not in utf-8, but in a legacy encoding such as the system default or plain latin-1. for ffx2, this would appear to be true whenever the characters entered by the user happen to be encodable as latin-1 etc. there is no rfc that would govern usage and announcement of encodings in urls (what a joke), so you have to guess yourself when doing url decoding (i always use a routine that first tries utf-8, then latin-1 or similar as a fallback. web application frameworks surprisingly often fail in doing that for me). (2) there is yet another percent-encoding style using unicode character ids, using four hex digits à la %a34f (see <a href="http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations)" rel="nofollow">http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations)</a>; this has been rejected by the w3c (probably on the grounds that this would definitely make things too easy—which is presumably why the standards bodies adopted <a href="http://en.wikipedia.org/wiki/Punycode" rel="nofollow">http://en.wikipedia.org/wiki/Punycode</a>, by far the weirdest character encoding standard ever released). (3) on the bright side, there is a healthy tendency among browser vendors to show, in the address bar, the intended, not the encoded likeness of the url typed in. the ffx2 locationbar² extension does it, google chrome does it, flock does it (the fine people over at ffx3 have sadly missed the trend so far—although your screenshots seem to indicate otherwise?). this is an important feature to get readable urls for the people of the world. hopefully, with browser vendors paying more attention to this issue, encoding problems will also become less of a burden in the future.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
