Favicons are those little icons sites have that end up in your bookmark list if you add the site there. I was wondering where were all these icons stored when I ran across this post that explains that Firefox holds those icons as base64-encoded strings inside the bookmarks.html file.
I wanted to get access to all those icons and wrote a small Python script to help me do that.
I used the Python built-in base64 module to handle base64 encoding and decoding and the wonderful BeautifulSoup library for parsing the bookmarks.html file.
The resulting code snippet is quite short:
from BeautifulSoup import BeautifulSoup
HEADER = "data:image/x-icon;base64,"
f = file("bookmarks.html")
page = BeautifulSoup(f)
for tag in page.findAll("a"):
iconData = tag["icon"]
iconData = iconData[len(HEADER):]
iconBinaryData = base64.decodestring(iconData)
iconFilename = re.sub("[^a-zA-Z0-9_\-.' ]", "_", tag.string) + ".ico"
Using BeautifulSoup, I do a simple for on all the <a> tags in the bookmarks.html file. For each such tag, I get the “icon” attribute and parse the base64 encoded icon using the base64 module after removing a small header that Firefox puts in each “icon” attribute before the actual icon data.
Maybe somebody will find it useful someday.