Extracting bookmark icons (favicons) from Firefox
Favicons are those little icons sites have that end up in your bookmark list if you add the site there. I was wondering where were all these icons stored when I ran across this post that explains that Firefox holds those icons as base64-encoded strings inside the bookmarks.html file.
I wanted to get access to all those icons and wrote a small Python script to help me do that.
I used the Python built-in base64 module to handle base64 encoding and decoding and the wonderful BeautifulSoup library for parsing the bookmarks.html file.
The resulting code snippet is quite short:
import re
from BeautifulSoup import BeautifulSoup
HEADER = "data:image/x-icon;base64,"
f = file("bookmarks.html")
page = BeautifulSoup(f)
for tag in page.findAll("a"):
try:
iconData = tag["icon"]
print tag.string
if iconData.startswith(HEADER):
iconData = iconData[len(HEADER):]
iconBinaryData = base64.decodestring(iconData)
iconFilename = re.sub("[^a-zA-Z0-9_\-.' ]", "_", tag.string) + ".ico"
file(iconFilename, "wb").write(iconBinaryData)
except KeyError:
pass
Using BeautifulSoup, I do a simple for on all the <a> tags in the bookmarks.html file. For each such tag, I get the “icon” attribute and parse the base64 encoded icon using the base64 module after removing a small header that Firefox puts in each “icon” attribute before the actual icon data.
Maybe somebody will find it useful someday.




12 Comments on “Extracting bookmark icons (favicons) from Firefox”
I will find it useful! …except… I don’t know what to with it. Can I execute it like a bat file, or do I need something to compile it or… as you can see I am clueless of Python. But this is exactly what I want to do: extract the bookmark icons.
You’ll need to install Python from http://www.python.org (version 2.6 should do just fine) and install BeautifulSoup from http://www.crummy.com/software/BeautifulSoup/. You’ll need to create a text file called extract.py and copy the code in the post into it. You’ll need to copy the bookmarks.html file from the Firefox profile to the place where you created extract.py. Now you can run it using “python extract.py” snd you should get the icons in the same directory.
Thank you! Downloading now, I’ll let you know how it goes. :)
I’m going to ask a couple of even dumber questions!…
1) I’ve installed Python and downloaded BeautifulSoup. It doesn’t seem to have an installer so at present the extracted files are sitting in the same file as extract.py Is this correct?
2) I’ve created extract.py and copied bookmarks.html to the same folder but I don’t know how to run ‘python extract.py’ I started Python which opens a command window and typed “Python C:\Users\Linda….extract.py” and the syntax is invalid. Can you explain (in words of one syllable!) exactly what to do?!!
Sorry for being a pain and thanks for providing the code to enable icon extractions. :)
Regards,
Linda
Copying BeautifulSoup into the same directory as extract.py should work. By running ‘python extract.py’ I meant opening a command window (Start -> Run -> cmd), going to the directory where you have the extract.py script (using the cd command) and then typing ‘python extract.py’. That should work, let me know if it doesn’t.
i tried this using Python 3.0.1 and BeautifulSoup 3.1.0.1 and I get “invalid syntax” at line 12 print tag.string
^
Just FYI, the carrot (^) is located under the ‘g’ of tag.string on the error. I don’t know if there is any significance.
Don’t worry about it man. I uninstalled 3.0.1 and installed 2.6.2 and it worked fine. Thanks for this great script! I have been looking for something like this for a long time!
help !
Traceback (most recent call last):
File “extract.py”, line 12, in
print tag.string
File “C:\Program Files\Python\lib\encodings\cp850.py”, line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: ‘charmap’ codec can’t encode character u’\u2019′ in position
53: character maps to
Or, using Firefox, open the bookmark (actually load the page), Tools > Page Info
Media
scroll through the list to the onethat ends in /favicon.ico
on bottom of widow click “save as”
save your icon!
Wonderful Script!
I’m still running Ubuntu 8.04 Hardy (Just one more month till 9.10 Karmic, yeah!) I installed BeautifulSoup from Synaptic, used the script as described and worked perfectly, I only got a couple errors from filenames being too long (from bad websites giving huge page titles, definitively not the scrip’s fault), I manually edited the bookmarks file and the script did it’s magic.
python 2.5.2
python-beautifulsoup 3.0.4-1build1
THANK YOU VERY MUCH !!!
Thanks for the effort you took to expand upon this post so thoroughly. I look forward to future posts.