Parser for Exported Bookmarks HTML file of Google Chrome and Mozilla in Java

In most cases, you don't really need to parse the HTML file. Chrome stores its bookmarks in a JSON file. It's a lot simpler to just read that file using a JSON parser.

The file you are interested in is located at (on Linux, anyway, Google around for other O/S):

/home/your_name/.config/google-chrome/Default/Bookmarks

JSON parsing is easy. Google around or start with How to parse JSON in Java.

If you want to visualize JSON data before you start digging through it, then also have a look at http://chris.photobooks.com/json/default.htm.


Per new comments posted , the solution would be to use JSOUP Open Source Program to do this. JSOUP accepts only HTTP or HTTPS protocols so you might want to host the exported bookmark HTML on a Local Server like tomcat and obtain the DOM of it

 http://yourip:<port>/<yourProject>/<bookmark.html>. 

JSOUP is pretty self-explanatory.

Other simpler ways :

Chrome and Firefox bookmarks are stored as JSON like below.

Java way : I would suggest you use JSON to parse these. Make a reference Java Object based on the below structure.

or simply use UNIX Command prompt and do a

 grep -i "url" <bookmark file path> | cut -d":" -f2

However if you still interested to do with Chrome APIs then please visit : http://developer.chrome.com/extensions/bookmarks.html

{
   "checksum": "702d8e600a3d70beccfc78e82ca7caba",
   "roots": {
  "bookmark_bar": {
     "children": [ {
        "date_added": "12939920104154671",
        "id": "3",
        "name": "Development/Tutorials/Git/git-svn - KDE TechBase",
        "type": "url",
        "url": "http://techbase.kde.org/Development/Tutorials/Git/git-svn"
     }, {
        "date_added": "12939995405838705",
        "id": "4",
        "name": "QJson - Usage",
        "type": "url",
        "url": "http://qjson.sourceforge.net/usage.html"