How to read PDF bookmarks programmatically

You can use the PDFsharp library. It is published under the MIT License so it can be used even in corporate development. Here is an untested example.

using PdfSharp.Pdf;

using (PdfDocument document = PdfReader.IO.Open("bookmarked.pdf", IO.PdfDocumentOpenMode.Import))
{
    PdfDictionary outline = document.Internals.Catalog.Elements.GetDictionary("/Outlines");
    PrintBookmark(outline);
}

void PrintBookmark(PdfDictionary bookmark)
{
    Console.WriteLine(bookmark.Elements.GetString("/Title"));
    for (PdfDictionary child = bookmark.Elements.GetDictionary("/First"); child != null; child = child.Elements.GetDictionary("/Next"))
    {
        PrintBookmark(child);
    }
}

Gotchas:

  • PdfSharp doesn't support open pdf's over version 1.6 very well. (throws: cannot handle iref streams. the current implementation of pdfsharp cannot handle this pdf feature introduced with acrobat 6)
  • There are many types of strings in PDFs which PDFsharp returns as is including UTF-16BE strings. (7.9.2.1 ISO32000 2008)

Try the following code

PdfReader pdfReader = new PdfReader(filename);

IList<Dictionary<string, object>> bookmarks = SimpleBookmark.GetBookmark(pdfReader);

for(int i=0;i<bookmarks.Count;i++)
{
    MessageBox.Show(bookmarks[i].Values.ToArray().GetValue(0).ToString());

    if (bookmarks[i].Count > 3)
    {
        MessageBox.Show(bookmarks[i].ToList().Count.ToString());
    }
}

Note: Don't forget to add iTextSharp DLL to your project.


As the bookmarks are in a tree structure (https://en.wikipedia.org/wiki/Tree_(data_structure)), I've used some recursion here to collect all bookmarks and it's children.

iTextSharp solved it for me.

dotnet add package iTextSharp

Collected all bookmarks with the following code:

using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using iTextSharp.text.pdf;

namespace PdfManipulation
{
    class Program
    {
        static void Main(string[] args)
        {
            StringBuilder bookmarks = ExtractAllBookmarks("myPdfFile.pdf");
        }

        private static StringBuilder ExtractAllBookmarks(string pdf)
        {
            StringBuilder sb = new StringBuilder();
            PdfReader reader = new PdfReader(pdf);
            IList<Dictionary<string, object>> bookmarksTree = SimpleBookmark.GetBookmark(reader);
            foreach (var node in bookmarksTree)
            {
                sb.AppendLine(PercorreBookmarks(node).ToString());
            }
            return RemoveAllBlankLines(sb);
        }

        private static StringBuilder RemoveAllBlankLines(StringBuilder sb)
        {
            return new StringBuilder().Append(Regex.Replace(sb.ToString(), @"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline));
        }

        private static StringBuilder PercorreBookmarks(Dictionary<string, object> bookmark)
        {
            StringBuilder sb = new StringBuilder();
            sb.AppendLine(bookmark["Title"].ToString());
            if (bookmark != null && bookmark.ContainsKey("Kids"))
            {
                IList<Dictionary<string, object>> children = (IList<Dictionary<string, object>>) bookmark["Kids"];
                foreach (var bm in children)
                {
                    sb.AppendLine(PercorreBookmarks(bm).ToString());
                }
            }
            return sb;
        }
    }
}