Auto generate List of \url usages within document

  1. The following example uses hyperref (the question has mentioned "hyperlinking") and hooks into \hyper@linkurl to get the URLs.

  2. The catched URLs are written into an index file \jobname-url.idx:

    \urlentry{<hex coded URL>}{<page number>}
    

    The URL are hex encoded to avoid trouble with special characters.

  3. Package filecontents helps to create a style file \jobname-url.mst for makeindex. Makeindex automatically looks for a file with the same name as the input file, but with extension .mst as style file. Then only the .idx file needs to be given as argument for makeindex.

  4. Makeindex generates the file \jobname-url.ind:

    \begin{theurls}
    \urlitem{<hex coded URL>}{<page list>}
    ...
    \end{theurls}
    
  5. Environment theurls and \urlitem are defined appropriately to print the list of urls. \listurlname contains the title of the section.

Remarks:

  • Makeindex takes care of the sorting and removes duplicates.
  • Hooking into \hyper@linkurl has the advantage, that the URL is normalized (e.g., % and \% are the same, a % with catcode 12/other).
  • Hex encoding has the advantage, that special characters such as percent, hash or characters with special meaning for makeindex (at sign, ...) do not need a special treatment.

Example file:

\RequirePackage{filecontents}
\begin{filecontents*}{\jobname-url.mst}
% Input style specifiers
keyword "\\urlentry"
% Output style specifiers
preamble "\\begin{theurls}"
postamble "\\end{theurls}\n"
group_skip ""
headings_flag 0  
item_0 "\n\\urlitem{"
delim_0 "}{"
delim_t "}"
line_max 500
\end{filecontents*}

\documentclass{article}   
\usepackage[colorlinks]{hyperref}
\usepackage{pdfescape}

\makeatletter
\newwrite\file@url
\openout\file@url=\jobname-url.idx\relax

\newcommand*{\write@url}[1]{%
  \begingroup
    \EdefEscapeHex\@tmp{#1}%
    \protected@write\file@url{}{%
      \protect\urlentry{\@tmp}{\thepage}%
    }%
  \endgroup
}
\let\saved@hyper@linkurl\hyper@linkurl
\renewcommand*{\hyper@linkurl}[2]{%
  \write@url{#2}%
  \saved@hyper@linkurl{#1}{#2}%
}
\newcommand*{\listurlname}{List of URLs}
\newcommand*{\printurls}{%
  \InputIfFileExists{\jobname-url.ind}{}{}%
}
\newenvironment{theurls}{%
  \section*{\listurlname}%
  \@mkboth{\listurlname}{\listurlname}%
  \let\write@url\@gobble  
  \ttfamily
  \raggedright
}{%
  \par
}
\newcommand*{\urlitem}[2]{%
  \hangindent=1em
  \hangafter=1   
  \begingroup    
    \EdefUnescapeHex\@tmp{#1}%
    \expandafter\url\expandafter{\@tmp}%
  \endgroup
  \par
}
\makeatother

\usepackage[T1]{fontenc}
\usepackage[variablett]{lmodern}

\begin{document}
This this file answers the
\href{http://tex.stackexchange.com/q/121977/16967}{question}
on \href{http://tex.stackexchange.com/}{\TeX.SE}.

Further examples for URLs:
\url{http://www.dante.de/}\\
\url{http://www.ctan.org/}\\
\url{mailto:[email protected]/}\\
\url{ftp://ftp.dante.de/pub/tex/}\\
\url{http://www.example.com/\%7efoo/index.html}\\
\url{http://www.example.com/%7efoo/index.html} 

\printurls
\end{document}

The following commands generate the result (linux/bash):

$ pdflatex test

Generates test-url.mst and test-url.idx.

$ makeindex test-url

Generates test-url.ind.

$ pdflatex test

Result

Update for page numbers

There are many formatting ways for the page numbers. The following example uses dots to separate the URL from the page numbers that appear at the end of the line (similar to the index of package doc). As requested the page numbers are prefixed with p., if only one page number follows and pp. otherwise. This is implented with the help of package xstring by testing the page number list, whether it contains a comma separator or a hyphen from a range specifier.

\RequirePackage{filecontents}
\begin{filecontents*}{\jobname-url.mst}
% Input style specifiers
keyword "\\urlentry"
% Output style specifiers
preamble "\\begin{theurls}"
postamble "\n\\end{theurls}\n"
group_skip ""
headings_flag 0  
item_0 "\n\\urlitem{"
delim_0 "}{"
delim_t "}"
line_max 500
\end{filecontents*}

\documentclass{article}   
\usepackage[colorlinks]{hyperref}
\usepackage{pdfescape}
\usepackage{xstring}

\makeatletter
\newwrite\file@url
\openout\file@url=\jobname-url.idx\relax

\newcommand*{\write@url}[1]{%
  \begingroup
    \EdefEscapeHex\@tmp{#1}%
    \protected@write\file@url{}{%
      \protect\urlentry{\@tmp}{\thepage}%
    }%
  \endgroup
}
\let\saved@hyper@linkurl\hyper@linkurl
\renewcommand*{\hyper@linkurl}[2]{%
  \write@url{#2}%
  \saved@hyper@linkurl{#1}{#2}%
}
\newcommand*{\listurlname}{List of URLs}
\newcommand*{\printurls}{%
  \InputIfFileExists{\jobname-url.ind}{}{}%
}
\newenvironment{theurls}{%
  \section*{\listurlname}%
  \@mkboth{\listurlname}{\listurlname}%
  \let\write@url\@gobble  
  \ttfamily
  \raggedright
  \setlength{\parfillskip}{0pt}%
}{%
  \par
}
\newcommand*{\urlitem}[2]{%
  \hangindent=1em
  \hangafter=1   
  \begingroup    
    \EdefUnescapeHex\@tmp{#1}%
    \expandafter\url\expandafter{\@tmp}%
  \endgroup
  \urlindex@pfill
  \IfSubStr{#2}{,}{pp}{%
    \IfSubStr{#2}{-}{pp}{p}%
  }.\@\space\ignorespaces
  #2%
  \par
}
\newcommand*{\urlindex@pfill}{% from \pfill of package `doc'
  \unskip~\urlindex@dotfill
  \penalty500\strut\nobreak
  \urlindex@dotfil~\ignorespaces
}
\newcommand*{\urlindex@dotfill}{% from \dotfill of package `doc'
  \leaders\hbox to.6em{\hss .\hss}\hskip\z@ plus  1fill\relax
}
\newcommand*{\urlindex@dotfil}{% from \dotfil of package `doc'
  \leaders\hbox to.6em{\hss .\hss}\hfil
}
\makeatother

\usepackage[T1]{fontenc}
\usepackage[variablett]{lmodern}

\begin{document}
This this file answers the
\href{http://tex.stackexchange.com/q/121977/16967}{question}
on \href{http://tex.stackexchange.com/}{\TeX.SE}.

Further examples for URLs:
\url{http://www.dante.de/}\\
\url{http://www.ctan.org/}\\
\url{mailto:[email protected]/}\\
\url{ftp://ftp.dante.de/pub/tex/}\\
\url{http://www.example.com/\%7efoo/index.html}\\
\url{http://www.example.com/%7efoo/index.html} 

% further pages to generate more page numbers for testing the url index
\newpage
\url{http://www.ctan.org}
\newpage
\url{http://www.ctan.org}
\url{http://tex.stackexchange.com/}

\newpage
\printurls

\end{document}

Result with page numbers


Warning Atention: the following code works only for simple URLs, that is, URLs that do not contain special characters, like %. For a complete solution, please refer to Heiko's answer.


As Nicola mentioned in the comments, redefining \url might be an interesting idea, but some characters in the URL might cause problems. Sadly my TeX-fu isn't good enough to overcome this issue, but here's a preliminary start:

\documentclass{article}
\usepackage{url}
\usepackage{imakeidx}

\let\originalurl\url

\makeindex[name=urls, title={Links found in this document}, columns=1]

\renewcommand{\url}[1]{\originalurl{#1}\index[urls]{\protect\originalurl{#1}}
}

\begin{document}

Hello, make sure to visit \url{http://www.google.com} and,
of course, our own place \url{http://tex.stackexchange.com}.

By the way, \url{http://tex.stackexchange.com} is awesome!

\printindex[urls]

\end{document}

The list is then generated:

Links

Hope it helps. :)

Tags:

Urls