Detecting all pages which contain color

Newer versions of Ghostscript (version 9.05 and later) include a "device" called inkcov. It calculates the ink coverage of each page (not for each image) in Cyan (C), Magenta (M), Yellow (Y) and Black (K) values, where 0.00000 means 0%, and 1.00000 means 100%.

Example commandline:

gs -o - -sDEVICE=inkcov /path/to/your.pdf

Example output:

Page 1
0.00000  0.00000  0.00000  0.02230 CMYK OK
Page 2
0.02360  0.02360  0.02360  0.02360 CMYK OK
Page 3
0.02525  0.02525  0.02525  0.00000 CMYK OK
Page 4
0.00000  0.00000  0.00000  0.01982 CMYK OK

You can see here that the pages 1+4 are using no color, while pages 2+3 do. This case is particularly 'nasty' for people who want to save on color ink: because all the respective C, M, Y (and K) values are exactly the same for each of the pages 2+3, they possibly could appear to the human eye not as color pages, but as ("rich") grayscale anyway (if each single pixel is mixed with these color values).

Ghostscript can also convert color into grayscale. Example commandline:

gs                                \
  -o grayscale.pdf                \
  -sDEVICE=pdfwrite               \
  -sColorConversionStrategy=Gray  \
  -sProcessColorModel=/DeviceGray \
   /path/to/your.pdf

Checking for the ink coverage distribution again (note how the addition of -q to the parameters slightly changes the output format):

gs -q  -o - -sDEVICE=inkcov grayscale.pdf
 0.00000  0.00000  0.00000  0.02230 CMYK OK
 0.00000  0.00000  0.00000  0.02360 CMYK OK
 0.00000  0.00000  0.00000  0.02525 CMYK OK
 0.00000  0.00000  0.00000  0.01982 CMYK OK

For the general case it seem to be indeed better to use an external tool to test for all pages which contain colors. This is the topic of the mentioned SO question How do I know if PDF pages are color or black-and-white?. I now wrote an answer to it which includes small script for this.

However, it is much easier to get a list of all pages containing figures. Here I use the zref-abspage package to get an absolute page counter. The normal \write command can be used which will expand its content when the surrounding content is really placed on a page. Therefore the page counters will have the correct value. Then the end-macro of figure can simply be patched to hold this code.

\documentclass{book}
\usepackage{mwe}

\usepackage{zref-abspage}% absolute page counter
\newwrite\figpages
\openout\figpages=\jobname.fpg
\makeatletter
\g@addto@macro\endfigure{%
    % Write absolute page number and page label to file
    % Do not use \immediate!
    \write\figpages{\number\value{abspage}: \thepage}%
}
\makeatother

\newcount\mycount% for example loop
\begin{document}
\frontmatter
\Blindtext

\begin{figure}
    \centering
    \includegraphics[width=.8\textwidth,height=5cm]{example-image}
    \caption{Some caption}
\end{figure}

\mainmatter
\Blindtext

\loop% keep MWE small by using a loop

\begin{figure}
    \centering
    \includegraphics[width=.8\textwidth,height=5cm]{example-image}
    \caption{Some caption}
\end{figure}

{\Blindtext}

\advance\mycount by 1
    \ifnum\mycount<20\relax
\repeat

\backmatter
\appendix
\Blindtext

\begin{figure}
    \centering
    \includegraphics[width=.8\textwidth,height=5cm]{example-image}
    \caption{Some caption}
\end{figure}

\end{document}

This generates a .fpg file (for figure pages) which looks like:

2: ii
4: 2
5: 3
7: 5
8: 6
10: 8
11: 9
13: 11
14: 12
16: 14
18: 16
19: 17
21: 19
22: 20
24: 22
25: 23
27: 25
28: 26
30: 28
31: 29
33: 31
38: 36

The format can be changed if required.


There's a rather useful python script at http://homepages.inf.ed.ac.uk/imurray2/code/hacks/pdfcolorsplit which uses pdftk to split into colour and b&w files, though it doesn't deal with the boxes around hyperrefs. If you have access to the LaTeX source, why not turn off the colour in hyperref anyway - I do it like this:

\usepackage[colorlinks=true,
            linkcolor=black,
            citecolor=black,
            filecolor=black,
            urlcolor=black]{hyperref}

IIRC if you just set [colorlinks=false] they're not clickable.