bug with page numbers when using soul package

Problem

The problem appears to be the fact that soul uses \count0 as a local scratch register. While it is, as a general rule, safe to use the even \counts locally (see .e.g this answer), \count0 is special because it holds the page number. Using it in this way is fine only if there is zero chance of a page break being triggered within the scope where it is used (as described in the same answer).

This is exactly what you see happening here. While the page break does not occur within the underlined text itself, it is triggered by the paragraph break in the argument of \hl. Until this paragraph break, TeX was still considering putting the first bit of highlighted text on the previous page.

Fix

The problem is fixed by adding the following lines to your preamble.

\makeatletter %% <- make @ usable in command sequences
\newcount\SOUL@minus
\makeatother  %% <- revert @

This reserves a new count register for soul to use instead of the one it is using by default.

To solve the problem at the package level, the line

\countdef\SOUL@minus\z@

in soul.dtx would need to be replaced by (for instance)

\newcount\SOUL@minus

There are a few other registers being used by soul for which the registration of a new register would probably also have been more appropriate. (See e.g. this question.)

I've created an issue on the soul github page, but it appears to be inactive.

Demonstration

The page numbers in the document below are correct with the aforementioned fix in place. If you remove them, however, the page number is reset to 0 just like in your example.

\documentclass{article}
\usepackage{soul}
\usepackage{blindtext}

\makeatletter
\newcount\SOUL@minus %% <- without this line the page number would be reset to 0
\makeatother

\begin{document}

\Blindtext[4]

\blindtext
\hl{Closing words

New paragraph!}

\end{document}

This is the bottom of the first page of the output:

output

This is what it would've been without this addition to the preamble:

what could've been


Not so much an answer as a simplification of the question to, hopefully, focus on the problem (a long, formatted comment?) and suggest two possible ways to avoid it.

The same behaviour as seen by the OP is apparent with \st instead of \hl. Using the ulem package and \sout instead of soul and sl produces an error about the end of paragraph within the scope of \sout (! Paragraph ended before \UL@word was complete.). Removing the end of paragraph from within the braces lets ulem proceed and produce a result with correct page numbers. However, with this same paragraph break removed, using soul also produces the correct page numbering. Perhaps then soul is proceeding when the correct, or better, action would be to flag an error and stop, as ulem appears to do?

To demonstrate this the following simpler example also produces incorrect page numbering:

\documentclass[10pt]{article}
\usepackage{soul}
\usepackage{blindtext}
\begin{document}
\blindtext[8]

\blindtext[8]
\hl{

x
}
\end{document}

whereas

...

\blindtext[8]

\hl{
x
}
\end{document}

does not.

This problem appears to be then that soul allows a paragraph break within its argument but fails to process it safely or to stop and throw an error.

The solution would appear to be either

(a) modify soul to identify the problem and stop with an appropriate error message.

or

(b) do not put paragraph breaks within arguments to soul commands.

Edit: additional observations

The page numbering disruption seems to happen when the start of the paragraph containing the soul formatting command with an argument that includes a paragraph break is on a different page from the start of the requested formatting. The page number where the paragraph starts is set to 0 and counts up from there. In the example above, deleting one of the \blindtext lines in the above example and its accompanying paragraph break makes the first page show number 0 and the subsequent ones to count correctly.

This perhaps fits with soul's documentation stating that "the complex engine, which has to read and inspect every character before it can hand it over to TEX’s paragraph builder" (section 2.2) and the hint in the documentation's introduction that "there are several possibilities to emphasize parts of a paragraph" (section 1).

Various cases can be seen in the example below by setting the first \blindtext argument to 8 (page nos. 1,2,3,4), 9 (page nos. 1,0,1,2) and then 10 (page nos. 1,2,3,4).

\documentclass[10pt, letterpaper]{article}
\usepackage{soul}
\usepackage{blindtext}
\begin{document}
\blindtext[9]

\blindtext
\hl{

x
}

\blindtext[6]
\end{document}