Are there any reasons why browsers should block mixed passive content by default?

Mixed passive content, sometimes referred to as mixed display content, like serving images, audio, video files, or any other content that can't alter the DOM - thus the use of "passive" in the name, as you mention yourself - through the non-encrypted HTTP and the requesting document via the encrypted HTTPS is prone to attacks that could replace these HTTP served contents with inappropriate or misleading information. Think here for example of misleading the user in believing he's expected to do some action, or is otherwise misdirected by the Man-in-The-Middle (MiTM) replaced contents. The difference here is, that the attacker wouldn't be able to affect the rest of the page, but only the contents loaded via the non-encrypted HTTP protocol.

Additionally, an attacker could track users by inferring information about the user's browsing activities through HTTP loaded contents that are served to the user. These contents might be limited to displaying on only specific pages, and the request for them could tell the attacker what page the user was visiting.

The attacker can intercept HTTP header information that is sent via the unsecured protocol, redirect requests to another server, or change information in the HTTP response (of course, including headers, so also cookies). Request info includes user agent string and cookies associated with the domain the HTTP served contents are served from. The attacker could change any of this information at will to facilitate easier user activity tracking, or misguide the user with false information.

If these contents are served from the same domain as the main page requesting them, then the assumed protection the user receives by opening a HTTPS page might become even more useless, since the attacker can read the user's cookies that don't attach the ;secure tag via the linked HTTP contents' request headers, indicating to the user agent (browser) to only include such tagged cookies when an encrypted / secured HTTPS channel is used to make additional, linked content requests.


Another possible scenario is that a MITM replaces an image with one that exploits an RCE (remote code execution) vulnerability in the user's browser.

Here's an example of such vulnerability: https://nvd.nist.gov/vuln/detail/CVE-2017-2416 --basically you can serve a crafted image containing executable code and have that code executed on older versions of macOS and iOS.

(The above is "the gist" of my answer to a different, but semi-related question.)