[Crypto] Password hash contained '\x00' in middle, resulting in ValueError from bcrypt.hashpw
You should not be hashing it before passing to bcrypt, which is designed to do the hashing and key-stretching work itself.
It's choking on the hash result because it's expecting a redundant, mushy, ASCII (or UTF-8), not-rigorous, user-entered string.
Generally speaking, hashing things that might be untrustworthy is good to avoid various numeric vulnerabilities (e.g. the internal SHA-512 ops in Curve25519 to sanitize things) -- but, in this case, you can just trust bcrypt; it's designed to work safely and well when fed a lame, user-created password (and a proper, randomly-generated salt ).
On further contemplation, though, I believe that the behavior of
pyca/bcrypt is actually incorrect here. You are supposed to use a cryptographic hash function to convert passwords which are more than 72 bytes (576 bits) into a size that bcrypt can accept. [This is only for pigeonholing purposes; it IS still designed to safely stretch cryptographically mediocre passwords.]
Looking at the ticket on their tracker, and the changeset which they affirmed as having "fixed" it, it appears that the authors did not actually fix their implementation's oddball behavior, but instead merely updated their documentation to recommend the following workaround:
a common approach is to hash a password with a cryptographic hash (such as sha256) and then base64 encode it to prevent NULL byte problems…
This seems sloppy and bizarre to me; it will, for instance, limit such digests' output to, at most, around 408 bits (with base85 encoding), thus truncating >72-byte passwords to just 51 bytes. I would be very interested to see the opinions of more experienced cryptographers on this.
This doesn't negate the first half of this answer, but it's something to keep in mind. Obviously, even with the workaround, the scheme is still quite secure (the best cryptographic schemes are designed to survive some slip-ups in their implementations); it just doesn't seem completely correct to be applying such arbitrary truncations to the user's password, above and beyond what the author of bcrypt itself applied.
You might want to consider bouncing that 72-byte limit back to your users, if you would otherwise have to truncate longer ones by a minimum of 30% to comply with
pyca's implementation quirks. (I'm sure that those who care to max out their password's entropy and are hitting that 72-byte limit would, if they looked into how the sausage is made, ultimately prefer the former.)
Is there a best practice that sanitizes the hash prior to passing it to bcrypt that I am missing, or is there another type of byte-encoding that should be used with passwords?
One reasonable thing is to convert the hash to Base64 (say, with the standard
base64 alphabet), and truncate to 64 characters (well below the most common input size limit for bcrypt, and not above the common threshold for LF insertion in Base64). 384 bits of entropy are aplenty. That fixes the question's issue, and more including malformed UTF-8, which some bcrypt implementations could check. It should insure portability across implementations that process at least the first 64 characters in the standard way, even when they otherwise mishandle non-ASCII:
Versions of jBCrypt before 0.3 suffered from a bug related to character encoding that substantially reduced the entropy of hashed passwords containing non US-ASCII characters. An incorrect encoding step transparently replaced such characters by '?' prior to hashing. In the worst case of a password consisting solely of non-US-ASCII characters, this would cause its hash to be equivalent to all other such passwords of the same length.
bcrypt is still considered a reliable password hash (when used with an appropriate cost parameter for modern hardware) but it's quite old and has a bunch of odd quirks; the 72-byte input limit is one of them. As discussed in the other answers, what PyCA's bindings are doing to work around this limit is cryptographically dubious. I would suggest you replace bcrypt with argon2, which is a current-generation password hash that does not have any of these quirks.