Opened a JPG picture with notepad, pasted all the "text" to a new notepad file, changed to .JPG and it no longer opens. Why?

Depending on the encoding used to open the file you might see different behaviour. My Windows 7 notepad allows to open a file in ANSI, UTF-8, Unicode or Unicode big endian.

I've tested this issue with a small 2x2 pixel jpeg image created with gimp and opening and saving the image file with ANSI encoding. Opening both the original and the saved image with an hex editor I see that all 00 sequences (two hex digits, NUL control character) have been converted to 20 (space character).

Replacing back in the hex editor all 20 by 00 restores the image format.

I've googled it a bit and I didn't found any references that explain why it does that. Only a reference to a post that warns about it (google cache link, the page is not available).

If you save/open the file as UTF-8 it seems that it still converts NUL characters to spaces but it also increases the resulting file size due to conversions from single-byte characters to UTF-8 multi-byte sequences.

If you save/open the file as Unicode it seems that it still converts NUL characters to spaces but also adds a byte to the beginning of the file, the BOM.


Why it fails :

Notepad create spaces (ASCII code 32) character for characters like NUL (ASCII code 0) because Windows API's text box only allows null terminated char * ASCIIZ (character array, pointer). It gets cut off at the first NUL.

That happens because Windows API is mostly written in C language and null terminated strings are one of the common features. Even when modern Windows and Unicode is considered same null terminated strings occur. So notepad simply replace them with space so you can view the complete file.

So when you save the file it is corrupted.

wikipedia-null terminated strings


How to do further research :

You may use a comparator like beyond compare (commercial,trial) to see the character replacement effect. also see other binary compare tools.

hex comparison

Note : (20)16 = (32)10


Reason for notepad acts slowly on large files

It checks each character and replace special characters with spaces. Other software do not do in-memory conversions (at least not primitive as notepad). They just render special characters differently. And they use advanced buffering techniques.


Looking into Notepad.exe (XP 32 bit)

( I'm assuming its still written in C++ or at least use a comparably similar linker )

notepad

I'm using the PEiD tool (which stopped development with introduction of PE+/64 exes)

PEiD can be found bundled in the bin folder of Universal Extractor

I extracted the notepad. ex_ file from the Windows xp iso obviously. Try it out. It's a cab file extract using 7z.

Warning ! Your virus scanner might detect Universal Extractor/PEiD as hack tools or viruses. Don't Trust it don't download it !!


Further info about windows API

credits:Jason C

It's not just the text box; WM_SETTEXT in general provides no parameter for specifying the string length, and strings are always assumed to terminate at null. You could always create a custom text box with a custom message that specified the string length, but Notepad and most other programs reasonably do not. Also the function SetWindowText does not provide a length parameter as well.


Notepad does not preserve all special / extended characters exactly as they are. I don't have a reference for this behaviour immediately at hand but have found this to be the case for example with UNIX-style end of line LF which Notepad will convert into CRLF and null (0x00) which it will ignore. In a binary file such as a JPG there are liable to be random occurrences of the character(s) that Notepad does not preserve. Try your experiment with a HEX-aware editor and it should work then. I'll update my answer if I find a good reference and once I've tested a HEX editor.

Update: I tried a few well known programmers editors but only one of them worked right off the bat, HxD by Maël Hörz. I never used HxD before but found it thanks to an answer to this Stack article, A hex viewer / editor plugin for Notepad++.

The other editors that didn't work after a few minutes effort were Notepad++, Notepad2 and UltraEdit (v17.3, older version). A couple of these had problems with the copy / paste of the first few bytes, the JPEG file signature magic number FF D8 FF. Maybe they would work with a little more fiddling than I have time for at present.