Unicode characters being replaced by question marks after copy and paste on Windows

I've "suffered" from this issue for years and I never knew the fix was so dead simple until Sanny menitoned "locale" in a comment above (Thanks Sanny!). Haha! Anyway, here's how to fix it if you come upon the same issue as I did:

This applies to Windows 10 (build 15002) but it may be similar to older (or newer) versions of Windows.

  1. Go to the Region settings in the Control Panel. There are several ways to do this and here's a few of them.
    • In the Search bar (Cortana) on the taskbar, search for "Control Panel". In the Control Panel, click on Change date, time, or number formats under Clock, Language and Region in category view or Region in icon list view
    • Windows 10 only: In the Search bar again, search for "region & language settings". This will open the Region & Language page in the Settings app. Scroll down until you find Additional date, time, & region settings. You may then select Region on the Control Panel window that opens.
  2. Open the Administrative tab and click on the Change system locale button. Choose a locale that is different to your current locale. I went with Japanese. I think choosing the language you will copy-paste often would be best, though it may be the same regardless. Acknowledge the change with OK.
  3. The system will ask you to restart which you'll obviously need to do to notice the changes.
  4. After restarting, test if copy-paste now works as intended. Upon success, you may re-do the above steps again and switch back to the locale you actually need to use.

That's it! Enjoy copy-pasting! ;)


Microsoft's products are all Unicode compliant. It doesn't make sense that you have to change your locale to fix the issue.

The ????? indicates that Unicode or UTF-8 is not being recognized properly (rather than being misdiagnosed as a different charset (perhaps between the program and the clipboard).

But it seems that this is an actual bug - it seems like the OS thought it was ASCII the first time, but then tried again with UTF-8. The Unicode world is very complex - to store full Unicode in every possible charset, you would need double the space and convert all your functions to be UTF-16 compliant - a massive undertaking and not very practical - imagine the storage and processing you will need to convert to UTF-16/32 - we're talking every document you own or view...so practically we use UTF-8 which encodes the standard UTF to 8-bit. But legacy functions and ASCII-based docs need to be converted to UTF-ASCII etc. What was going on here I surmise is that the OS 'guessed' incorrectly that the encoding was ASCII and used a separate function/class to copy and paste (or the default function/class). Once it 'realized' the encoding was Unicode it used that encoding. While Unicode UTF-8 is the standard there are on average about 3-4 different encodings per language for an OS to do deal with, and without knowing ahead of time what the encoding is - having to determine that is pretty hard.

From a computer's perspective, your character just looks like a pre-determined set of 1's and 0's and there is no way of objectively knowing what's the correct conversion that 01000001 is an 'A' for example. It could also be an א in Hebrew or some other character. Unicode changed all of that - each character has a unique 8-bit assignment which means you can determine what it is based on the encoding range.

So the misbehaving copy and paste probably has to do with legacy functionality with ASCII - upgrade and it should solve the problem!