Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
| Unicode Range | UTF-8 Encoded Bytes |
|---|---|
| U+0000-U+007F | 0xxxxxxx |
| U+0080-U+07FF | 110xxxxx 10xxxxxx |
| U+0800-U+FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
| U+10000-U+10FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
| Unicode Range | Scalar Value | UTF-16 Enocded Data |
|---|---|---|
| U+0000-U+D7FF | xxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxx |
| U+D800-U+DFFF | N/A | N/A |
| U+E000-U+FFFF | xxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxx |
| U+10000-U+10FFFF | 000uuuuu xxxxxxxx xxxxxxxx | 110110wwwwxxxxxx 110111xxxxxxxxxx |
Unicode supports scripts which may violate your assumptions. For example:
Any questions?