![]() As is recommended with UTF-8, no Byte Order Marks (BOM) are employed. Your viewer might need to be told that the files are UTF-8 for them to show properly. The UTF-8_sequence_separated/*.txt are UTF-8 encoded plaintext documents containing every UTF-8 code point in a given range separated by spaces with newlines every 50 code points to aid readability. ![]() You never know what garbage people, fuzzers or errors will throw at your system, so here you'll find the gamut of representable characters / code points to test with. ![]() These would include control codes like NULL, EOT, XOFF, CANcel and the never-seen-used DC2, all of 7-bit US-ASCII and explode in volume to cover the deepest recesses of Unicode. While building and testing code meant to properly handle arbitrary UTF-8 strings, you might want to make use of some test documents that include every possible Unicode codepoint.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |