On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton < [email protected]> wrote:
> Easy example: what's the code for [blank space] U+020 across all language > sets of Unicode? Is it the same ie: 100%? > I don't understand what you are asking, and I have a hunch you haven't said it in a way that anyone else understands it either. The code point value that the Unicode Standard assigns to the normal space is U+0020, but - not every language uses spaces - not every language that uses spaces uses them for the same purpose as English - there are some 30 other "space" characters in Unicode Statistics of character frequencies vary by corpus, as others have said. Even if you "only" look "on the web", that's undefined until you specify a crawling strategy. Dynamically generated content means that there is an infinite number of "web pages". Every crawler will come up with a different set. Maybe you are asking about statistics of character encodings? On the web? Such as, Unicode vs. Shift-JIS vs. ISO 8859-2 etc.? markus
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

