Hi, Interesting graph from Google about the relative usage of different character encodings:
http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html
It's interesting to see that the Unicode entry only lists the UTF-8
encoding. Are the other Unicode encodings so infrequent?
I think we can use this data as a guideline when optimizing the
encoding detection code in Tika.
BR,
Jukka Zitting
