Two questions:

1. Is there a way to determine the prevalence of Unicode in electronic file documents 
(vs. documents not in Unicode)? At least for the Web, has anyone done a statistical 
sampling to determine the percentage of Unicode-encoded webpages?

2. A graduate student mentioned that it was her impression that most Cyrillic webpages 
(at least for Russian--her interest) are still not encoded in Unicode. (She is doing 
some research on the use of certain words in Russian and wanted to know how best to do 
the search.) 
Again: Has anyone looked into the situation with Cyrillic in terms of the percentage 
of Web documents in Unicode? 

With thanks,
Debbie Anderson

Deborah Anderson
Researcher, Dept. of Linguistics
UC Berkeley
Email: [EMAIL PROTECTED]
or [EMAIL PROTECTED]
Script Encoding Initiative: www.linguistics.berkeley.edu/~dwanders
 



Reply via email to