In a message dated 2002-01-20 16:49:17 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

> The point was that a UTF-8 encoded HTML file for an English web page
> carrying say 10 gifs would have a file size one-third that for a Devanagari
> web page with the same no. of gifs...
> Therefore transmission of a Devanagari web page over a network would take
> thrice as long as that of an English web page using the same images and
> presenting the same information.

This conclusion ignores two obvious points, which Asmus already made:

(1) The 10 GIFs, each of which may well be larger than the HTML file, take 
the same amount of space regardless of the encoding of the HTML file.  The 
total number of bytes involved in transmitting a Web page includes 
everything, HTML and graphics, but the purported "factor of 3" applies only 
to the HTML.

(2) The markup in an HTML file, which comprises a significant portion of the 
file, is all ASCII.  So the "factor of 3" doesn't even apply to the entire 
HTML file, only the plain-text content portion.

In addition, text written in Devanagari includes plenty of instances of 
U+0020 SPACE, plus CR and/or LF, each of which which occupies one byte each 
regardless of the encoding.

I think before worrying about the performance and storage effect on Web pages 
due to UTF-8, it might help to do some profiling and see what the actual 
impact is.

-Doug Ewell
 Fullerton, California

Reply via email to