Chris wrote: > There is no way to compare 2 HTML elements and know they are talking about the same character
That's because character identity is a hard problem. Is the emoji TIGER the same as TONY THE TIGER or as TONY THE TIGER GIVING THE VICTORY SIGN? http://www.engadget.com/2014/04/30/you-may-be-accidentally-sending-friends-a-hairy-heart-emoji/ Note that even in Unicode, the set ẛ ᷥ ſ ṡ s S Ŝ may be considered the same character or up to seven different characters, depending on case-folding, canonization and accent dropping. > Similarly, there is no way to search or index html elements. If a HTML document contained an image of a particular custom character, there would be no way to ask google or whatever to find all the documents with that character. Different documents would represent it differently. You can index links to images. If two documents represent it differently, then I go back to the above; we can't know that they're the same thing. On Tue, Jun 2, 2015 at 7:11 PM Chris <idou...@gmail.com> wrote: > You can’t ask the entire computing universe to compress everything all the > time. Anytime we care about how much space text takes up, it should be compressed. It compresses very well. On the other hand, it's rare that anyone cares anymore; what's a few hundred kilobytes between friends?