Richard Cook wrote:
> Well, why stop with words, my lord? Why not just encode all sentences,
> paragraphs, pages, chapters, books, libraries, or your higher level
> unit of choice, for that matter.
> ...
> Whether you choose to associate a single glyph with your private-use
> code point, or an en
On Dec 5, 2004, at 07:02 PM, Doug Ewell wrote:
A word-based encoding for English could automatically assume spaces
where they are appropriate. The sentence:
"What means this, my lord?"
would have seven encodable elements: the five words, the comma, and the
question mark. Spaces would be automatic
Hohberger, Clive wrote:
> When I went back and recoded those same words with leading or trailing
> spaces (denoted here by "_") as: "_the", "the_" "_and", "and_", etc.
> as single bytes, I found a huge gain in efficiency in terms of the
> number of bytes to encode the sma e English text. Next, wh
So here is the idea: why not use the unused part (231 - 221 =
2,145,386,496) to encode all the words of all the languages as well.
You
could then send any word with a few bytes. This would reduce the
bandwidth necessary to send text. (You need at most six bytes to
address
all 231 code points, and
"Philippe Verdy" <[EMAIL PROTECTED]> writes:
> > Drop the part of the sentence before "then". A protocol could delete "the",
> > "an", etc. right
> > now. In fact, I suspect several library systems do drop "the", etc. right
> > now. Not that this
> > makes it a good idea, but that's a lousy argu
Don't misinterpret my words or arguments here: the purpose of the question
was strictly about which UTF or other transformation would be good for
interoperability, and storage, and whever it would be a good idea to encode
words with standard codes.
So in my view, it is completely unneeded to cr
s represented as alphabetic strings.
Clive Hohberger
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of D. Starner
Sent: Sunday, December 05, 2004 11:49 AM
To: [EMAIL PROTECTED]
Subject: Re: Unicode for words?
"Philippe Verdy" writes:
> Suppose that Uni
"Philippe Verdy" writes:
> Suppose that Unicode encodes the common English words "the", "an", "is",
> etc... then a protocol
> could decide that these words are not important and will filter them.
Drop the part of the sentence before "then". A protocol could delete "the",
"an", etc. right
now
From: "Ray Mullan" <[EMAIL PROTECTED]>
I don't see how the one million available codepoints in the Unicode
Standard could possibly accommodate a grammatically accurate vocabulary of
all the world's languages.
You have misread the message from Tim: he wanted to use "code points" above
U+10 wi
I don't see how the one million available codepoints in the Unicode
Standard could possibly accommodate a grammatically accurate vocabulary
of all the world's languages. You're overlooking the question of which
versions of words -- 'color' or 'colour' in English for instance --
would be used in
On Dec 5, 2004, at 12:27 AM, Tim Finney wrote:
my co-worker suggested encoding entire words in Unicode.
The "word" is considerably less well-defined than the character. The
set of words is open-ended. If you'd like to see where you go when you
start trying to encode words, take a look at CJK Exte
"Tim Finney" <[EMAIL PROTECTED]> writes:
> This would reduce the
> bandwidth necessary to send text.
Would it really? Ignoring all the other details (being limited
to English, for one), would words that might take up to six bytes
in UTF-8 really compete with the normal encoding, with most words
ta
Dear All
This is off topic, so feel free to ignore it.
The other day I was telling a co-worker about Unicode and how the UTF-8
encoding system works. During the far ranging discussions that followed
(we are public servants), my co-worker suggested encoding entire words
in Unicode.
This sounds like
13 matches
Mail list logo