stephen barncard wrote:
Why are you replacing the CRs with LFs? doesn't the engine's Unicode
functions handle line endings?
-------------------------
Stephen Barncard
San Francisco
http://houseofcubes.com/disco.irev
2009/9/4 Sivakatirswami <ka...@hindu.org>
Aloha, Joe:
I'm not quite sure how your suggestion relates to the problem of endlines.
The unicode.txt file I have is being read OK in Pages on the mac.
It also loads just fine in Rev, with the exception of the line breaks
I'm not sure where the uniencode/unidecode could be used to solve the line
break issu
Sometime on Kauai it rans for so many days (max count on my log 63
days...) we live in a "mud world"
Some how my entry into unicode feels like not like a "baptism by fire"
but a "baptism by mud"
welcome to petroglyph land... (smile)
Stephen: the engine only handles line line ending for "file:*" and not
"binary:*"
A note on the source: This is original Tamil done in MylaiSri which maps
all chars against 0-127; Muthu Neduraman of Marusu System in Malaysia,
IT Tamil Master, font designer, systems engineer etc. wrote me a C++
program to transform the ASCII input out to a Unicode.txt.. really
don't have any specs on what his program outputs. ( would love to take
that thing and turn it into an external if I knew how... that's another
story...)
but, that's what I'm loading... but since he works on OS X I, thought,
on a hunhc sure he was piping cr's from the original ASCII out to char(13)
Joe Ault: OK we are getting some where:
I obviously made a blooper where I was replacing
char(13) with char (10) in the filename and not the data.
Of course nothing happened... fixed it:
this now works fine!
on mouseUp
answer file "Choose a unicode file to read in."
if it is empty then exit mouseUp
put "binfile:" & it into urlName
set the useUnicode to true
put url urlName into tTamilUnicodeText
replace numtochar(13) with numtochar(10) in tTamilUnicodeText
set the unicodeText of fld "display" to tTamilUnicodeText
end mouseUp
OK so far so good. I'm getting the same line breaks from the original text.
Richard: thanks for the arcane script from Mark, which I only saw
*after* trying the above... so I did not need it.
but I will keep it as a reference, thank you.
Jim F: thanks for the tip on always encoding... since I have to move
this stuff back and forth to the web server and possible in and out of
PostGreSQL.. I will take your advice.
So, for now it works... Read on if you want walk into the morass of
trying actually see what you have a decimal strings:
Ken Kojima, Thanks: this now works -- well appears to, on the surface.
on mouseUp
set the useUnicode to true
if the selection is empty then
answer "No Selection" with "ok"
end if
put the selection into tUnicode
repeat with i=1 to the num of chars of tUnicode step 2
put chartonum(char i to i+1 of tUnicode) & cr after tOutput
end repeat
put tOutput
end mouseUp
but I get super irrational results (irrational to me at least)
Tamil lives here:
U+0B80 – U+0BFF (2944–3071)
if load the text *without* handling the line endings and select across
the last letters of one line and the beginning of the next:
[note, the editor of this text typically puts two end-of-paragraph (i.e.
1 blank line) between paragraphs, block style]
2990 - Valid Tamil Character
3021 - Valid Tamil Character
3374 - out of range: should be line break and does show as one in Pages
8205 - out of range: should be line break and does show as one in Pages,
or in the field if I do the (13) to (10) conversion
2953 - Valid Tamil Character
And if I select the same thing a second time... different results!
2990
3021
12576
2570
2992
3007
OK now... if I select another section of text where there is a text/2
line breaks/text/2 line breaks/text
I get super bizarre results back
3377
45069 # way out of range.
48907
44555
52491
8203
If I lengthen the selection, left and right I get completely (almost)
different results)
Even the same characters selected in the short selection are not output
as the characters:
3015
2985
3021
3391
39437
49419
45579
51979
38155
12555
3341
2992
3007
if I put replace numtochar(13) with numtochar(10) in tTamilUnicodeText
back into my import script and then select across the end of the same
line and 2 cr's and the beginning of the 4th line, I get different
results again. And this time, so beyond my ken as to be a black box. I
don't think I will even "go there" in trying to understand what is
happening, wrong and why we get something like:
2985
3021
2623
39434
49419
45579
51979
38155
44555
52491
8203
2609
45066
48907
There's more bizarre events occuring (selecting text causes characters
to switch places!) Wish me luck in creating an online editor as a revlet!
Sivakatirswami
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution