stephen barncard wrote:
Why are you replacing the CRs with LFs? doesn't the engine's Unicode
functions handle line endings?
-------------------------
Stephen Barncard
San Francisco
http://houseofcubes.com/disco.irev


2009/9/4 Sivakatirswami <ka...@hindu.org>

Aloha, Joe:

I'm not quite sure how your suggestion relates to the problem of endlines.

The unicode.txt file I have is being read OK in Pages on the mac.
It also loads just fine in Rev, with the exception of the line breaks

I'm not sure where the uniencode/unidecode  could be used to solve the line
break issu
Sometime on Kauai it rans for so many days (max count on my log 63 days...) we live in a "mud world"

Some how my entry into unicode feels like not like a "baptism by fire" but a "baptism by mud"

welcome to petroglyph land... (smile)

Stephen: the engine only handles line line ending for "file:*" and not "binary:*"

A note on the source: This is original Tamil done in MylaiSri which maps all chars against 0-127; Muthu Neduraman of Marusu System in Malaysia, IT Tamil Master, font designer, systems engineer etc. wrote me a C++ program to transform the ASCII input out to a Unicode.txt.. really don't have any specs on what his program outputs. ( would love to take that thing and turn it into an external if I knew how... that's another story...)

but, that's what I'm loading... but since he works on OS X I, thought, on a hunhc sure he was piping cr's from the original ASCII out to char(13)

Joe Ault: OK we are getting some where:
I obviously made a blooper where I was replacing
char(13) with char (10) in the filename and not the data.

Of course nothing happened... fixed it:

this now works fine!
on mouseUp
 answer file "Choose a unicode file to read in."
 if it is empty then exit mouseUp
 put "binfile:" & it into urlName
 set the useUnicode to true
 put url urlName into tTamilUnicodeText
 replace numtochar(13) with numtochar(10) in tTamilUnicodeText
 set the unicodeText of fld "display" to tTamilUnicodeText
end mouseUp

OK so far so good. I'm getting the same line breaks from the original text.

Richard: thanks for the arcane script from Mark, which I only saw *after* trying the above... so I did not need it.
but I will keep it as a reference, thank you.

Jim F: thanks for the tip on always encoding... since I have to move this stuff back and forth to the web server and possible in and out of PostGreSQL.. I will take your advice.

So, for now it works... Read on if you want walk into the morass of trying actually see what you have a decimal strings:

Ken Kojima, Thanks: this now works  -- well appears to, on the surface.

on mouseUp
set the useUnicode to true
if the selection is empty then
  answer "No Selection" with "ok"
end if
put the selection into tUnicode
 repeat with i=1 to the num of chars of tUnicode step 2
    put  chartonum(char i to i+1 of tUnicode) & cr after tOutput
 end repeat
put tOutput
end mouseUp

but I get super irrational results (irrational to me at least)

Tamil lives here:

U+0B80 – U+0BFF   (2944–3071)

if load the text *without* handling the line endings and select across the last letters of one line and the beginning of the next:

[note, the editor of this text typically puts two end-of-paragraph (i.e. 1 blank line) between paragraphs, block style]

2990 - Valid Tamil Character
3021 - Valid Tamil Character
3374 - out of range: should be line break and does show as one in Pages
8205 - out of range: should be line break and does show as one in Pages, or in the field if I do the (13) to (10) conversion
2953 - Valid Tamil Character

And if I select the same thing a second time... different results!

2990
3021
12576
2570
2992
3007



OK now... if I select another section of text where there is a text/2 line breaks/text/2 line breaks/text

I get super bizarre results back

3377
45069 # way out of range.
48907
44555
52491
8203

If I lengthen the selection, left and right I get completely (almost) different results)

Even the same characters selected in the short selection are not output as the characters:

3015
2985
3021
3391
39437
49419
45579
51979
38155
12555
3341
2992
3007


if I put  replace numtochar(13) with numtochar(10) in tTamilUnicodeText

back into my import script and then select across the end of the same line and 2 cr's and the beginning of the 4th line, I get different results again. And this time, so beyond my ken as to be a black box. I don't think I will even "go there" in trying to understand what is happening, wrong and why we get something like:

2985
3021
2623
39434
49419
45579
51979
38155
44555
52491
8203
2609
45066
48907

There's more bizarre events occuring (selecting text causes characters to switch places!) Wish me luck in creating an online editor as a revlet!

Sivakatirswami















_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to