Re: Line Breaks Dropped on Importing Unicode Text

Sivakatirswami Fri, 04 Sep 2009 20:06:36 -0700

stephen barncard wrote:

Why are you replacing the CRs with LFs? doesn't the engine's Unicode
functions handle line endings?
-------------------------
Stephen Barncard
San Francisco
http://houseofcubes.com/disco.irev



2009/9/4 Sivakatirswami <ka...@hindu.org>

Aloha, Joe:

I'm not quite sure how your suggestion relates to the problem of endlines.

The unicode.txt file I have is being read OK in Pages on the mac.
It also loads just fine in Rev, with the exception of the line breaks

I'm not sure where the uniencode/unidecode  could be used to solve the line
break issu

Sometime on Kauai it rans for so many days (max count on my log 63days...) we live in a "mud world"

Some how my entry into unicode feels like not like a "baptism by fire"but a "baptism by mud"


welcome to petroglyph land... (smile)

Stephen: the engine only handles line line ending for "file:*" and not"binary:*"

A note on the source: This is original Tamil done in MylaiSri which mapsall chars against 0-127; Muthu Neduraman of Marusu System in Malaysia,IT Tamil Master, font designer, systems engineer etc. wrote me a C++program to transform the ASCII input out to a Unicode.txt.. reallydon't have any specs on what his program outputs. ( would love to takethat thing and turn it into an external if I knew how... that's anotherstory...)

but, that's what I'm loading... but since he works on OS X I, thought,on a hunhc sure he was piping cr's from the original ASCII out to char(13)


Joe Ault: OK we are getting some where:
I obviously made a blooper where I was replacing
char(13) with char (10) in the filename and not the data.

Of course nothing happened... fixed it:

this now works fine!

on mouseUp
 answer file "Choose a unicode file to read in."
 if it is empty then exit mouseUp
 put "binfile:" & it into urlName
 set the useUnicode to true
 put url urlName into tTamilUnicodeText
 replace numtochar(13) with numtochar(10) in tTamilUnicodeText
 set the unicodeText of fld "display" to tTamilUnicodeText
end mouseUp

OK so far so good. I'm getting the same line breaks from the original text.

Richard: thanks for the arcane script from Mark, which I only saw*after* trying the above... so I did not need it.

but I will keep it as a reference, thank you.

Jim F: thanks for the tip on always encoding... since I have to movethis stuff back and forth to the web server and possible in and out ofPostGreSQL.. I will take your advice.

So, for now it works... Read on if you want walk into the morass oftrying actually see what you have a decimal strings:


Ken Kojima, Thanks: this now works  -- well appears to, on the surface.

on mouseUp
set the useUnicode to true
if the selection is empty then
  answer "No Selection" with "ok"
end if
put the selection into tUnicode
 repeat with i=1 to the num of chars of tUnicode step 2
    put  chartonum(char i to i+1 of tUnicode) & cr after tOutput
 end repeat
put tOutput
end mouseUp

but I get super irrational results (irrational to me at least)

Tamil lives here:

U+0B80 – U+0BFF   (2944–3071)

if load the text *without* handling the line endings and select acrossthe last letters of one line and the beginning of the next:

[note, the editor of this text typically puts two end-of-paragraph (i.e.1 blank line) between paragraphs, block style]


2990 - Valid Tamil Character
3021 - Valid Tamil Character
3374 - out of range: should be line break and does show as one in Pages

8205 - out of range: should be line break and does show as one in Pages,or in the field if I do the (13) to (10) conversion

2953 - Valid Tamil Character

And if I select the same thing a second time... different results!

2990
3021
12576
2570
2992
3007

OK now... if I select another section of text where there is a text/2line breaks/text/2 line breaks/text


I get super bizarre results back

3377
45069 # way out of range.
48907
44555
52491
8203

If I lengthen the selection, left and right I get completely (almost)different results)

Even the same characters selected in the short selection are not outputas the characters:


3015
2985
3021
3391
39437
49419
45579
51979
38155
12555
3341
2992
3007


if I put  replace numtochar(13) with numtochar(10) in tTamilUnicodeText

back into my import script and then select across the end of the sameline and 2 cr's and the beginning of the 4th line, I get differentresults again. And this time, so beyond my ken as to be a black box. Idon't think I will even "go there" in trying to understand what ishappening, wrong and why we get something like:

There's more bizarre events occuring (selecting text causes charactersto switch places!) Wish me luck in creating an online editor as a revlet!


Sivakatirswami















_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Line Breaks Dropped on Importing Unicode Text

Reply via email to