Thanks Mark for pointing me (as ever) in the right direction, away from the red 
herring but to the hidden inner loop entailed in evaluating

line k of fff

A bit embarrassing because I was party to the discussion some time ago about 
the slowness of lineOffset when working with unicode text.

And thanks for the neat array trick which I wouldn’t have thought of.

There is indeed just one line in the text which contains a single Unicode 
character. Without that character evidently the variable fff would internally 
be an ascii (or native?) string, but otherwise is entirely utf16 and we hit the 
LineOffset problem.

But I am still bemused. 

Is it not the case that the processing time for looping over the number of 
lines and getting the k-th line in each iteration, for some arbitrary k, going 
to be

Order(N^2) * C * T

where

N = the number of lines in the text

C = the number of codepoints in each line 

T =  the (average) time for processing each codepoint to check for a return 
character

Now N and C are the same whether the text is ascii or unicode

Test 1
If I get rid of the red herring by replacing the matchChunk call with a simple

put line k of fff into x; put true into found — as if matchChunk always finds a 
match on the first line tested so that I am timing just the getting of lines 
endings:

The time for processing plain ascii is.  0.008 seconds
The time for processing unicode is.     0.84 seconds

Which would appear to mean that processing unicode codepoints for the 
apparently simple task of checking for return takes 100 times as much time as 
processing ascii. That seems excessive, to put it mildly!

Test 2
With the original code using matchChunk, which of course is going to have its 
own internal loop on code points so multiply by another 8 (it only searches the 
first few characters)
and will not always return true so more lines must be processed — but again 
these are same multipliers whether ascii or unicode

Plain ascii takes   0.07 seconds
Unicode takes    19.9 seconds, a multiplier of nearly 300. — I can easily 
believe matchChunk takes 3 times as long to process unicode as ascii, this is 
the sort of factor I would have expected in Test 1.

OK Mark, hit me again, I am a glutton for punishment, what is wrong with this 
analysis?

Neville Smythe




_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to