Hugh,

The condition you've chosen for deciding whether to delete the line is whether or not the line is empty.
So in method 2, replacing those lines by "" has no effect on the data.

That is, I think, an inadequate benchmark. The primary cost you would encounter in the real case with method 2 is the data copying caused by emptying the line (i.e. all subsequent content in the variable has to be copied down into place). So at the very least it gives very optimistic times for method 2.

I tried the same code (more or less - added the "filter" for method 2 to get the same results) on a data set which contains real lines of data to be deleted.

I also added method4, which tries to get the best of both worlds. It restricts the additional memory usage (by building up a second variable, but removing sections of the input variable at the same time), and also does relatively few deletions (and hence few data copies).

(full script included below)

LC7DP10, MacOSX 10.8.5 Macbook Pro

Source data:
10,000 lines of 100 chars, alternating between two values (non-random so it would be completely repeatable)
  5000 lines begin with char "a", other 5000 being with "b"

Condition for deletion
  First char of the line is "a"
   (hence 5000 lines are deleted

Method 1:
 not timed - was taking too long :-)

Method 2:
  11.51 secs

Method 3:
  0.04 secs

Method 4:
   0.09 secs

Changing the conditional test so that no lines are deleted, the times are all < 0.1 seconds


Conclusion:
  if memory is not an issue, method 3 is best
  if memory is an issue, method 4 is best

-- Alex.


on mouseUp
   local t1, t2, t3, t4
   --   put method1() into t1
   put method2() into t2
   put method3() into t3
   put method4() into t4

put the number of lines in t1 && the number of lines in t2 && the number of lines in t3 && the number of lines in t4 after fld 1
end mouseup

function make_data
   local L, L1, t
   put "a" into L
   put "b" into item 100 of L
   put "b" into L1
   put "a" into item 100 of L1
   repeat 5000 times
      put L & CR & L1 & CR after t
   end repeat
   return t
end make_data

function method1
   local tVar, tStart
   set the cursor to watch
   put make_data() into tVar
   put the long seconds into tStart
   repeat with x=the number of lines in tVar down to 1
      if char 1 of line x of tVar="a" then
         delete line x of tVar
      end if
   end repeat
   put the long seconds - tStart & CR after fld 1
   return tVar
end method1

function method2
   local tVar, L, tStart, x
   set the cursor to watch
   put make_data() into tVar
   put the long seconds into tStart
   repeat for each line L in tVar
      add 1 to x
      --      put x && L &CR after msg
      if char 1 of L="a" then
         put "" into line x of tVar
      end if
   end repeat
   filter tVar without empty
   put the long seconds - tStart & CR after fld 1
   return tVar
end  method2

function method3
   local tVar, tStart, L, tdout
   set the cursor to watch
   put make_data() into tVar
   put the long seconds into tStart
   repeat for each line L in tVar
      if char 1 of L<>"a" then
         put L &cr after tdout
      end if
   end repeat
   if last char of tdout=cr then delete last char of tdout
   put the long seconds - tStart & CR after fld 1
   return tdout
end method3

function method4
   constant K = 1000 -- probably should be a bigger number

   local tVar, tStart, L, tdout, x
   set the cursor to watch
   put make_data() into tVar
   put the long seconds into tStart

   repeat forever
      if the number of lines in tVar = 0 then exit repeat
      put 0 into x
      repeat for each line L in tVar
         if char 1 of L<>"a" then
            put L &cr after tdout
         end if
         add 1 to x
         if x >= K then exit repeat
      end repeat
      delete line 1 to K of tVar
   end repeat
   if last char of tdout=cr then delete last char of tdout
   put the long seconds - tStart & CR after fld 1
   return tdout
end method4



On 31/08/2014 10:53, FlexibleLearning.com wrote:
Some benchtesting...

Setup:
     LC7DP10, Windows 7

Source data:
     10,000 lines of random data
     100 chars per line
     3,346 empty lines

Task:
     Strip lines where a given condition is met.

Results:
     Method 1
     Operating on a single variable, 'repeat with' + delete line
     25.586 secs

Method 2
     Operating on a single variable, 'repeat for each' + filter with empty
     7.755 secs

Method 3
     Using a second variable for output, 'repeat for each' + second variable
     0.136 secs

Conclusions:
     If memory is an issue, then Method 2 is best
     If memory is not an issue, then Method 3 is best

Scripts applied:
#1:
on mouseUp
    set the cursor to watch
    put the long seconds into tStart
    put fld "Data" into tVar
    repeat with x=the number of lines in tVar down to 1
       if line x of tVar="" then
          delete line x of tVar
       end if
    end repeat
    put tVar into fld "output"
    put the long seconds - tStart into fld "timer1"
end mouseUp

#2:
on mouseUp
    set the cursor to watch
    put the long seconds into tStart
    put fld "data" into tVar
    repeat for each line L in tVar
       add 1 to x
       if L="" then
          put "" into line x of tVar
       end if
    end repeat
    put tVar into fld "output"
    put the long seconds - tStart into fld "timer2"
end mouseUp

#3:
on mouseUp
    set the cursor to watch
    put fld "Data" into tVar
    put the long seconds into tStart
    repeat for each line L in tVar
       if L<>"" then
          put L &cr after stdout
       end if
    end repeat
    if last char of stdout=cr then delete last char of stdout
    put stdout into fld "output"
    put the long seconds - tStart into fld "timer3"
end mouseUp


On 30/08/2014 08:45, FlexibleLearning.com wrote:
Peter Haworth <p...@lcsql.com> wrote

There's another situation where I use repeat with even though it's a
little
slower than repeat for and I also alter the contents of the data I'm
repeating through without any problems.

repeat with x=the number of lines in tVar down to to 1
     if <data condition on  line x of tVar> then
        delete line x of tVar
     end if
end repeat

This is an insightful observation. Nice one, Pete!

My stock method (and presumably the method you allude to above) is...

repeat for each line L in tVar
      add 1 to x
      if <data condition on  L> then put "" into line x of tVar
end repeat
filter tVar without empty

Both methods operate on a single data set and avoid putting the output
into a second variable which, for large datasets, involve an unnecessary
memory overhead..

Hugh Senior
FLCo


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to