Klaus Major wrote:

Does somebody know if there is a "quick" way to extract a column from a tab limited list (in a field or a variable)? By "quick" I mean I'm not obliged to cycle through all the lines of my var because it can be quite long.

I've tried to use array but the transpose function doesn't work if the number of columns is not the same as the number of lines.
Or can I do something else with an array to achieve that goal?

use the new "split" command!
...
put "your data here" into myvar
put 2 into my_column
## The number of column you want to extract
split myvar by column
## turns your data into an array!
put myvar[my_column] into my_column_data
...

Well done, Klaus. I'd forgotten that the "split" command has been extended with the "column" token, and since I have a data management library that I use in a number of apps I decided to test this against the "repeat for each line" method I'm currently using.

It seems that even with the convenience of the new form of "split", the "repeat for each line" method is still faster - here are the results of this morning's test:

  Split: 1101 ms (490.46 lines/ms)
  Repeat: 499 ms (1082.16 lines/ms)
  Same results?: true

(MacBook Pro 2.16GHz, OS X 10.4.11)

While the relative benchmarks favor "repeat for each", in absolute terms being able to extract a column from half a million lines per second isn't bad. :)


Here's the code - please let me know if I've missed something here which may be skewing the results:

on mouseUp
  set cursor to watch
  --
  -- Number of times to run the test:
  put 1000 into n
  --
  -- "src" contains a tab-delimited list of 540 lines:
  put fld  "src" into tData
  --
  -- TEST 1: split
  put the millisecs into t
  repeat n
    put GetCol1(tData, 2) into tmp1
  end repeat
  put the millisecs - t into t1
  --
  -- TEST 2: repeat for each:
  put the millisecs into t
  repeat n
    put GetCol2(tData, 2) into tmp2
  end repeat
  put the millisecs - t into t2
  --
  -- Display results:
  put tmp1 into fld "r1"
  put tmp2 into fld "r2"
  --
  -- Display times and verify that the
  -- results are the same:
  put N * the number of lines of tData into x
  set the numberformat to "0.##"
  put "Split: "&t1 &" ms ("& x/t1 &" lines/ms)"& \
      cr& "Repeat: "&t2 &" ms ("& x/t2 &" lines/ms)"&\
      cr&"Same results?: "&(tmp1 = tmp2)
end mouseUp

--
--  TEST 1: split
--
function GetCol1 pData, pCol
  split pData by column
  return pData[pCol]
end GetCol1

--
-- TEST 2: repeat for each
--
function GetCol2 pData, pCol
  put empty into tVal
  set the itemdel to tab
  repeat for each line tLine in pData
    put item pCol of tLine &cr after tVal
  end repeat
  delete last char of tVal
  return tVal
end GetCol2



My test stack with a 540-line source field is at:

   go url "http://fourthworldlabs.com/getcol_test.rev";


--
 Richard Gaskin
 Managing Editor, revJournal
 _______________________________________________________
 Rev tips, tutorials and more: http://www.revJournal.com
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to