Re: Chunks vs Arrays - surprising benchmarking results

Richard Gaskin Fri, 07 Aug 2009 10:23:00 -0700

Paul Looney wrote:

I have nothing to add directly to the chunk vs array discussion(Trevor's reply was very good) but I have often found it helpful toincrease the speed of compound selections by breaking them intoindividual ones.
For instance if you have a large database of names and sexes and youwant to select every female named "Jan" ("Jan" could be male or female).Select all of the Jans first (this will run much faster than thecompound selection).Then select all of the females from the result of the first selection(this will run faster because it is searching only "Jan"s - a verysmall list).
This double selection will run faster than a single compound selection.
Obviously this requires a known data-set where one filter willeliminate a lot of records (selecting "female", then selecting "Jan"would be much slower in our example because, presumably, half of thelist is female and a small portion is Jan).On many lists this can create a much bigger speed difference thanchunk vs array variance you noted.

One of the tough challenges with this sort of benchmarking is thatdifferent methods will favor different test cases.

But with delimited rows and columns, I haven't found a way to make atwo-pass search run faster than one pass, except in very specializedcases as you noted.

There's a temptation to use the filter command for the first pass, butfilter is only faster when testing the first few items; filtering on the10th item is much slower, and attempting to test the 50th item in asample data set caused Rev to hang. RegEx is a harsh mistress.

In my case, I don't often know in advance which item will be searched.The queries I'm running usually come from a Search dialog in which theuser can specify criteria. I could make the search function smartenough to special-case certain types of searches to use a two-passmethod in which the first pass is the filter command where practical,but the overhead of analyzing both the query and the data to make suchdeterminations may detract from the benefits of doing so, esp. since mycontinued testing on this is increasingly nudging me towardmulti-dimensional arrays anyway. Even with the data bloat and thesurprising overhead of moving arrays in and out of storage, with alittle extra work to deal with those the performance of arrays seemsunbeatable in the broadest range of use cases I've run thus far.


--
 Richard Gaskin
 Fourth World
 Revolution training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Chunks vs Arrays - surprising benchmarking results

Reply via email to