Kevin Miller wrote:

On 05/08/2009 21:02, "Richard Gaskin" <ambassador at fourthworld.com> wrote:

Excellent sleuthing, Trevor.   Confirmed:  with that change I'm getting
the same results.  Who would have thought there could be so much
overhead moving a custom property array into a variable array?

Bear in mind that when retrieving a custom property, the engine has to look
up whether there is a getProp handler for it. Locking messages should give
you an improvement.

In my tests here the difference was measurable but minor, but then again I have no getProp handlers in the message path for these tests.


The main difference between arrays and chunks is that arrays will continue
to access huge data sets in linear time, whereas chunks (depending on
exactly what you're doing) will slow down as the data set gets larger.

Very true, though the difference becomes most significant only when the data is large enough that it may be a good time to consider SQLite over stack files for storage anyway. :)


In my tests I ran three sets of data:

small: 5000 records, with 50 items in each record, with 50 chars in each item

medium: 10,000 records with 100 items in each record, with 100 chars in each item

large: 10,000 records with 100 items in each record, with 200 chars in each item

I put both the delimited and array data into the same stack for each, giving me a size for the small stack of about 27MB, medium was about 204MB, and large was over 408MB.

The small data set performed well with both methods, as did medium although the medium stack took some time to load (confirming my hunch that 100MB of data is probably a good boundary to consider using SQLite over stack files, but fortunately I'm using this just for document files so it's unlikely I'll ever reach even half that).

The large data set could be created and saved, but the resulting stack could not be opened; no corruption warning, just wouldn't open. Have I discovered an undocumented limitation?

The results were as we would expect: as the data grows in size, performance of the array search method scales linearly, while performance for chunk expressions degrade in logarithmic proportion to data length.

Even so, chunk expressions consistently outperformed arrays in tests which included loading the data from properties.

When I altered the test to preload the data into vars before testing, the difference in performance was just under an order of magnitude greater in favor of arrays.

While this means changing my setup to preload data when documents are first opened, this one-time hit is more than compensated by ongoing performance enhancement for nearly all other operations.


So I'm strongly favoring arrays for this sort of thing, but it would be nice to have three enhancements in the engine to make it even better:

1. faster load time
-------------------
Can the operation which moves data from array custom props into array variables be optimized to reduce the surprising amount of time it takes? Grabbing a non-array property is nearly as fast as accessing a global; it'd be nice if accessing array props were at least a bit closer to that stellar performance.

2. operate directly on properties
---------------------------------
It would be very handy if we could use the same array syntax to work with properties as we can with variables. Before multi-dimensional arrays there was an enjoyable, learnable, and efficient parity in the syntax used for arrays in both vars and props, and I miss that when working with nested arrays.


3. reduce data redundancy in keys
---------------------------------
Given that Rev's arrays are associative every element is a name-value pair, so in addition to storing the value it needs to also store the name as its key. This is necessary because for all the engines knows every array may contain unique keys, but when making nested arrays in which the inner arrays are all uniform the replicated key names just take up space.

For example, with the Congress contact info I used in my original example, it's only 530 lines with less than 1/2k per line, so tucking that into a property in a new stack gives me a size for that stack file of about 68k.

But when I make an array version of that data and store that into a property in another stack, using meaningful names for those elements (e.g., "Name", "Address", "Telephone", etc.) brings that stack size up to 132k - more than double.

So I was daydreaming: what if we could tell the engine that a given array is to be populated only with sub-arrays in which all keys will always be the same?

Imagine being able to define something like a struct, a definition of an array which could be assigned to another array so that parent array would be able to store the data more efficiently, tucking only the data into its nifty hash without having to replicate the field names for every element.

I would imagine such a struct-like thing would have a great many uses, in addition to reducing memory and storage requirements for uniform arrays as elements of a parent array.

Doable?  By Tuesday, perhaps? :)

--
 Richard Gaskin
 Fourth World
 Revolution training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to