Kevin Miller wrote:
On 05/08/2009 21:02, "Richard Gaskin" <ambassador at fourthworld.com> wrote:
Excellent sleuthing, Trevor. Confirmed: with that change I'm getting
the same results. Who would have thought there could be so much
overhead moving a custom property array into a variable array?
Bear in mind that when retrieving a custom property, the engine has to look
up whether there is a getProp handler for it. Locking messages should give
you an improvement.
In my tests here the difference was measurable but minor, but then again
I have no getProp handlers in the message path for these tests.
The main difference between arrays and chunks is that arrays will continue
to access huge data sets in linear time, whereas chunks (depending on
exactly what you're doing) will slow down as the data set gets larger.
Very true, though the difference becomes most significant only when the
data is large enough that it may be a good time to consider SQLite over
stack files for storage anyway. :)
In my tests I ran three sets of data:
small: 5000 records, with 50 items in each record, with 50 chars in each
item
medium: 10,000 records with 100 items in each record, with 100 chars in
each item
large: 10,000 records with 100 items in each record, with 200 chars in
each item
I put both the delimited and array data into the same stack for each,
giving me a size for the small stack of about 27MB, medium was about
204MB, and large was over 408MB.
The small data set performed well with both methods, as did medium
although the medium stack took some time to load (confirming my hunch
that 100MB of data is probably a good boundary to consider using SQLite
over stack files, but fortunately I'm using this just for document files
so it's unlikely I'll ever reach even half that).
The large data set could be created and saved, but the resulting stack
could not be opened; no corruption warning, just wouldn't open. Have I
discovered an undocumented limitation?
The results were as we would expect: as the data grows in size,
performance of the array search method scales linearly, while
performance for chunk expressions degrade in logarithmic proportion to
data length.
Even so, chunk expressions consistently outperformed arrays in tests
which included loading the data from properties.
When I altered the test to preload the data into vars before testing,
the difference in performance was just under an order of magnitude
greater in favor of arrays.
While this means changing my setup to preload data when documents are
first opened, this one-time hit is more than compensated by ongoing
performance enhancement for nearly all other operations.
So I'm strongly favoring arrays for this sort of thing, but it would be
nice to have three enhancements in the engine to make it even better:
1. faster load time
-------------------
Can the operation which moves data from array custom props into array
variables be optimized to reduce the surprising amount of time it takes?
Grabbing a non-array property is nearly as fast as accessing a global;
it'd be nice if accessing array props were at least a bit closer to that
stellar performance.
2. operate directly on properties
---------------------------------
It would be very handy if we could use the same array syntax to work
with properties as we can with variables. Before multi-dimensional
arrays there was an enjoyable, learnable, and efficient parity in the
syntax used for arrays in both vars and props, and I miss that when
working with nested arrays.
3. reduce data redundancy in keys
---------------------------------
Given that Rev's arrays are associative every element is a name-value
pair, so in addition to storing the value it needs to also store the
name as its key. This is necessary because for all the engines knows
every array may contain unique keys, but when making nested arrays in
which the inner arrays are all uniform the replicated key names just
take up space.
For example, with the Congress contact info I used in my original
example, it's only 530 lines with less than 1/2k per line, so tucking
that into a property in a new stack gives me a size for that stack file
of about 68k.
But when I make an array version of that data and store that into a
property in another stack, using meaningful names for those elements
(e.g., "Name", "Address", "Telephone", etc.) brings that stack size up
to 132k - more than double.
So I was daydreaming: what if we could tell the engine that a given
array is to be populated only with sub-arrays in which all keys will
always be the same?
Imagine being able to define something like a struct, a definition of an
array which could be assigned to another array so that parent array
would be able to store the data more efficiently, tucking only the data
into its nifty hash without having to replicate the field names for
every element.
I would imagine such a struct-like thing would have a great many uses,
in addition to reducing memory and storage requirements for uniform
arrays as elements of a parent array.
Doable? By Tuesday, perhaps? :)
--
Richard Gaskin
Fourth World
Revolution training and consulting: http://www.fourthworld.com
Webzine for Rev developers: http://www.revjournal.com
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution