Generally, I don't tend to like to 'jump the gun' on anything related to optimization lest it is not what it seems when running in the real world but...

I'm becoming increasingly confident that a recent foray of mine into yet another attempt to improve the speed of array access in LC9 might have actually born some quite tasty fruit. Pleasingly the code changes required are small, highly localized and quite possibly more than suitable for including in a 9.0.x maintenance release.

The current WIP PR can be found here:

  https://github.com/livecode/livecode/pull/6671

Amusingly this didn't even start as an attempt at optimization, but an attempt to see if I could do anything about heap fragmentation!

The patch itself does three things:

1) Minimizes the in-memory size of the core types - all 'book-keeping' records are now <=32 bytes on 64-bit systems

2) Makes use of tArray[x]...[z] type expressions much more efficient when both evaluating and mutating

  3) Makes use of integer indicies in arrays much more efficient

I've been running it on a number of benchmarks, the two most notable follow.

LARGE ARRAY CONSTRUCTION

The following simple code example was supplied in Bug 17434 by Richard as it appears to show a memory leak (its actually heap fragmentation, but that's another story):

        put 100 into tMax
        repeat with i = 1 to tMax
                repeat with j = 1 to tMax
                        repeat with k = 1 to tMax
                                put any item of "bob,carol,ted,alice" into 
sA[i][j][k]
                        end repeat
                end repeat
        end repeat

The code constructs an 100x100x100 matrix (rendered as a nested array, rather than a flat one).

Timings in various versions (on my 2018 MBP laptop) are:

  6.7.11: 1117ms
  9.0.1:  4020ms
  PR6671: 1017ms

This would seem to be a not too uncommon a pattern of code - judging by JBV's recent post on here (which was essentially doing the same thing).

We can modify the above slightly to make it more focussed on array performance (i.e. taking out the need to construct / copy a random portion of a string):

        put 100 into tMax
        repeat with i = 1 to tMax
                repeat with j = 1 to tMax
                        repeat with k = 1 to tMax
                                put "bob,carol,ted,alice" into sA[i][j][k]
                        end repeat
                end repeat
        end repeat

Timings for this are:

  6.7.11: 1055ms
  9.0.1:  3281ms
  PR6671: 497ms

ARRAY RECORD FILTERING

The following code example was derived by me for my recent LCG talk based on a real-world code supplied by Mark Talluto. It iterates over a sequence of arrays, creating a new one where each record only has a subset of the keys. The keys to keep are taken from a stringlist:

private function _BenchmarkArrayFilterRecords pRecords, pColumnNames
   local tFilteredRecords, tFilteredRecord
   repeat for each element tRecord in pRecords
      repeat for each item tColumnName in pColumnNames
         put tRecord[tColumnName] into tFilteredRecord[tColumnName]
      end repeat
      put tFilteredRecord into \
tFilteredRecords[the number of elements in tFilteredRecords + 1]
   end repeat
   return tFilteredRecords
end _BenchmarkArrayFilterRecords

Timings for 100000 iterations of the above are:

  6.7.11: 16872ms
  9.0.1:  8305ms
  PR6671: 4315ms

The following is the same as the above, except the keys to keep are taken from the keys of an array:

private function _BenchmarkArrayFilterRecordsSet pRecords, pColumnNameSet
   local tFilteredRecords, tFilteredRecord
   repeat for each element tRecord in pRecords
      repeat for each key tColumnName in pColumnNameSet
         put tRecord[tColumnName] into tFilteredRecord[tColumnName]
      end repeat
      put tFilteredRecord into \
tFilteredRecords[the number of elements in tFilteredRecords + 1]
   end repeat
   return tFilteredRecords
end _BenchmarkArrayFilterRecordsSet

Timings for 100000 iterations of the above are:

  6.7.11: 16508ms
  9.0.1:  6397ms
  PR6671: 3001ms

REAL WORLD CASE

Now, I'm always a little skeptical about using synthetic benchmarks for performance. However, both of the above are actually real-world examples. Furthermore, when running a rather large LCS project on an engine with PR6671, I got a 2x speed up - one particular input took 3mins to process, rather than 6mins (one phase of processing actually saw a 5x speed up!).

Anyway, thought this might make some potentially pleasing Friday afternoon news :)

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to