Very helpful info - thanks!

I'll see if I can dig up my old experiment code and submit a tidy version with an enhancement request.

My hope was that it might be as simple as "Aha, yes, as bigger buffer size!", but few things in life are that simple. :)

--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 ____________________________________________________________________
 ambassa...@fourthworld.com                http://www.FourthWorld.com


Mark Waddingham wrote:
On 2016-03-22 15:24, Richard Gaskin wrote:
What is the size of the read buffer used when reading until <char>?

I'm assuming it isn't reading a single char per disk access, probably
at least using the file system's block size, no?

Well, the engine will memory map files if it can (if there is available
address space) so for smaller (sub 1Gb) files they are essentially all
buffered. For larger files, the engine uses the stdio FILE abstraction
so will get buffering from that.

Given that the engine is probably already doing pretty much the same
thing, would it make sense to consider a readBufferSize global
property which would govern the size of the buffer the engine uses
when executing "read...until <char>"?

Perhaps - the read until routines could potentially be made more
efficient. For some streams, buffering is inappropriate unless
explicitly stated (which isn't an option at the moment). For example,
for serial port streams and process streams you don't want to read any
more than you absolutely need to as the other end can block if you ask
it for more data than it has available. At the moment the engine favours
the 'do not read any more than absolutely necessary' approach as the
serial/file/process stream processing code is the same.

In my experiments I was surprised to find that larger buffers (>10MB)
were slower than "read...until <char>", but the sweet spot seemed to
be around 128k.  Presumably this has to do with the overhead of
allocating contiguous memory, and if you have any insights on that it
would be interesting to learn more.

My original reasoning on this was a 'working set' argument. Modern CPUs
heavily rely on various levels of memory cache, access getting more
expensive as the cache is further away from the processor. If you use a
reasonable sized buffer to implement processing in a stream fashion,
then the working set is essentially just that buffer which means less
movement of blocks of memory from physical memory to/from the processors
levels of cache.

However, having chatted to Fraser, he pointed out that Linux tends to
have a file read ahead of 64kb-128kb 'builtin'. This means that the OS
will proactively prefetch the next 64-128kb of data after it has
finished fetching the one you have asked for. The result is that data is
being read from disk by the OS whilst your processing code is running
meaning that things get done quicker. (In contrast, if you have a 10Mb
buffer then you have to wait to read 10Mb before you can do anything
with it, and then do that again when the buffer is empty).

Pretty much any program will read big files in chunks, and if LC can
do so optimally with all the grace and ease of "read...until <char>"
it makes one more strong set of use cases where choosing LC isn't a
tradeoff but an unquestionable advantage.

If you have the time to submit a report in the QC with a sample stack
measuring the time of a simple 'read until cr' type loop with some data
and comparing it to the more efficient approach you found then it is
something we (or someone else) can do some digging into at some point to
see what we can do to improve its performance.

As I said initially, for smaller files I'd be surprised if we could do
that much since those files will be memory mapped; however, it might be
there are some improvements which could be made for larger (non memory
mappable) files.

Warmest Regards,

Mark.


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to