On 2016-03-22 15:24, Richard Gaskin wrote:
What is the size of the read buffer used when reading until <char>?

I'm assuming it isn't reading a single char per disk access, probably
at least using the file system's block size, no?

Well, the engine will memory map files if it can (if there is available address space) so for smaller (sub 1Gb) files they are essentially all buffered. For larger files, the engine uses the stdio FILE abstraction so will get buffering from that.

Given that the engine is probably already doing pretty much the same
thing, would it make sense to consider a readBufferSize global
property which would govern the size of the buffer the engine uses
when executing "read...until <char>"?

Perhaps - the read until routines could potentially be made more efficient. For some streams, buffering is inappropriate unless explicitly stated (which isn't an option at the moment). For example, for serial port streams and process streams you don't want to read any more than you absolutely need to as the other end can block if you ask it for more data than it has available. At the moment the engine favours the 'do not read any more than absolutely necessary' approach as the serial/file/process stream processing code is the same.

In my experiments I was surprised to find that larger buffers (>10MB)
were slower than "read...until <char>", but the sweet spot seemed to
be around 128k.  Presumably this has to do with the overhead of
allocating contiguous memory, and if you have any insights on that it
would be interesting to learn more.

My original reasoning on this was a 'working set' argument. Modern CPUs heavily rely on various levels of memory cache, access getting more expensive as the cache is further away from the processor. If you use a reasonable sized buffer to implement processing in a stream fashion, then the working set is essentially just that buffer which means less movement of blocks of memory from physical memory to/from the processors levels of cache.

However, having chatted to Fraser, he pointed out that Linux tends to have a file read ahead of 64kb-128kb 'builtin'. This means that the OS will proactively prefetch the next 64-128kb of data after it has finished fetching the one you have asked for. The result is that data is being read from disk by the OS whilst your processing code is running meaning that things get done quicker. (In contrast, if you have a 10Mb buffer then you have to wait to read 10Mb before you can do anything with it, and then do that again when the buffer is empty).

Pretty much any program will read big files in chunks, and if LC can
do so optimally with all the grace and ease of "read...until <char>"
it makes one more strong set of use cases where choosing LC isn't a
tradeoff but an unquestionable advantage.

If you have the time to submit a report in the QC with a sample stack measuring the time of a simple 'read until cr' type loop with some data and comparing it to the more efficient approach you found then it is something we (or someone else) can do some digging into at some point to see what we can do to improve its performance.

As I said initially, for smaller files I'd be surprised if we could do that much since those files will be memory mapped; however, it might be there are some improvements which could be made for larger (non memory mappable) files.

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to