Re: Buffer size (was Looking for parser for Email (MIME))

Mark Waddingham Tue, 22 Mar 2016 09:12:14 -0700

On 2016-03-22 15:24, Richard Gaskin wrote:

What is the size of the read buffer used when reading until <char>?


I'm assuming it isn't reading a single char per disk access, probably
at least using the file system's block size, no?

Well, the engine will memory map files if it can (if there is availableaddress space) so for smaller (sub 1Gb) files they are essentially allbuffered. For larger files, the engine uses the stdio FILE abstractionso will get buffering from that.

Given that the engine is probably already doing pretty much the same
thing, would it make sense to consider a readBufferSize global
property which would govern the size of the buffer the engine uses
when executing "read...until <char>"?

Perhaps - the read until routines could potentially be made moreefficient. For some streams, buffering is inappropriate unlessexplicitly stated (which isn't an option at the moment). For example,for serial port streams and process streams you don't want to read anymore than you absolutely need to as the other end can block if you askit for more data than it has available. At the moment the engine favoursthe 'do not read any more than absolutely necessary' approach as theserial/file/process stream processing code is the same.

In my experiments I was surprised to find that larger buffers (>10MB)
were slower than "read...until <char>", but the sweet spot seemed to
be around 128k.  Presumably this has to do with the overhead of
allocating contiguous memory, and if you have any insights on that it
would be interesting to learn more.

My original reasoning on this was a 'working set' argument. Modern CPUsheavily rely on various levels of memory cache, access getting moreexpensive as the cache is further away from the processor. If you use areasonable sized buffer to implement processing in a stream fashion,then the working set is essentially just that buffer which means lessmovement of blocks of memory from physical memory to/from the processorslevels of cache.

However, having chatted to Fraser, he pointed out that Linux tends tohave a file read ahead of 64kb-128kb 'builtin'. This means that the OSwill proactively prefetch the next 64-128kb of data after it hasfinished fetching the one you have asked for. The result is that data isbeing read from disk by the OS whilst your processing code is runningmeaning that things get done quicker. (In contrast, if you have a 10Mbbuffer then you have to wait to read 10Mb before you can do anythingwith it, and then do that again when the buffer is empty).

Pretty much any program will read big files in chunks, and if LC can
do so optimally with all the grace and ease of "read...until <char>"
it makes one more strong set of use cases where choosing LC isn't a
tradeoff but an unquestionable advantage.

If you have the time to submit a report in the QC with a sample stackmeasuring the time of a simple 'read until cr' type loop with some dataand comparing it to the more efficient approach you found then it issomething we (or someone else) can do some digging into at some point tosee what we can do to improve its performance.

As I said initially, for smaller files I'd be surprised if we could dothat much since those files will be memory mapped; however, it might bethere are some improvements which could be made for larger (non memorymappable) files.


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Buffer size (was Looking for parser for Email (MIME))

Reply via email to