On 2016-03-22 12:45, Roland Huettmann wrote:
How to know how much we can read into memory? Is there any function to know
this? Is there a size limit for variables?

LiveCode has a limit of 2Gb characters for strings but that depends on how much memory a single process can have on your system.

On 32-bit systems, you're generally limited to 768Mb-1Gb contiguous block of memory (32-bit Windows has an address space of 3Gb for a user process which also has to include all mapped resources such as executables and shared libraries; Mac has a user process address space of 4Gb which also has to include all mapped resources so you can generally get up to around 1.5Gb contiguous allocated memory block).

On 64-bit systems then you should be able to many 2Gb strings (or similar in LiveCode), although obviously how fast they will operate will depend on the amount of physical ram in the machine - disk paged virtual memory taking up the slack).

It is not possible to read backwards - which could be a nice way reading a file in some special cases. So "read from file fName at eof until -1000"
does not work.

Well, reading backwards in that way is equivalent to knowing how long the file is:

   read ... at -1000 until EOF

is the same as

   read ... at (fileSize - 1000) until EOF

So, the only way reading very large file is reading a chunk of data of n bytes (whatever is allowed in memory), processing this, and then reading the next chunk until the remaining part of the file is small enough to be
read until eof.

For such a large file (38gb) your only solution is to read and parse it in chunks. MBOX files are a sequence of records, so you need to use a process which reads in blocks from the file when there is not enough data left to find the current record boundary - that way you only load into memory (at any one time) enough of the file to process completely the next record.

In terms of finding the size of a file in LiveCode you can use 'the detailed files'.

It is worth pointing out that using 'open file' and 'read from file' are *stream* based in approach. From memory, the MBOX format is essentially line-based, so you should be able to write a relatively simple parsing loop with that in mind:

open file ...
repeat forever
  read from file ... until return
  if the result is not empty then
    exit repeat
  end if
  if *it is a new message boundary* then
    ... finish processing current message ...
    ... start processing new boundary ...
  else
    ... append line to current message ...
  end if
end repeat

Of course, one thing to bear in mind, is that with a 38Gb file you are never going to fit all of that into memory; so the best approach would probably be to parse your mail messages and then store them into a storage scheme which doesn't require everything to appear in memory at once - e.g. an sqlite db or a more traditional dbms, or even lots of discrete files in a filesystem in some suitable hierarchy.

Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to