Re: Looking for parser for Email (MIME)

Mark Waddingham Tue, 22 Mar 2016 06:19:55 -0700

On 2016-03-22 12:45, Roland Huettmann wrote:

How to know how much we can read into memory? Is there any function toknow
this? Is there a size limit for variables?

LiveCode has a limit of 2Gb characters for strings but that depends onhow much memory a single process can have on your system.

On 32-bit systems, you're generally limited to 768Mb-1Gb contiguousblock of memory (32-bit Windows has an address space of 3Gb for a userprocess which also has to include all mapped resources such asexecutables and shared libraries; Mac has a user process address spaceof 4Gb which also has to include all mapped resources so you cangenerally get up to around 1.5Gb contiguous allocated memory block).

On 64-bit systems then you should be able to many 2Gb strings (orsimilar in LiveCode), although obviously how fast they will operate willdepend on the amount of physical ram in the machine - disk paged virtualmemory taking up the slack).

It is not possible to read backwards - which could be a nice wayreading afile in some special cases. So "read from file fName at eof until-1000"
does not work.

Well, reading backwards in that way is equivalent to knowing how longthe file is:


   read ... at -1000 until EOF

is the same as

   read ... at (fileSize - 1000) until EOF

So, the only way reading very large file is reading a chunk of data ofnbytes (whatever is allowed in memory), processing this, and thenreadingthe next chunk until the remaining part of the file is small enough tobe
read until eof.

For such a large file (38gb) your only solution is to read and parse itin chunks. MBOX files are a sequence of records, so you need to use aprocess which reads in blocks from the file when there is not enoughdata left to find the current record boundary - that way you only loadinto memory (at any one time) enough of the file to process completelythe next record.

In terms of finding the size of a file in LiveCode you can use 'thedetailed files'.

It is worth pointing out that using 'open file' and 'read from file' are*stream* based in approach. From memory, the MBOX format is essentiallyline-based, so you should be able to write a relatively simple parsingloop with that in mind:


open file ...
repeat forever
  read from file ... until return
  if the result is not empty then
    exit repeat
  end if
  if *it is a new message boundary* then
    ... finish processing current message ...
    ... start processing new boundary ...
  else
    ... append line to current message ...
  end if
end repeat

Of course, one thing to bear in mind, is that with a 38Gb file you arenever going to fit all of that into memory; so the best approach wouldprobably be to parse your mail messages and then store them into astorage scheme which doesn't require everything to appear in memory atonce - e.g. an sqlite db or a more traditional dbms, or even lots ofdiscrete files in a filesystem in some suitable hierarchy.


Warmest Regards,

Mark.

--
Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Looking for parser for Email (MIME)

Reply via email to