This is a really long email, I realised, so I'll section it.

Sorry about it being off topic. My ElfData plugin does extensively use Unicode, though. You can see for yourself, at www.elfdata.com/plugin/ , and XML is Unicode also. I just thought that seeing as IBM people are here, they are usually good in large scale solutions...


__aim summary__


I'm thinking about formalising the RAM management in my ElfData (string processing) plugin, to let it handle processing, and writing to files larger than is kept in the RAM. Lets say I want to process a 2GB file, but I want to do it with only 1MB allocated in RAM. That kind of thing.


__answers please!__


My question, is does anyone know information about this kind of domain, on the internet? Or has anyone dealt with this, before?

We all know programs, on OS9, that can handle huge amounts of data, with little RAM. They do this by only keeping a small piece of it in the RAM, at a time.


__What should my plugin do, why, and what is "it's place"__


Now, I've pretty much come to the conclusion that it's not my plugin's place to do the file system interaction, or RAM management. I just can't make it do enough for everyone, or even myself. It's usually better to let some code be good at one thing, instead of trying to be the "everything to everyone".

However, I do think it's my plugin's place to offer hooks, to let the RB code do the RAM management.

Currently, I have a kind of crude setup, for managing "families", which is a consequence of splitting strings by reference (instead of splitting by copy which if what RB does).

I'm wondering if anyone has any kind of advice here? Or even some thoughts about how it should be done?


__Heres some design requirements:__


* Would like to make my XML Engine capable of processing gigabytes of XML, with only a few MB of RAM allocated.
* Would like to make my XML Editor do the same, except that it also will need to display a tree structure on it's editor.
* Obviously, my ElfData string processing plugin will need some improvements to allow this.
* I don't want to slow down my ElfData plugin noticeably, if possible. Or make it over complex.


This has been done before! I am sure that programs like gcc, and CodeWarrior, compile files larger than can be stored in the RAM, right??? Or am I wrong?


__Speculative solutions__


Heres my thoughts on the issue, some tentative design speculations. Probably incomplete speculations that will need refinement.

So, basically, it's about resources. Resources need to be formalised. Instead of just having some loosely defined system of allocating data, and disposing it when there are no references, I should formalise my access to resources. Currently, it's hard to get an idea of what is allocated where. One byte reference to an ElfData object, can stop a multi megabyte block of RAM being deallocated.

Lets say I have a resource, thats a 2gb XML file. I want to do stuff to this XML file.

1) Parse, and validate well-formedness
2) validate via DTD or schema
3) Manipulate, and view graphically
4) Edit as a text file
5) Save back to the hard disk

Currently, I have two options. RB's way (split by copy), and my way (split by reference). The ElfData plugin can split by copy, also... I just want to avoid this.


__Managing RAM storage and disposal__


Heres a hypothetical design flaw. I don't know of a real design flaw like this, because if I did, I'd fix it, but I want my app to even manage cases where I do miss a piece here and there...

Now, lets say I store something from a file, somewhere in my app. I'll say my design flaw was copying to an internal clipboard. So now, if the user closes the file, the 2GB of data will still be allocated, because of one reference to a tiny part of it.

I'm thinking, that the best way to deal with that, is by making some kind of "ElfDataResource" class. This one, would manage splitting, and saving references to ElfData objects. So basically, each time I split, I append a reference to that object in my ElfDataResource (EDR) class. So, when my ElfData is disposed, it should tell the EDR class to remove it's reference. Doing that fast enough to not interferre with splitting, already is a bit of a technical problem.

When the EDR class is told to close the resource, it should update all the ElfData objects, that haven't been disposed still, to contain a copy of the data, instead of the original. The EDR class should also give me statistics on RAM management. That can really help.

Thats what I mean by "formalising" my RAM usage.

But then that still doesn't deal with how to process only a tiny part of my data in the RAM, and keep most on disk....


__Speculative: My own application's/libary's practical application__


I'm not sure how best to go about this. My guess is this is more of a "database" kind of thing. IE, fast access to a disk. Lets say I wanted to use something like Valentina (rumoured to be very fast), to do most of my work, so that I don't have to re-write the wheel. That would only be for stuff like storing my graphical editor's interface, mind, not the text. Another idea might be to have a different kind of paradigm for browsing my XML. Maybe something more like the filesystem? The Mac's file system doesn't need to contain a whole hard disk in RAM to be able to let me navigate it's files, so maybe my XML could take a similar approach?? It's an idea....

For reading the data in from the hard disk, without putting the whole file in RAM, that's another problem. I'm not sure how to do it really.... I think with XML, it's not so big a problem. Everything is designed with tags. One tag itself is almost ALWAYS small (never seen a 1kb tag even). The bits inbetween tags, may be very large, i'm sure there could be 1mb or more text inside a tag. elements may contain more elements, but that's not a problem. So I guess I could break down my problem along those lines. It shouldn't be too hard.

Another requirement is how to store my XML objects... Should I put the data back onto a Valentina-like data base? Or read them directly from the file??


__Is it worth it!!__


At some point I have to think, is it really worth it? Is being fast, and having gigabyte XML processing, mutually compatible tasks... Who is my target? What audiences are there and which best to choose from?

Perhaps people who want to edit gigabyte XML files, are dreaming? Validating of multi-gigabyte files, can be done, as a separate code-base, that's no problem. But what if there is an error? Do I expect users to edit their gigabyte XML files with a text editor, or using my graphical editor?

Maybe I should make a separate "large file mode"? So that way, I can concentrate on the task at hand, and not try to make one thing be two things? I can refactor my existing code to handle parsing, without too much of a problem at all. I'd just need to write another editing mode, thats all. Or maybe just suggest they use a different text editor... I'm not sure really about trying to write a text editor that can handle gigabyte files!


__Just thinking out loud__


Once again, this is really just me thinking aloud. Even if no one answers, already writing this, in the aim for people to understand, this helps me a lot get my thoughts straight!

--
    Theodore H. Smith - Macintosh Consultant / Contractor.
    My website: <www.elfdata.com/>




Reply via email to