On 2009-Aug-14, at 4:34 am, David Green wrote:
There's a lot of scope for a nice, friendly, high-level IO view; perhaps we need an IO-working group to assemble knowledge about what is and isn't possible with different filesystems and design an interface around it all.


It won't be possible to unify every different FS function, but it can keep the basics (like "open") as consistent as possible across platforms, and provide idiomatic Perl expressions so everything at least tries to feel "perlish". Some things may simply be impossible to manage transparently (like filenames with no identifiable encoding), but even having an organised way to identify the possible problems will be helpful. (A lot of similar issues apply to databases, despite which -- or because of which -- it's so nice to have the DBI project.)

All the low-level features have to be there, of course, to open files and seek through them, etc., etc., but Perl is not a low-level language; we have (in P5) the magical <>, but the "easy" way to deal with files should always apply (that is, be available anywhere you don't absolutely need to manipulate low-level details).

The idea is to treat files as data, because that's what we're really interested in: their contents, not all their surrounding metadata. If you're writing a file-manager, then you need lots of FS-specific detail; in fact, you probably don't care about the contents of files but only their metadata. However, it's more usual to care only about the contents and use a bit of metadata (like the name) merely as a way to get the contents. The high-level interface shouldn't be concerned with file "handles" or pointers, but simply with "files", and not even as concerned with files as with *data*.


In other words, the focus should be on a data-structure, not on the file itself. A text file is really just a way to store a string; an XML file is a way to encode and store a structured tree object.

    my Str $poem = io file://jabberwocky.txt;
    my Image::JPEG $camel = io file://dromedary.jpg;
    my XHTML::Doc $x is readonly = io http://foo.org/a%20doc;

Then you simply work with your string, or tree, or image-object. An XHTML::Doc object, for instance, would really contain nodes, but it would know how to encode or decode itself as a series of plain-text tags for reading or writing to disk. The old-fashioned way to handle this is to open the file, get a filehandle, read the filehandle into a string, take the string[s] and XHTML::Doc->parse() them. The new way would be to have the parse(Str) method be how XHTML::Doc objects do the IO::input() role, and the IO class takes care of the opening and reading behind the scenes.

Anything could mix in the basic IO role so that an object can load or save itself. Different data structures would override the necessary methods to format the data the right way. The type of IO object would determine how metadata works (i.e. whether it's a file or an HTTP stream or a directory, etc.).


Objects would handle other functionality in whatever ways make sense for that object. For example, iterating a plain-text file would split it into lines. (This suggests that maybe ordinary Strings could split into lines in @-context? Hm.) A Spreadsheet::OpenOffice object might instead provide a list of worksheets when acting like an array (or hash, with sheet names as the keys). A raw binary file of uninterpreted data would just be a simple Blob object.

Assuming that there existed a consistent definition of Table objects that could be coerced one to another, we could do things like:

    my Table::SQL $input is readonly := io dbi:Pg:employees/active;
    my Table::Spreadsheet::Excel $output := io file://Active-Emp.xls;

    $output<Sheet #1> = $input;    # read from input, write to output
    say "Data from $input.io.name() saved to $output.io.name()";


Now of course there are lots of things that such a broad interface isn't prepared to handle, but that's what the lower levels are for. If .io provides access to the intermediate interface, then you can start with course-grained control, and delve into the particulars if or when you need finer control (access to OS- or FS-specific calls). Above that, there is room for a very high-level IO functionality that glosses over a lot of details, because an awful lot of code just doesn't need all the specifics.


-David

Reply via email to