Re: [whatwg] Archive API - proposal
On Wed, Aug 15, 2012 at 9:38 PM, Glenn Maynard gl...@zewt.org wrote: On Wed, Aug 15, 2012 at 10:10 PM, Jonas Sicking jo...@sicking.cc wrote: Though I still think that we should support reading out specific files using a filename as a key. I think a common use-case for ArchiveReader is going to be web developers wanting to download a set of resources from their own website and wanting to use a .zip file as a way to get compression and packaging. In that case they can easily either ensure to stick with ASCII filenames, or encode the names in UTF8. That's what this was for: // For convenience, add getter File? (DOMString name) to FileList, to find a file by name. This is equivalent // to iterating through files[] and comparing .name. If no match is found, return null. This could be a function // instead of a getter. var example_file2 = zipFile.files[file.txt]; if(example_file2 == null) { console.error(file.txt not found in ZIP; return; } I suppose a named getter isn't a great idea--you might have a filename length--so a zipFile.files.find('file.txt') function is probably better. I definitely wouldn't want to use a getter. That runs into all sorts of problems and the syntactical wins are pretty small. One way we could support this would be to have a method which allows getting a list of meta-data about each entry. Probably together with the File object itself. So we could return an array of objects like: [ { rawName: UInt8Array, file: File object, crc32: UInt8Array }, { rawName: UInt8Array, file: File object, crc32: UInt8Array }, ... ] That way we can also leave out the crc from archive types that doesn't support it. This means exposing two objects per file. I'd prefer a single File-subclass object per file, with any extra metadata put on the subclass. First of all, we're be talking about 5 vs. 6 objects per file entry: two ArrayBuffers, two ArrayBufferViews, one File and potentially one JS-object. Actually, in Gecko it's more like 8 vs. 9 objects once you start counting the C++ objects and their JS-wrappers. Second, at least in the Gecko engine, allocating the first 5 objects take about three orders of magnitude more time than allocating the JS-object. I'm also not a fan of sticking the crc32 on the File object itself since we don't actually know that that's the correct crc32 value. But I like this approach a lot of we can make it work. The main thing I'd be worried about, apart from the IO performance above, is if we can make it work for a larger set of archive formats. Like, can we make it work for .tar and .tar.gz? I think we couldn't but we would need to verify. It wouldn't handle it very well, but the original API wouldn't, either. In both cases, the only way to find filenames in a TAR--whether it's to search for one or to construct a list--is to scan through the whole file (and decompress it all, for .tgz). Simply retrieving a list of filenames from a large .tgz would thrash the user's disk and chew CPU. I don't think there's much use in supporting .tar, anyway. Even if you want true streaming (which would be a different API anyway, since we're reading from a Blob here), ZIP can do that too, by using the local file headers instead of the central directory. The main argument that I could see is that the initial proposal allowed extracting files from a .tar.gz while only extracting up to the point of finding the file-to-be-extracted. As long as .getFileNames wasn't called. Which I'll grant isn't a huge benefit. / Jonas
Re: [whatwg] Archive API - proposal
On Thu, Aug 16, 2012 at 1:22 AM, Jonas Sicking jo...@sicking.cc wrote: First of all, we're be talking about 5 vs. 6 objects per file entry: two ArrayBuffers, two ArrayBufferViews, one File and potentially one JS-object. Actually, in Gecko it's more like 8 vs. 9 objects once you start counting the C++ objects and their JS-wrappers. That's not what I meant. It looked like you meant passing two arrays to onsuccess, one with metadata and one with Files, so the user would have to reassociate them. Rereading I see that's not what you meant. That said, these can be methods, so the ArrayBuffers aren't allocated unless the user wants them, which I expect would be rare: interface ZipFile : File { ArrayBuffer getErrorVerificationCode(); readonly attribute DOMString errorVerificationMethod; // always CRC32 for now ArrayBuffer getRawFilename(); }; (If all we care about is CRC32, then readonly attribute unsigned long expectedCRC32 instead and drop errorVerificationMethod. I'm assuming non-CRC32 is what you had in mind by making CRC32 an ArrayBuffer instead of just an unsigned long.) I'm also not a fan of sticking the crc32 on the File object itself since we don't actually know that that's the correct crc32 value. It's the expected CRC32, not the CRC32, and should have an attribute name to that effect. It definitely doesn't belong on File itself, since it's pretty tightly specific to archive error checking; it should use a subclass. -- Glenn Maynard
Re: [whatwg] Archive API - proposal
On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard gl...@zewt.org wrote: On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini b...@mozilla.com wrote: // The getFilenames handler receives a list of DOMString: var handle = this.reader.getFile(this.result[i]); This interface is problematic. Since ZIP files don't have a standard encoding, filenames in ZIPs are often garbage. This API requires that filenames round-trip uniquely, or else files aren't accessible t all. Indeed, in the case of zip files, file names themselves are dangerous as handles that get past passed back and forth, so it seems like a good idea to be able to extract the contents of a file inside the archive without having to address the file by name. As for the filenames, after an off-list discussion, I think the best solution is that UTF-8 is tried first but the ArchiveReader constructor takes an optional second argument that names a character encoding from the Encoding Standard. This will be known as the fallback encoding. If no fallback encoding is provided by the caller of the constructor, Windows-1252 is set as the fallback encoding. When it ArchiveReader processes a filename from the zip archive, it first tests if the byte string is a valid UTF-8 string. If it is, the byte string is interpreted as UTF-8 when converting to UTF-16. If the filename is not a valid UTF-8 string, it is decoded into UTF-16 using the fallback encoding. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Archive API - proposal
On Wed, Aug 15, 2012 at 7:24 AM, Andrea Marchesini b...@mozilla.com wrote: Thanks for your feedback. When I was implementing the ArchiveAPI, my idea was to have a generic Archive API and not just a ZIP API. Of course the current implementation supports just ZIP but in the future we could have support for more formats. What other sorts of archive formats were you thinking of supporting? Apart from archive-specific features like the CRC32, different formats can be read in different ways. A tarball, for instance, can't be read out-of-order easily. It's literally the files concatenated together with a header before each. The headers tell you the size of each file, so you can seek over the data, but you still have to jump across the entire file sequentially to find a particular file. (Though I suppose you could build a table once when the file's loaded.) A gzipped tarball is even worse since the entire stream is compressed, so you have to decompress it to hop around. Do you know how this compares to a JavaScript library implementation with typed arrays and whatnot? David
Re: [whatwg] Archive API - proposal
On Wed, Aug 15, 2012 at 6:14 AM, Henri Sivonen hsivo...@iki.fi wrote: As for the filenames, after an off-list discussion, I think the best solution is that UTF-8 is tried first but the ArchiveReader constructor takes an optional second argument that names a character encoding from the Encoding Standard. This will be known as the fallback encoding. If no fallback encoding is provided by the caller of the constructor, Windows-1252 is set as the fallback encoding. When it ArchiveReader processes a filename from the zip archive, it first tests if the byte string is a valid UTF-8 string. If it is, the byte string is interpreted as UTF-8 when converting to UTF-16. If the filename is not a valid UTF-8 string, it is decoded into UTF-16 using the fallback encoding. This would misinterpret filenames as UTF-8. For example, 黴雨.jpg in a CP932 (SJIS) ZIP is also legal UTF-8. This would happen even though the user explicitly specified an encoding, and even though UTF-8 is exceptionally rare in ZIPs (all Windows ZIP software outputs filenames in the user's ACP, and many don't support UTF-8 at all). On Wed, Aug 15, 2012 at 6:17 AM, Andrea Marchesini amarches...@mozilla.comwrote: I agree. I was thinking that the default encoding for filenames is: UTF-8. If filename is not a valid UTF-8 string we can use the caller-supplied encoding: I hate to argue against defaulting to UTF-8, but very few ZIPs are actually UTF-8. CP1252 as a default will at least often be correct, but UTF-8 will almost never be. (The only straightforward way I know to create a ZIP with UTF-8 filenames is with a *nix commandline client, and most Windows software won't understand it.) var reader = new ArchiveReader(blob, Windows-1252); If this fails, this filename/file will be excluded from the results. There's no need. Decode with proper error handling, as specified in the Encoding spec: http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html. This will give placeholder characters (U+FFFD); even if the whole filename comes out unreadable, the file can still be read, selected from a list, shown in a thumbnail view, and so on. Lots of uses aren't dependant on filenames. It should be possible to get the CRC32 of files, which ZIP stores in the central directory. This both allows the user to perform checksum verification himself if wanted, and all the other variously useful things about being able to get a file's checksum without having to read the whole file. can we have 'generic' archive API supporting CRC32? Do you actually have any concrete plans for other archive formats? The only others commonly used are TAR and RAR. TAR is unsuitable for non-archive use (you have to scan the whole file to construct a file list), and RAR is proprietary. You could design a checksum API that uses the algorithm for a particular format, but that's severe overdesign if it never supports anything but ZIP. I wouldn't worry about this. -- Glenn Maynard
Re: [whatwg] Archive API - proposal
On Tue, Aug 14, 2012 at 1:20 PM, Glenn Maynard gl...@zewt.org wrote: (I've reordered my responses to give a more logical progression.) On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini b...@mozilla.com wrote: // The getFilenames handler receives a list of DOMString: var handle = this.reader.getFile(this.result[i]); This interface is problematic. Since ZIP files don't have a standard encoding, filenames in ZIPs are often garbage. This API requires that filenames round-trip uniquely, or else files aren't accessible t all. For example, if you have two filenames in CP932, 日 and 本, but the encoding isn't determined correctly, you may end up with two files both with a filename of ??. Either you can't open either file, or you can only open one of them. This isn't theoretical; I hit ZIP files like this in the wild regularly. Instead, I'd recommend that the primary API simply returns File objects directly from the ZIP. For example: var reader = archive.getFiles(); reader.onsuccess = function(result) { // result = [File, File, File, File...]; console.log(result[0].name); // read the file new FileReader(result[0]); } This allows opening files without any dependency on the filename. Since File objects are by design lightweight--no decompression should happen until you actually read from the file--this isn't expensive and won't perform any extra I/O. All the information you need to expose a File object is in the central directory (filename, mtime, decompressed size). This is a good idea. It neatly solves the problem of not having to rely on filenames as keys. Though I still think that we should support reading out specific files using a filename as a key. I think a common use-case for ArchiveReader is going to be web developers wanting to download a set of resources from their own website and wanting to use a .zip file as a way to get compression and packaging. In that case they can easily either ensure to stick with ASCII filenames, or encode the names in UTF8. By allowing them to download a .zip file, they can also store that .zip in compressed form in IndexedDB or the FileSystem API in order to use less space on the user's device. (Additionally many times IO gets faster by using .zip files because the time saved in doing less IO is larger than the time spent decompressing. Obviously very dependent on what data is being stored). . Do you think it can be useful? . Do you see any limitation, any feature missing? It should be possible to get the CRC32 of files, which ZIP stores in the central directory. This both allows the user to perform checksum verification himself if wanted, and all the other variously useful things about being able to get a file's checksum without having to read the whole file. One way we could support this would be to have a method which allows getting a list of meta-data about each entry. Probably together with the File object itself. So we could return an array of objects like: [ { rawName: UInt8Array, file: File object, crc32: UInt8Array }, { rawName: UInt8Array, file: File object, crc32: UInt8Array }, ... ] That way we can also leave out the crc from archive types that doesn't support it. Though I'm not convinced that CRCs are important enough that we need to put it in the first iteration of the API. (I don't think CRC32 checks should be performed automatically, since it's too hard for that to make sense when random access is involved.) I agree with this. // The ArchiveReader object works with Blob objects: var archiveReader = new ArchiveReader(file); // Any request is asynchronous: The only operation that needs to be asynchronous is creating the ArchiveReader itself. It should parse the ZIP central record before before returning a result. Once you've done that you can do the rest synchronously, because no further I/O is necessary until you actually read data from a file. This is definitely an interesting idea. The current API is designed around doing the IO when each individual operation is done. You are proposing to do all IO up front which allows all operations to be synchronous. I suspect that doing the IO lazily can provide better performance for some types of operations, such as only wanting to extract a single resource from an archive. But maybe the difference wouldn't be that big in most cases. But I like this approach a lot of we can make it work. The main thing I'd be worried about, apart from the IO performance above, is if we can make it work for a larger set of archive formats. Like, can we make it work for .tar and .tar.gz? I think we couldn't but we would need to verify. / Jonas
Re: [whatwg] Archive API - proposal
On Wed, Aug 15, 2012 at 10:10 PM, Jonas Sicking jo...@sicking.cc wrote: Though I still think that we should support reading out specific files using a filename as a key. I think a common use-case for ArchiveReader is going to be web developers wanting to download a set of resources from their own website and wanting to use a .zip file as a way to get compression and packaging. In that case they can easily either ensure to stick with ASCII filenames, or encode the names in UTF8. That's what this was for: // For convenience, add getter File? (DOMString name) to FileList, to find a file by name. This is equivalent // to iterating through files[] and comparing .name. If no match is found, return null. This could be a function // instead of a getter. var example_file2 = zipFile.files[file.txt]; if(example_file2 == null) { console.error(file.txt not found in ZIP; return; } I suppose a named getter isn't a great idea--you might have a filename length--so a zipFile.files.find('file.txt') function is probably better. By allowing them to download a .zip file, they can also store that .zip in compressed form in IndexedDB or the FileSystem API in order to use less space on the user's device. (Additionally many times IO gets faster by using .zip files because the time saved in doing less IO is larger than the time spent decompressing. Obviously very dependent on what data is being stored). There's also the question of when decompression happens--you don't want to decompress the whole thing in advance if you can avoid it, since if the user isn't doing random access you can stream the decompression--but that's just QoI, of course. One way we could support this would be to have a method which allows getting a list of meta-data about each entry. Probably together with the File object itself. So we could return an array of objects like: [ { rawName: UInt8Array, file: File object, crc32: UInt8Array }, { rawName: UInt8Array, file: File object, crc32: UInt8Array }, ... ] That way we can also leave out the crc from archive types that doesn't support it. This means exposing two objects per file. I'd prefer a single File-subclass object per file, with any extra metadata put on the subclass. This is definitely an interesting idea. The current API is designed around doing the IO when each individual operation is done. You are proposing to do all IO up front which allows all operations to be synchronous. I suspect that doing the IO lazily can provide better performance for some types of operations, such as only wanting to extract a single resource from an archive. But maybe the difference wouldn't be that big in most cases. I'd expect the I/O savings to be negligible, since ZIP has a central directory at the end, allowing the whole thing to be read very quickly. I hope creating an array of File objects (even thousands of them) isn't too expensive. Even if it is, though, this could be refactored to still give a synchronous interface: store the file directory natively (in a non-File, non-GC'd way), and allow looking up and iterating that list in a way that only instantiates one File object at a time. (This would lose the FileList API compatibility with input type=file, though, which I think is a nice plus.) But I like this approach a lot of we can make it work. The main thing I'd be worried about, apart from the IO performance above, is if we can make it work for a larger set of archive formats. Like, can we make it work for .tar and .tar.gz? I think we couldn't but we would need to verify. It wouldn't handle it very well, but the original API wouldn't, either. In both cases, the only way to find filenames in a TAR--whether it's to search for one or to construct a list--is to scan through the whole file (and decompress it all, for .tgz). Simply retrieving a list of filenames from a large .tgz would thrash the user's disk and chew CPU. I don't think there's much use in supporting .tar, anyway. Even if you want true streaming (which would be a different API anyway, since we're reading from a Blob here), ZIP can do that too, by using the local file headers instead of the central directory. -- Glenn Maynard
Re: [whatwg] Archive API - proposal
On Tue, Jul 17, 2012 at 7:23 PM, Andrea Marchesini b...@mozilla.com wrote: Hi All, I would like to propose a new javascript/web API that provides the ability to read the content of an archive file through DOMFile objects. I have started to work on this API because it has been requested during some Mozilla Game Meeting by game developers who often use ZIP files as storage system. What I'm describing is a read-only and asynchronous API built on top of FileAPI ( http://dev.w3.org/2006/webapi/FileAPI/ ). Here a draft written in webIDL: interface ArchiveRequest : DOMRequest { // this is the ArchiveReader: readonly attribute nsIDOMArchiveReader reader; } [Constructor(Blob blob)] interface ArchiveReader { // any method is supposed to be asynchronous // The ArchiveRequest.result is an array of strings (the filenames) ArchiveRequest getFilenames(); // The ArchiveRequest.result is a DOMFile (http://dev.w3.org/2006/webapi/FileAPI/#dfn-file) ArchiveRequest getFile(DOMString filename); }; Here an example about how to use it: function startRead() { // Starting from a input type=file id=file /: var file = document.getElementById('file').files[0]; if (file.type != 'application/zip') { alert(This archive format is not supported); return; } // The ArchiveReader object works with Blob objects: var archiveReader = new ArchiveReader(file); // Any request is asynchronous: var handler = archiveReader.getFilenames(); handler.onsuccess = getFilenamesSuccess; handler.onerror = errorHandler; // Multiple requests can run at the same time: var handler2 = archiveReader.getFile(levels/1.txt); handler2.onsuccess = getFileSuccess; handler2.onerror = errorHandler; } // The getFilenames handler receives a list of DOMString: function getFilenamesSuccess() { for (var i = 0; i this.result.length; ++i) { /* this.reader is the ArchiveReader: var handle = this.reader.getFile(this.result[i]); handle.onsuccess = ... */ } } // The GetFile handler receives a File/Blob object (and it can be used with FileReader): function getFileSuccess() { var reader = FileReader(); reader.readAsText(this.result); reader.onload = function(event) { // alert(event.target.result); } } function errorHandler() { // ... } I would like to receive feedback about this.. In particular: . Do you think it can be useful? . Do you see any limitation, any feature missing? FWIW, this API is now available in Firefox nightly builds. It's currently on track to ship in Firefox 17. Feedback would still be greatly appreciated! / Jonas
Re: [whatwg] Archive API - proposal
(I've reordered my responses to give a more logical progression.) On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini b...@mozilla.com wrote: // The getFilenames handler receives a list of DOMString: var handle = this.reader.getFile(this.result[i]); This interface is problematic. Since ZIP files don't have a standard encoding, filenames in ZIPs are often garbage. This API requires that filenames round-trip uniquely, or else files aren't accessible t all. For example, if you have two filenames in CP932, 日 and 本, but the encoding isn't determined correctly, you may end up with two files both with a filename of ??. Either you can't open either file, or you can only open one of them. This isn't theoretical; I hit ZIP files like this in the wild regularly. Instead, I'd recommend that the primary API simply returns File objects directly from the ZIP. For example: var reader = archive.getFiles(); reader.onsuccess = function(result) { // result = [File, File, File, File...]; console.log(result[0].name); // read the file new FileReader(result[0]); } This allows opening files without any dependency on the filename. Since File objects are by design lightweight--no decompression should happen until you actually read from the file--this isn't expensive and won't perform any extra I/O. All the information you need to expose a File object is in the central directory (filename, mtime, decompressed size). I would like to receive feedback about this.. In particular: . Do you think it can be useful? . Do you see any limitation, any feature missing? It should be possible to get the CRC32 of files, which ZIP stores in the central directory. This both allows the user to perform checksum verification himself if wanted, and all the other variously useful things about being able to get a file's checksum without having to read the whole file. (I don't think CRC32 checks should be performed automatically, since it's too hard for that to make sense when random access is involved.) // The ArchiveReader object works with Blob objects: var archiveReader = new ArchiveReader(file); // Any request is asynchronous: The only operation that needs to be asynchronous is creating the ArchiveReader itself. It should parse the ZIP central record before before returning a result. Once you've done that you can do the rest synchronously, because no further I/O is necessary until you actually read data from a file. This gives the following, simpler interface: var opener = new ZipOpener(file); opener.onerror = function() { console.error(Loading failed); } opener.onsuccess = function(zipFile) { // .files is a FileList, representing each file in the archive. if(zipFile.files.length == 0) { console.error(ZIP file is empty); return; } var example_file = zipFile.files[0]; console.log(The first filename is, example_file.name, with an expected CRC of, example_file.expectedCRC); // Read from the file: var reader = new FileReader(example_file); // For convenience, add getter File? (DOMString name) to FileList, to find a file by name. This is equivalent // to iterating through files[] and comparing .name. If no match is found, return null. This could be a function // instead of a getter. var example_file2 = zipFile.files[file.txt]; if(example_file2 == null) { console.error(file.txt not found in ZIP; return; } } (To fit expectedCRC in there, it would actually need to use a subclass of File, not File itself.) This also eliminates an error condition (no getFile error callback), and since .files looks just like HTMLInputElement.files, it can be used directly with code written for it. For example, if you have a function uploadAllFiles(files), you can pass in both an input type=file multiple's .input or a zipFile.files, and they'll both work. -- Glenn Maynard
Re: [whatwg] Archive API - proposal
On Aug 14, 2012, at 21:21, Glenn Maynard gl...@zewt.org wrote: (I've reordered my responses to give a more logical progression.) On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini b...@mozilla.com wrote: // The getFilenames handler receives a list of DOMString: var handle = this.reader.getFile(this.result[i]); This interface is problematic. Since ZIP files don't have a standard encoding, filenames in ZIPs are often garbage. This API requires that filenames round-trip uniquely, or else files aren't accessible t all. For example, if you have two filenames in CP932, 日 and 本, but the encoding isn't determined correctly, you may end up with two files both with a filename of ??. Either you can't open either file, or you can only open one of them. This isn't theoretical; I hit ZIP files like this in the wild regularly. Instead, I'd recommend that the primary API simply returns File objects directly from the ZIP. For example: var reader = archive.getFiles(); reader.onsuccess = function(result) { // result = [File, File, File, File...]; console.log(result[0].name); // read the file new FileReader(result[0]); } This allows opening files without any dependency on the filename. Since File objects are by design lightweight--no decompression should happen until you actually read from the file--this isn't expensive and won't perform any extra I/O. All the information you need to expose a File object is in the central directory (filename, mtime, decompressed size). I would like to receive feedback about this.. In particular: . Do you think it can be useful? . Do you see any limitation, any feature missing? It should be possible to get the CRC32 of files, which ZIP stores in the central directory. This both allows the user to perform checksum verification himself if wanted, and all the other variously useful things about being able to get a file's checksum without having to read the whole file. (I don't think CRC32 checks should be performed automatically, since it's too hard for that to make sense when random access is involved.) // The ArchiveReader object works with Blob objects: var archiveReader = new ArchiveReader(file); // Any request is asynchronous: The only operation that needs to be asynchronous is creating the ArchiveReader itself. It should parse the ZIP central record before before returning a result. Once you've done that you can do the rest synchronously, because no further I/O is necessary until you actually read data from a file. This gives the following, simpler interface: var opener = new ZipOpener(file); opener.onerror = function() { console.error(Loading failed); } opener.onsuccess = function(zipFile) { // .files is a FileList, representing each file in the archive. if(zipFile.files.length == 0) { console.error(ZIP file is empty); return; } var example_file = zipFile.files[0]; console.log(The first filename is, example_file.name, with an expected CRC of, example_file.expectedCRC); // Read from the file: var reader = new FileReader(example_file); // For convenience, add getter File? (DOMString name) to FileList, to find a file by name. This is equivalent // to iterating through files[] and comparing .name. If no match is found, return null. This could be a function // instead of a getter. var example_file2 = zipFile.files[file.txt]; if(example_file2 == null) { console.error(file.txt not found in ZIP; return; } } (To fit expectedCRC in there, it would actually need to use a subclass of File, not File itself.) This also eliminates an error condition (no getFile error callback), and since .files looks just like HTMLInputElement.files, it can be used directly with code written for it. For example, if you have a function uploadAllFiles(files), you can pass in both an input type=file multiple's .input or a zipFile.files, and they'll both work. How are nested directories handled in your counter proposal? --tobie
[whatwg] Archive API - proposal
Hi All, I would like to propose a new javascript/web API that provides the ability to read the content of an archive file through DOMFile objects. I have started to work on this API because it has been requested during some Mozilla Game Meeting by game developers who often use ZIP files as storage system. What I'm describing is a read-only and asynchronous API built on top of FileAPI ( http://dev.w3.org/2006/webapi/FileAPI/ ). Here a draft written in webIDL: interface ArchiveRequest : DOMRequest { // this is the ArchiveReader: readonly attribute nsIDOMArchiveReader reader; } [Constructor(Blob blob)] interface ArchiveReader { // any method is supposed to be asynchronous // The ArchiveRequest.result is an array of strings (the filenames) ArchiveRequest getFilenames(); // The ArchiveRequest.result is a DOMFile (http://dev.w3.org/2006/webapi/FileAPI/#dfn-file) ArchiveRequest getFile(DOMString filename); }; Here an example about how to use it: function startRead() { // Starting from a input type=file id=file /: var file = document.getElementById('file').files[0]; if (file.type != 'application/zip') { alert(This archive format is not supported); return; } // The ArchiveReader object works with Blob objects: var archiveReader = new ArchiveReader(file); // Any request is asynchronous: var handler = archiveReader.getFilenames(); handler.onsuccess = getFilenamesSuccess; handler.onerror = errorHandler; // Multiple requests can run at the same time: var handler2 = archiveReader.getFile(levels/1.txt); handler2.onsuccess = getFileSuccess; handler2.onerror = errorHandler; } // The getFilenames handler receives a list of DOMString: function getFilenamesSuccess() { for (var i = 0; i this.result.length; ++i) { /* this.reader is the ArchiveReader: var handle = this.reader.getFile(this.result[i]); handle.onsuccess = ... */ } } // The GetFile handler receives a File/Blob object (and it can be used with FileReader): function getFileSuccess() { var reader = FileReader(); reader.readAsText(this.result); reader.onload = function(event) { // alert(event.target.result); } } function errorHandler() { // ... } I would like to receive feedback about this.. In particular: . Do you think it can be useful? . Do you see any limitation, any feature missing? Regards, AM