I am curious how having different types of collections (binary and xml) will affect XPath queries since the XPath can start at the root level.
-Matt > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Friday, June 28, 2002 9:27 AM > To: [EMAIL PROTECTED] > Subject: Re: Binary Files > > > +1 > > I too will need to store binaries in the not so distant future. I also > agree with the conclusions Tom has drawn. > > Tom, would it be possible for this solution to allow some form of an > envelope for header type information and possibly a (proprietary) system > id > (not collection-document id)? This would be incredibly helpful. Maybe a > 'BinaryCollectionFacade' actually has two (hidden) collections, one > XmlCollection, one BinaryCollection. The xml collection has the envelope > with searchable fields and references the documentId of the binary in the > binary collection. The binary is exactly as you described....but we get > the best of both worlds without sacrificing any performance. This way I > can look at the info for all binaries without actually having to retrieve > them. Some may be more familiar if I say it's like a 'HEAD' request? > > Kevin Ross > www.bredex.com > > > > > Tom Bradford > <[EMAIL PROTECTED]> To: xindice- > [EMAIL PROTECTED] > cc: > 06/28/2002 Subject: Re: Binary Files > 10:37 AM > Please respond > to xindice-dev > > > > > > > On Thursday, June 27, 2002, at 02:51 PM, Francesco Bellomi wrote: > > I would rather see binary as lower level than xml, not the other way > > round. Xml itself need to be ultimetely encoded in some binary form > > (such as UTF-8) to be written on the file, whereas encoding binary as > > (base64) xml is a less-than-optimal solution for both space and time. > > Ok... so here is the issue with binary resources in Xindice, and > hopefully this will allow people to think of it from an implementation > perspective rather than a knee-jerk 'we need this' point of view. > > Support for binary resources would *not* be as easy as everyone says it > would be for one very simple reason. When you mix and match tokenized > document streams (which is how documents are represented inside of > Xindice, and not as text) and binary streams in a single collection, you > open up the possibility for major data corruption when people start > reading/writing the binary image of XML documents directly (accidentally > or not) or when you try to read/overwrite a binary resource as if it > were a document. > > There are two solutions to this, the first is to have a special > signature at the beginning of *every* tokenized stream that identifies > it as such so that the collection manager can check individual streams > to determine exactly what they are. This will lessen the possibility > for data corruption, though not eliminate it completely. This is an > expensive operation and would also require changing the tokenized format. > > The other option is to task a collection as either 'binary' or 'xml'. I > am not opposed to saying that a collection can be a 'binary collection' > or an 'xml collection', but not both. My vote would be +1 for this > option. It wouldn't require changes to the tokenized format, and is a > solution that I think everyone can live with. > > > By the way, I do currently use base64 for embedding binary in my > > XIndice database, but it's not a good solution. > > FWIW, I agree with this. > > -- > Tom Bradford - http://www.tbradford.org > Architect - XQRL (XQuery Engine) - http://www.xqrl.com > Apache Xindice (XML Database) - http://xml.apache.org/xindice > Labrador (Web Services Hub) - http://www.notdotnet.org/labrador > >
