RE: Binary Files

Matt Liotta 28 Jun 2002 16:27:10 -0000

I am curious how having different types of collections (binary and xml)
will affect XPath queries since the XPath can start at the root level.


-Matt
 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 28, 2002 9:27 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Binary Files
> 
> 
> +1
> 
> I too will need to store binaries in the not so distant future.  I
also
> agree with the conclusions Tom has drawn.
> 
> Tom, would it be possible for this solution to allow some form of an
> envelope for header type information and possibly a (proprietary)
system
> id
> (not collection-document id)?  This would be incredibly helpful.
Maybe a
> 'BinaryCollectionFacade' actually has two (hidden) collections, one
> XmlCollection, one BinaryCollection.  The xml collection has the
envelope
> with searchable fields and references the documentId of the binary in
the
> binary collection.  The binary is exactly as you described....but we
get
> the best of both worlds without sacrificing any performance.  This way
I
> can look at the info for all binaries without actually having to
retrieve
> them.  Some may be more familiar if I say it's like a 'HEAD' request?
> 
> Kevin Ross
> www.bredex.com
> 
> 
> 
> 
>                     Tom Bradford
>                     <[EMAIL PROTECTED]>       To:     xindice-
> [EMAIL PROTECTED]
>                                          cc:
>                     06/28/2002           Subject:     Re: Binary Files
>                     10:37 AM
>                     Please respond
>                     to xindice-dev
> 
> 
> 
> 
> 
> 
> On Thursday, June 27, 2002, at 02:51  PM, Francesco Bellomi wrote:
> > I would rather see binary as lower level than xml, not the other way
> > round. Xml itself need to be ultimetely encoded in some binary form
> > (such as UTF-8) to be written on the file, whereas encoding binary
as
> > (base64) xml is a less-than-optimal solution for both space and
time.
> 
> Ok... so here is the issue with binary resources in Xindice, and
> hopefully this will allow people to think of it from an implementation
> perspective rather than a knee-jerk 'we need this' point of view.
> 
> Support for binary resources would *not* be as easy as everyone says
it
> would be for one very simple reason.  When you mix and match tokenized
> document streams (which is how documents are represented inside of
> Xindice, and not as text) and binary streams in a single collection,
you
> open up the possibility for major data corruption when people start
> reading/writing the binary image of XML documents directly
(accidentally
> or not) or when you try to read/overwrite a binary resource as if it
> were a document.
> 
> There are two solutions to this, the first is to have a special
> signature at the beginning of *every* tokenized stream that identifies
> it as such so that the collection manager can check individual streams
> to determine exactly what they are.  This will lessen the possibility
> for data corruption, though not eliminate it completely.  This is an
> expensive operation and would also require changing the tokenized
format.
> 
> The other option is to task a collection as either 'binary' or 'xml'.
I
> am not opposed to saying that a collection can be a 'binary
collection'
> or an 'xml collection', but not both.  My vote would be +1 for this
> option.  It wouldn't require changes to the tokenized format, and is a
> solution that I think everyone can live with.
> 
> > By the way, I do currently use base64 for embedding binary in my
> > XIndice database, but it's not a good solution.
> 
> FWIW, I agree with this.
> 
> --
> Tom Bradford - http://www.tbradford.org
> Architect - XQRL (XQuery Engine) - http://www.xqrl.com
> Apache Xindice (XML Database) - http://xml.apache.org/xindice
> Labrador (Web Services Hub) - http://www.notdotnet.org/labrador
> 
>

RE: Binary Files

Reply via email to