On Mon, 22 Jul 2002, Kimbro Staken wrote:

> 
> On Monday, July 22, 2002, at 10:21  AM, Fernando Padilla wrote:
> 
> >
> > Hey guys.
> >
> > So I have a question.
> >
> > Why BTreeFiler instead of FSFiler?
> >
> > What are the benefits, are there any benchmarks, etc?
> >
> >
> > It's that I was just going over the code and I started to wonder what were
> > the hard factual benefits of BTree over staright FileSystem.  We're
> > basically implementing our own FileSystem.. wondering if that is really
> > useful, or just technically cool....
> 
> It's definitely useful, in fact it's pretty essential. Maybe not so much 
> for file storage, but definitely for indexes which is what BTrees are 
> designed for. Really the file store is an indexed store too, since file 
> lookups are done via key. How this differs from the file system is going 
> to depend on the file system being used, the file system should have its 
> own indexed structure for making file retrievals faster.
> 
> For file storage, I'd be very happy to see some test results for large 
> numbers of files that show the file system is more efficient then using a 
> BTree. However, I suspect those results will vary across platforms and 
> file system types. Most file systems that I'm aware of are generally 
> considered poor choices for storing very large numbers of files in a 
> single directory. Especially if those files are small. I believe this is 
> why things like ReiserFS were developed.
> 
> If you're feeling motivated, please run some tests and report on the 
> results. I'd certainly like to have empirical data to support or disprove 
> the approach. My belief is BTree will win out in the end, probably by a 
> large margin, but I don't have any data to support that. Anyway, that code 
> is essential for indexes, though we could certainly use a BTree 
> implementation from someone else if a truly robust Java impl exists. I'd 
> actually prefer to do it that way.

hmmm.  Yeah.  It's just sad that it's so complicated, and one huge file.  
But that has it's own benefits, like we only have one file descriptor open 
per collection.  Much better than one per document.

Well, I'll try to look into other DB persistance implementations for 
inspiration.

Fernando


Reply via email to