On Mon, Nov 29, 2010 at 1:49 PM, Ian Boston <[email protected]> wrote:
>
> On 29 Nov 2010, at 12:06, Alexander Klimetschek wrote:
>
>> (as a randomly-accessible
>> binary)
>
> One of the reasons the JDBCDirectory is not fast is that most DBs dont 
> support seek on blobs, and anyway, anything that is shared over a network is 
> just too slow, unless a local cached version of the index is made available. 
> I think thats why the Infinispan Directory does work. BTW, iirc you can 
> configure infinispan to page its cache to disk.


Indeed. Lucene needs so many random seeks, that the only (in my view)
efficient way is to have it on local disk. Lucene 4.0 even removes
many internal caches (like FieldCache!!!) and relies completely on
file system caches. This will actually make things like sorting on
tens of millions of titles possible without going OOM.

I didn't look at infinispan yet code wise, but of course they have a
way to flush the memory to disk, or, to database. We might add
flushing to jcr, which would make the lucene segments be flushed into
the repository (as Alexander earlier pointed out)

>
> I tired a number of impls of remote shared Lucene indexes when I was writing 
> the search engine for Sakai 2, all failed. The only solution that worked was 
> one where lucene was allowed to perform seeks on local disk or in memory. 
> (documents were indexed on one node in the cluster (round robin), the 
> indexing nodes ship segments updates, and all nodes search on local indexes 
> but not real time as Jackrabbit is)

Yes. I recently had some talks with Simon Willnauer, one of the very
few Lucene committers that know how the low-level persistence and read
works: Lucene cannot perform other then FS or in memory. The other day
I attended a talk about Lucandra at Atlante Apachecon: Lucene in a
distributed Cassandra ring...they hit performance penalties after
100.000 lucene docs...well, it is just not possible (or I am too
stupid) :-)

Cheers Ard

>
> Ian
>
>
>



-- 
Hippo
Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  [email protected]

Reply via email to