@lucene.apache.org
> Sent: Fri, 22 October, 2010 15:26:31
> Subject: Re: Polymorphic Index
>
>
> On Oct 21, 2010, at 3:44 PM, eks dev wrote:
>
> > Hi All,
> > I am trying to figure out a way to implement following use case with
> > lucene/solr.
> >
> From: Toke Eskildsen
> To: "dev@lucene.apache.org"
> Sent: Fri, 22 October, 2010 14:27:45
> Subject: Re: Polymorphic Index
>
> On Fri, 2010-10-22 at 11:23 +0200, eks dev wrote:
> > Both of these solutions are just better way to do it wrong :) The rea
On Oct 21, 2010, at 3:44 PM, eks dev wrote:
> Hi All,
> I am trying to figure out a way to implement following use case with
> lucene/solr.
>
>
> In order to support simple incremental updates (master) I need to index and
> store UID Field on 300Mio collection. (My UID is a 32 byte sequen
On Fri, 2010-10-22 at 11:23 +0200, eks dev wrote:
> Both of these solutions are just better way to do it wrong :) The real
> solution
> is definitely somewhere around ParallelReader usage.
The problem with parallel is with updates of documents. The IndexWriter
takes terms and queries for deleti
ucene.apache.org"
> Sent: Fri, 22 October, 2010 0:32:04
> Subject: RE: Polymorphic Index
>
> From: Mark Harwood [markharw...@yahoo.co.uk]
> > Good point, Toke. Forgot about that. Of course doubling the number
> > of hash algos used to 4 increases the space massively.
&g
From: Mark Harwood [markharw...@yahoo.co.uk]
> Good point, Toke. Forgot about that. Of course doubling the number
> of hash algos used to 4 increases the space massively.
Maybe your hashing-idea could work even with collisions?
Using your original two-hash suggestion, we're just about sure to get
Good point, Toke. Forgot about that. Of course doubling the number of hash
algos used to 4 increases the space massively.
On 21 Oct 2010, at 22:51, Toke Eskildsen wrote:
> Mark Harwood [markharw...@yahoo.co.uk]:
>> Given a large range of IDs (eg your 300 million) you could constrain
>> the num
Mark Harwood [markharw...@yahoo.co.uk]:
> Given a large range of IDs (eg your 300 million) you could constrain
> the number of unique terms using a double-hashing technique e.g.
> Pick a number "n" for the max number of unique terms you'll tolerate
> e.g. 1 million and store 2 terms for every prima
How about splitting the 32 byte field into for example 16 subfields of 2 bytes
each?
Then any direct query on that field needs to be transformed into a boolean
query requiring all 16 subfield terms.
Would that work?
Regards,
Paul Elschot
Op donderdag 21 oktober 2010 21:44:34 schreef eks dev:
>
Perhaps another way of thinking about the problem:
Given a large range of IDs (eg your 300 million) you could constrain the number
of unique terms using a double-hashing technique e.g.
Pick a number "n" for the max number of unique terms you'll tolerate e.g. 1
million and store 2 terms for every
Hi All,
I am trying to figure out a way to implement following use case with
lucene/solr.
In order to support simple incremental updates (master) I need to index and
store UID Field on 300Mio collection. (My UID is a 32 byte sequence). But I do
not need indexed (only stored) it during norm
11 matches
Mail list logo