Re: Polymorphic Index

2010-10-22 Thread eks dev
@lucene.apache.org > Sent: Fri, 22 October, 2010 15:26:31 > Subject: Re: Polymorphic Index > > > On Oct 21, 2010, at 3:44 PM, eks dev wrote: > > > Hi All, > > I am trying to figure out a way to implement following use case with > > lucene/solr. > >

Re: Polymorphic Index

2010-10-22 Thread eks dev
> From: Toke Eskildsen > To: "dev@lucene.apache.org" > Sent: Fri, 22 October, 2010 14:27:45 > Subject: Re: Polymorphic Index > > On Fri, 2010-10-22 at 11:23 +0200, eks dev wrote: > > Both of these solutions are just better way to do it wrong :) The rea

Re: Polymorphic Index

2010-10-22 Thread Grant Ingersoll
On Oct 21, 2010, at 3:44 PM, eks dev wrote: > Hi All, > I am trying to figure out a way to implement following use case with > lucene/solr. > > > In order to support simple incremental updates (master) I need to index and > store UID Field on 300Mio collection. (My UID is a 32 byte sequen

Re: Polymorphic Index

2010-10-22 Thread Toke Eskildsen
On Fri, 2010-10-22 at 11:23 +0200, eks dev wrote: > Both of these solutions are just better way to do it wrong :) The real > solution > is definitely somewhere around ParallelReader usage. The problem with parallel is with updates of documents. The IndexWriter takes terms and queries for deleti

Re: Polymorphic Index

2010-10-22 Thread eks dev
ucene.apache.org" > Sent: Fri, 22 October, 2010 0:32:04 > Subject: RE: Polymorphic Index > > From: Mark Harwood [markharw...@yahoo.co.uk] > > Good point, Toke. Forgot about that. Of course doubling the number > > of hash algos used to 4 increases the space massively. &g

RE: Polymorphic Index

2010-10-21 Thread Toke Eskildsen
From: Mark Harwood [markharw...@yahoo.co.uk] > Good point, Toke. Forgot about that. Of course doubling the number > of hash algos used to 4 increases the space massively. Maybe your hashing-idea could work even with collisions? Using your original two-hash suggestion, we're just about sure to get

Re: Polymorphic Index

2010-10-21 Thread Mark Harwood
Good point, Toke. Forgot about that. Of course doubling the number of hash algos used to 4 increases the space massively. On 21 Oct 2010, at 22:51, Toke Eskildsen wrote: > Mark Harwood [markharw...@yahoo.co.uk]: >> Given a large range of IDs (eg your 300 million) you could constrain >> the num

RE: Polymorphic Index

2010-10-21 Thread Toke Eskildsen
Mark Harwood [markharw...@yahoo.co.uk]: > Given a large range of IDs (eg your 300 million) you could constrain > the number of unique terms using a double-hashing technique e.g. > Pick a number "n" for the max number of unique terms you'll tolerate > e.g. 1 million and store 2 terms for every prima

Re: Polymorphic Index

2010-10-21 Thread Paul Elschot
How about splitting the 32 byte field into for example 16 subfields of 2 bytes each? Then any direct query on that field needs to be transformed into a boolean query requiring all 16 subfield terms. Would that work? Regards, Paul Elschot Op donderdag 21 oktober 2010 21:44:34 schreef eks dev: >

Re: Polymorphic Index

2010-10-21 Thread Mark Harwood
Perhaps another way of thinking about the problem: Given a large range of IDs (eg your 300 million) you could constrain the number of unique terms using a double-hashing technique e.g. Pick a number "n" for the max number of unique terms you'll tolerate e.g. 1 million and store 2 terms for every

Polymorphic Index

2010-10-21 Thread eks dev
Hi All, I am trying to figure out a way to implement following use case with lucene/solr. In order to support simple incremental updates (master) I need to index and store UID Field on 300Mio collection. (My UID is a 32 byte sequence). But I do not need indexed (only stored) it during norm