Re: How best to handle a reasonable amount to data (25TB+)

2012-02-05 Thread ppp c
it sounds not an issue of lucene but the logic of your app. if you're afraid too many docs in one index you can make multiple indexes. And then search across them, then merge, then over. On Mon, Feb 6, 2012 at 10:50 AM, Peter Miller < peter.mil...@objectconsulting.com.au> wrote: > Hi, > > I have

Re: Modify Field.Index.NO to Field.Index.NOT_ANALYZED

2011-11-10 Thread ppp c
terrible. you have made a big mistake, since you in fact made the primary key unsearchable. There is no any other method, since deleteDocument, updateDocument both need Term to be searchable. The only way is during the traversal of all the docs and finding the matched field and delete it. On Fri,

Re: reusing the term-frequency count while indexing

2011-10-23 Thread ppp c
Of curse, it can be reused. But from my point of view, it's meaningless, since the analysis process has to be performed to collect such as prox, offset, or syno, payload and so on. On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee wrote: > I already have the term-frequency-count for all the t

Re: Custom Similarity

2011-10-08 Thread ppp c
That's what phaseQuery does. Try phaseQuery to match the overlap, i think On Sat, Oct 8, 2011 at 3:37 PM, Joel Halbert wrote: > Hi, > > Does anyone have a modified scoring (Similarity) function they would > care to share? > > I'm searching web page documents and find the default Similarity seems