Re: "Catalog" backend for document stored fields?

2006-10-23 Thread Doron Cohen
> I'm indexing logs from a transaction-based application. > ... > millions documents per month, the size of the indices is ~35 gigs per month > (that's the lower bound). I have no choice but to 'store' each field values > (as well as indexing/tokenizing them) because I'll need to retrieve them in

Re: experiences with lingpipe

2006-10-23 Thread Otis Gospodnetic
Hi Martin, - Original Message From: Martin Braun <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, October 23, 2006 4:29:03 AM Subject: experiences with lingpipe hi all, does anybody have practical experiences with Ling Pipes Spellchecker (http://www.alias-i.com/lingpipe

Re: number of term occurrences

2006-10-23 Thread Grant Ingersoll
You can also use Term Vectors, at the cost of extra storage. Search this list for Term Vectors for info on how to implement. On Oct 23, 2006, at 5:50 AM, beatriz ramos wrote: Hello, I´m working with Lucene. I need to get the number of occurrences of the term in the document. I had seen the

Re: experiences with lingpipe

2006-10-23 Thread Breck Baldwin
Martin Braun wrote: hi all, does anybody have practical experiences with Ling Pipes Spellchecker (http://www.alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html)? I wrote the demo and I am the company 'system tuner' so I can perhaps help out here. With lucenes spellcheck

RE: "Catalog" backend for document stored fields?

2006-10-23 Thread Robichaud, Jean-Philippe
That may be a good idea. Is it possible to do this efficiently, like inside of the collect() call of a hitCollector? Right now, that's how my reporting tool works: Searcher searcher = new MultiSearcher(directories[] ...); HitCollector myHC = new MyHitCollector(searcher, ...); Searcher.search(myQ

Re: boost at query time or index time

2006-10-23 Thread Doron Cohen
> thanks what i was looking for was the fact if i can donot need to boost > docs then what will be the difference a) in query results and b) time > for indexing and c) time to run query and collect result ? There is also some precision loss with index time boosting. Also see the "Score Boosting"

Re: boost at query time or index time

2006-10-23 Thread Chris Lu
For your case a,b,c, there won't be much differences. Boost at indexing time can be more flexible. You can use one field's value to boost the document's ranking. For example, you could boost your products' ranking by their prices, or rating scores. -- Chris Lu - Instant F

Re: boost at query time or index time

2006-10-23 Thread Rupinder Singh Mazara
thanks what i was looking for was the fact if i can donot need to boost docs then what will be the difference a) in query results and b) time for indexing and c) time to run query and collect result ? Daniel Naber wrote: On Monday 23 October 2006 19:39, Rupinder Singh Mazara wrote: wher

Re: boost at query time or index time

2006-10-23 Thread Daniel Naber
On Monday 23 October 2006 19:39, Rupinder Singh Mazara wrote: >  where can i get info on  how boosting terms at index time compares to > boosting terms at query time ? At index time you can boost fields and/or documents. Only at query time you can boost terms. Regards Daniel -- http://www.da

Re: number of term occurrences

2006-10-23 Thread Erick Erickson
Yeah, but I haven't used the termfreq thingy enough to think of it automatically ... Besides, I'm learning that if I put a fooliwh answer out there, someone'll correct me. Thanks Erick On 10/23/06, Paul Elschot <[EMAIL PROTECTED]> wrote: On Monday 23 October 2006 21:16, Erick Erickson wrote: >

Re: number of term occurrences

2006-10-23 Thread Paul Elschot
On Monday 23 October 2006 21:16, Erick Erickson wrote: > Use TermDocs.seek(Term) to get to the term. That'll position your TermDocs > variable at a list, ordered by document ID of the ocurrences of a term. Then > TermDocs.skipTo(doc ID) will get you to the list of terms for that document > (you hav

Re: number of term occurrences

2006-10-23 Thread Erick Erickson
Use TermDocs.seek(Term) to get to the term. That'll position your TermDocs variable at a list, ordered by document ID of the ocurrences of a term. Then TermDocs.skipTo(doc ID) will get you to the list of terms for that document (you have to know what Lucene DocId you care about here.). Now TermDo

Re: wildcard and span queries

2006-10-23 Thread Erick Erickson
I thought I'd update folks on the continuing saga. Many thanks to all who've contributed to my education. Here's our current resolution; It turns out that the PM will cope with restricting wildcards two ways. 1> there must be at least 3 non-wildcard characters 2> wildcards cannot appear in the fi

boost at query time or index time

2006-10-23 Thread Rupinder Singh Mazara
hi all where can i get info on how boosting terms at index time compares to boosting terms at query time ? case 1 : if i have a index with all terms with the default boost value and i apply a boost value terms at query time versus case 2: i boost individual terms at index time with a boost

experiences with lingpipe

2006-10-23 Thread Martin Braun
hi all, does anybody have practical experiences with Ling Pipes Spellchecker (http://www.alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html)? With lucenes spellcheck contribution I am not really satisfied because the Index has some (many?) mispelled words, so the did you mean clas

Re: Highlighting "really" found terms

2006-10-23 Thread Harini Raghavan
I have a requirement to highlight phrases. I came across a reference to this alternate highlighter implementation. But I am unable to see the source files for the same. Can someone please point me to it? Thanks, Harini mark harwood wrote: See here for a thread reviewing the challenges and po

Re: Don't use the same index for updating and searching

2006-10-23 Thread Hes Siemelink
No, I wasn't using NFS. It was difficult to make a diagnostic, since we had no access to the file system of the production machine. Since it occured on production only (live on a busy web site), we decided to circumvent the problem by making an alternative implementation that would not use mixed r

number of term occurrences

2006-10-23 Thread beatriz ramos
Hello, I´m working with Lucene. I need to get the number of occurrences of the term in the document. I had seen the documentations ant I don´t find anything. Do you have any idea? Thanks.

Lucene consultants

2006-10-23 Thread Erik Hatcher
I am e-mailed almost daily about tackling Lucene consulting gigs, and I simply do not make the time to even give the time of day what with kids, day job, daydreaming about eventually getting LIA2 done, and did I mention kids?. Kids rock! I typically refer folks to Otis, and he likely say

overriding addClause()?

2006-10-23 Thread Bill Janssen
I'd like to suggest a minor change in the QueryParser.jj. I thought I'd describe it here and get some feedback before posting a patch. The issue is that I can't get my hands on some clauses (typically PhraseQuery instances) in my subclass of MultiFieldQueryParser, which I'd like to do to implemen

RE: Poor performance "race condition" in FieldSortedHitQueue

2006-10-23 Thread Oliver Hutchison
Kalpesh, Are you using sorting? If you are, then the patch attached to LUCENE-651 may help. It fixes a race condition that exists in the initialization of the FieldCache (which is used to accelerate sorting). Cheers, Ollie > -Original Message- > From: kalpesh patel [mailto:[EMAIL PROTEC