Lucene Search Engine Query

2008-04-01 Thread shrish garg
Hi, I am using Lucene search engine in my website for document search . though it is working fine and searching the keywords into the documents properly, i am facing a problem during the search . When i am searching some keywords whose occurence are very low in the document and

Re: Lucene Search Engine Query

2008-04-01 Thread Michael McCandless
This could be the maxFieldLength default in IndexWriter? By default IndexWriter only indexes the first 10,000 tokens of a document. Mike shrish garg wrote: Hi, I am using Lucene search engine in my website for document search . though it is working fine and searching the

Lucene Search Engine Query

2008-04-01 Thread shrish garg
Hi, I am using Lucene search engine in my website for document search . though it is working fine and searching the keywords into the documents properly, i am facing a problem during the search . When i am searching some keywords whose occurence are very low in the document and

Re: java.lang.IllegalArgumentException: Segment is too large

2008-04-01 Thread Michael McCandless
OK, I opened LUCENE-1254 and committed the fix to trunk (upcoming) 2.3.2. Mike Yonik Seeley wrote: On Mon, Mar 31, 2008 at 5:19 AM, Michael McCandless [EMAIL PROTECTED] wrote: I think we should remove those checks and allow addIndexesNoOptimize to import and index even if it has

RE: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Dominique Béjean
May be you can index the set of documents in a temporary index. This index needs only one field (tag). Then you can browse the terms collection of the index and get each couple term/frequency IndexReader reader = IndexReader.open(temp_index); TermEnum terms = reader.terms();

stemming in Lucene

2008-04-01 Thread Wojtek H
Hi all, Snowball stemmers are part of Lucene, but for few languages only. We have documents in various languages and so need stemmers for many languages (in particular polish). One of the ideas is to use ispell dictionaries. There are ispell dicts for many languages and so this solution is good

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread wuqi
so build a index for the dynamically generated docucements set ,and then try to find frequency for each terms in this index... not sure it's fast enoug.but it's worth to have a try... Thank you Doinique! - Original Message - From: Dominique Béjean [EMAIL PROTECTED] To:

RE: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Dominique Béjean
On www.crossfeeds.com, I use this method in order to update hourly a tag cloud based on the title of 20.000 RSS articles (RSS published during the last 24 hours). It takes 1 minute. -Message d'origine- De : wuqi [mailto:[EMAIL PROTECTED] Envoyé : mardi 1 avril 2008 14:10 À :

Re: setPositionIncrement questions

2008-04-01 Thread Erick Erickson
See Chris's reply, but for this So I will not want to return higher PositionIncrement for each instance of a field, just those which I'm interested in (title/headers) I think you want PerFieldAnalyzerWrapper. Erick On Mon, Mar 31, 2008 at 10:56 AM, Itamar Syn-Hershko [EMAIL PROTECTED] wrote:

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread wuqi
I registered myself just now, an interesting website. It seems crossfeeds generate a tag cloud offline hourly ? But I have a more strict time requirement. user submit a query in my website, and they may get tens of thousands of search results. I need to generate a tag cloud for all these

Controlling index file name

2008-04-01 Thread 021336
We use Lucene to create simple data stores that we deploy with our application. Our application also supports auto-updating and we refresh these data stores monthly. Since Lucene computes the names for the index we end up deploying new files each time while leaving the old files to continue

intuitive explanation for what seems like odd result?

2008-04-01 Thread Donna L Gresh
I have two slightly different queries, and am filtering to return only a single unique document. The scores are very slightly different, but in the opposite way from what my (naive) reasoning would have expected. In the first case the query is text:j2ee^2.0, text:soa^2.0, text:webservic,

Re: stemming in Lucene

2008-04-01 Thread Karl Wettin
Wojtek H skrev: Snowball stemmers are part of Lucene, but for few languages only. We org.apache.lucene.analysis contains a few more stemmers. have documents in various languages and so need stemmers for many languages (in particular polish). Have you seen Stempel?

Re: intuitive explanation for what seems like odd result?

2008-04-01 Thread Karl Wettin
Donna L Gresh skrev: I have two slightly different queries, Hi Donna, I can't help you, but perhaps I would understand everthing better if you also pasted in the explanations. karl - To unsubscribe, e-mail: [EMAIL

Re: intuitive explanation for what seems like odd result?

2008-04-01 Thread Donna L Gresh
Sure; here are the two explanations (below). Your question made me go look at the explanation more carefully again and (no) surprise, I discovered that I misspoke (miswrote) earlier; the two found terms are j2ee and soa, which then makes my concern much less of one, since in both cases, the

Re: Problems about using Lucene to generate tag cloud..

2008-04-01 Thread Daniel Noll
On Tuesday 01 April 2008 18:51:55 Dominique Béjean wrote: IndexReader reader = IndexReader.open(temp_index); TermEnum terms = reader.terms(); while (terms.next()) { String field = terms.term().field(); Gotcha: after calling terms() it's already pointing at

Lucene Compression

2008-04-01 Thread Sebastin
Hi All, is there any possibility to create compression store for the following types of string in lucene index store? String str = II0264.D05|00022745|ABCDE|03/01/2008 00:23:12|00035| 9840836588| 129382152520| 04F4243B600408|04F4243B600408| |11919898456123|354943011025810L| CPTBS2I|