Real world app advice

2006-09-15 Thread Luis Rodrigo Aguado
Hi all, I have used Lucene so far for solving toy exaples and making tutorial examples, but now I am facing my first real-world high-quality application. I need to manage around 50.000 docs, ranging from a few lines to a couple pages. I also need to handle lemmas and synonyms, and h

RE: apachecon

2006-09-15 Thread Steven Parkes
I stopped procrastinating on this today. I signed up for a BOF slot at 8 on Thursday. Hopefully not against other stuff of interest. I've not done this before, but the BOF slots were filling. >From my perspective, it'd be great to have people from any of the subprojects. Plenty of cross fertiliz

Re: apachecon

2006-09-15 Thread Yonik Seeley
On 8/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I was wondering if there have been any other self/semi-organized things : around Lucene in the past, like a BOF? This will be my first ApacheCon, so i can't speak to what's happened in the past -- but I'm certainly up for putting some face

Re[2]: ParallelMultiSearcher and docFreq

2006-09-15 Thread Yura Smolsky
Hello, Ronald. What I have found that nothing except createWeight uses that docFreqs(Term[]) method... Maybe I need to parallelize it... But I dont understand something. When does Multisearcher.createWeight() is being called, b/c only this method used docFreqs and this method creates HashMap of d

RE: ParallelMultiSearcher and docFreq

2006-09-15 Thread Haines, Ronald C. \(LNG-DAY\)
I understand...because I've experienced it. I think the answer is to 'parallelize' the docFreq process...and or try to make use of the docFreq(Terms[]). By passing an Array of Terms, you can avoid the 'call per Term' per remote and just make a single docFreq call per remote. You might have to ex

Re: Using wildcard at the start of teh token

2006-09-15 Thread Erick Erickson
I suspect guarantee that if you have a large index (actually, not that large), you'll find yourself dealing with TooManyClauses exceptions. Look at the thread in this list titled "I just don't get wildcards at all" for a discussion of wildcards and applicable strategies. "The guys" explained a lot

Merging "orphaned" segments into a composite index

2006-09-15 Thread Rob Staveley (Tom)
I have had some badly behaved Lucene indexing software crash on me several times and have been left with an index directory with lots of non-composite files in, when all I ought to be getting is the compound files .cfs files plus deletable and segments. Re-indexing everything doesn't bear think

RE: Using wildcard at the start of teh token

2006-09-15 Thread Lee_Gary
I believe Lucene's QueryParser doesn't allow you to specify a leading wildcard. However, the WildcardQuery class does allow leading wildcard queries, such as "*technology". This is probably the easiest way to get around this. You do have other options that can specify a wildcard search, such as

Re: AW: Lucene Suggest ?

2006-09-15 Thread karl wettin
On Fri, 2006-09-15 at 15:31 +0200, Mark Müller wrote: > I guess terms will only be took into the corpus when the search found > results at least once for that term (and removed if no more results were > found). > > Persisting the corpus has to be done, but should be no problem. I use ObjectIn&Out

Using wildcard at the start of teh token

2006-09-15 Thread Supriya Kumar Shyamal
Hello All, I have a question .. how to use wildcard for searching at the start of the query string. For Ex. I want to search on title with query value "*technology", when I try to create a lucene query by using QueryParser it thorws the excpetion .. Lexical error at line 1, column 1. Enco

Re: Lucene Suggest ?

2006-09-15 Thread Bill Taylor
Depending on the size of your index, you might want to put it in the downloaded page. I have a small index of maybe 1,500 words so I have the word list in the page. this is simpler than ajax, but will not work for big indexes, of course. On Sep 15, 2006, at 8:02 AM, Mark Müller wrote: Hi a

How to get field name when a term hit documents

2006-09-15 Thread Mukesh Bhardwaj
HI, I'm new in the lucene and currently I'm performing search in all the fields.I'm only specifying the term which i want to search so, I would like to know how to get field name related to this term in all the documents that hit by searcher. pls suggest a solution for same. Thanks in ad

AW: Lucene Suggest ?

2006-09-15 Thread Mark Müller
Thx for the pointer to your code. It's a smart approach even it not related to Lucene only. I guess terms will only be took into the corpus when the search found results at least once for that term (and removed if no more results were found). Persisting the corpus has to be done, but should be no

Re: Big Ducument Indexing Limit?

2006-09-15 Thread Erick Erickson
First, You really must undestand analyzers and what they do. If you haven't seen the book Lucene in Action, I highly recommend it. Second, get a copy of Luke (google luke and lucene). It is a graphical tool that lets you examine an index and fire queries at it. It'll show you exactly what was ind

Re: ParallelMultiSearcher and docFreq

2006-09-15 Thread Yura Smolsky
Hello, Yura. Does anyone understand my email? Maybe my English is too bad... Thanks. YS> Here is the situation. I have ParallelMultiSearcher object YS> initializated with two or more RemoteSearchable's. YS> I run PrefixQuery search on some keyword field, say "link". When I run YS> search starti

Re: Lucene Suggest ?

2006-09-15 Thread Ioan Cocan
We've done something similar at http://www.123dictionar.ro. As you type, the word is sent to the server using AJAX and if an exact match is not found, a Lucene index is searched using a FuzzyQuery search. Counts are precomputed, as data is not changing. Regards, Ioan Mark Müller wrote: Hi al

Re: Big Ducument Indexing Limit?

2006-09-15 Thread aslam bari
Thanks for response, I have again a small problem. I have some text in a xml tag like \A1;Frank\PPaul Does lucene can not index it using SimpleAnalyzer or TextContentExtractor. Thanks... Catalin Mititelu <[EMAIL PROTECTED]> wrote: One more hint for 2) and 3): use SimpleAnalyzer on

Re: Lucene Suggest ?

2006-09-15 Thread karl wettin
On Fri, 2006-09-15 at 14:02 +0200, Mark Müller wrote: > Hi all, > I like to know if it is possible to let make Lucene Suggestions while the > user types in the search query. > > Like in Google Suggest: http://www.google.com/webhp?complete=1&hl=en > > I just need to send with AJAX the part of the

Lucene Suggest ?

2006-09-15 Thread Mark Müller
Hi all, I like to know if it is possible to let make Lucene Suggestions while the user types in the search query. Like in Google Suggest: http://www.google.com/webhp?complete=1&hl=en I just need to send with AJAX the part of the word the user already typed and get back the list of matching terms.

Re: Big Ducument Indexing Limit?

2006-09-15 Thread Catalin Mititelu
One more hint for 2) and 3): use SimpleAnalyzer on your xml (give up at XmlContentExtractor). In this manner you can index all "words" from xml file at lower case (tag name, attribute name, attribute value and content). Of course, you should use the same analyzer for searching. Simon Willnauer

Re: Big Ducument Indexing Limit?

2006-09-15 Thread Simon Willnauer
On 9/15/06, aslam bari <[EMAIL PROTECTED]> wrote: Dear Mititelu, Thanks for reply. Can you help me on some samll issue related to it. 1) I am new to Lucene. Can you tell me where is this DEFAULT_MAX_FIELD_LENGTH variable available and how to set it and for my purpose like 6-10MB file, how m

Re: Big Ducument Indexing Limit?

2006-09-15 Thread aslam bari
Dear Mititelu, Thanks for reply. Can you help me on some samll issue related to it. 1) I am new to Lucene. Can you tell me where is this DEFAULT_MAX_FIELD_LENGTH variable available and how to set it and for my purpose like 6-10MB file, how much i should set. 2) how can i index all the words

Re: Big Ducument Indexing Limit?

2006-09-15 Thread Catalin Mititelu
Yes. The default max limit for indexing tokens is 10,000. Look here http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH aslam bari <[EMAIL PROTECTED]> wrote: Dear all, I am trying to index a Xml file which has 6MB size. Does lucene support t

Big Ducument Indexing Limit?

2006-09-15 Thread aslam bari
Dear all, I am trying to index a Xml file which has 6MB size. Does lucene support the big document size. What is the limit of lucene Max file size to index. Because when i check and trying to search in the indexed file. I am not able to get all the results. It gives me some results but not o