RE: Phrase indexing and searching with Lucene

2009-02-19 Thread Nada Mimouni
Hello, Thank you Erick for this detailed answer, that makes things clearer in my mind. >I'm still not clear why the built-in phrase query syntax won't work. I have programmed a set of java classes (I use Lucene classes) to index and search into a collection of documents for a set of queries. T

Re: what's the best practice for getting "next page" of hits?

2009-02-19 Thread Joel Halbert
Out of interest, if the index is entirely in memory (using a RAMDir) is there any significant different in performance between options (a) and (b) as outlined below? Rgs, Joel -Original Message- From: Ganesh Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org, rolaren..

Incremental search, CachingWrapperFilter and BooleanFilter

2009-02-19 Thread Konstantyn Smirnov
Hi all I implemented an autocomplete functionality, which is pretty classical: a user types in some words in an input field, and sees a list of matches in a drop-down. I've done it using filters (BooleanFilter, and TermsFilter + PrefixFilter), and it's working against and index (loaded in RAM) w

Analyse TermQuery and PhraseQuery

2009-02-19 Thread Nada Mimouni
Hello, String ws = " "; String query = "The"+ws+"president"+ws+"of"+ws+"the"+ws+"USA"+ws+"is"+ws+""\Barak Obama\""; Query q = QueryParser.parse(query, new StandardAnalyser()); Query q = QueryParser.parse(query, new WhitespaceAnalyser()); In this example: - could we create a query in such a fo

Index Structure

2009-02-19 Thread Seid Mohammed
I am new to lucene, and reading lucene in action book sometimes, i better understand when somone tell me an answer than a book. my queston is when indexing, what actually lucene is doing? if i have a file called test.txt with contents " lucen is used to index files" and i apply lucene indexing, wh

lucene index details

2009-02-19 Thread Seid Mohammed
I am new to lucene, and reading lucene in action book sometimes, i better understand when somone tell me an answer than a book. my queston is when indexing, what actually lucene is doing? if i have a file called test.txt with contents " lucen is used to index files" and i apply lucene indexing, wh

RE: Index Structure

2009-02-19 Thread Nada Mimouni
Hello, When indexing Lucene generates terms from your original text. To see the content and the structure of the index, use "Luke" which is a Lucene index toolbox. You can download it here : http://www.getopt.org/luke/ There is a detailed description of this tool (with pretty screen-shots) in

Filters - at what stage are they applied?

2009-02-19 Thread Joel Halbert
Hi, By way of clarification, when a filter is used with a search query, is the filter applied only to documents that matched the search query or is it applied to all documents in the index before the query is executed? Rgs, Joel

Re: Index Structure

2009-02-19 Thread Seid Mohammed
great, I have got it do luke support unicode? I am trying lucene in non-english languaguage thanks a lot seid m On 2/19/09, Nada Mimouni wrote: > > Hello, > > When indexing Lucene generates terms from your original text. > > To see the content and the structure of the index, use "Luke" which is

Re: Analyse TermQuery and PhraseQuery

2009-02-19 Thread Grant Ingersoll
On Feb 19, 2009, at 5:54 AM, Nada Mimouni wrote: Hello, String ws = " "; String query = "The"+ws+"president"+ws+"of"+ws+"the"+ws+"USA"+ws +"is"+ws+""\Barak Obama\""; Query q = QueryParser.parse(query, new StandardAnalyser()); Query q = QueryParser.parse(query, new WhitespaceAnalyser());

"Near" force in query server side?

2009-02-19 Thread Ian Vink
Once my app gets the query string from the user, is there a way to tell the query engine to only return documents where these words are at most 5 words apart? I can't tell the user to change their query, I have to do it server side. Is so do I have to add anything to my index to let Lucene know abo

Re: "Near" force in query server side?

2009-02-19 Thread Grant Ingersoll
You will likely need to create n-grams from the user query and from that construct a sloppy PhraseQuery. There is an n-gram Filter in the contrib/analysis package (I think it is called the ShingleFilter) On Feb 19, 2009, at 7:19 AM, Ian Vink wrote: Once my app gets the query string from t

searching a sentence or paragraph

2009-02-19 Thread Seid Mohammed
from lucen index, how can we search a sentence or a paragraph which satisfy our query? thanks a lot seid m -- "RABI ZIDNI ILMA" - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: j

RE: searching a sentence or paragraph

2009-02-19 Thread Nada Mimouni
You need to create a TermQuery or PhraseQuery with terms in your query depending on what result you need exactly. To create PhraseQuery, try the built-in phrase processing with double quotes, e.g. "this is a phrase". See the Term section at http://lucene.apache.org/java/2_4_0/queryparsersyn

Re: Index Structure

2009-02-19 Thread Koji Sekiguchi
Seid Mohammed wrote: great, I have got it do luke support unicode? I am trying lucene in non-english languaguage Of course. I can see Japanese terms without problems. Koji - To unsubscribe, e-mail: java-user-unsubscr...@

Re: Index Structure

2009-02-19 Thread Seid Mohammed
I have trioed Amharic fonts, it displays square like character, may be there is a kind of setting for it? Seid M On 2/19/09, Koji Sekiguchi wrote: > Seid Mohammed wrote: >> great, >> I have got it >> do luke support unicode? I am trying lucene in non-english languaguage >> >> > Of course. I can

Re: stream of events never to know when it ends? how to index such things & search

2009-02-19 Thread Christian Brennsteiner
hi erick, nr of events are 107/sec in average with 400/sec peak and 20/sec low. between searchable should be less than 20 minutes. we are planning to index IN RAM only for a duration of one day MAX. per lucene process on the operating system. currently we need 500 M RAM for indexing one day (just

Re: Phrase indexing and searching with Lucene

2009-02-19 Thread Erick Erickson
It looks to me like what you're trying to do is akin to document similarity, which I haven't had to delve into. But it's been discussed on the user list a few times, so perhaps your best bet would be to search the mail archives for that topic. Best Erick On Thu, Feb 19, 2009 at 3:14 AM, Nada Mimo

Re: Lucene search performance on Sun UltraSparc T2 (T5120) servers

2009-02-19 Thread Glen Newton
I will look a little deeper into the information you supplied and comment, but will suggest this on my initial cursory review: 1 - You have 32GB of memory. Using the 64bit VM, try using a 16GB or 24GB heap; 2 - Turn-on huge pages: -XX:+UseLargePages -XX:LargePageSizeInBytes=256m 3 - Tu

Re: Index Structure

2009-02-19 Thread Koji Sekiguchi
There is no additional setting for me... Koji Seid Mohammed wrote: I have trioed Amharic fonts, it displays square like character, may be there is a kind of setting for it? Seid M On 2/19/09, Koji Sekiguchi wrote: Seid Mohammed wrote: great, I have got it do luke support unicode? I

class used to create term document matrix in lucene

2009-02-19 Thread nitin gopi
Hi all Can anybody tell me which class and its methods are used to create term document matrix in lucene? Regards, Nitin

Re: what's the best practice for getting "next page" of hits?

2009-02-19 Thread Erick Erickson
The best practice is, well, "It Depends" (tm). First off, I wouldn't do any caching of results unless and until you had a reasonable certainty that you had performance issues, so would by my first choice. And if you *did* start to see performance issues, I'd look first at why the queries were expe

Re: searching a sentence or paragraph

2009-02-19 Thread Seid Mohammed
Thanks Nada, it again works perfectly seid m. On 2/19/09, Nada Mimouni wrote: > > > > You need to create a TermQuery or PhraseQuery with terms in your query > depending on what result you need exactly. > > To create PhraseQuery, try the built-in phrase processing with double > quotes, e.g. > "th

Re: lucene index details

2009-02-19 Thread Erick Erickson
You have to look at Analyzers a bit here because that's what controls what is in the index. The simplest case is a WhitespaceAnalyzer that breaks the input stream up into tokens on any whitespace. So, in your example and using a WhitespaceAnalyzer, you'd get the following tokens: lucene, is, used,

Re: stream of events never to know when it ends? how to index such things & search

2009-02-19 Thread Erick Erickson
My indexes have been much more static than yours, so I'll defer indexing event logging recommendations to others. But as I remember, the issue of indexing log files has been discussed on the list before, a search of logfiles or log files in the searchable archive might be useful. Your problem is a

Indexer.Java problem

2009-02-19 Thread Seid Mohammed
I am using netbeans on windows to test lucene. I have added all the lib files from the /lib directory to my project library. down the end of Indexer.java program, it states the Field.Text method is not available the error message is as follows ---

Re: Indexer.Java problem

2009-02-19 Thread Erick Erickson
LIA was written for a pretty early version of Lucene, if you're using a recent release you need to modify the code to be compliant with that version. Or install an older release of Lucene. Erick On Thu, Feb 19, 2009 at 10:41 AM, Seid Mohammed wrote: > I am using netbeans on windows to test luc

Re: Indexer.Java problem

2009-02-19 Thread Seid Mohammed
I better modify it, but can you give just a hint on how to modify thanks a lot Seid M On 2/19/09, Erick Erickson wrote: > LIA was written for a pretty early version of Lucene, if you're using a > recent > release you need to modify the code to be compliant with that version. > > Or install an ol

Re: Filters - at what stage are they applied?

2009-02-19 Thread Yonik Seeley
On Thu, Feb 19, 2009 at 6:53 AM, Joel Halbert wrote: > By way of clarification, when a filter is used with a search query, is > the filter applied only to documents that matched the search query or is > it applied to all documents in the index before the query is executed? Filters are currently a

Re: Indexer.Java problem

2009-02-19 Thread Erick Erickson
Unfortunately, not really. I haven't tried to get the LIA examples working for years... The various release notes on the Wiki, especially the 1.9 and 2.0 release notes are probably the best place to start. Best Erick On Thu, Feb 19, 2009 at 11:13 AM, Seid Mohammed wrote: > I better modify it,

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

2009-02-19 Thread Philip Puffinburger
Actually, WhitespaceTokenizer won't work. Too many person names and it won't do anything with punctuation. Something had to have changed in StandardTokenizer, and we need some of the 2.4 fixes/features, so we are kind of stuck. -Original Message- From: Philip Puffinburger [mailto:ppuf

Re: Indexer.Java problem

2009-02-19 Thread Michael McCandless
The early access version of LIA2 (accessible at http://www.manning.com/hatcher3/) has updated this example to work with recent Lucene releases (though it's still using deprecated APIs -- that'll be fixed before the book is released). Oh actually the first chapter is a free PDF on Manning'

Re: stream of events never to know when it ends? how to index such things & search

2009-02-19 Thread Christian Brennsteiner
hi erick, ram and fsdir: we will hold every day of the 30 days (in the past) in ram. we will start a seperate process every 1 or 2 days which holds 1-2 days. i think that FSDir might be too slow? never tested that my goal is to search 30 days with indexes about 300-700 M / day -> 21 G (max) w