How specify the analyzer when created query with api?

2008-09-22 Thread Giannandrea Castaldi
Hi, In my webapp I'm trying to use the lucene api to build queries instead of the QueryParser but I haven't found out where to specify the Analyzer. Any help? Thanks. jean71 - To unsubscribe, e-mail: [EMAIL PROTECTED] For

RE: Multi Field search without Multifieldqueryparser

2008-09-22 Thread Dino Korah
I would think, with the current capabilities of lucene, denormalisation is the solution. Create an extra indexed but not stored field called searchable-mash which will hold the values from all fields with added words to connect the data like Male named George Bush whoes occupation is President of

Re: Multi Field search without Multifieldqueryparser

2008-09-22 Thread Umesh Prasad
Hi, Having an extra indexed but unstored field is equivalent to having a bag of words. So the search results quality will be affected. Consider an Example: Text : President of USA-- Other Fields .. Text : -- Occupation: President of USA In both cases searchable-mash = BAG of WORDs, will

Re: eXist, Lucene and XQuery

2008-09-22 Thread adasal
Ah, hi Gustavo, I actually don't know this, but it seems that the implementation of XQuery in eXist places your results in an unsorted list! Or possibly it is placed in a hashmap with its own bindings representing, for instance, data types and so on? Thinking about this, I believe it is possible

StandardAnalyzer exclude numbers

2008-09-22 Thread jim
Hello Is it possible to exclude numbers using StandardAnalyzer just like SimpleAnalyzer? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Exception while doing sorting

2008-09-22 Thread Ganesh - yahoo
My index crossed 5 GB and 5 million documents are indexed. My query includes searching and sorting returns 4 hits. If i do search from a standalone application, the results are returned in 12 seconds. If i perform the same from web application running inside Tomcat, out of memory exception

Re: StandardAnalyzer exclude numbers

2008-09-22 Thread Mark Miller
[EMAIL PROTECTED] wrote: Hello Is it possible to exclude numbers using StandardAnalyzer just like SimpleAnalyzer? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Its possible

Re: StandardAnalyzer exclude numbers

2008-09-22 Thread 黄成
why not use a token filter? On Mon, Sep 22, 2008 at 8:36 PM, Mark Miller [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hello Is it possible to exclude numbers using StandardAnalyzer just like SimpleAnalyzer? - To

Re: How specify the analyzer when created query with api?

2008-09-22 Thread Erick Erickson
What do you mean when you say trying to use the lucene api to build queries? Are you trying to use BooleanQuery? If so, you either construct specific clauses yourself (presumably by, say, tokenizing things yourself and creating TermQuerys, PhraseQuerys, etc.) which *don't* need an analyzer, or

Re: StandardAnalyzer exclude numbers

2008-09-22 Thread Mark Miller
Agreed. I am always diving into that analyzer too fast g Possibly premature optimization thoughts as well. But scanning the token after in a filter and breaking/skipping if you find a number will be much easier and possibly not too much slower. Depends on how involved you are/want to get I

Re: Multi Field search without Multifieldqueryparser

2008-09-22 Thread Erick Erickson
One way to address Umesh's concern is to boost terms you *do* know enough about to assign to a specific field. But the observation that That said, Best solution depends on your requirement is right on. Best Erick On Mon, Sep 22, 2008 at 5:29 AM, Umesh Prasad [EMAIL PROTECTED] wrote: Hi,

Re: Exception while doing sorting

2008-09-22 Thread Erick Erickson
Sure, your tomcat instance is assigning some amount of memory to the JVM that your searcher is running in. Of course, now you're going to ask me now to increase that number... I have no idea but I've seen this question multiple times in the mail archive, so a search there or in the tomcat docs

Re: Exception while doing sorting

2008-09-22 Thread Dipen
@ganesh: For increasing memory in tomcat, you wanna increase it in CATALINA_OPTS in catalina.sh file add this : -Xmx1500m which means it shud not use more than 1500 megs or -Xms500m should have atleast 500 megs On Mon, Sep 22, 2008 at 5:15 PM, Ganesh - yahoo [EMAIL PROTECTED]wrote: My

Re: How specify the analyzer when created query with api?

2008-09-22 Thread Giannandrea Castaldi
On Mon, Sep 22, 2008 at 2:49 PM, Erick Erickson [EMAIL PROTECTED] wrote: What do you mean when you say trying to use the lucene api to build queries? Are you trying to use BooleanQuery? If so, you either construct specific clauses yourself (presumably by, say, tokenizing things yourself and

Re: eXist, Lucene and XQuery

2008-09-22 Thread Gustavo Corral
Thanks a lot Adam, actually I'm returning now a plain ValueSequence instead a NodeSet and all is working great now. It seems like NodeSet needs a predefined order to all the algorithms work in the correct way. Gustavo

RE: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Steven A Rowe
Hi Daniel, On 09/22/2008 at 12:49 AM, Daniel Noll wrote: I have a question about Korean tokenisation. Currently there is a rule in StandardTokenizerImpl.jflex which looks like this: ALPHANUM = ({LETTER}|{DIGIT}|{KOREAN})+ LUCENE-1126 https://issues.apache.org/jira/browse/LUCENE-1126

Re: How specify the analyzer when created query with api?

2008-09-22 Thread Erick Erickson
Right, you can't do that. TermQuerys are low-level, they don't go through analyzers, you have to do the tokenizing yourself. StandardAnalyzer, among other things, lowercases the tokens. If you haven't already got a copy of Luke, please do so as it's a wonderful tool for seeing what different

Re: Using Hits as document space for new search

2008-09-22 Thread Chris Hostetter
: For example, in my case it's car searching form. : First of all i'm telling that i want to search for BMW. System returning set : of results. : In process of viewing results system shows additional criterias for making : search result more exact, and shows count of result set after adding

Re: How specify the analyzer when created query with api?

2008-09-22 Thread Giannandrea Castaldi
On Mon, Sep 22, 2008 at 5:42 PM, Erick Erickson [EMAIL PROTECTED] wrote: ... But isn't this the wonderful thing about tests? They make your assumptions explicit, and when your assumptions are incorrect, you find it with much less pain than in a working program where you say the most recent

Query attached words

2008-09-22 Thread Jean-Claude Antonio
Hello, If I had a file with the following content: ... object.method(); ... I would like to be able to query for object method object.method My guess is that I should store not only object.method, but also object and method as I cannot query *method. Any other suggestion? Kind regards,

Re: IndexSearcher.search

2008-09-22 Thread Chris Hostetter
: We're not using TopDocCollector right now, as we're still using Hits. : Performing some operation over every result is just one use case. We also : have to deal with the user scrolling the display. Currently this works : acceptably using the same java.util.List model for both cases. Sometimes

Re: Background merge hit exception

2008-09-22 Thread Michael McCandless
OK I found one path whereby optimize would detect that the ConcurrentMergeScheduler had hit an exception while merging in a BG thread, and correctly throw an IOException back to its caller, but fail to set the root cause in that exception. I just committed it, so it should be fixed in

Re: eXist, Lucene and XQuery

2008-09-22 Thread adasal
OK, right, that figures. Adam 2008/9/22 Gustavo Corral [EMAIL PROTECTED] Thanks a lot Adam, actually I'm returning now a plain ValueSequence instead a NodeSet and all is working great now. It seems like NodeSet needs a predefined order to all the algorithms work in the correct way.

Re: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Daniel Noll
Steven A Rowe wrote: Korean has been treated differently from Chinese and Japanese since LUCENE-461 https://issues.apache.org/jira/browse/LUCENE-461. The grouping of Hangul with digits was introduced in this issue. Certainly I found LUCENE-461 during my search, and certainly grouping

Re: Exception while doing sorting

2008-09-22 Thread Ganesh - yahoo
System Specification: Processor speed: 2Ghz Ram: 3 GB IndexDB size 5 GB. Total documents indexed: 5.8 million. To collect hits, i have replaced Hits object with TopFieldDocs. This has improved the search performance better. Sorting is faster on date / long field, but it is very slow on string