Hi,
In my webapp I'm trying to use the lucene api to build queries
instead of the QueryParser but I haven't found out where to specify
the Analyzer. Any help?
Thanks.
jean71
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
I would think, with the current capabilities of lucene, denormalisation is
the solution. Create an extra indexed but not stored field called
searchable-mash which will hold the values from all fields with added
words to connect the data like Male named George Bush whoes occupation is
President of
Hi,
Having an extra indexed but unstored field is equivalent to having a bag of
words. So the search results quality will be affected.
Consider an Example:
Text : President of USA--
Other Fields ..
Text : --
Occupation: President of USA
In both cases searchable-mash = BAG of WORDs, will
Ah, hi Gustavo,
I actually don't know this, but it seems that the implementation of XQuery
in eXist places your results in an unsorted list! Or possibly it is placed
in a hashmap with its own bindings representing, for instance, data types
and so on?
Thinking about this, I believe it is possible
Hello
Is it possible to exclude numbers using StandardAnalyzer just like
SimpleAnalyzer?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
My index crossed 5 GB and 5 million documents are indexed.
My query includes searching and sorting returns 4 hits.
If i do search from a standalone application, the results are returned in 12
seconds. If i perform the same from web application running inside Tomcat,
out of memory exception
[EMAIL PROTECTED] wrote:
Hello
Is it possible to exclude numbers using StandardAnalyzer just like
SimpleAnalyzer?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Its possible
why not use a token filter?
On Mon, Sep 22, 2008 at 8:36 PM, Mark Miller [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
Hello
Is it possible to exclude numbers using StandardAnalyzer just like
SimpleAnalyzer?
-
To
What do you mean when you say trying to use the lucene api to build
queries?
Are you trying to use BooleanQuery? If so, you either construct specific
clauses yourself (presumably by, say, tokenizing things yourself and
creating TermQuerys, PhraseQuerys, etc.) which *don't* need an
analyzer, or
Agreed. I am always diving into that analyzer too fast g Possibly
premature optimization thoughts as well. But scanning the token after in
a filter and breaking/skipping if you find a number will be much easier
and possibly not too much slower. Depends on how involved you are/want
to get I
One way to address Umesh's concern is to boost terms
you *do* know enough about to assign to a specific field.
But the observation that
That said, Best solution depends on your requirement
is right on.
Best
Erick
On Mon, Sep 22, 2008 at 5:29 AM, Umesh Prasad [EMAIL PROTECTED] wrote:
Hi,
Sure, your tomcat instance is assigning some amount of memory
to the JVM that your searcher is running in. Of course, now you're
going to ask me now to increase that number... I have no idea but
I've seen this question multiple times in the mail archive,
so a search there or in the tomcat docs
@ganesh:
For increasing memory in tomcat, you wanna increase it in CATALINA_OPTS in
catalina.sh file
add this : -Xmx1500m which means it shud not use more than 1500 megs or
-Xms500m should have atleast 500 megs
On Mon, Sep 22, 2008 at 5:15 PM, Ganesh - yahoo [EMAIL PROTECTED]wrote:
My
On Mon, Sep 22, 2008 at 2:49 PM, Erick Erickson [EMAIL PROTECTED] wrote:
What do you mean when you say trying to use the lucene api to build
queries?
Are you trying to use BooleanQuery? If so, you either construct specific
clauses yourself (presumably by, say, tokenizing things yourself and
Thanks a lot Adam,
actually I'm returning now a plain ValueSequence instead a NodeSet and all
is working great now. It seems like NodeSet needs a predefined order to all
the algorithms work in the correct way.
Gustavo
Hi Daniel,
On 09/22/2008 at 12:49 AM, Daniel Noll wrote:
I have a question about Korean tokenisation. Currently there
is a rule in StandardTokenizerImpl.jflex which looks like this:
ALPHANUM = ({LETTER}|{DIGIT}|{KOREAN})+
LUCENE-1126 https://issues.apache.org/jira/browse/LUCENE-1126
Right, you can't do that. TermQuerys are low-level, they don't
go through analyzers, you have to do the tokenizing yourself.
StandardAnalyzer, among other things, lowercases the tokens.
If you haven't already got a copy of Luke, please do so as it's
a wonderful tool for seeing what different
: For example, in my case it's car searching form.
: First of all i'm telling that i want to search for BMW. System returning set
: of results.
: In process of viewing results system shows additional criterias for making
: search result more exact, and shows count of result set after adding
On Mon, Sep 22, 2008 at 5:42 PM, Erick Erickson [EMAIL PROTECTED] wrote:
...
But isn't this the wonderful thing about tests? They
make your assumptions explicit, and when your
assumptions are incorrect, you find it with much less
pain than in a working program where you say the most
recent
Hello,
If I had a file with the following content:
...
object.method();
...
I would like to be able to query for
object
method
object.method
My guess is that I should store not only object.method, but also
object and method as I cannot query *method.
Any other suggestion?
Kind regards,
: We're not using TopDocCollector right now, as we're still using Hits.
: Performing some operation over every result is just one use case. We also
: have to deal with the user scrolling the display. Currently this works
: acceptably using the same java.util.List model for both cases. Sometimes
OK I found one path whereby optimize would detect that the
ConcurrentMergeScheduler had hit an exception while merging in a BG
thread, and correctly throw an IOException back to its caller, but
fail to set the root cause in that exception. I just committed it, so
it should be fixed in
OK, right, that figures.
Adam
2008/9/22 Gustavo Corral [EMAIL PROTECTED]
Thanks a lot Adam,
actually I'm returning now a plain ValueSequence instead a NodeSet and all
is working great now. It seems like NodeSet needs a predefined order to all
the algorithms work in the correct way.
Steven A Rowe wrote:
Korean has been treated differently from Chinese and Japanese since
LUCENE-461 https://issues.apache.org/jira/browse/LUCENE-461. The
grouping of Hangul with digits was introduced in this issue.
Certainly I found LUCENE-461 during my search, and certainly grouping
System Specification:
Processor speed: 2Ghz
Ram: 3 GB
IndexDB size 5 GB.
Total documents indexed: 5.8 million.
To collect hits, i have replaced Hits object with TopFieldDocs. This has
improved the search performance better. Sorting is faster on date / long
field, but it is very slow on string
25 matches
Mail list logo