Wouldn't the easiest fix be to just alter the users query string before passing
it to queryparser (moving the semantics of your search app outside of lucene)?
Something like:
str.replaceAll(" ET ", " AND ").replaceAll(" OU ", " OR ").replaceAll(" SAUF ",
" NOT ")
Citerar Mark Miller <[EMAIL PROT
: >From my searches, there seems to be a FunctionQuery in Solr that can do this
: type of query. But I am using pure Lucene, and trying to port Solr code over
: (to create my own version of FunctionQuery) looks too complicated because of
: code dependency on other Solr code such as ValueSource, et
Hi,
> Well, the first question is always "are you opening/closing your
> IndexSearchers for each request on your remote machines?". This is
always a
> no-no. This is also a question for your single-searcher version.
Yes I know, each search slave (RMI server) have single instance
of IndexSearc
Hi,
Is there a way to query all numbers that is close to a particular number
(query), and score by how close they are to that number (query) ?
To illustrate further, assume document with single field "num", and the
value for this field can only be integer number. Now, let says, there are 3
docum
Oh wouldn't we all. I want this too. Unfortunately, it's an elusive
beast at best. As I am sure you know, JavaCC generates code based on the
grammar and so it is very hard to alter the grammar after JavaCC'ing it.
If you relax the 'add' part then you might be able to do something with
QueryPars
Currently AND/OR/NOT are hardcoded into the .jj file. A patch to
make this configurable would be welcome!
Erik
On Oct 3, 2006, at 11:15 AM, Patrick Turcotte wrote:
Hi,
Is there a way to add / replace the text for the boolean operators
used
by the query parser?
We would like to
I don't think it can be recovered. It's better to validate the index
file beforehand, or make sure one thread is updating the index files
and close the index properly.
Chris Lu
--
http://www.dbsight.net
Instant Lucene Search on Any Database/Application
On 10/3
: does not pour affinity information into the score - i.e. both doc1 and doc2
: in your example would get the same score, and the SpanFirstQurey would only
: allow you to limit the set of returned documents - Hoss, do you agree with
: this?
Oh ... hmmm ... i think you're right. SpanScorer scores
If I understand the question, you do not want to boost in advance a certain
doc, but rather score higher those documents containing the search term
closer to the start of the document.
There is more to define here - for instance, if doc1 has 5 words but doc2
has 1,000,000 words, would you still pr
We get this when trying to optimize index:
Exception in thread "main" java.io.IOException: term out of order
at org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:95)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:305)
at
org.apache.lucene.index.SegmentM
take a look at the SpanFirstQuery ... do do "A*" type searches inside of a
SpanQuery you'll eithe need to use SpanRegexQuery, or roll your own
SpamPrefixQuery out of a SpanOrQuery containing SpamTermQueries.
: I have a question about the lucene scoring. In my following example,
: how can I ensu
: Well, as always, it depends ... My first thought is that I'd index things
: in separate fields as it gives you more options. For instance, let's say
: that you have name and phone fields and decide that the name field is more
: important than the phone number. Your options for boosting "anything
Sure, anything's possible. Whether Lucene is your best bet may be another
question . But in this example, you're not using Lucene to do anything
except store the strings. By storing all the data as UN_TOKENIZED, all
you're doing is a regex match on the entire HTML text of each document. You
might
Hi,
I have a question about the lucene scoring. In my following example, how can
I ensure the doc1 has the higher score than doc2, if I search for "A*". In
another words, I want to boost the docs which match their leading terms.
doc1: Aterm Bterm Cterm
doc2: Bterm Aterm Cterm
Hi,
I've been wondering if anyone has tried to compare the performance of
any 'native' Java DB as index storage mechanism vs Lucene custom
implementation? I'm assuming that DB products should provide some
functionality for 'free' right out of the box (correct, if I'm wrong):
- easily managable
Hi folks,
Does anybody have the description of Lucene query syntax in German?
Thanks!
Bye.
/lexi
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi,
Is there a way to add / replace the text for the boolean operators used
by the query parser?
We would like to replace (or even better, add), "AND", "OR" and "NOT" by
"ET", "OU" and "SAUF".
Is there a way to configure the QueryParser to do it?
We know we could always modify QueryParser.jj to
My crawler indexing crawled pages with these code:
Document doc = new Document();
doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED
));
doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED));
doc.add(new Field("title", page.getTitle(), Store.YES, Index.TO
I should note, though, that we do this using the Lucene index, using
the TermDocs, etc.
On Oct 3, 2006, at 8:42 AM, Grant Ingersoll wrote:
We often calculate co-occurrence information as an offline task and
store it and then it is just a simple lookup at run time. You
just have to put to
Le Mardi 03 Octobre 2006 12:14, Renzo Scheffer a écrit :
> I try to get back a list of all left or right neighbours of a searchterm.
> Then I will count them to get back the Information, how often a specific
> word is used as neighbour of the searchterm. I know that the results are
> variable accor
Le Mardi 03 Octobre 2006 14:27, Nicolas Lalevée a écrit :
> Le Mardi 03 Octobre 2006 12:06, W.H. van Atteveldt a écrit :
> > I don't know what you're doing but the to: header is empty in your email
> > which is really annoying (since I rely on the to: to sort my mail)
>
> Strange. Looking to the so
We often calculate co-occurrence information as an offline task and
store it and then it is just a simple lookup at run time. You just
have to put together the appropriate loops based on the window size
that you want for any given term. Probably not efficient if you
index is changing a l
Le Mardi 03 Octobre 2006 12:06, W.H. van Atteveldt a écrit :
> I don't know what you're doing but the to: header is empty in your email
> which is really annoying (since I rely on the to: to sort my mail)
Strange. Looking to the source of Ajani's mail, there is :
To:
And my filter worked : I put
Thank you very much Erik, I'll think about it and will do some tests.
Bye
- Original Message -
From: "Erick Erickson" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, October 03, 2006 1:42 PM
Subject: Re: I need your opinion about working with big index and frecuently
updates
Think about In
Well, as always, it depends ... My first thought is that I'd index things
in separate fields as it gives you more options. For instance, let's say
that you have name and phone fields and decide that the name field is more
important than the phone number. Your options for boosting "anything in the
Think about IndexModifier to change your index, although the documentation
does state that it's better to batch your deletions together and batch your
additions together if possible.
100Mb is not, in my experience, a very big index, so I really don't
anticipate many problems. Do note that you can
Well, the first question is always "are you opening/closing your
IndexSearchers for each request on your remote machines?". This is always a
no-no. This is also a question for your single-searcher version.
What is your performance if you only go to one server? I'd start by finding
out what happen
In my application I need to implement search across several fields.
What is better approach in term of relevance scoring:
Index in separate fields and search using MultiFieldQueryParser or index
everything as concatenated field an search using this field?
Thanks in advance.
--
regards,
Volody
Hi,
Can anyone tell me how indexing takes place in lucene(Depth).i
will be thankful to you if anyone help me..
Thanks & Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MIDC
Hinjewadi, Pune 411057
Tel: (91) (20) 40201100 e
Anyone can help me???
2006/10/3, zhu jiang <[EMAIL PROTECTED]>:
Hi all,
In some situation, index files may throw read past EOF exception so
that the index cannot be used any more. I wonder how to recover the index
files in such situation?
--
Thanks,
Jiang
--
Thanks,
Jiang
thanks for detailed explanation.
John Haxby wrote:
Volodymyr Bychkoviak wrote:
User has an input (javaScript calendar) on page where he can choose
some date to include in search. Search resolution is day resolution.
If user will enter same date in different time of date he will get
different
Volodymyr Bychkoviak wrote:
User has an input (javaScript calendar) on page where he can choose
some date to include in search. Search resolution is day resolution.
If user will enter same date in different time of date he will get
different results (because calendar will also set current hour
I try to get back a list of all left or right neighbours of a searchterm.
Then I will count them to get back the Information, how often a specific
word is used as neighbour of the searchterm. I know that the results are
variable according to the used Analyzer/Filter. It's just an experiment and
fir
I don't know what you're doing but the to: header is empty in your email
which is really annoying (since I rely on the to: to sort my mail)
> -Original Message-
> From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED]
> Sent: dinsdag 3 oktober 2006 10:47
> Subject: Indexing In Lucene
>
>
Hi,
Can you tell me how indexing takes place in lucene(Depth).if
document has 1n indices then which algorithm it uses,which
information retrival model it uses...
Thanks & Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MID
Hi,
Can you tell me how indexing takes place in lucene(Depth).if
document has 1n indices then which algorithm it uses,which
information retrival model it uses...
Thanks & Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MID
Ok, I'll try to explain a bit.
User has an input (javaScript calendar) on page where he can choose some
date to include in search. Search resolution is day resolution.
If user will enter same date in different time of date he will get
different results (because calendar will also set current
Hi,
I'm working with a 100Mb length index. By application requirements, the
information indexed is frecuently updated, with plenty of modifications,
deletions and additions.
I think Lucene is a very powerful searching tool once the index is already
created, but I'm not sure if update index frec
38 matches
Mail list logo