Hi,
I have the same problem.
This is useful when you try to extract the contexts (terms before and after) of
a certain term (for example).
I found a solution but it performs badly: when you try to retrieve those
contexts you have to re-tokenize the documents containing the given term (i.e.
Hi,
I'm working with a 100Mb length index. By application requirements, the
information indexed is frecuently updated, with plenty of modifications,
deletions and additions.
I think Lucene is a very powerful searching tool once the index is already
created, but I'm not sure if update index
Ok, I'll try to explain a bit.
User has an input (javaScript calendar) on page where he can choose some
date to include in search. Search resolution is day resolution.
If user will enter same date in different time of date he will get
different results (because calendar will also set current
Hi,
Can you tell me how indexing takes place in lucene(Depth).if
document has 1n indices then which algorithm it uses,which
information retrival model it uses...
Thanks Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park,
Hi,
Can you tell me how indexing takes place in lucene(Depth).if
document has 1n indices then which algorithm it uses,which
information retrival model it uses...
Thanks Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park,
I don't know what you're doing but the to: header is empty in your email
which is really annoying (since I rely on the to: to sort my mail)
-Original Message-
From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED]
Sent: dinsdag 3 oktober 2006 10:47
Subject: Indexing In Lucene
I try to get back a list of all left or right neighbours of a searchterm.
Then I will count them to get back the Information, how often a specific
word is used as neighbour of the searchterm. I know that the results are
variable according to the used Analyzer/Filter. It's just an experiment and
Volodymyr Bychkoviak wrote:
User has an input (javaScript calendar) on page where he can choose
some date to include in search. Search resolution is day resolution.
If user will enter same date in different time of date he will get
different results (because calendar will also set current
thanks for detailed explanation.
John Haxby wrote:
Volodymyr Bychkoviak wrote:
User has an input (javaScript calendar) on page where he can choose
some date to include in search. Search resolution is day resolution.
If user will enter same date in different time of date he will get
Anyone can help me???
2006/10/3, zhu jiang [EMAIL PROTECTED]:
Hi all,
In some situation, index files may throw read past EOF exception so
that the index cannot be used any more. I wonder how to recover the index
files in such situation?
--
Thanks,
Jiang
--
Thanks,
Jiang
Hi,
Can anyone tell me how indexing takes place in lucene(Depth).i
will be thankful to you if anyone help me..
Thanks Regards,
Akil Ajani
Cognizant Technology Solutions India Pvt. Ltd.
Plot # 26, Rajiv Gandhi Infotech Park, MIDC
Hinjewadi, Pune 411057
Tel: (91) (20) 40201100
In my application I need to implement search across several fields.
What is better approach in term of relevance scoring:
Index in separate fields and search using MultiFieldQueryParser or index
everything as concatenated field an search using this field?
Thanks in advance.
--
regards,
Well, the first question is always are you opening/closing your
IndexSearchers for each request on your remote machines?. This is always a
no-no. This is also a question for your single-searcher version.
What is your performance if you only go to one server? I'd start by finding
out what happens
Think about IndexModifier to change your index, although the documentation
does state that it's better to batch your deletions together and batch your
additions together if possible.
100Mb is not, in my experience, a very big index, so I really don't
anticipate many problems. Do note that you
Well, as always, it depends G... My first thought is that I'd index things
in separate fields as it gives you more options. For instance, let's say
that you have name and phone fields and decide that the name field is more
important than the phone number. Your options for boosting anything in the
Thank you very much Erik, I'll think about it and will do some tests.
Bye
- Original Message -
From: Erick Erickson [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Tuesday, October 03, 2006 1:42 PM
Subject: Re: I need your opinion about working with big index and frecuently
Le Mardi 03 Octobre 2006 12:06, W.H. van Atteveldt a écrit :
I don't know what you're doing but the to: header is empty in your email
which is really annoying (since I rely on the to: to sort my mail)
Strange. Looking to the source of Ajani's mail, there is :
To: java-user@lucene.apache.org
And
We often calculate co-occurrence information as an offline task and
store it and then it is just a simple lookup at run time. You just
have to put together the appropriate loops based on the window size
that you want for any given term. Probably not efficient if you
index is changing a
Le Mardi 03 Octobre 2006 14:27, Nicolas Lalevée a écrit :
Le Mardi 03 Octobre 2006 12:06, W.H. van Atteveldt a écrit :
I don't know what you're doing but the to: header is empty in your email
which is really annoying (since I rely on the to: to sort my mail)
Strange. Looking to the source
I should note, though, that we do this using the Lucene index, using
the TermDocs, etc.
On Oct 3, 2006, at 8:42 AM, Grant Ingersoll wrote:
We often calculate co-occurrence information as an offline task and
store it and then it is just a simple lookup at run time. You
just have to put
My crawler indexing crawled pages with these code:
Document doc = new Document();
doc.add(new Field(body, page.getHtmlData(), Store.YES, Index.UN_TOKENIZED
));
doc.add(new Field(url, page.getUrl(), Store.YES, Index.UN_TOKENIZED));
doc.add(new Field(title, page.getTitle(), Store.YES,
Hi,
Is there a way to add / replace the text for the boolean operators used
by the query parser?
We would like to replace (or even better, add), AND, OR and NOT by
ET, OU and SAUF.
Is there a way to configure the QueryParser to do it?
We know we could always modify QueryParser.jj to add them
Hi folks,
Does anybody have the description of Lucene query syntax in German?
Thanks!
Bye.
/lexi
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi,
I've been wondering if anyone has tried to compare the performance of
any 'native' Java DB as index storage mechanism vs Lucene custom
implementation? I'm assuming that DB products should provide some
functionality for 'free' right out of the box (correct, if I'm wrong):
- easily managable
Hi,
I have a question about the lucene scoring. In my following example, how can
I ensure the doc1 has the higher score than doc2, if I search for A*. In
another words, I want to boost the docs which match their leading terms.
doc1: Aterm Bterm Cterm
doc2: Bterm Aterm Cterm
Sure, anything's possible. Whether Lucene is your best bet may be another
question G. But in this example, you're not using Lucene to do anything
except store the strings. By storing all the data as UN_TOKENIZED, all
you're doing is a regex match on the entire HTML text of each document. You
: Well, as always, it depends G... My first thought is that I'd index things
: in separate fields as it gives you more options. For instance, let's say
: that you have name and phone fields and decide that the name field is more
: important than the phone number. Your options for boosting
We get this when trying to optimize index:
Exception in thread main java.io.IOException: term out of order
at org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:95)
at
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:305)
at
If I understand the question, you do not want to boost in advance a certain
doc, but rather score higher those documents containing the search term
closer to the start of the document.
There is more to define here - for instance, if doc1 has 5 words but doc2
has 1,000,000 words, would you still
: does not pour affinity information into the score - i.e. both doc1 and doc2
: in your example would get the same score, and the SpanFirstQurey would only
: allow you to limit the set of returned documents - Hoss, do you agree with
: this?
Oh ... hmmm ... i think you're right. SpanScorer
I don't think it can be recovered. It's better to validate the index
file beforehand, or make sure one thread is updating the index files
and close the index properly.
Chris Lu
--
http://www.dbsight.net
Instant Lucene Search on Any Database/Application
On
Currently AND/OR/NOT are hardcoded into the .jj file. A patch to
make this configurable would be welcome!
Erik
On Oct 3, 2006, at 11:15 AM, Patrick Turcotte wrote:
Hi,
Is there a way to add / replace the text for the boolean operators
used
by the query parser?
We would like
Oh wouldn't we all. I want this too. Unfortunately, it's an elusive
beast at best. As I am sure you know, JavaCC generates code based on the
grammar and so it is very hard to alter the grammar after JavaCC'ing it.
If you relax the 'add' part then you might be able to do something with
Hi,
Is there a way to query all numbers that is close to a particular number
(query), and score by how close they are to that number (query) ?
To illustrate further, assume document with single field num, and the
value for this field can only be integer number. Now, let says, there are 3
Hi,
Well, the first question is always are you opening/closing your
IndexSearchers for each request on your remote machines?. This is
always a
no-no. This is also a question for your single-searcher version.
Yes I know, each search slave (RMI server) have single instance
of IndexSearcher
: From my searches, there seems to be a FunctionQuery in Solr that can do this
: type of query. But I am using pure Lucene, and trying to port Solr code over
: (to create my own version of FunctionQuery) looks too complicated because of
: code dependency on other Solr code such as ValueSource,
36 matches
Mail list logo