Re: get terms by positions

2006-10-03 Thread Catalin Mititelu
Hi, I have the same problem. This is useful when you try to extract the contexts (terms before and after) of a certain term (for example). I found a solution but it performs badly: when you try to retrieve those contexts you have to re-tokenize the documents containing the given term (i.e.

I need your opinion about working with big index and frecuently updates

2006-10-03 Thread Enrique Lamas
Hi, I'm working with a 100Mb length index. By application requirements, the information indexed is frecuently updated, with plenty of modifications, deletions and additions. I think Lucene is a very powerful searching tool once the index is already created, but I'm not sure if update index

Re: DateTools again

2006-10-03 Thread Volodymyr Bychkoviak
Ok, I'll try to explain a bit. User has an input (javaScript calendar) on page where he can choose some date to include in search. Search resolution is day resolution. If user will enter same date in different time of date he will get different results (because calendar will also set current

Indexing In Lucene

2006-10-03 Thread Ajani, Akil \(Cognizant\)
Hi, Can you tell me how indexing takes place in lucene(Depth).if document has 1n indices then which algorithm it uses,which information retrival model it uses... Thanks Regards, Akil Ajani Cognizant Technology Solutions India Pvt. Ltd. Plot # 26, Rajiv Gandhi Infotech Park,

Indexing In Lucene

2006-10-03 Thread Ajani, Akil \(Cognizant\)
Hi, Can you tell me how indexing takes place in lucene(Depth).if document has 1n indices then which algorithm it uses,which information retrival model it uses... Thanks Regards, Akil Ajani Cognizant Technology Solutions India Pvt. Ltd. Plot # 26, Rajiv Gandhi Infotech Park,

RE: Indexing In Lucene

2006-10-03 Thread W.H. van Atteveldt
I don't know what you're doing but the to: header is empty in your email which is really annoying (since I rely on the to: to sort my mail) -Original Message- From: Ajani, Akil (Cognizant) [mailto:[EMAIL PROTECTED] Sent: dinsdag 3 oktober 2006 10:47 Subject: Indexing In Lucene

AW: get terms by positions

2006-10-03 Thread Renzo Scheffer
I try to get back a list of all left or right neighbours of a searchterm. Then I will count them to get back the Information, how often a specific word is used as neighbour of the searchterm. I know that the results are variable according to the used Analyzer/Filter. It's just an experiment and

Re: DateTools again

2006-10-03 Thread John Haxby
Volodymyr Bychkoviak wrote: User has an input (javaScript calendar) on page where he can choose some date to include in search. Search resolution is day resolution. If user will enter same date in different time of date he will get different results (because calendar will also set current

Re: DateTools again

2006-10-03 Thread Volodymyr Bychkoviak
thanks for detailed explanation. John Haxby wrote: Volodymyr Bychkoviak wrote: User has an input (javaScript calendar) on page where he can choose some date to include in search. Search resolution is day resolution. If user will enter same date in different time of date he will get

Re: [Lucene 2.0]How to recover index?

2006-10-03 Thread zhu jiang
Anyone can help me??? 2006/10/3, zhu jiang [EMAIL PROTECTED]: Hi all, In some situation, index files may throw read past EOF exception so that the index cannot be used any more. I wonder how to recover the index files in such situation? -- Thanks, Jiang -- Thanks, Jiang

Indexing In Lucene

2006-10-03 Thread Ajani, Akil \(Cognizant\)
Hi, Can anyone tell me how indexing takes place in lucene(Depth).i will be thankful to you if anyone help me.. Thanks Regards, Akil Ajani Cognizant Technology Solutions India Pvt. Ltd. Plot # 26, Rajiv Gandhi Infotech Park, MIDC Hinjewadi, Pune 411057 Tel: (91) (20) 40201100

MultiFieldQueryParser vs concatenated field

2006-10-03 Thread Volodymyr Bychkoviak
In my application I need to implement search across several fields. What is better approach in term of relevance scoring: Index in separate fields and search using MultiFieldQueryParser or index everything as concatenated field an search using this field? Thanks in advance. -- regards,

Re: Searching documents on big index by using ParallelMultiSearcher is slow...

2006-10-03 Thread Erick Erickson
Well, the first question is always are you opening/closing your IndexSearchers for each request on your remote machines?. This is always a no-no. This is also a question for your single-searcher version. What is your performance if you only go to one server? I'd start by finding out what happens

Re: I need your opinion about working with big index and frecuently updates

2006-10-03 Thread Erick Erickson
Think about IndexModifier to change your index, although the documentation does state that it's better to batch your deletions together and batch your additions together if possible. 100Mb is not, in my experience, a very big index, so I really don't anticipate many problems. Do note that you

Re: MultiFieldQueryParser vs concatenated field

2006-10-03 Thread Erick Erickson
Well, as always, it depends G... My first thought is that I'd index things in separate fields as it gives you more options. For instance, let's say that you have name and phone fields and decide that the name field is more important than the phone number. Your options for boosting anything in the

Re: I need your opinion about working with big index and frecuently updates

2006-10-03 Thread Enrique Lamas
Thank you very much Erik, I'll think about it and will do some tests. Bye - Original Message - From: Erick Erickson [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Tuesday, October 03, 2006 1:42 PM Subject: Re: I need your opinion about working with big index and frecuently

Re: Indexing In Lucene

2006-10-03 Thread Nicolas Lalevée
Le Mardi 03 Octobre 2006 12:06, W.H. van Atteveldt a écrit : I don't know what you're doing but the to: header is empty in your email which is really annoying (since I rely on the to: to sort my mail) Strange. Looking to the source of Ajani's mail, there is : To: java-user@lucene.apache.org And

Re: AW: get terms by positions

2006-10-03 Thread Grant Ingersoll
We often calculate co-occurrence information as an offline task and store it and then it is just a simple lookup at run time. You just have to put together the appropriate loops based on the window size that you want for any given term. Probably not efficient if you index is changing a

Re: Indexing In Lucene

2006-10-03 Thread Nicolas Lalevée
Le Mardi 03 Octobre 2006 14:27, Nicolas Lalevée a écrit : Le Mardi 03 Octobre 2006 12:06, W.H. van Atteveldt a écrit : I don't know what you're doing but the to: header is empty in your email which is really annoying (since I rely on the to: to sort my mail) Strange. Looking to the source

Re: AW: get terms by positions

2006-10-03 Thread Grant Ingersoll
I should note, though, that we do this using the Lucene index, using the TermDocs, etc. On Oct 3, 2006, at 8:42 AM, Grant Ingersoll wrote: We often calculate co-occurrence information as an offline task and store it and then it is just a simple lookup at run time. You just have to put

Re: Search in HTML code

2006-10-03 Thread John Bugger
My crawler indexing crawled pages with these code: Document doc = new Document(); doc.add(new Field(body, page.getHtmlData(), Store.YES, Index.UN_TOKENIZED )); doc.add(new Field(url, page.getUrl(), Store.YES, Index.UN_TOKENIZED)); doc.add(new Field(title, page.getTitle(), Store.YES,

QueryParser syntax French Operator

2006-10-03 Thread Patrick Turcotte
Hi, Is there a way to add / replace the text for the boolean operators used by the query parser? We would like to replace (or even better, add), AND, OR and NOT by ET, OU and SAUF. Is there a way to configure the QueryParser to do it? We know we could always modify QueryParser.jj to add them

Lucene query syntax description in German

2006-10-03 Thread Aleksei Valikov
Hi folks, Does anybody have the description of Lucene query syntax in German? Thanks! Bye. /lexi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

native Java DB (eg, Derby) to store the index: performance comparision?..

2006-10-03 Thread Vladimir Olenin
Hi, I've been wondering if anyone has tried to compare the performance of any 'native' Java DB as index storage mechanism vs Lucene custom implementation? I'm assuming that DB products should provide some functionality for 'free' right out of the box (correct, if I'm wrong): - easily managable

Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread qaz zaq
Hi, I have a question about the lucene scoring. In my following example, how can I ensure the doc1 has the higher score than doc2, if I search for A*. In another words, I want to boost the docs which match their leading terms. doc1: Aterm Bterm Cterm doc2: Bterm Aterm Cterm

Re: Search in HTML code

2006-10-03 Thread Erick Erickson
Sure, anything's possible. Whether Lucene is your best bet may be another question G. But in this example, you're not using Lucene to do anything except store the strings. By storing all the data as UN_TOKENIZED, all you're doing is a regex match on the entire HTML text of each document. You

Re: MultiFieldQueryParser vs concatenated field

2006-10-03 Thread Chris Hostetter
: Well, as always, it depends G... My first thought is that I'd index things : in separate fields as it gives you more options. For instance, let's say : that you have name and phone fields and decide that the name field is more : important than the phone number. Your options for boosting

java.io.IOException: term out of order -- HELP

2006-10-03 Thread Michael J. Prichard
We get this when trying to optimize index: Exception in thread main java.io.IOException: term out of order at org.apache.lucene.index.TermInfosWriter.add(TermInfosWriter.java:95) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:305) at

Re: Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread Doron Cohen
If I understand the question, you do not want to boost in advance a certain doc, but rather score higher those documents containing the search term closer to the start of the document. There is more to define here - for instance, if doc1 has 5 words but doc2 has 1,000,000 words, would you still

Re: Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread Chris Hostetter
: does not pour affinity information into the score - i.e. both doc1 and doc2 : in your example would get the same score, and the SpanFirstQurey would only : allow you to limit the set of returned documents - Hoss, do you agree with : this? Oh ... hmmm ... i think you're right. SpanScorer

Re: [Lucene 2.0]How to recover index?

2006-10-03 Thread Chris Lu
I don't think it can be recovered. It's better to validate the index file beforehand, or make sure one thread is updating the index files and close the index properly. Chris Lu -- http://www.dbsight.net Instant Lucene Search on Any Database/Application On

Re: QueryParser syntax French Operator

2006-10-03 Thread Erik Hatcher
Currently AND/OR/NOT are hardcoded into the .jj file. A patch to make this configurable would be welcome! Erik On Oct 3, 2006, at 11:15 AM, Patrick Turcotte wrote: Hi, Is there a way to add / replace the text for the boolean operators used by the query parser? We would like

Re: QueryParser syntax French Operator

2006-10-03 Thread Mark Miller
Oh wouldn't we all. I want this too. Unfortunately, it's an elusive beast at best. As I am sure you know, JavaCC generates code based on the grammar and so it is very hard to alter the grammar after JavaCC'ing it. If you relax the 'add' part then you might be able to do something with

Number Proximity Query

2006-10-03 Thread KEGan
Hi, Is there a way to query all numbers that is close to a particular number (query), and score by how close they are to that number (query) ? To illustrate further, assume document with single field num, and the value for this field can only be integer number. Now, let says, there are 3

Re: Searching documents on big index by using ParallelMultiSearcher is slow...

2006-10-03 Thread Scott
Hi, Well, the first question is always are you opening/closing your IndexSearchers for each request on your remote machines?. This is always a no-no. This is also a question for your single-searcher version. Yes I know, each search slave (RMI server) have single instance of IndexSearcher

Re: Number Proximity Query

2006-10-03 Thread Chris Hostetter
: From my searches, there seems to be a FunctionQuery in Solr that can do this : type of query. But I am using pure Lucene, and trying to port Solr code over : (to create my own version of FunctionQuery) looks too complicated because of : code dependency on other Solr code such as ValueSource,