Reading stop word from a file!

2006-03-15 Thread Supheakmungkol SARIN
Dear Luceners, I wonder if there is any pre-defined option to read stop-word from a file? Any comment is hightly appreciated. Thanks in advance & Best regards, Mungkol __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam pr

fuzzy phrase query?

2006-03-15 Thread karl wettin
Is it possible to make a phrase query fuzzy? It could be a quick and not so dirty replacement for hidden markov models and thus produce great results for spell checking and other natrual language classifications. - To unsub

Re: FunctionQuery example request

2006-03-15 Thread Brian Riddle
Hej Paul, I have implemented the DistanceComparatorSource > example from Lucene In Action (my Bible) and it works > great. We are now in the situation where we have > nearly a million documents in our index and the > performance of this implementation has degraded. > I have had the same problem w

Re: Multiple languages - possible approach

2006-03-15 Thread Otis Gospodnetic
Hi Paul, I don't have any first-hand experience with this, but your suggestion about pluggable analyzers sounds both reasonable and interesting to me. One thing you did not mention as a mechanism for figuring out which analyzer to use is language identification (like the one you can find among

Multiple languages - possible approach

2006-03-15 Thread Paul Cowan
Hi everyone, We are currently using Lucene to index correspondence between various people, who may or may not use the same language in their discussions to each other. Think an email system where participants might use the language that seems most appropriate to the thought at the time, just a

Re: closing searcher

2006-03-15 Thread Otis Gospodnetic
You are correct. Reuse IndexSearcher. Otis - Original Message From: Amol Bhutada <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, March 15, 2006 7:14:49 PM Subject: closing searcher Hi, I am using lucene in a j2ee based webapplication. I have created a one instance

closing searcher

2006-03-15 Thread Amol Bhutada
Hi, I am using lucene in a j2ee based webapplication. I have created a one instance of reader and searcher object and trying to use that for all searches from different users without recreating/refreshing reader & searcher objects. Is this fine? I am asking this because I am reading http://www.

RE: Vector Space Model <-> Probabilistic Model

2006-03-15 Thread Runde, Kevin
Hello, I recently came across this email in the Lucene user list and am interested in this article. I tried to access it from the link you provided, but couldn't find any link to access it. Do you still have an electronic copy? Thanks, Kevin Runde -Original Message- From: Malcolm [mailt

Re: MultiSearch

2006-03-15 Thread Otis Gospodnetic
The Javadoc should have all the info. If not - Lucene in Action - http://www.lucenebook.com/search?query=multisearcher If not - Lucene in Action's free code that includes code with MultiSearcher, as you can see from snippets at the above URL. Otis - Original Message From: Brian <[EMAIL

Re: FunctionQuery example request

2006-03-15 Thread Chris Hostetter
: I have implemented the DistanceComparatorSource : example from Lucene In Action (my Bible) and it works : great. We are now in the situation where we have : nearly a million documents in our index and the : performance of this implementation has degraded. : Can someone please spare a couple of

Re: PhraseQuery and edit distance slightly confusing.

2006-03-15 Thread Dawid Weiss
Hi Doug, Yes, it should probably be called "edit-distance-like" or something. It should definitely say so in the JavaDoc because I've seen this propagate to people's articles (it was Eric Hatcher's I think, but I'm not sure). But what then would the criteria for matching at all be? Right

Re: Best design for an use case which is going to stress Lucene

2006-03-15 Thread Michael D. Curtin
This doesn't sound like a Lucene problem, at least the way you've described it. For example, Lucene can't search on any field that isn't indexed (and most of yours aren't indexed). Given that, it seems like your option (c) is the way to go. Seems like a simple RDBMS schema with 3 tables woul

FunctionQuery example request

2006-03-15 Thread Paul Lynch
Hi, I have implemented the DistanceComparatorSource example from Lucene In Action (my Bible) and it works great. We are now in the situation where we have nearly a million documents in our index and the performance of this implementation has degraded. I have downloaded and am trying to understand

Re: PhraseQuery and edit distance slightly confusing.

2006-03-15 Thread Doug Cutting
Dawid Weiss wrote: I get the concept implemented in PhraseQuery but isn't calling it an edit distance a little bit far fetched? Yes, it should probably be called "edit-distance-like" or something. Only the marginal elements (minimum and maximum distance from their respective query positions)

Re: Searching in paths

2006-03-15 Thread Chris Hostetter
: What about such solution: : Split path like string into smaller tokens and index them as seperate words eg: : #Top/World/Poland/# #Top/World/# #Top/# i would be careful about your use of the word "token" in that sentence, but yes indexing each of the directory like paths as keywords and doing

MultiSearch

2006-03-15 Thread Brian
Hello Everyone, I currently have an IndexSearch working Great! What I want to do now, is move to a multi Index search. What's the best way to go about it? Is it a simple process? Any thought's would be appreciated. Thanks, B __ Do You Yahoo!? Ti

Re: Best design for an use case which is going to stress Lucene

2006-03-15 Thread Fabio Insaccanebbia
> No queries on other fields (news metadata etc) will be performed. Do you mean that a full text search on the news text isn't required? I might be wrong, but it seems to me it doesn't sound as a typical Lucene usage.. I'd go for the (c) option.. (but not just one table :-) Bye, Fabio P.S.: how

Re: Add a module to the lucene

2006-03-15 Thread Chris Hostetter
Jason: you really don't need to send the same message 4 times in one night. You've got to give people time to sleep, and eat, and take care of other things that don't involve a computer :) : Can we add a module to lucene so that we are able to use our own similarity : measure to calculate the si

Best design for an use case which is going to stress Lucene

2006-03-15 Thread Terenzio Treccani
Hi all, I'm required to develop an application for searching over news items. There will be thousands of news items, each one will be assigned directly to a list of millions of customerIDs. The query will be done by passing a customerID and will return all news items associated to it. Furthermore,

Re: inclusive range search

2006-03-15 Thread mark harwood
> What I did is this: > > TermsFilter filter = new TermsFilter(); > filter.addTerm(new Term("date", "20060304 TO > 20060304")); The Term object's constructor in your example does not parse the "20060304 TO 20060304" string. A term is supposed to represent a single term exactly as it appears in y

PhraseQuery and edit distance slightly confusing.

2006-03-15 Thread Dawid Weiss
Hi there, I get the concept implemented in PhraseQuery but isn't calling it an edit distance a little bit far fetched? Only the marginal elements (minimum and maximum distance from their respective query positions) are taken into account. Consider this example: phrase: a b c d term p

Re: inclusive range search

2006-03-15 Thread Samuru Jackson
> Try making bother terms mandatory with "+" > > "+date:[20040101 TO 20040101] +Paris" That was it .. however it does not exactly suite my needs. I want to create a few combo boxes to let the user create a datefilter (From and To) on the search queries using a webform. Now if he chooses 20040101

Re: inclusive range search

2006-03-15 Thread Yonik Seeley
On 3/15/06, Samuru Jackson <[EMAIL PROTECTED]> wrote: > search = "date:[20040101 TO 20040101] Paris" > Somehow this range search does not work. I still get the same results > as without the date:[..] Try making bother terms mandatory with "+" "+date:[20040101 TO 20040101] +Paris" http://lucene

inclusive range search

2006-03-15 Thread Samuru Jackson
Hi! I have some trouble to use the inclusive range search explained in Lucene in Action in ch. 2.5.5 What I do is to add several fields to the index this way: document.add(Field.Keyword("id", key)); document.add(Field.Keyword("type", type)); document.add(Field.Text("text",text)); document.add(Fi

RE: segments.new

2006-03-15 Thread Vanlerberghe, Luc
See http://issues.apache.org/jira/browse/LUCENE-481 It was for the trunk at the time, but it's not difficult to apply it to the 1.4.3 sources manually... -Original Message- From: WATHELET Thomas [mailto:[EMAIL PROTECTED] Sent: woensdag 15 maart 2006 14:53 To: java-user@lucene.apache.org

RE: segments.new

2006-03-15 Thread WATHELET Thomas
Yes I use the Lucene 143 Could you send me the link for this patch? Thanks in advance -Original Message- From: Vanlerberghe, Luc [mailto:[EMAIL PROTECTED] Sent: mercredi 15 mars 2006 13:38 To: java-user@lucene.apache.org Subject: RE: segments.new Are you using Lucene 1.4.3 ? There's a b

AW: Searching in paths

2006-03-15 Thread Mathias Lux
Hi! Another option for a term query would be an analyzer, which creates keywords from paths, building them from every neighbouring pair in the path. So you could query for paths anywhere in the hierarchy, and you don't have to start from the top level hierarchy like in the approach mentioned be

"docs out of order" while merging

2006-03-15 Thread Paulo Silveira
I ve just get a "docs out of order". I have a database that is indexed everytime an update occurs. The index was ok for the last 3 weeks, and now, after the system throwed an exception because of a write lock that was not released (and I deleted it) I am recebing this: Can anyone help Full stack

Re: Searching in paths

2006-03-15 Thread kieran
Sorry, that should have read: Query query1 = null; if(cat!=""){ Term term = new Term("parentPath",cat); query1 = new TermQuery(term); Hits hits = is.search(query1); } ("parentPath" substituted for "category"). kieran wrote: Alternatively, you could examine each path, and index each of its

Re: Searching in paths

2006-03-15 Thread kieran
Alternatively, you could examine each path, and index each of its "parent" paths (perhaps in a field named "parentPath"). i.e. Top/World/Poland/Abc would result in the following three values being indexed: Top Top/World Top/World/Poland You can then use a TermQuery instead of a PrefixQuery. F

RE: segments.new

2006-03-15 Thread WATHELET Thomas
Ok thanks -Original Message- From: Vanlerberghe, Luc [mailto:[EMAIL PROTECTED] Sent: mercredi 15 mars 2006 13:38 To: java-user@lucene.apache.org Subject: RE: segments.new Are you using Lucene 1.4.3 ? There's a bug report in JIRA (LUCENE-481) with a patch that solves this. On Windows, f

Re: Searching in paths

2006-03-15 Thread Java Programmer
Reply to myself hate this :( What about such solution: Split path like string into smaller tokens and index them as seperate words eg: #Top/World/Poland/# #Top/World/# #Top/# so if I ask about word #Top/# I will get all the results for this category, without making so many boolean queries. Is the

RE: segments.new

2006-03-15 Thread Vanlerberghe, Luc
Are you using Lucene 1.4.3 ? There's a bug report in JIRA (LUCENE-481) with a patch that solves this. On Windows, files cannot be deleted while they are open and before the patch, calling getCurrent or isCurrent in one process could block another one from updating the segments file. The patch in

Re: Searching in paths

2006-03-15 Thread Java Programmer
On 3/14/06, Mordo, Aviran (EXP N-NANNATEK) <[EMAIL PROTECTED]> wrote: > You need to index the field as a keyword, or use an analyzer that will > not strip the / from the string > > Aviran > http://www.aviransplace.com Field is indexed as Keyword, I was using StandardAnalyzer(), but currently I try

RE: Setting the COMMIT lock timeout.

2006-03-15 Thread Jim Bedford-roberts
Yes - this 1.4 bug is what induced us to upgrade to 1.9! So, finding the same problem in a different guise in 1.9 is quite an unfortunate coincidence! -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: 14 March 2006 19:38 To: java-user@lucene.apache.org Subject: Re: S

Re: segments.new

2006-03-15 Thread Patrick Kimber
Hi Thomas I have been getting similar errors and am trying to investigate the cause. My current thinking is that it is caused by my virus checker opening the files. The error only occurs on Windows. When I run the same test on Linux I do not get the error. Not much help I know... but at least yo

Add a module to the lucene

2006-03-15 Thread jason
Hi, Can we add a module to lucene so that we are able to use our own similarity measure to calculate the similarity between documents and queries? As lucene has defined its own measure, we can do few with it. Considering the documents and queries represented as the vectors, we only need one clas

segments.new

2006-03-15 Thread WATHELET Thomas
High, I have a trouble this the indexation process, sometimes I retrieve an error like the file segments.new can't be rename or delete something like that. What's happened?

Re: lucene query analysis

2006-03-15 Thread Nadav Har'El
"Raghavendra Prabhu" <[EMAIL PROTECTED]> wrote on 15/03/2006 08:37:25 AM: > Hi > > The problem which i am facing is that the query is Case Sensitive > > If i type in BIG letters i am not able to see answers and if i type in > small letters i am able to see results > > Is there anything by which i