Re: "Starts with" query?

2006-01-05 Thread Paul Smith
one thing you may not have thought about yet that may affect your decision: sorting in lucene requires the field be indexed but untokenized. so if you want to support sortting on the conceptual "title", you'll still need a version of your title field that's untokenized, which can then be u

Re: "Starts with" query?

2006-01-05 Thread Chris Hostetter
: Thanks Chris, I had thought of that one, but unfortunately the title : could be quite long, and there are literally millions of documents. : Isn't each title going to be included as one "term" in the index : dictionary? If so, won't the index get ridiculously large and slow? It depends on your

Re: "Starts with" query?

2006-01-05 Thread Yonik Seeley
That's deprecated now of course... so you want MultiPhraseQuery. -Yonik On 1/5/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Check out PhrasePrefixQuery. > > -Yonik > > > On 1/5/06, Paul Smith <[EMAIL PROTECTED]> wrote: > > first off response to my own post, I meant PhraseQuery instead. > > > > B

Re: "Starts with" query?

2006-01-05 Thread Yonik Seeley
Check out PhrasePrefixQuery. -Yonik On 1/5/06, Paul Smith <[EMAIL PROTECTED]> wrote: > first off response to my own post, I meant PhraseQuery instead. > > But, since we're only tokenizing this field ,and not storing the > entire contents of the field, I'm not sure this is ever going to > work, i

Re: "Starts with" query?

2006-01-05 Thread Paul Smith
first off response to my own post, I meant PhraseQuery instead. But, since we're only tokenizing this field ,and not storing the entire contents of the field, I'm not sure this is ever going to work, is it? I notice that if I have a title "auto update", then the phrase query trick works i

Re: "Starts with" query?

2006-01-05 Thread Paul Smith
2) index a magic token at the start of the title and include that in a phrase query: "_START_ the quick" Ok, I've gone and chose "0start0" as my start token, because our analyzer is stripping _. Now, second dumb question of the day, give the search for starts with "The qui*", that has t

Re: "Starts with" query?

2006-01-05 Thread Paul Smith
1) also index the field untokenized and use a straight prefix query See my reply to Chris, not sure I can afford the index size increment. 2) index a magic token at the start of the title and include that in a phrase query: "_START_ the quick" h, that's clever. 3) use a SpanFirst quer

Re: "Starts with" query?

2006-01-05 Thread Paul Smith
On 06/01/2006, at 9:33 AM, Chris Hostetter wrote: : Think SQL of " where title like 'The quick%' ". I solved this problem by having a variation of my field that was not tokenized, and did PrefixQueries on that field (so in your case, leave your title field alone for generic matches, and

Re: "Starts with" query?

2006-01-05 Thread Yonik Seeley
Off the top of my head: 1) also index the field untokenized and use a straight prefix query 2) index a magic token at the start of the title and include that in a phrase query: "_START_ the quick" 3) use a SpanFirst query (but you have to make the Java Query object yourself) -Yonik On 1/5/06,

Re: "Starts with" query?

2006-01-05 Thread Chris Hostetter
: Think SQL of " where title like 'The quick%' ". I solved this problem by having a variation of my field that was not tokenized, and did PrefixQueries on that field (so in your case, leave your title field alone for generic matches, and have a titleUntokenized field for PrefixMatches. Ano

"Starts with" query?

2006-01-05 Thread Paul Smith
I'm throwing myself at the mercy of the lucene community, I'm a bit brain dead today after looking after a screaming 3 month old baby for 4 hours last night... We have a 'title' field indexed as Field.Text(...), which works nicely, and has lots of good searching. However, this application

Handling fractional field range queries

2006-01-05 Thread Urvashi Gadi
Hi All, Any pointers on how to handle range queries if the data type is double or float? Best, Urvashi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Opening (and building) the lucene source in eclipse

2006-01-05 Thread Erik Hatcher
On Jan 5, 2006, at 2:03 PM, Colin Young wrote: Just curious, but when one is desiring to make use of stuff in the contrib is there any particular reason to compile it into the Lucene jar, or include it in the project that's making use of it, or is it really just up to the preferences of the

Re: multi-field query parser with AND operator?

2006-01-05 Thread Daniel Naber
On Donnerstag 05 Januar 2006 03:31, Bill Janssen wrote: > I've got a some code developed for Lucene 1.4.1, that works around the > problem of having both (1) multiple default fields, and (2) the AND > operator for query elements.  In 1.4.1, MultiFieldQueryParser > effectively only allowed the OR o

Re: WordNet alternatives

2006-01-05 Thread Daniel Naber
On Donnerstag 05 Januar 2006 16:09, Yilmazel, Sibel wrote: > Are there any WordNet dictionary alternatives that anyone had a chance > to look into for Lucene? Maybe you could specify your question: what exactly are you trying to do with WordNet and why are you looking for alternatives? Regards

RE: Opening (and building) the lucene source in eclipse

2006-01-05 Thread Colin Young
I did manage to get things built with ant this morning (I had some time and space on the train) including the DBDJE support, although I wasn't aware any of it had been checked into contrib. I'll have to grab that tonight.   I had tried building a normal Eclipse project, but was having some proble

Re: span / position increment issue

2006-01-05 Thread Paul Elschot
On Thursday 05 January 2006 19:33, Marc Hadfield wrote: > > Thanks Erik, Hoss - > > I will try MultiPhraseQuery and report back. > > Back in an email thread with Doug be mentioned SpanQuery would work, and > in a fashion it does, but I can't differentiate between terms at the > same position a

Re: searching and indexing simultaneously...

2006-01-05 Thread Otis Gospodnetic
As far as I know, the best information about various Lucene concurrency rules, locks and locking is in the Lucene book (e.g. http://www.lucenebook.com/search?query=concurrency+rules ). The short story is: use 1 IndexSearcher. Get a new one when you detect index change. Use 1 IndexWriter. If

Re: span / position increment issue

2006-01-05 Thread Marc Hadfield
Thanks Erik, Hoss - I will try MultiPhraseQuery and report back. Back in an email thread with Doug be mentioned SpanQuery would work, and in a fashion it does, but I can't differentiate between terms at the same position and contiguous positions. The problem gets worse if I want to test fo

Re: WordNet alternatives

2006-01-05 Thread Otis Gospodnetic
A better place to ask this may be some kind of a computational linguistics mailing list. I'm not aware of WordNet alternatives for the English language, but I know there are WordNet cousins for other languages (e.g. BalkaNet for several languages spoken on the Balkan peninsula). Otis - Ori

RE: searching and indexing simultaneously...

2006-01-05 Thread Ramana Jelda
Nice contribution by Luc. Thanks, Jelda -Original Message- From: Vanlerberghe, Luc [mailto:[EMAIL PROTECTED] Sent: Thursday, January 05, 2006 5:11 PM To: java-user@lucene.apache.org Subject: RE: searching and indexing simultaneously... One reader/searcher per server. My configuration

RE: searching and indexing simultaneously...

2006-01-05 Thread Vanlerberghe, Luc
One reader/searcher per server. My configuration uses - one Lucene index in a shared location, - one server that uses either a single IndexReader or a single IndexWriter to delete or add documents - several servers that read/search the index. The 'search' servers each have a single IndexReader op

Re: Opening (and building) the lucene source in eclipse

2006-01-05 Thread Yonik Seeley
On 1/5/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > my recommendation is to simply set up > a normal Eclipse project. I use IntelliJ, and that's how I do it (a normal IntelliJ project). -Yonik - To unsubscribe, e-mail: [EMAIL

RE: searching and indexing simultaneously...

2006-01-05 Thread John Powers
But its best to only have one reader/searcher, correct? -Original Message- From: Ramana Jelda [mailto:[EMAIL PROTECTED] Sent: Thursday, January 05, 2006 9:08 AM To: java-user@lucene.apache.org Subject: RE: searching and indexing simultaneously... Hi, You are right. There can be multipl

WordNet alternatives

2006-01-05 Thread Yilmazel, Sibel
Hello, Are there any WordNet dictionary alternatives that anyone had a chance to look into for Lucene? Thanks, Sibel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: searching and indexing simultaneously...

2006-01-05 Thread Ramana Jelda
Hi, You are right. There can be multiple indexreaders but only one indexwriter is advised. No, we can not use two indexwriters simultaneously. Jelda -Original Message- From: K.A.Hussain Ali [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 5:01 PM To: java-user@lucene.apache.o

searching and indexing simultaneously...

2006-01-05 Thread K.A.Hussain Ali
HI all. i am newbie to lucene Do lucene provides any way to do indexing ,searching and deleting simultaneously .. I hope we could do searching and indexing which means there can be multiple indexreader and only one indexwriter accessing the index.. Could we have two indexwriter working simult

Re: Opening (and building) the lucene source in eclipse

2006-01-05 Thread Erik Hatcher
Colin, I've not used Eclipse in a long while, and I've not used the new Ant project support it has. It sounds like Eclipse has issues with Ant imports as you've surmised, and my recommendation is to simply set up a normal Eclipse project. The source tree to Lucene is extremely basic...

Re: span / position increment issue

2006-01-05 Thread Erik Hatcher
Marc, SpanNearQuery isn't capable of performing the proximity to within only a single position in the manner you've described. A slop of 0 means the terms must be contiguous with no gaps, which also allows for matches in the same position as in your first example. I think MultiPhraseQuer

Re: Lucene and Regex - ?

2006-01-05 Thread Erik Hatcher
Dmitry, RegexQuery is similar in behavior to Lucene's built-in WildcardQuery, except rather than accepting only ? and * as wildcard characters it leverages the full expression capability of whatever underlying regular expression engine is selected. SpanRegexQuery is a "span" version of Re

Re: http://www.textmining.org/ is "hacked"

2006-01-05 Thread Patrick Kimber
Gui How about using the OpenOffice API? http://weblogs.java.net/blog/tchangu/archive/2005/12/open_office_jav_1.html http://api.openoffice.org/DevelopersGuide/DevelopersGuide.html Patrick On 24/11/05, Guilherme Barile <[EMAIL PROTECTED]> wrote: > I have some issues with textmining extracting tex