date:20091119

Re: ChainedFilter in Lucene 2.9

2009-11-19 Thread Robert Muir

Hi, you can find this in 'lucene-misc' contrib jar file http://lucene.apache.org/java/2_9_1/api/contrib-misc/org/apache/lucene/misc/ChainedFilter.html On Thu, Nov 19, 2009 at 11:27 PM, Michel Nadeau wrote: > Hi ! > > Can someone tell me what is replacing ChainedFilter in Lucene 2.9? > > I used

ChainedFilter in Lucene 2.9

2009-11-19 Thread Michel Nadeau

Hi ! Can someone tell me what is replacing ChainedFilter in Lucene 2.9? I used to do it like this - h = searcher.search(q, cluCF, cluSort); Where cluCF is a ChainedFilter declared like this - Filter cluCF = new ChainedFilter(cluFilters, ChainedFilter.AND); cluFilters is a Filter[] containing

RE: Keep URLs intact and not tokenized by the StandardTokenizer

2009-11-19 Thread Delbru, Renaud

Hi, Some time ago, I had to modify and extend the Lucene StandardTokenizer grammar (flex file) so that it preserves URIs (based on RFC3986). I have extracted the files from my project and published the source code on github [1] under the Apache License 2.0, if it can help. [1] http://github.co

SpanQuery for Terms at same position

2009-11-19 Thread Christopher Tignor

Hello, I would like to search for all documents that contain both "plan" and "_v" (my part of speech token for verb) at the same position. I have tokenized the documents accordingly so these tokens exists at the same location. I can achieve programaticaly using PhraseQueries by adding the Terms e

Re: Keep URLs intact and not tokenized by the StandardTokenizer

2009-11-19 Thread Sudha Verma

Thanks. I was hoping Lucene would already have a solution for this since it seems like it would be a common problem. I am new to the lucene API. If I were to implement something from scratch, are my options to extend the Tokenizer to support http regex and then pass the text to StandardTokenizer.

RE: Keep URLs intact and not tokenized by the StandardTokenizer

2009-11-19 Thread Steven A Rowe

Hi Sudha, In the past, I've built regexes to recognize URLs using the information here: http://www.foad.org/~abigail/Perl/url2.html The above, however, is currently a dead link. Here's the Internet Archive's WayBack Machine's cache of this page from August 2007:

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson

Hmmm, you're beyond what I've tried to do, so all I can do is speculate. But I don't believe that two terms on top of each other are considered when calculating slop. But I really don't know for sure, so I'd create a couple of unit tests to verify. You're right, the combinatorial explosion with pu

Re: best way to iterate through all docs from a query

2009-11-19 Thread Ian Lea

First queries are often slow and subsequent ones faster. Search the list for warming - I think there was something on it in the last couple of days. Or read the "When measuring performance, disregard the first query" bit of http://wiki.apache.org/lucene-java/ImproveSearchingSpeed A good number t

Re: Phrase query with terms at same location

2009-11-19 Thread Christopher Tignor

Thanks again for this. I would like to able to do several things with this data if possible. As per Mark's post, I'd like to be able to query for phrases like "He _v"~1 (where _v is my verb part of speech token) to recover string like: "He later apologized". This already in fact seems to be worki

RE: best way to iterate through all docs from a query

2009-11-19 Thread Uwe Schindler

Simply create an own Collector (or HitCollector for Lucene < 2.9). Nothing more to do. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: it99 [mailto:deswiatlow...@syrres.com] > Sent: Thursday, November 19

best way to iterate through all docs from a query

2009-11-19 Thread it99

What is the best way to iterate across all the documents in a search results? Previously I was using the deprecated Hits object but changed the implentations as recommended in javadocs to ScoreDoc. I've tried the following but I've seen warning about peformance. Seems the first time I query some

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson

Ahhh, I should have followed the link. I was interpreting your first note as emitting two tokens NOT at the same offset. My mistake, ignore my nonsense about unexpected consequences. Your original assumption is correct, zero offsets are pretty transparent. What do you really want to do here? Mark'

RE: Finding the highest term in a field

2009-11-19 Thread Uwe Schindler

Hi Daniel, hi Yonik, With NumericFields it would be possible to get faster to the really last position in the TermEnum. It would be possible to iterate first over the lowest precision terms until the end is reached. By that you know the prefix of the last term. You can then place the TermEnum on t

Re: Phrase query with terms at same location

2009-11-19 Thread Christopher Tignor

Thanks, Erick - Indeed every word will have a part of speech token but Is this how the slop actually works? My understanding was that if I have two tokens in the same location then each will not effect searches involving other in terms of the slop as slop indicates the number of words *between* s

Re: Finding the highest term in a field

2009-11-19 Thread Yonik Seeley

On Thu, Nov 19, 2009 at 1:04 AM, Daniel Noll wrote: > I take it the existing numeric fields can't already do stuff like > this? Nope, it's a fundamental limitation of the current TermEnums. -Yonik http://www.lucidimagination.com --

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson

If I'm reading this right, your tokenizer creates two tokens. One "report" and one "_n"... I suspect if so that this will create some "interesting" behaviors. For instance, if you put two tokens in place, are you going to double the slop when you don't care about part of speech? Is every word going

Re: adding documents while optimizing?

2009-11-19 Thread Michael McCandless

It's fine to do so, and you should see good concurrency (and if you don't, please report back!). But, note that the semantics of optimize() is that it only ensures those segments present when the call started are the ones that will be merged down to one. Any newly created segments (eg flushing th

adding documents while optimizing?

2009-11-19 Thread vsevel

Hi, is this a good idea/possible to continue writing events to an index while optimizing it, in 2 differents threads, in the same process, using the same writer? thanks, vince -- View this message in context: http://old.nabble.com/adding-documents-while-optimizing--tp26421269p26421269.html Sen

Re: ChainedFilter in Lucene 2.9

ChainedFilter in Lucene 2.9

RE: Keep URLs intact and not tokenized by the StandardTokenizer

SpanQuery for Terms at same position

Re: Keep URLs intact and not tokenized by the StandardTokenizer

RE: Keep URLs intact and not tokenized by the StandardTokenizer

Re: Phrase query with terms at same location

Re: best way to iterate through all docs from a query

Re: Phrase query with terms at same location

RE: best way to iterate through all docs from a query

best way to iterate through all docs from a query

Re: Phrase query with terms at same location

RE: Finding the highest term in a field

Re: Phrase query with terms at same location

Re: Finding the highest term in a field

Re: Phrase query with terms at same location

Re: adding documents while optimizing?

adding documents while optimizing?

18 matches

Site Navigation

Mail list logo

Footer information