adding documents while optimizing?

2009-11-19 Thread vsevel
Hi, is this a good idea/possible to continue writing events to an index while optimizing it, in 2 differents threads, in the same process, using the same writer? thanks, vince -- View this message in context: http://old.nabble.com/adding-documents-while-optimizing--tp26421269p26421269.html Sen

Re: adding documents while optimizing?

2009-11-19 Thread Michael McCandless
It's fine to do so, and you should see good concurrency (and if you don't, please report back!). But, note that the semantics of optimize() is that it only ensures those segments present when the call started are the ones that will be merged down to one. Any newly created segments (eg flushing th

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson
If I'm reading this right, your tokenizer creates two tokens. One "report" and one "_n"... I suspect if so that this will create some "interesting" behaviors. For instance, if you put two tokens in place, are you going to double the slop when you don't care about part of speech? Is every word going

Re: Finding the highest term in a field

2009-11-19 Thread Yonik Seeley
On Thu, Nov 19, 2009 at 1:04 AM, Daniel Noll wrote: > I take it the existing numeric fields can't already do stuff like > this? Nope, it's a fundamental limitation of the current TermEnums. -Yonik http://www.lucidimagination.com --

Re: Phrase query with terms at same location

2009-11-19 Thread Christopher Tignor
Thanks, Erick - Indeed every word will have a part of speech token but Is this how the slop actually works? My understanding was that if I have two tokens in the same location then each will not effect searches involving other in terms of the slop as slop indicates the number of words *between* s

RE: Finding the highest term in a field

2009-11-19 Thread Uwe Schindler
Hi Daniel, hi Yonik, With NumericFields it would be possible to get faster to the really last position in the TermEnum. It would be possible to iterate first over the lowest precision terms until the end is reached. By that you know the prefix of the last term. You can then place the TermEnum on t

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson
Ahhh, I should have followed the link. I was interpreting your first note as emitting two tokens NOT at the same offset. My mistake, ignore my nonsense about unexpected consequences. Your original assumption is correct, zero offsets are pretty transparent. What do you really want to do here? Mark'

best way to iterate through all docs from a query

2009-11-19 Thread it99
What is the best way to iterate across all the documents in a search results? Previously I was using the deprecated Hits object but changed the implentations as recommended in javadocs to ScoreDoc. I've tried the following but I've seen warning about peformance. Seems the first time I query some

RE: best way to iterate through all docs from a query

2009-11-19 Thread Uwe Schindler
Simply create an own Collector (or HitCollector for Lucene < 2.9). Nothing more to do. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: it99 [mailto:deswiatlow...@syrres.com] > Sent: Thursday, November 19

Re: Phrase query with terms at same location

2009-11-19 Thread Christopher Tignor
Thanks again for this. I would like to able to do several things with this data if possible. As per Mark's post, I'd like to be able to query for phrases like "He _v"~1 (where _v is my verb part of speech token) to recover string like: "He later apologized". This already in fact seems to be worki

Re: best way to iterate through all docs from a query

2009-11-19 Thread Ian Lea
First queries are often slow and subsequent ones faster. Search the list for warming - I think there was something on it in the last couple of days. Or read the "When measuring performance, disregard the first query" bit of http://wiki.apache.org/lucene-java/ImproveSearchingSpeed A good number t

Re: Phrase query with terms at same location

2009-11-19 Thread Erick Erickson
Hmmm, you're beyond what I've tried to do, so all I can do is speculate. But I don't believe that two terms on top of each other are considered when calculating slop. But I really don't know for sure, so I'd create a couple of unit tests to verify. You're right, the combinatorial explosion with pu

RE: Keep URLs intact and not tokenized by the StandardTokenizer

2009-11-19 Thread Steven A Rowe
Hi Sudha, In the past, I've built regexes to recognize URLs using the information here: http://www.foad.org/~abigail/Perl/url2.html The above, however, is currently a dead link. Here's the Internet Archive's WayBack Machine's cache of this page from August 2007:

Re: Keep URLs intact and not tokenized by the StandardTokenizer

2009-11-19 Thread Sudha Verma
Thanks. I was hoping Lucene would already have a solution for this since it seems like it would be a common problem. I am new to the lucene API. If I were to implement something from scratch, are my options to extend the Tokenizer to support http regex and then pass the text to StandardTokenizer.

SpanQuery for Terms at same position

2009-11-19 Thread Christopher Tignor
Hello, I would like to search for all documents that contain both "plan" and "_v" (my part of speech token for verb) at the same position. I have tokenized the documents accordingly so these tokens exists at the same location. I can achieve programaticaly using PhraseQueries by adding the Terms e

RE: Keep URLs intact and not tokenized by the StandardTokenizer

2009-11-19 Thread Delbru, Renaud
Hi, Some time ago, I had to modify and extend the Lucene StandardTokenizer grammar (flex file) so that it preserves URIs (based on RFC3986). I have extracted the files from my project and published the source code on github [1] under the Apache License 2.0, if it can help. [1] http://github.co

ChainedFilter in Lucene 2.9

2009-11-19 Thread Michel Nadeau
Hi ! Can someone tell me what is replacing ChainedFilter in Lucene 2.9? I used to do it like this - h = searcher.search(q, cluCF, cluSort); Where cluCF is a ChainedFilter declared like this - Filter cluCF = new ChainedFilter(cluFilters, ChainedFilter.AND); cluFilters is a Filter[] containing

Re: ChainedFilter in Lucene 2.9

2009-11-19 Thread Robert Muir
Hi, you can find this in 'lucene-misc' contrib jar file http://lucene.apache.org/java/2_9_1/api/contrib-misc/org/apache/lucene/misc/ChainedFilter.html On Thu, Nov 19, 2009 at 11:27 PM, Michel Nadeau wrote: > Hi ! > > Can someone tell me what is replacing ChainedFilter in Lucene 2.9? > > I used