Re: Finding cityfuzzily but most accurate is most relevant

2012-01-21 Thread Rene Hackl-Sommer
Hi Marc, So, I guess I'm looking for a near phrase query with wild card. Any suggestions on this? Do you rely on the Lucene Query Syntax or can you build queries via the API? If the latter, take a look at SpanNearQuery and SpanRegexQuery. Here's an article that can get you started on

Re: Time detection

2011-11-01 Thread Rene Hackl-Sommer
Hi, I am not aware of an existing analyzer that's doing this. There is the UIMA based tool Heideltime that is doing multilingual extraction of temporal expressions (http://dbs.ifi.uni-heidelberg.de/index.php?id=129). It might get you started. Cheers, Rene Am 01.11.2011 11:58, schrieb

Re: reusing the term-frequency count while indexing

2011-10-25 Thread Rene Hackl-Sommer
Use term boosts? solr^3 rocks^2 apache http://lucene.apache.org/java/3_4_0/queryparsersyntax.html#Boosting%20a%20Term Am 25.10.2011 11:19, schrieb prasenjit mukherjee: During search time I get the following input ( only for 1 field ) = solr:3 rocks:2 apache:1 . For this I have to create the

Re: AutomatonQuery Caching

2011-07-12 Thread Rene Hackl-Sommer
This sounds plausible, even if manually cleaning the Java cache has no effect. Probably a JDK/JRE mismatch somewhere, just have to find the spot. Thanks, Rene Am 12.07.2011 19:22, schrieb Robert Muir: On Tue, Jul 12, 2011 at 10:42 AM, René Hacklrene.a.ha...@gmx.de wrote: Hi, I am running

Lucene 3.3: Self referring deprecation use insteads in LowerCaseTokenizer

2011-07-05 Thread Rene Hackl-Sommer
Hi, just noted that the deprecation use ... insteads in LowerCaseTokenizer (Lucene 3.3) refer to themselves instead of the new constructors with (Version...). E.g. *@deprecated*use {@link #LowerCaseTokenizer(Reader)}instead. should be #LowerCaseTokenizer(Version, Reader). Same for the two

Re: Indexing lists of IDs

2010-04-14 Thread Rene Hackl-Sommer
Hi Kristjan, which Tokenizer and Filters are you using for the ID field? Rene Am 14.04.2010 21:15, schrieb Kristjan Siimson: Hello, I have document for which I'd like to index an array of indexes. For example, there is a product that belongs to categories with IDs 12, 15, 16, 145, 148. I'd

Re: Increase number of available positions?

2010-03-18 Thread Rene Hackl-Sommer
Hi Steve, I'm not sure what's wrong with the above (have you tried each of the two nested SpanNot clauses independently?), but here's another thing to try: Your query works. And as turns out, if I don't commit the same embarrassing lower case / upper case inconsistency over and over

Re: Increase number of available positions?

2010-03-17 Thread Rene Hackl-Sommer
Hi, I was looking at SpanNotQuery to see if I could make do without the position increment gaps. A search requirement that's causing me some trouble to implement is when two terms are supposed to be on the same L_2, yet on different L_3's (L_3's are hierarchically below L_2). With the

Re: Increase number of available positions?

2010-03-16 Thread Rene Hackl-Sommer
Hi Guys, Thanks for the input! I am now going to put in some work to see how things fare. Should I post the question about substituting int with long on lucene-dev again, if need arises? Thanks again, Rene Am 15.03.2010 23:04, schrieb Steven A Rowe: Hi Rene, Have you seen

Re: Deleting documents without deleting them

2010-03-16 Thread Rene Hackl-Sommer
Hi Daniel, Unless you have only a few documents and a small index, I don't think never calling optimize is going to be a means you should rely upon. What about if you reindexed the documents you are deleting, adding a field excludeFromSearch with the value true? This would imply that either

Re: Deleting documents without deleting them

2010-03-16 Thread Rene Hackl-Sommer
I cannot comment on the marked-as-deleted documents, but for the approach I outlined: this might impact the scores. I prefer to say 'impact' instead of 'skew', because to me 'skew' would imply that the original scores are some kind of ideal state which is distorted. I don't think this is

Increase number of available positions?

2010-03-15 Thread Rene Hackl-Sommer
Hello, I am working at a use case that is very demanding regarding the number of token positions. For one special field in the index, I need to represent different hierarchy levels, like this: MyField Level_1 Level_2 Level_3 Please note that I need to do this with Lucene, not a XML search

Re: Increase number of available positions?

2010-03-15 Thread Rene Hackl-Sommer
Is your entire corpus a single document? Because I'm having trouble imagining a single document where this would be a problem, unless your increment gap is huge. The term positions are relative to a single document... It is getting pretty huge, yes (see below). The term positions are also

Re: Increase number of available positions?

2010-03-15 Thread Rene Hackl-Sommer
could say Level_2 has to be 65, but I don't now that beforehand of course. Or am I overlooking something here? On 03/15/2010 at 9:59 AM, Rene Hackl-Sommer wrote: Search in MyField: Terms T1 and T2 on Level_2 and T3, T4, and T5 on Level_3, which should both be in the same Level_1. I

Re: Increase number of available positions?

2010-03-15 Thread Rene Hackl-Sommer
Hi Erick, What about indexing the triplets with a small increment gap between? That is: ... gets indexed as: level1-1/level2-1/level3-1 +gap 100 level1-1/level2-1/level3-2 +gap 100 level1-1/level2-2/level3-3 +gap 100 level1-1/level2-2/level3-4 If I understand this correctly, the field

SpanQueries in Luke

2010-03-04 Thread Rene Hackl-Sommer
Hi, I would like to submit SpanQueries in Luke. AFAIK this isn't doable out of the box. What would be the way to go? Replace the built-in QueryParser by e.g. the xml-query-parser from the contrib section? Thanks, Rene -

Re: SpanQueries in Luke

2010-03-04 Thread Rene Hackl-Sommer
Hi Andrzej, Thanks! I'll keep my eyes open for that. FWIW, implementing this by replacing the QueryParser with the CoreParser worked fine. Thanks again, Rene Am 04.03.2010 16:22, schrieb Andrzej Bialecki: On 2010-03-04 14:13, Rene Hackl-Sommer wrote: Hi, I would like to submit