Re: query.extractTerms(..) on rewritten queries

2014-10-07 Thread Christian Reuschling
terms. > > Uwe > > - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de eMail: > u...@thetaphi.de > >> -Original Message- From: Christian Reuschling >> [mailto:reuschl...@dfki.uni-kl.de] >> Sent: Monday,

query.extractTerms(..) on rewritten queries

2014-10-06 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, currently I migrate to Lucene 4. In the past, I did a trick to get the index specific terms for an according (wildcard) query (see below). But it don't works anymore: String queryString = "n*"; // gives no result // String queryString = "nöä"; /

BooleanWeight.scorer() gives a TermScorer

2014-08-07 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I try to get the scorer for a result document, for further computation. List leafContexts = indexReader.leaves(); int n = ReaderUtil.subIndex(scoreDoc.doc, leafContexts); AtomicReaderContext ctx = leafContexts.get(n); Scorer scorer = weight.sc

Re: Migration Lucene 3=>4: IndexSearcher.setDefaultFieldSortScoring(..)

2014-07-22 Thread Christian Reuschling
nt,%20org.apache.lucene.search.Sort,%20boolean,%20boolean)> > > Steve > > On Jul 18, 2014, at 10:17 AM, Christian Reuschling > wrote: > >> We currently migrate one project to Lucene 4 and noticed that the method >> IndexSearcher.setDefaultFieldSortScoring(..) disappeare

Migration Lucene 3=>4: IndexSearcher.setDefaultFieldSortScoring(..)

2014-07-18 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We currently migrate one project to Lucene 4 and noticed that the method IndexSearcher.setDefaultFieldSortScoring(..) disappeared in Lucene 4.0. We can't find something about this in the migration guide. Further, it was never deprecated in Lucene 3,

searching multiple remote indices

2014-06-18 Thread Christian Reuschling
e an exotic case. Or is it? Thanks from the whole DFKI Lucene crew! Christian - -- __ Christian Reuschling, Dipl.-Ing.(BA) Software Engineer Knowledge Management Department German Research Center for Artif

transparently access a remote index: new alternative to old RemoteSearchable / Searcher interfaces

2014-06-04 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I remember that there was a general Searcher interface, with the standard IndexSearcher as subclass, plus some subclass that enabled RMI-based remote access to an index. In the case you used Searcher in your codebase, the code was independent from ac

create a Filter/DocIdSet from a number of documents

2014-03-12 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have a small set of document numbers as a query result collected with some non-scoring collector. Now, I want to send high-performant successive queries only in this document number scope, as part of a customized Similarity implementation (modifie

tf/idf similarity with modified document similarity

2014-03-06 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, what is the best method to score documents similar to default similarity, but the document frequency should be calculated per query against the matching result document set, not statically against the whole corpus. Didn't found a good and pe

Re: FuzzySuggester EXACT_FIRST criteria

2013-11-20 Thread Christian Reuschling
e end (as we say in Germany ;) ). Don't know how to proceed further, as the deeper code starts to become very complex. Thanks a lot! Christian Reuschling On 15.11.2013 18:49, Michael McCandless wrote: > Hmm, I'm not sure offhand why that change gives you no results. >

Re: FuzzySuggester EXACT_FIRST criteria

2013-11-14 Thread Christian Reuschling
ichael McCandless wrote: > On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling > wrote: >> We started to implement a named entity recognition on the base of >> AnalyzingSuggester, which >> offers the great support for Synonyms, Stopwords, etc. For

FuzzySuggester EXACT_FIRST criteria

2013-11-13 Thread Christian Reuschling
We started to implement a named entity recognition on the base of AnalyzingSuggester, which offers the great support for Synonyms, Stopwords, etc. For this, we slightly modified AnalyzingSuggester.lookup() to only return the exactFirst hits (considering the exactFirst code block only, skipping th

Re: Empty numeric field

2012-02-15 Thread Christian Reuschling
e fields have no "equal length" or >> something like that, especially numeric fields are tokenized and contain of >> several tokens separately indexed. So what do you mean with equal length? >> Why must this "length" be identical? >> > >> > The o

Re: Empty numeric field

2012-02-15 Thread Christian Reuschling
" be identical? > > The only suggestion is to index a "fake" placeholder value (like -1, > infinity, NaN). If you only need it in the "stored" fields, just store it but > don't index it. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63,

Re: Numeric field min max values

2011-11-08 Thread Christian Reuschling
y lower-precision terms used by NumericField to allow fast >> NumericRangeQuery. You have to filter those values by looking at the first >> few bits, which contains the precision. >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://ww

Re: Numeric field min max values

2011-11-07 Thread Christian Reuschling
value (what you are seeing, I presume) to int or long or whatever. >> Maybe that will help. >> >> >> -- >> Ian. >> >> >> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling >> wrote: >>> Hi, >>> >>> maybe it is an easy

Re: Numeric field min max values

2011-11-03 Thread Christian Reuschling
. > Maybe that will help. > > > -- > Ian. > > > On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling > wrote: >> Hi, >> >> maybe it is an easy question - I searched over the lucene-user >> archive, but sadly didn't found an answer :( >> &g

Numeric field min max values

2011-11-02 Thread Christian Reuschling
Hi, maybe it is an easy question - I searched over the lucene-user archive, but sadly didn't found an answer :( I currently change our field logic from string- to numeric fields. Until now, I managed to find the min-max values of a field by iterating over the field with a TermEnum (termEnum = rea

Filter for searching in result lists with 2.9

2009-10-16 Thread Christian Reuschling
Hi guys, in our app we gives the possibility to search inside a set of documents, which is the result list of a former search. Thus, someone can shrink down a search according different criterias. For this, we implemented a simple Filter that simply gets a TopDocs Object and creates a bitSet out

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Christian Reuschling
Hello Michael, I also would prefer B - it also shortens the time to have a benefit of new Lucene features in our applications. It forces our lazy programmers (I am of course ;) ) to deal with them - and reduces the efford to change to a major release afterwards. Maybe some minimum time waiting bef

How to sort and get document scores afterwards

2009-10-15 Thread Christian Reuschling
Hi, our application enables sorting the result lists according to field values, currently all represented as Strings (we plan to also migrate to the new numeric type capabilities of Lucene 2.9 at a later time) For this, the documents will be sorted e.g. according to the author, which works fine w

Re: Reverse stemmer?

2009-10-08 Thread Christian Reuschling
Hi, looking up the different terms with a common stem can be useful in different scenarios - so I don't want to judge it whether someone needs it or not. E.g., in the case you have multilingual documents in your index, it is straight forward to determine the language of the documents in order to

Re: Search By Phrase Not Working

2009-10-08 Thread Christian Reuschling
Hi, I had similar behaviour. On an self-build index on german wikipedia I searched for the phrase "blaue blume". I've got 2 results. When I searched for +"blaue blume" "vogel" I've got 59 results...strange. I found out that when I create a plain BooleanQuery with just the phrase "blaue blume" give

Re: How to normalize Lucene score?

2009-08-17 Thread Christian Reuschling
Hi Prashant, we let convergate the scores to 1 - whereby they will never reach one, to have also correct ratings with respect to higher Lucene scores which are more or less open-ended: normalizedScore = 1 - [ 1 / (1+luceneScore) ] best Christian On Sun, 16 Aug 2009 19:04:44 +0530 prashant ul

Re: ParallelMultiSearcher and idf

2009-08-04 Thread Christian Reuschling
, NLP, NER, IR > > > > - Original Message > > From: Christian Reuschling > > To: java-user@lucene.apache.org > > Sent: Tuesday, August 4, 2009 5:50:16 AM > > Subject: ParallelMultiSearcher and idf > > > > Hello, > > > > when se

ParallelMultiSearcher and idf

2009-08-04 Thread Christian Reuschling
Hello, when searching over multiple indices, we create one IndexReader for each index, and wrap them into a MultiReader, that we use for IndexSearcher creation. This is fine for searching multiple indices on one machine, but in the case the indices are distributed over the (intra)net, this scenar

Determining index term count

2009-01-07 Thread Christian Reuschling
Is there a fast way to determine the total number of terms inside an index? Currently I only found the way to walk through the TermEnumeration, i.e. TermEnum termEnum4TermCount = reader.terms(); int iTermCount = 0; while (termEnum4TermCount.next()) iTermCount++; termEnum4TermCount.close();

Re: 1:n queries again

2008-11-13 Thread Christian Reuschling
t; > This is correct if I'm reading it right. Perhaps what's needed here > is a statement of the problem you're trying to solve, because I'm > having trouble understanding the underlying use-cases.. > > Best > Erick > > > On Wed, Nov 12, 2008 at 10:

Re: 1:n queries again

2008-11-12 Thread Christian Reuschling
t. > > Of course I may have completely mis-read your problem, but I'm sure you'll > let us know if that's the case . > > > BTW, if this isn't a typo, you probably need SpanQuery since you can > specify order not being important: > attName:"st

Re: 1:n queries again

2008-11-12 Thread Christian Reuschling
term2 term3 term4" For the 1:n behaviour, you need some kind of logical 'grouping' of one dataset. whereby a query 'term1 term4' should NOT match, 'term1 term2' must match. Stefan Trcek schrieb: > On Wednesday 12 November 2008 14:58:53 Christian Reuschling

1:n queries again

2008-11-12 Thread Christian Reuschling
would be a standard BooleanQuery, but only applied inside the range of the delimiters. Is this somehow possible, or do I have to write my own Query implementation - and what would be the best way in this case. Thanks in advance Christian Reuschling signature.asc Description: OpenPGP digital signature

term offsets wrong depending on analyzer

2008-11-07 Thread Christian Reuschling
p a little, greetings Christian Reuschling package org.dynaq; import org.apache.lucene.analysis.KeywordAnalyzer; import org.apache.lucene.analysis.PerFieldAnalyzerWrapper; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Docume

which version of lucene do you recommend

2008-09-09 Thread Christian Reuschling
in the past, I made really good experiences with the svn versions of lucene - I never had problems, and everything feeled stable. Currently, I get unexpected exceptions from time to time: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _3g6n.fdx

yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Christian Reuschling
Hello people, I'm sorry if I have send this message twice - my gmail interface merges the mails in the 'send' folder with incoming mails from my adress - strange, but I can't say if the mail was sent - I only see it in the send-folder (with only one label on it, which brings me to send it again

yet again: getting the minimum and maximum value of a field

2008-06-25 Thread Christian Reuschling
Hello people, yes, there were several threads about this topic, but I sadly have to respawn it, I'm sorry. The first I found was a discussion from May 2005: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL PROTECTED] There the final solution suggestion from Hoss wa

Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to waste

Refreshing IndexReaders for our desktop searching app

2008-05-28 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to waste

Refreshing IndexReaders for our desktop searching app

2008-05-27 Thread Christian Reuschling
Hello out there, We have implemented some open source desktop searching app based on Lucene http://sourceforge.net/projects/dynaq Development always goes further, and currently we make experiments with the file-lock based writer (/reader) synchronization capabilities of Lucene, in order to waste

Re: SoundEx

2006-01-18 Thread Christian Reuschling
yes, look at the 'contributions' link at the lucene-homepage. The 'Phonetix'-project provides an implementation for soudex, metaphor and double-metaphor. Simply use their analyzer. I am not sure what the behaviour is in the case of wildcards. Have anyone an answer? regards Christian Steven Pan