terms.
>
> Uwe
>
> - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de eMail:
> u...@thetaphi.de
>
>> -Original Message- From: Christian Reuschling
>> [mailto:reuschl...@dfki.uni-kl.de]
>> Sent: Monday,
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
currently I migrate to Lucene 4. In the past, I did a trick to get the index
specific terms for an
according (wildcard) query (see below). But it don't works anymore:
String queryString = "n*"; // gives no result
// String queryString = "nöä"; /
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello,
I try to get the scorer for a result document, for further computation.
List leafContexts = indexReader.leaves();
int n = ReaderUtil.subIndex(scoreDoc.doc, leafContexts);
AtomicReaderContext ctx = leafContexts.get(n);
Scorer scorer = weight.sc
nt,%20org.apache.lucene.search.Sort,%20boolean,%20boolean)>
>
> Steve
>
> On Jul 18, 2014, at 10:17 AM, Christian Reuschling
> wrote:
>
>> We currently migrate one project to Lucene 4 and noticed that the method
>> IndexSearcher.setDefaultFieldSortScoring(..) disappeare
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
We currently migrate one project to Lucene 4 and noticed that the method
IndexSearcher.setDefaultFieldSortScoring(..) disappeared in Lucene 4.0. We
can't find something
about this in the migration guide. Further, it was never deprecated in Lucene
3,
e
an exotic case. Or
is it?
Thanks from the whole DFKI Lucene crew!
Christian
- --
__
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer
Knowledge Management Department
German Research Center for Artif
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I remember that there was a general Searcher interface, with the standard
IndexSearcher as
subclass, plus some subclass that enabled RMI-based remote access to an index.
In the case you used Searcher in your codebase, the code was independent from
ac
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I have a small set of document numbers as a query result collected with some
non-scoring collector.
Now, I want to send high-performant successive queries only in this document
number scope, as part
of a customized Similarity implementation (modifie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello,
what is the best method to score documents similar to default similarity, but
the document
frequency should be calculated per query against the matching result document
set, not statically
against the whole corpus.
Didn't found a good and pe
e end (as we say in Germany ;) ). Don't know how to
proceed further, as the
deeper code starts to become very complex.
Thanks a lot!
Christian Reuschling
On 15.11.2013 18:49, Michael McCandless wrote:
> Hmm, I'm not sure offhand why that change gives you no results.
>
ichael McCandless wrote:
> On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling
> wrote:
>> We started to implement a named entity recognition on the base of
>> AnalyzingSuggester, which
>> offers the great support for Synonyms, Stopwords, etc. For
We started to implement a named entity recognition on the base of
AnalyzingSuggester, which offers
the great support for Synonyms, Stopwords, etc.
For this, we slightly modified AnalyzingSuggester.lookup() to only return the
exactFirst hits
(considering the exactFirst code block only, skipping th
e fields have no "equal length" or
>> something like that, especially numeric fields are tokenized and contain of
>> several tokens separately indexed. So what do you mean with equal length?
>> Why must this "length" be identical?
>> >
>> > The o
" be identical?
>
> The only suggestion is to index a "fake" placeholder value (like -1,
> infinity, NaN). If you only need it in the "stored" fields, just store it but
> don't index it.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63,
y lower-precision terms used by NumericField to allow fast
>> NumericRangeQuery. You have to filter those values by looking at the first
>> few bits, which contains the precision.
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://ww
value (what you are seeing, I presume) to int or long or whatever.
>> Maybe that will help.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
>> wrote:
>>> Hi,
>>>
>>> maybe it is an easy
.
> Maybe that will help.
>
>
> --
> Ian.
>
>
> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
> wrote:
>> Hi,
>>
>> maybe it is an easy question - I searched over the lucene-user
>> archive, but sadly didn't found an answer :(
>>
&g
Hi,
maybe it is an easy question - I searched over the lucene-user
archive, but sadly didn't found an answer :(
I currently change our field logic from string- to numeric fields.
Until now, I managed to find the min-max values of a field by
iterating over the field with a TermEnum
(termEnum = rea
Hi guys,
in our app we gives the possibility to search inside a set of documents, which
is the result list of a former search. Thus, someone can shrink down a search
according different criterias.
For this, we implemented a simple Filter that simply gets a TopDocs Object and
creates a bitSet out
Hello Michael,
I also would prefer B - it also shortens the time to have a benefit of new
Lucene features in our applications.
It forces our lazy programmers (I am of course ;) ) to deal with them - and
reduces the efford to change to a major release afterwards.
Maybe some minimum time waiting bef
Hi,
our application enables sorting the result lists according to field values,
currently all represented as Strings (we plan to also migrate to the new
numeric type capabilities of Lucene 2.9 at a later time)
For this, the documents will be sorted e.g. according to the author, which
works fine w
Hi,
looking up the different terms with a common stem can be useful in different
scenarios - so I don't want to judge it whether someone needs it or not.
E.g., in the case you have multilingual documents in your index, it is straight
forward to determine the language of the documents in order to
Hi,
I had similar behaviour. On an self-build index on german wikipedia I searched
for the phrase "blaue blume". I've got 2 results. When I searched for +"blaue
blume" "vogel" I've got 59 results...strange.
I found out that when I create a plain BooleanQuery with just the phrase "blaue
blume" give
Hi Prashant,
we let convergate the scores to 1 - whereby they will never reach one, to have
also correct ratings with respect to higher Lucene scores which are more
or less open-ended:
normalizedScore = 1 - [ 1 / (1+luceneScore) ]
best
Christian
On Sun, 16 Aug 2009 19:04:44 +0530
prashant ul
, NLP, NER, IR
>
>
>
> - Original Message
> > From: Christian Reuschling
> > To: java-user@lucene.apache.org
> > Sent: Tuesday, August 4, 2009 5:50:16 AM
> > Subject: ParallelMultiSearcher and idf
> >
> > Hello,
> >
> > when se
Hello,
when searching over multiple indices, we create one IndexReader for each index,
and wrap them into a MultiReader, that we use for IndexSearcher creation.
This is fine for searching multiple indices on one machine, but in the case the
indices are distributed over the (intra)net, this scenar
Is there a fast way to determine the total number of terms inside an index?
Currently I only found the way to walk through the TermEnumeration, i.e.
TermEnum termEnum4TermCount = reader.terms();
int iTermCount = 0;
while (termEnum4TermCount.next())
iTermCount++;
termEnum4TermCount.close();
t;
> This is correct if I'm reading it right. Perhaps what's needed here
> is a statement of the problem you're trying to solve, because I'm
> having trouble understanding the underlying use-cases..
>
> Best
> Erick
>
>
> On Wed, Nov 12, 2008 at 10:
t.
>
> Of course I may have completely mis-read your problem, but I'm sure you'll
> let us know if that's the case .
>
>
> BTW, if this isn't a typo, you probably need SpanQuery since you can
> specify order not being important:
> attName:"st
term2 term3 term4"
For the 1:n behaviour, you need some kind of logical 'grouping' of one
dataset.
whereby a query 'term1 term4' should NOT match, 'term1 term2' must match.
Stefan Trcek schrieb:
> On Wednesday 12 November 2008 14:58:53 Christian Reuschling
would be a standard BooleanQuery, but only
applied inside the range of the delimiters. Is this somehow possible, or do I
have to write my own Query implementation - and what would be the best way in
this case.
Thanks in advance
Christian Reuschling
signature.asc
Description: OpenPGP digital signature
p a little, greetings
Christian Reuschling
package org.dynaq;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Docume
in the past, I made really good experiences with the svn versions of lucene -
I never had problems, and everything feeled stable.
Currently, I get unexpected exceptions from time to time:
java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length
in bytes of _3g6n.fdx
Hello people,
I'm sorry if I have send this message twice - my gmail interface merges the
mails in the 'send' folder with incoming mails from my adress - strange, but
I can't say if the mail was sent - I only see it in the send-folder (with
only one label on it, which brings me to send it again
Hello people,
yes, there were several threads about this topic, but I sadly have to respawn
it, I'm sorry.
The first I found was a discussion from May 2005:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL
PROTECTED]
There the final solution suggestion from Hoss wa
Hello out there,
We have implemented some open source desktop searching app based on Lucene
http://sourceforge.net/projects/dynaq
Development always goes further, and currently we make experiments with the
file-lock based writer (/reader) synchronization capabilities of Lucene, in
order to waste
Hello out there,
We have implemented some open source desktop searching app based on Lucene
http://sourceforge.net/projects/dynaq
Development always goes further, and currently we make experiments with the
file-lock based writer (/reader) synchronization capabilities of Lucene, in
order to waste
Hello out there,
We have implemented some open source desktop searching app based on Lucene
http://sourceforge.net/projects/dynaq
Development always goes further, and currently we make experiments with the
file-lock based writer (/reader) synchronization capabilities of Lucene, in
order to waste
yes, look at the 'contributions' link at the lucene-homepage.
The 'Phonetix'-project provides an implementation for soudex,
metaphor and double-metaphor. Simply use their analyzer. I am
not sure what the behaviour is in the case of wildcards. Have
anyone an answer?
regards
Christian
Steven Pan
39 matches
Mail list logo