Thanks. My mapping is:
cm,cmoen,Christian Moen
On Sun, Jul 31, 2022 at 12:08 PM Michael McCandless <
luc...@mikemccandless.com> wrote:
> Hello Lucene users, contributors and developers,
>
> If you have used Lucene's Jira and you have a GitHub account as well,
> please
rcaseFilter before your SynonymFilter, which means that the
entities in your SynonymMap need to be all lowercase or they won’t be matched.
Alan Woodward
www.flax.co.uk
> On 25 Jul 2017, at 07:52, Christian Kaufhold
> wrote:
>
> Hi,
>
> I am not able to add synonyms to the lu
aderWrapper.wrap(reader).terms("content");
TermsEnum iterator =terms.iterator(TermsEnum.EMPTY);
BytesRef byteRef;
while ( (byteRef = iterator.next())!=null){
String term = byteRef.utf8ToString();
System.out.println("
terms.
>
> Uwe
>
> - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de eMail:
> u...@thetaphi.de
>
>> -Original Message- From: Christian Reuschling
>> [mailto:reuschl...@dfki.uni-kl.de]
>> Sent: Monday,
:
Searcher.rewrite:
/**Expert: called to re-write queries into primitive queries.**/
Query.extractTerms:
/**Expert: adds all terms occurring in this query to the terms set. Only works
if this query is in
its {@link #rewrite rewritten} form.**/
Thanks in advance!
Christian
-BEGIN PGP S
very similar to
CachingTokenFilter.
On the first call to incrementToken() it builds a cache and goes through al
tokens for the first pass.
The following calls to incrementToken() build the second pass.
In the second pass I can use information collected in the first pass.
Christian
2014-08-24 13:50 G
n use this information to filter the tokens.
Or is there a better solution to do this?
Thanks,
Christian
it is the result doc with the smallest doc number, all
other result
documents are from a different subIndex (n).
For DisjunctionSumScorer, my code works just fine. What is the reason that a
BooleanWeight can
return a TermScorer? And can I force the weight to do not?
best regards
Christian
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
perfect - thanks a lot Steve
On 18.07.2014 16:46, Steve Rowe wrote:
> Hi Christian,
>
> I found an entry about this in the 4.0-ALPHA ?Changes in backwards
> compatibility policy?
> section of Lucene?s CHANGES.txt (html ve
we achieve
this now?
best
Christian
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlPJLJEACgkQ6EqMXq+WZg+V5QCguuA+jsI1aL+xuAcjF5f7+SvH
uCwAnisNQtGlQ8H3TMNsj77IM91+NtUR
=GhP4
-EN
e
an exotic case. Or
is it?
Thanks from the whole DFKI Lucene crew!
Christian
- --
__
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer
Knowledge Management Department
German Research Center for Artif
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I remember that there was a general Searcher interface, with the standard
IndexSearcher as
subclass, plus some subclass that enabled RMI-based remote access to an index.
In the case you used Searcher in your codebase, the code was independent from
ac
out of my
simple document number list. I know it sounds trivial - what is it I can't see?
:)
Thanks so much!
Christian
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlM
d and performant solution yet.
Thank you!
Christian
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-EN
queezing work done by
Robert and Uwe.
Best,
Christian
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
saying regarding unk.def -- it’s to my
knowledge used as-is from the above sources when the binary .dat files are
made. (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)
Perhaps I’m missing something. Could you clarify how you think things should
be done?
Many thanks,
Christian
e end (as we say in Germany ;) ). Don't know how to
proceed further, as the
deeper code starts to become very complex.
Thanks a lot!
Christian Reuschling
On 15.11.2013 18:49, Michael McCandless wrote:
> Hmm, I'm not sure offhand why that change gives you no results.
>
ichael McCandless wrote:
> On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling
> wrote:
>> We started to implement a named entity recognition on the base of
>> AnalyzingSuggester, which
>> offers the great support for Synonyms, Stopwords, etc. For
We started to implement a named entity recognition on the base of
AnalyzingSuggester, which offers
the great support for Synonyms, Stopwords, etc.
For this, we slightly modified AnalyzingSuggester.lookup() to only return the
exactFirst hits
(considering the exactFirst code block only, skipping th
e fields have no "equal length" or
>> something like that, especially numeric fields are tokenized and contain of
>> several tokens separately indexed. So what do you mean with equal length?
>> Why must this "length" be identical?
>> >
>> > The o
" be identical?
>
> The only suggestion is to index a "fake" placeholder value (like -1,
> infinity, NaN). If you only need it in the "stored" fields, just store it but
> don't index it.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63,
y lower-precision terms used by NumericField to allow fast
>> NumericRangeQuery. You have to filter those values by looking at the first
>> few bits, which contains the precision.
>>
>> -
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://ww
- but all documents have a correct timestamp.
I also recognized that Luke shows the same values, even in the case
the correct decoder is selected. Luke also gives the opportunity to
'browse term docs', and says that every document is a '0' - term
document.
Has anyone a idea?
be
.
> Maybe that will help.
>
>
> --
> Ian.
>
>
> On Wed, Nov 2, 2011 at 7:19 PM, Christian Reuschling
> wrote:
>> Hi,
>>
>> maybe it is an easy question - I searched over the lucene-user
>> archive, but sadly didn't found an answer :(
>>
&g
Hi,
maybe it is an easy question - I searched over the lucene-user
archive, but sadly didn't found an answer :(
I currently change our field logic from string- to numeric fields.
Until now, I managed to find the min-max values of a field by
iterating over the field with a TermEnum
(termEnum = rea
g something or is there a different approach for finding
keywords and keywordPhrases in a text?
Thanks for your help!
Christian
--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-and-searching-phrases-tp1084545p1084545.html
Sent from the Lucene
, by recognizing the
reader-specific docbase given in the method
'public void setNextReader(IndexReader reader, int docBase)'
But inside Filter, I don't have such a nice method. What is the trick?
Thanks for all potential hints
Christian
signature.asc
Description: PGP signature
Hello Michael,
I also would prefer B - it also shortens the time to have a benefit of new
Lucene features in our applications.
It forces our lazy programmers (I am of course ;) ) to deal with them - and
reduces the efford to change to a major release afterwards.
Maybe some minimum time waiting bef
Hi,
our application enables sorting the result lists according to field values,
currently all represented as Strings (we plan to also migrate to the new
numeric type capabilities of Lucene 2.9 at a later time)
For this, the documents will be sorted e.g. according to the author, which
works fine w
Hi,
looking up the different terms with a common stem can be useful in different
scenarios - so I don't want to judge it whether someone needs it or not.
E.g., in the case you have multilingual documents in your index, it is straight
forward to determine the language of the documents in order to
Hi,
I had similar behaviour. On an self-build index on german wikipedia I searched
for the phrase "blaue blume". I've got 2 results. When I searched for +"blaue
blume" "vogel" I've got 59 results...strange.
I found out that when I create a plain BooleanQuery with just the phrase "blaue
blume" give
Anshum,
> You could get the hits in a collector and pass the sort to the
> collector as it would be the collect function that handles the
> sorting.
>
> searcherObject.search(query,collector);
>
> Hope that gives you some headway. :)
Not quite (yet?) ;-)
What do you mean by passing the Sort t
Uwe,
> You are using TopDocs incorrectly. Normally you use *not* Integer.MAX_VALUE,
> as the upper bound of your pagination window as numer of documents. So if
> user wants to display documents 90 to 100, just set the number to 100 docs.
> If the user then goes to docs 100 to 110, just reexecute t
Hello everybody,
I'm looking at quite an interesting challenge right now, so I
hope that somebody out there will be able to assist me.
What I'm trying to do is returning search results both sorted and
paginated. So far I haven't been able to come up with a working solution.
Pagination without so
e
covers when it sees Field.Store.NO & Field.Index.NOT_ANALYZED. We will have
millions of entries.
Thanks,
Christian
e
covers when it sees Field.Store.NO & Field.Index.NOT_ANALYZED. We will have
millions of entries.
Thanks,
Christian
Hi Prashant,
we let convergate the scores to 1 - whereby they will never reach one, to have
also correct ratings with respect to higher Lucene scores which are more
or less open-ended:
normalizedScore = 1 - [ 1 / (1+luceneScore) ]
best
Christian
On Sun, 16 Aug 2009 19:04:44 +0530
prashant
turns out the index is being built with lower-case terms which is why we
aren't getting hits the way we expect. When I change my search terms to
lower I see more of what I expect.
Gonna keep working on this and post updates.
On Wed, Aug 12, 2009 at 12:46 PM, Christian Bongiorno <
boostNumber);
}
booleanQuery.add(termQuery, BooleanClause.Occur.SHOULD);
}
}
LOG.warn("Boolean query: " + booleanQuery.toString());
return booleanQuery;
}
return null;
}
--
Christian Bongiorno
Gospodnetic wrote:
> Hi Christian,
>
> You didn't mention Solr, so I'm not sure if you are aware of it. Maybe Solr
> meets your needs?
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA
Hello,
when searching over multiple indices, we create one IndexReader for each index,
and wrap them into a MultiReader, that we use for IndexSearcher creation.
This is fine for searching multiple indices on one machine, but in the case the
indices are distributed over the (intra)net, this scenar
fields
> > and searching that concatenated field with and/or (except that MFQ does
> > interesting things with boosting).
> >
> > But if you know exactly what terms you require in which field, the
> > standard query parser is fine. i.e. +material:leather +gender:fem
u and Paul have recommended. Once done, then I would need a
MultiFieldQuery? Forgive me but the queries confuse me.
Rebuilding my index will take some time, but I appreciate everyone's help
Christian
On Mon, May 4, 2009 at 11:40 AM, Erick Erickson wrote:
> H, tricky. Let's see
what I need. I very clearly know my fields and values and that should
give me enormous leverage when querying if I could build a query to do that
Christian
--
Christian Bongiorno
could use this mechanism to insure that. Simply choose
> an IncrementGap greater than the maximum number of terms in
> an event description, then when you want to search in the
> description field, just use a proximity less than the IncrementGap.
> It may not apply at all for you, but
ls
> depends upon whether you can rebuild your index so we'll defer
> that part
>
> You could also think about updating the document when new events
> were added, but since an update is really a delete/add under the
> covers you'd have to either gather enough i
ocmuent inside the index (since this
consumes so many space)? e.g. extracting the keywords that were stored
for the item?
any hints appreciated.
regards chris
--
Christian Brennsteiner
Salzburg / Austria / Europe
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
();
Thanks for all answers!
Christian
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
not
completely in RAM.
regards chris
On Mon, Dec 22, 2008 at 4:41 AM, Otis Gospodnetic
wrote:
> Christian
>
> You can certainly purge old documents on a daily basis in order to keep the
> corpus from growing, but note that 3M*90=270M 2K docs may be a bit too much
> for a singl
hi *,
i am searching for a fulltext index capeable of the following requirements:
index everyday 3 000 000 new records with a validity of N days (e.g.
90 days expiration)
== 34,7 / s
one record is e.g. an url and can be up to 2 k big
http://example.com/somedir/some.html
lucene should use "/" as
is fantastic :)
When I think about standard 1:n queries, all of you are right, there an 'AND'
behaviour is needed -
so the span queries are adequate, with the positionIncrementGap trick.
Thank you guys, your answers really helped me a lot!
Christian
Erick Erickson schrieb:
> No
t.
>
> Of course I may have completely mis-read your problem, but I'm sure you'll
> let us know if that's the case .
>
>
> BTW, if this isn't a typo, you probably need SpanQuery since you can
> specify order not being important:
> attName:"st
term2 term3 term4"
For the 1:n behaviour, you need some kind of logical 'grouping' of one
dataset.
whereby a query 'term1 term4' should NOT match, 'term1 term2' must match.
Stefan Trcek schrieb:
> On Wednesday 12 November 2008 14:58:53 Christian Reuschling
would be a standard BooleanQuery, but only
applied inside the range of the delimiters. Is this somehow possible, or do I
have to write my own Query implementation - and what would be the best way in
this case.
Thanks in advance
Christian Reuschling
signature.asc
Description: OpenPGP digital signature
p a little, greetings
Christian Reuschling
package org.dynaq;
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Docume
day I switched to the new svn
build, but no
change. Can you recommend the svn version, or would you say I should switch
back to the
release?
Thanks in advance
Christian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For addit
Hello people,
I'm sorry if I have send this message twice - my gmail interface merges the
mails in the 'send' folder with incoming mails from my adress - strange, but
I can't say if the mail was sent - I only see it in the send-folder (with
only one label on it, which brings me to send it again
Hello people,
yes, there were several threads about this topic, but I sadly have to respawn
it, I'm sorry.
The first I found was a discussion from May 2005:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL
PROTECTED]
There the final solution suggestion from Hoss wa
l cases.
If it is critical - is there a common - or uncommon but good - solution for
this, that
I have forgotten?
Christian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
l cases.
If it is critical - is there a common - or uncommon but good - solution for
this, that
I have forgotten?
Christian
signature.asc
Description: OpenPGP digital signature
l cases.
If it is critical - is there a common - or uncommon but good - solution for
this, that
I have forgotten?
Christian
signature.asc
Description: OpenPGP digital signature
Thanks. You were right, in a different spot of the code somebody
hard-coded mime types without
including charsets in there.
Christian
Grant Ingersoll wrote:
Lucene knows nothing about mime types, so this is likely a problem
somewhere else in the chain. Have a look at the stack trace to see
.
--
Christian Pich, Ph.D.
University of Oregon
Zebrafish Information Network
Phone: 541-346-1581
Email: [EMAIL PROTECTED]
Web: http://zfin.org
filter (for indexing) with google, so this is my guess of how to do
it.
My question is: Is this the correct pattern for the usage of a filter
or where should it be placed?
Thank you in advantage for any comments,
Christian
19:01 schrieb Daniel Naber:
On Friday 12 October 2007 15:48, Christian Aschoff wrote:
indexWriter = new IndexWriter(MiscConstants.luceneDir,
new GermanAnalyzer(), create);
[...]
Not NO_NORMS is the problem but GermanAnalyzer. Try
StandardAnalyzer on the
field you get the suggestions from.
Re
ome kind of 'unstemmed' index just
for the creation of the SpellCheckers-index?
Regards,
Christian Aschoff
---
Dipl. Ing. (FH) Christian Aschoff
Büro:
Universität Ulm
Kommunikations- und Informationszentrum
Abt. Informationssysteme
Raum O26/5403
Albert-Einstein-Allee 11
89081 Ulm
Tel. 07
Hi,
how would I efficently retrieve the names of all possible fields present in an
index?
One way would be to iterate over all terms and extract the field names, but it
doesn't
look like this method is efficient for large indices.
Murphy for president!
HC
-
Is there an enhancment/plugin to Lucene which would allow
queries like
myNumericalField > 100
I know that usually one has to index such fields as text with
the property a > b => lex(text(a)) > lex(text(b)) and devise
the text(n) transformation appropriately.
What I'm looking for is an enhance
If mergeFactor is set to 2 and no optimize() is ever done on the index,
what is the impact on
1) the number opened files during indexing
2) the number of opened files during searching
2) the search speed
3) the indexing speed
??
HC
---
yes, look at the 'contributions' link at the lucene-homepage.
The 'Phonetix'-project provides an implementation for soudex,
metaphor and double-metaphor. Simply use their analyzer. I am
not sure what the behaviour is in the case of wildcards. Have
anyone an answer?
regards
Thanks for the hint.
Cheers
Chris
-Original Message-
From: sergiu gordea [mailto:[EMAIL PROTECTED]
Sent: 05 July 2005 10:02
To: java-user@lucene.apache.org
Subject: Re: free text search with numbers
Hi Christian,
That syntax is not entirely correct.
Search in the mailing list for
ssage-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 05 July 2005 11:59
To: java-user@lucene.apache.org
Subject: Re: free text search with numbers
On Jul 5, 2005, at 2:26 AM, BOUDOT Christian wrote:
> :-) I changed the main lines and compiled the QueryParser.java
> after that I
ering with a copy of Lucene's
source code you can run the Ant target "javacc" and you must have
JavaCC installed per the build instructions.
Erik
On Jul 4, 2005, at 11:38 AM, BOUDOT Christian wrote:
> I have found in the QueryParser.jj those lines of comments:
>
--
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 04 July 2005 16:15
To: java-user@lucene.apache.org
Subject: Re: free text search with numbers
On Jul 4, 2005, at 9:02 AM, BOUDOT Christian wrote:
> Hi,
>
> I modified the analyzer (it is now vegetarian and won't eat numbers
ik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 01 July 2005 15:11
To: java-user@lucene.apache.org
Subject: Re: free text search with numbers
On Jul 1, 2005, at 8:06 AM, BOUDOT Christian wrote:
> It is the first time that I implement a search with Lucene, so
> please don't
> laugh if
Thanks for the link.
Cheers
Chris
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 01 July 2005 15:11
To: java-user@lucene.apache.org
Subject: Re: free text search with numbers
On Jul 1, 2005, at 8:06 AM, BOUDOT Christian wrote:
> It is the first time tha
Hi folks,
It is the first time that I implement a search with Lucene, so please don't
laugh if my question seam trivial.
When I enter some text in my free text search the query gets build correctly
but when I enter number (as string) the query parser seam to ignore them.
What am I doing wrong?
77 matches
Mail list logo