Hello Michael,
For the case of normalizing ü to ue, take a look at the german normalizer [1].
Regards,
Markus
[1]
https://lucene.apache.org/core/7_6_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
-Original message-
> From:Ralf Heyde
> Sent:
t: Re: 8.0.0 ClassCastException in ValueSource
>
> Hi Markus,
>
> Thanks for reporting this. It looks like a side-effect of the Scorable
> refactoring, can you open a JIRA issue?
>
> On Wed, Mar 20, 2019 at 5:01 PM Markus Jelsma
> wrote:
> >
> > Hello,
> >
&g
Hello,
Upgraded to Lucene and Solr 8.0 and ran all our unit tests, this one popped up:
Caused by: java.lang.ClassCastException:
org.apache.lucene.queries.function.ValueSource$ScoreAndDoc cannot be cast to
org.apache.lucene.search.Scorer
at
Hello,
I think i tracked it further down to LUCENE-8589 or SOLR-12243:. When i leave
Solr's edismax' pf parameter empty, everything runs fast. When all fields are
configured for pf, the node dies.
I am now unsure whether i am on the right list, or if i should move to Solr's.
Please let me
Hello,
While working on SOLR-12743, using 7.6 on two nodes and 7.2.1 on the remaining
four, we stumbled upon a situation where the 7.6 nodes quickly succumb when a
'Query-of-Death' is issued, 7.2.1 up to 7.5 are all unaffected (tested and
confirmed).
Following Smiley's suggestion i used
regards
>
>
>
> On 10/15/18 3:28 PM, Markus Jelsma wrote:
> > Hello Baris,
> >
> > Check out the filter factory and the map parser for a more low level
> > example:
> >
Hello Baris,
Check out the filter factory and the map parser for a more low level example:
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.java
Hi Egorlex,
Set the tokenSeparator to "" and ShingleFilter will concatenate all shingles
without whitespace. Keep in mind, this will greatly increase the size of the
index so it might not be a good idea to concatenate all pairs of words.
If you are looking for finding "similarissues" with
12238
>
> Patch and Pull Request is attached but it has not been reviewed yet.
> Give it a look, and then we can continue the discussion here!
> let me know if you feel your requirement is different !
>
> Cheers
>
> On Wed, May 23, 2018 at 11:41 AM, Markus
Hello,
To support payloads we rewrite SynonymQuery to a pair of SpanTerm queries which
we then can wrap in the PayloadScoreQuery. This is not the right way to do this
because if both clauses match, both are also scored. We could try to rewrite
SynonymQuery to a SpanOrQuery but i suppose that
Hello,
First, apologies for the weird subject line, and apologies for cross-posting,
but last week it got no replies on the Solr user mailing list.
We index many languages and search over all those languages at once, but boost
the language of the user's preference. To differentiate between
?
>
> Send a pull request. :)
>
> Uwe
>
> Am 16. September 2017 12:42:30 MESZ schrieb Markus Jelsma
> <markus.jel...@openindex.io>:
> >Hello Uwe,
> >
> >Thanks for getting rid of the compounds. The dictionary can be smaller,
> >it still has
Hello Uwe,
Thanks for getting rid of the compounds. The dictionary can be smaller, it
still has about 1500 duplicates. It is also unsorted.
Regards,
Markus
-Original message-
> From:Uwe Schindler
> Sent: Saturday 16th September 2017 12:16
> To:
re/org/apache/lucene/analysis/tokenattributes/TypeAttribute.html
> [2] :
> https://lucene.apache.org/core/6_5_0/analyzers-common/org/apache/lucene/analysis/core/TypeTokenFilter.html
>
> Il giorno mer 14 giu 2017 alle ore 23:33 Markus Jelsma <
> markus.jel...@openindex.io> ha scrit
an a number in there you have to
> provide your own decoders and the like to make sense of your
> payload
>
> Best,
> Erick (Erickson, not Hatcher)
>
> On Wed, Jun 14, 2017 at 2:22 PM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > Hello Erik,
> >
June 2017 23:03
> To: java-user@lucene.apache.org
> Subject: Re: Using POS payloads for chunking
>
> Markus - how are you encoding payloads as bitsets and use them for scoring?
> Curious to see how folks are leveraging them.
>
> Erik
>
> > On Jun 14, 2
Hello,
We use POS-tagging too, and encode them as payload bitsets for scoring, which
is, as far as is know, the only possibility with payloads.
So, instead of encoding them as payloads, why not index your treebanks POS-tags
as tokens on the same position, like synonyms. If you do that, you can
Ok, we decided not to implement PositionLengthAttribute for now due to, it
either is a bad applied (how could one even misapply that attribute?) or Solr's
QueryBuilder has a weird way of dealing with it or.. well.
Thanks,
Markus
-Original message-
> From:Markus Jelsma
Hello again, apologies for cross-posting and having to get back to this
unsolved problem.
Initially i thought this is a problem i have with, or in Lucene. Maybe not, so
is this problem in Solr? Is here anyone who has seen this problem before?
Many thanks,
Markus
-Original message-
>
Hello,
We have a decompounder and recently implemented the PositionLengthAttribute in
it and set it to 2 for a two-word compound such as drinkwater (drinking water
in dutch). The decompounder runs both at index- and query-time on Solr 6.5.0.
The problem is, q=content_nl:drinkwater no longer
Hello - you are on the wrong list, this is Lucene java user, not the Solr user
mailing list. But this is what you are looking for:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
https://wiki.apache.org/solr/ExtractingRequestHandler
First is
Yes, they should be the same unless the field is indexed with shingles, in that
case order matters.
Markus
-Original message-
> From:Julius Kravjar
> Sent: Monday 16th January 2017 18:20
> To: java-user@lucene.apache.org
> Subject: question
>
> May I have
Hello - i noticed something peculiar running Lucene/Solr 6.3.0.
The plural vaccinatieprogramma's should have a startOffset of 0 and a endOffset
of 21 when passed through WordDelimiterFilter and/or stemmers but it isn't,
slightly messing up highlighted terms.
wdf = new
Hi - i seem to be having trouble correctly executing a range query on a date
field.
The following Solr document is indexed via a unit test followed by a commit:
view
test_key
2013-01-09T17:11:40Z
I can retrieve the document simply wrapping term queries in a boolean query
Hello - upgrading one of our libraries to 6.2.0 failed due to LUCENE-7318. This
is fixed nicely on 6.2.1, many thanks for that!
Upgrading to 6.2.1, however, still raises compile errors. I haven't seen any
notice of this in CHANGES.txt or its API changes section for both 6.2.x
versions. Any
; H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> > Sent: Wednesday, August 31, 2016 11:08 AM
> > To: java-user@lucene.apache.org
>
Hello - i'm upgrading a project that uses Lucene to 6.2.0 and get the compile
error that LowerCaseFilter does not exists. And, so it seems, the JavaDoc is
gone too. I've checked CHANGES.txt and there is no mention of it, not even in
the API changes section.
Any ideas?
Thanks,
Markus
are developed for indices in which stop
> words are eliminated.
> Therefore, most of the term-weighting models have problems scoring common
> terms.
> By the way, DFI model does a decent job when handling common terms.
>
> Ahmet
>
>
>
> On Tuesday, April 19, 2016 4:48 PM,
Hello,
I just made a Solr query parser for BlendedTermQuery on Lucene 6.0 using BM25
similarity and i have a very simple unit test to see if something is working at
all. But to my surprise, one of the results has a negative score, caused by a
negative IDF because docFreq is higher than
Hi - if you don't want specific words passed through a stemmer, you need to
supply a CharArraySet with exclusions as the second argument to its constructor.
Markus
-Original message-
> From:Dwaipayan Roy
> Sent: Monday 14th March 2016 15:31
> To:
issue, or a Lucene or
> JVM bug?
>
> LUCENE-6970
>
> On Thu, Jan 21, 2016 at 4:07 PM, Markus Jelsma <markus.jel...@openindex.io>
> wrote:
>
> > Hi - we get the above issue as well some times. I've noticed Lucene-dev
> > mails on this issue [1] but i
mentation needs to rewrite to a BoostQuery. You can do that by
> prepending the following to your rewrite(IndexReader) implementation:
>
> if (getBoost() != 1f) { return super.rewrite(reader); }
>
>
> Le jeu. 17 déc. 2015 à 13:23, Markus Jelsma <markus.jel...@openindex.io&g
Hi,
Apologies for the cross post. We have a class overridding
SpanPositionRangeQuery. It is similar to a SpanFirst query but it is capable of
adjusting the boost value with regard to distance. With the 5.4 upgrade the
unit tests suddenly threw the following exception:
Query class
Hi,
Apologies for cross posting; i got no response on the Sorl list.
We have a developement environment running trunk but have custom analyzers and
token filters built on 4.6.1. Now the constructors have changes somewhat and
stuff breaks. Here's a consumer trying to get a TokenStream from an
Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Thursday, January 30, 2014 10:50 AM
To: java-user@lucene.apache.org
Subject: LUCENE-5388 AbstractMethodError
Hi,
Apologies for cross posting; i got no response on the Sorl list.
We have a developement
://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Thursday, January 30, 2014 12:52 PM
To: java-user@lucene.apache.org
Subject: RE: LUCENE-5388 AbstractMethodError
Hi Uwe,
The bug occurred only
Hi,
I know it is recommended to disable the coordination factor when using models
other than default TFIDFSimilarity. And out of curiosity i'd like to know the
motivation behind it but it is not explained anywhere, not even in
LUCENE-2959, the patches, wiki, PDF's or whatever. So, anyone here
Hi,
This is likely discussed before but i couldn't to find it. Why are most token
filters final, or are most or all members private and / or final? It is
impossible to customize token filters by extending them, instead we need to
copy code around. How do you customize for example some bits
code that
uses Directory for replication.
- Mark
On Nov 2, 2012, at 6:53 AM, Markus Jelsma markus.jel...@openindex.io wrote:
Hi,
For what it's worth, we have seen similar issues with Lucene/Solr from this
week's trunk. The issue manifests itself when it want to replicate
Hi,
For what it's worth, we have seen similar issues with Lucene/Solr from this
week's trunk. The issue manifests itself when it want to replicate. The servers
have not been taken offline and did not crash when this happenend.
2012-10-30 16:12:51,061 WARN [solr.handler.ReplicationHandler] -
No this is not using NFS but EXT3 on SSD.
Thanks
-Original message-
From:Michael McCandless luc...@mikemccandless.com
Sent: Fri 02-Nov-2012 16:22
To: java-user@lucene.apache.org
Subject: Re: quot;read past EOFquot; when merge
On Fri, Nov 2, 2012 at 6:53 AM, Markus Jelsma
Matthijs li...@selckin.be
Sent: Thu 04-Oct-2012 15:55
To: java-user@lucene.apache.org
Subject: Re: Highlighter IOOBE with modified
HyphenationCompoundWordTokenFilter
And to include the code
On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
I forgot to add
Hi,
I've modified the HyphenationCompoundWordTokenFilter to emit less subtokens
because the original filter can emit all kinds of subtokens that have a very
different meaning on their own. I've modified it so no overlapping subtokens
are emitted and no subtokens are emitted that can be found
I forgot to add that this is with today's build of trunk.
-Original message-
From:Markus Jelsma markus.jel...@openindex.io
Sent: Thu 04-Oct-2012 15:42
To: java-user@lucene.apache.org
Subject: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter
Hi,
I've modified
You should ask on the Droids list but there's some activity in Jira. And did
you consider Apache Nutch?
On Tuesday 23 August 2011 10:17:50 Li Li wrote:
hi all
I am interested in vertical crawler. But it seems this project is not
very active. It's last update time is 11/16/2009
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[X] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream
46 matches
Mail list logo