Re: Re: Solr Config XML DTD's

2011-05-01 Thread Michael Sokolov
My first post too - but if I can offer a suggestion - there are more modern XML validation technologies available than DTD. I would heartily recommend RelaxNG/Compact notation (see http://relaxng.org/compact-tutorial-20030326.html) - you can generate Relax from a DTD, but it is more

Re: Solr Config XML DTD's

2011-05-04 Thread Michael Sokolov
I'm not sure you will find anyone wanting to put in this effort now, but another suggestion for a general approach might be: 1 very basic static analysis to catch what you can - this should be a pretty minimal effort only given what can reasonably be achieved 2 throw runtime errors as Hoss

XmlCharFilter

2011-06-14 Thread Michael Sokolov
I work with a lot of XML data sources and have needed to implement an analysis chain for Solr/Lucene that accepts XML. In the course of doing that, I found I needed something very much like HTMLCharFilter, but that does standard XML parsing (understands XML entities defined in an internal or

Re: pro coding style

2012-12-01 Thread Michael Sokolov
On 12/1/2012 7:59 AM, Per Steffensen wrote: It is all about information - git has it, SVN doesnt. And my logical sence tells me that is has to be git and not github! :-) Now tell me that I am stupid :-) This kind of information (merge tracking) has been in svn since 1.5 (see

Re: [jira] [Created] (LUCENE-8319) A Time-limiting collector that works with CollectorManagers

2018-05-18 Thread Michael Sokolov
Would it make sense to change TimeExceededException so it extends CollectionTerminatedException? On Wed, May 16, 2018 at 4:29 PM, Tony Xu (JIRA) wrote: > Tony Xu created LUCENE-8319: > --- > > Summary: A Time-limiting collector that

Re: SynonymGraphFilter followed by StopFilter

2018-07-26 Thread Michael Sokolov
> In general I’d avoid index-time synonyms in lucene because synonyms can create graphs (eg if a single term gets expanded to several terms), and we can’t index graphs correctly. I wonder what it would take to address this. I guess the blast radius of adding a token "width" could be pretty

Re: Synonyms + autoGeneratePhraseQueries

2018-07-26 Thread Michael Sokolov
Did you mean q=oow in your example? As written, I don't see how there is a problem. On Thu, Jul 26, 2018 at 8:41 AM Andrea Gazzarini wrote: > Hi, still fighting with synonyms, I have another question. > I'm not understanding the role, and the effect, of the > "autoGeneratePhraseQueries"

Re: [jira] [Commented] (LUCENE-2562) Make Luke a Lucene/Solr Module

2018-08-16 Thread Michael Sokolov
Oh! Nice -- I'll have a look. I had started tinkering with my own, but it would be nice if it already existed thanks! On Thu, Aug 16, 2018 at 10:42 AM Tomoko Uchida (JIRA) wrote: > > [ >

benchmark drop for PrimaryKey

2018-08-23 Thread Michael Sokolov
I happened to stumble across this chart https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a pretty drastic drop in this benchmark on 5/13. I looked at the commits between the previous run and this one and did some investigation, trying to do some git bisect to find the problem

LUCENE-765

2018-08-23 Thread Michael Sokolov
Can I interest someone in reviewing my patch for https://issues.apache.org/jira/browse/LUCENE-765? It's additional javadoc for in the index package I was rooting around for some low-impact helpful thing to do here, and found this on a list of "newdev" issues. It's fairly high-level but should be

Re: benchmark drop for PrimaryKey

2018-08-23 Thread Michael Sokolov
run(): - idFieldPostingsFormat='Lucene50', + idFieldPostingsFormat='FST50', On Thu, Aug 23, 2018 at 5:52 PM Michael Sokolov wrote: > OK thanks. I guess this benchmark must be run on a large-enough index that > it doesn

Re: benchmark drop for PrimaryKey

2018-08-23 Thread Michael Sokolov
OK thanks. I guess this benchmark must be run on a large-enough index that it doesn't fit entirely in RAM already anyway? When I ran it locally using the vanilla benchmark instructions, I believe the generated index was quite small (wikimedium10k). At any rate, I don't have any specific use case

Re: benchmark drop for PrimaryKey

2018-08-24 Thread Michael Sokolov
the fact that the default > codec changed. However, I did not add backward-codecs.jar to the classpath, > you should rebuild the index that you use for benchmarking so that it uses > the Lucene80 codec instead of Lucene70. > > Le ven. 24 août 2018 à 02:03, Michael Sokolov a > écrit : &g

Re: javadoc linting on JDK10+

2018-08-29 Thread Michael Sokolov
Michael Sokolov wrote: > I am trying to run ant precommit (on master) and it fails for me with this > message: > > -ecj-javadoc-lint-unsupported: > > BUILD FAILED > /home/ > ANT.AMAZON.COM/sokolovm/workspace/lbench/lucene_baseline/lucene/common-build.xml:2076: > Lin

javadoc linting on JDK10+

2018-08-29 Thread Michael Sokolov
I am trying to run ant precommit (on master) and it fails for me with this message: -ecj-javadoc-lint-unsupported: BUILD FAILED /home/ ANT.AMAZON.COM/sokolovm/workspace/lbench/lucene_baseline/lucene/common-build.xml:2076: Linting documentation with ECJ is not supported on this Java version

Closing a JIRA issue

2018-08-29 Thread Michael Sokolov
This old issue was still assigned to me: https://issues.apache.org/jira/browse/LUCENE-3318. I had worked on it seven years ago, but it is no longer relevant today, and I'd like to close it, but I don't see any UI affordance for doing that in JIRA. Am I missing permissions? Is the issue in some

Re: Closing a JIRA issue

2018-08-31 Thread Michael Sokolov
t; > So, if you do not see it, the permissions may be in play. I will leave > the issue as is, to let the discrepancy to be figured out. > > Regards, > Alex. > > On 29 August 2018 at 15:56, Michael Sokolov wrote: > > This old issue was still assigned to me: > > http

Re: [jira] [Created] (LUCENE-8389) Could not limit Lucene's memory consumption

2018-07-06 Thread Michael Sokolov
You should really try asking on an Atlassian support forum since Jira is their project and they support it. This bug database is for tracking issues about Lucene itself. Also please note that Lucene 3 is many years old now, and no longer receiving bug fixes. The current version is 7, soon to be 8,

Re: [jira] [Reopened] (LUCENE-8389) Could not limit Lucene's memory consumption

2018-07-09 Thread Michael Sokolov
Can you run a mirror instance and swap traffic, performing reindexing on an online system, and then bring it online when complete? On Sun, Jul 8, 2018, 7:46 PM changchun huang (JIRA) wrote: > > [ >

WordDelimiterFilter javadocs are off base

2018-04-04 Thread Michael Sokolov
The javadocs for both WDF and WDGF include a pretty detailed discussion about the proper use of the "combinations" parameter, but no such parameter exists. I don't know the history here, but it sounds as if the docs might be referring to some previous incarnation of this filter, perhaps in the

Re: [jira] [Commented] (LUCENE-8240) Support different analysis per field instance

2018-04-05 Thread Michael Sokolov
Ok that was actually my first implementation. It was a lot messier. I'll follow up with details when I get back to a keyboard On Thu, Apr 5, 2018, 9:09 AM Adrien Grand (JIRA) wrote: > > [ >

Re: [jira] [Commented] (LUCENE-8248) Rename MergePolicyWrapper to FilterMergePolicy and override all of MergePolicy

2018-04-13 Thread Michael Sokolov
yes, thanks! On Fri, Apr 13, 2018 at 7:05 PM, Michael McCandless (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-8248?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel=16438060#comment-16438060 ] > > Michael McCandless commented on

Re: [jira] [Commented] (LUCENE-8248) Make MergePolicy.setMaxCFSSegmentSizeMB final

2018-04-10 Thread Michael Sokolov
Ah true that would be messy! I'll update the patch. On Tue, Apr 10, 2018 at 7:26 PM, Michael McCandless (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-8248?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- >

Re: [jira] [Commented] (LUCENE-8273) Add a BypassingTokenFilter

2018-04-24 Thread Michael Sokolov
+1 On Tue, Apr 24, 2018 at 9:58 AM, Alan Woodward (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-8273?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel=16449897#comment-16449897 ] > > Alan Woodward commented on LUCENE-8273: >

Re: [jira] [Comment Edited] (LUCENE-8159) Add a copy constructor in AutomatonQuery to copy directly the compiled automaton

2018-03-04 Thread Michael Sokolov
Perhaps Robert is a fan of Object.clone() On Feb 28, 2018 9:59 AM, "Bruno Roustant (JIRA)" wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-8159?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel=16380407#comment-16380407 ] > > Bruno

Re: [jira] [Commented] (LUCENE-8516) Make WordDelimiterGraphFilter a Tokenizer

2018-09-30 Thread Michael Sokolov
My current usage of this filter requires it to be a filter, since I need to precede it with other filters. I think the idea of not touching offsets preserves more flexibility, and since the offsets are already unreliable, we wouldn't be losing much. On Sun, Sep 30, 2018, 11:32 AM Alan Woodward

Re: [jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-26 Thread Michael Sokolov
The current situation is that it is impossible to apply offsets correctly in a TokenFilter. It seems to work OK most of the time, but truly correct behavior relies on prior components in the chain not having altered the length of tokens, which some of them occasionally do. For complete correctness

Re: [jira] [Commented] (LUCENE-8509) NGramTokenizer, TrimFilter and WordDelimiterGraphFilter in combination can produce backwards offsets

2018-10-26 Thread Michael Sokolov
In case it wasn't clear, I am +1 for Alan's plan. We can always restore offset-alterations here if at some future date we figure out how to do it correctly. On Fri, Oct 26, 2018 at 6:08 AM Michael Sokolov wrote: > The current situation is that it is impossible to apply offsets correc

Re: [jira] [Commented] (LUCENE-8548) Reevaluate scripts boundary break in Nori's tokenizer

2018-10-26 Thread Michael Sokolov
I agree w/Robert let's not reinvent solutions that are solved elsewhere. In an ideal world, wouldn't you want to be able to delegate tokenization of latin script portions to StandardTokenizer? I know that's not possible today, and I wouldn't derail the work here to try to make it happen since it

Re: Does ConcurrentMergeScheduler actually do smaller merges first?

2018-10-10 Thread Michael Sokolov
If maxMergeCount was 2, you could get into a situation with three large merges I think; the largest would be paused, but the others could still take > 10 mins to complete. Are you sure that your observation is at odds with what the document says the scheduler is doing? On Wed, Oct 10, 2018 at

Re: Closing a JIRA issue

2018-08-31 Thread Michael Sokolov
you that role, Michael, please see if you see > the Resolve button now. > > Cassandra > > On Fri, Aug 31, 2018 at 11:09 AM Uwe Schindler wrote: > >> Hi, >> >> When back in office, I will check the project roles of Lucene and Sole >> Jira projects. >&g

Re: Congratulations to the new Lucene/Solr PMC chair, Cassandra Targett

2018-12-31 Thread Michael Sokolov
Heavy is the head that wears the crown - congrats and thank you! And here's to a peaceful transition of power in the new year :) On Mon, Dec 31, 2018 at 1:39 PM Dawid Weiss wrote: > > Congratulations, Cassandra! > > On Mon, Dec 31, 2018 at 7:04 PM Gus Heck wrote: > > > > Congratulations :) > >

Re: [jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-16 Thread Michael Sokolov
I used the wikimedia2m data set for the second set of tests (the first test was on a tiny index - 10k docs) -- at least I think I did! I am kind of new to the benchmarking game. I ran the becnhmarks with python src/python/localrun.py -source wikimedium2m, and I can see that the index dir is 861M.

Re: Unicode Quotes in query parser

2019-01-21 Thread Michael Sokolov
I think this is probably better to discuss on solr-user, or maybe solr-dev, since it is dismax parser you are talking about, which really lives in Solr. However, my 2c - this seems somewhat dubious. Maybe people want to include those in their terms? Also, it leads to a kind of slippery slope:

Re: Unicode Quotes in query parser

2019-01-22 Thread Michael Sokolov
U+3000). >> >> I’m not sure if quotes are normalized. I did some searching around >> without success. That might come under character folding. There was a >> draft, now withdrawn, for standard character folding. I’d probably start >> there for a Unicode folding char filter. &

Re: SynonymQuery / Query Expansion Strategies Discussion

2018-11-20 Thread Michael Sokolov
This is a great idea. It would also be compelling to modify the term frequency using this deboosting so that stacked indexed terms can be weighted according to their closeness to the original term. On Tue, Nov 20, 2018, 2:19 PM jim ferenczi Sorry for the late reply, > > > So perhaps one way

Re: [GitHub] lucene-solr issue #500: LUCENE-8517: do not wrap FixedShingleFilter with con...

2018-11-19 Thread Michael Sokolov
Oh! got it - We run our tests and other release machinery etc against a single JDK, and it is currently Java 8. I will precommit with Java 8 then. Presumably at some future date JDK11 becomes the system of record? Historically how long have we waited after a new Java release before shifting over?

Re: [DISCUSS] Opening old indices for reading

2019-01-24 Thread Michael Sokolov
+1 it makes sense to me; real world problems sometimes require messy solutions. I guess the alternative is everybody develops their own suite of tools and it is hard to share. Some caution is warranted though I think; even with misc/experimental caveats, these tools will only be useful if people

Re: BadApple for today Woweeeeeee!!!!!! Nothing reported!!!!

2019-03-19 Thread Michael Sokolov
Oh that is great! It's work just to *keep* things passing. Climbing out from under a big pile of failures as folks have been doing here is extra hard, so thank you! On Mon, Mar 18, 2019 at 1:02 PM Erick Erickson wrote: > There are still annoying failing tests, but apparently nothing annotated >

Re: [jira] [Commented] (SOLR-13233) SpellCheckCollator ignores stacked tokens

2019-02-09 Thread Michael Sokolov
Why does SpellCheckCollator want to ignore tokens with incorrect offsets? On Fri, Feb 8, 2019 at 10:35 AM Alan Woodward (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763700#comment-16763700 > ]

Re: [GitHub] msokolov opened a new pull request #562: Don't create a LeafCollector when the Scorer for the leaf is null

2019-02-04 Thread Michael Sokolov
This PR proposes a small change a co-worker found. We can avoid creating a leaf collectors for a leaf that matches no terms, which we can tell if the scorer for it is null. One test was relying on the exact sequence of collectors, enforcing that every one was created with no gaps in their

Re: Welcome Namgyu Kim as Lucene/Solr committer

2019-06-05 Thread Michael Sokolov
Namgyu! Welcome Mike On Mon, Jun 3, 2019 at 1:52 PM Adrien Grand wrote: > > Hi all, > > Please join me in welcoming Namgyu Kim as Lucene/ Solr committer! > > Kim has been helping address technical debt and fixing bugs in the > last year, including a cleanup to our DutchAnalyzer[0] and >

Re: [jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-08 Thread Michael Sokolov
I think this is the same pro-rated idea from LUCENE-8681; when the documents are randomly distributed among segments, the prediction can be quite accurate. In the case of a time series index though (eg, or any index where the distribution among segments is correlated with the rank), then this

Re: Lucene / Solr Gradle Build Update

2019-06-08 Thread Michael Sokolov
Please don't stop now! Many thanks for doing the work. Faster builds will answer for any grumbling/transition pains I expect On Sat, Jun 8, 2019 at 9:58 AM Gus Heck wrote: > > Also looking forward to it. :) especially if it speeds things up. Moving > forward with it in 9x and not 8 sounds good

Re: ReleaseWizard tool

2019-06-01 Thread Michael Sokolov
I'm not sure what the proper way to use fix version is. Suppose you back port a fix to multiple branches? Should fixVersion list all of them? Just pick one? On Wed, May 29, 2019, 6:00 PM Jan Høydahl wrote: > My releaseWizard tool is getting more complete as the 7.7.2 release > progresses. Will

Re: Use of JIRA fixVersion

2019-06-01 Thread Michael Sokolov
The main use I've had for this field: as a user, I want to know whether this bug or feature has been fixed or is available in the version I am using, and if not, which version I would need to upgrade to in order to get it. For this use case I think it's important to list versions on each branch it

Re: [jira] [Reopened] (LUCENE-8781) Explore FST direct array arc encoding

2019-06-11 Thread Michael Sokolov
Oh that sounds possibly bad. Thanks for reporting. I'll try to take a look soon, but am about to travel ... Giving a talk at Berlin buzzwords! So it could be some time before I can really dig into it. On Mon, Jun 10, 2019, 5:37 PM David Smiley (JIRA) wrote: > > [ >

Re: [JENKINS] Lucene-Solr-Tests-8.x - Build # 259 - Failure

2019-06-22 Thread Michael Sokolov
I think it's OK to wait until tomorrow - it must be late in JP now! On Sat, Jun 22, 2019 at 11:53 AM Tomoko Uchida wrote: > > Sorry... I'll fix it soon. > > Tomoko > > 2019年6月23日(日) 0:46 Apache Jenkins Server : > > > > Build: https://builds.apache.org/job/Lucene-Solr-Tests-8.x/259/ > > > > All

Re: Welcome Munendra S N as Lucene/Solr committer

2019-06-21 Thread Michael Sokolov
Welcome Munendra  On Fri, Jun 21, 2019, 3:50 PM MUNENDRA S N wrote: > Thanks Ishan, and thank you everyone for this opportunity > I came across Lucene/Solr when I joined as Software Engineer at Unbxd. I > have been working with Lucene/Solr from past 3 years and started > contributing from past

Re: We need developers

2019-05-12 Thread Michael Sokolov
Please take this conversation elsewhere. It's not an appropriate use of this list, which is dedicated to discussion of Lucene development. On Sat, May 11, 2019, 4:32 PM Milind Thombre wrote: > Hi Matias! > > If India is OK, I may have someone for you! > > Regards > Milind K Thombre >

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
eryone, > > Please join me in welcoming Michael Sokolov as Lucene/ Solr committer! > > Many of you probably know Mike as he's been around for quite a while > -- answering questions, reviewing patches, providing insight and > actively working on new code. > > Congratulations

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
Alas, I had to resort to Google Translate; Thank you! On Mon, May 13, 2019 at 4:52 PM Martin Gainty wrote: > > Удачи Майкл! > > > > From: Erick Erickson > Sent: Monday, May 13, 2019 4:11 PM > To: dev@lucene.apache.org > Subject:

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
kson >> Sent: Monday, May 13, 2019 4:11 PM >> To: dev@lucene.apache.org >> Subject: Re: Welcome Michael Sokolov as Lucene/ Solr committer >> >> Welcome Michael! >> >> > On May 13, 2019, at 2:48 PM, Dawid Weiss wrote: >> > >> >> I am pr

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
earch Developer >> http://www.linkedin.com/in/davidwsmiley >> >> >> On Mon, May 13, 2019 at 3:12 PM Dawid Weiss wrote: >>> >>> Hello everyone, >>> >>> Please join me in welcoming Michael Sokolov as Lucene/ Solr committer! >>> >>

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
3:12 PM Dawid Weiss wrote: >> >> Hello everyone, >> >> Please join me in welcoming Michael Sokolov as Lucene/ Solr committer! >> >> Many of you probably know Mike as he's been around for quite a while >> -- answering questions, reviewing patches, providi

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
Thanks, Yonik! On Mon, May 13, 2019 at 5:21 PM Yonik Seeley wrote: > > Congrats Mike! > -Yonik > > > On Mon, May 13, 2019 at 3:23 PM Michael Sokolov wrote: >> >> Thanks Dawid, and thank you to everyone who voted to grant me access >> to this awesome projec

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
; >> On Mon, May 13, 2019 at 4:52 PM Martin Gainty wrote: >>> >>> Удачи Майкл! >>> >>> >>> >>> From: Erick Erickson >>> Sent: Monday, May 13, 2019 4:11 PM >>> To: dev@lucene.apache.org &

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
Thanks, Ishan! On Mon, May 13, 2019 at 3:26 PM Ishan Chattopadhyaya wrote: > > Congratulations and welcome, Michael! > > On Tue, May 14, 2019 at 12:53 AM Michael Sokolov wrote: > > > > Thanks Dawid, and thank you to everyone who voted to grant me access > > to t

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
am pretty sure my first interaction with the Apache Solr/Lucene >> >> community was back in 2012, >> > >> > Yeah... I really don't know how it happened you haven't been >> > invited earlier. Everyone just kind of assumed you >> > have

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
Ha! Thanks, Erik On Mon, May 13, 2019 at 3:28 PM Erik Hatcher wrote: > > Welcome, Michael! It’s about time :) > > > On May 13, 2019, at 15:11, Dawid Weiss wrote: > > > > Hello everyone, > > > > Please join me in welcoming Michael Sokolov as Lucene/ Solr

Re: Welcome Michael Sokolov as Lucene/ Solr committer

2019-05-13 Thread Michael Sokolov
Thanks, Tomás On Mon, May 13, 2019 at 5:57 PM Tomás Fernández Löbbe wrote: > > Welcome Michael! > > On Mon, May 13, 2019 at 2:21 PM Yonik Seeley wrote: >> >> Congrats Mike! >> -Yonik >> >> >> On Mon, May 13, 2019 at 3:23 PM Michael Sokolov wrote:

Re: [VOTE] Release Lucene/Solr 8.1.1 RC1

2019-05-22 Thread Michael Sokolov
+1 SUCCESS! [0:58:08.789057] On Wed, May 22, 2019 at 1:53 PM Andrzej Białecki wrote: > > Please vote for release candidate 1 for Lucene/Solr 8.1.1. > > The artifacts can be downloaded from: >

Re: Improve performance of FST Arc traversal

2019-04-26 Thread Michael Sokolov
Yes I think so too. Apparently this can be a ram hog if you have zillions of fields? I plan to post an issue with more details over the weekend when I will have the time to write up the results I saw. On Fri, Apr 26, 2019, 10:31 AM David Smiley wrote: > Maybe what you propose would allow

deprecations

2019-07-04 Thread Michael Sokolov
I'm curious what the process for dealing with deprecations (and their annoying compiler warnings) has been in the past? I see we have a large number of these stemming from @Deprecation of RAMOutputStream, RAMInputStream, RAMDirectory, etc, as well as various legacy DocValues classes, and probably

Re: deprecations

2019-07-04 Thread Michael Sokolov
ry has been removed already on master? > > On Thu, Jul 4, 2019, 14:56 Michael Sokolov wrote: >> >> I'm curious what the process for dealing with deprecations (and their >> annoying compiler warnings) has been in the past? I see we have a >> large number of th

Re: deprecations

2019-07-04 Thread Michael Sokolov
ed, we use forbiddenapis > > ("jdk-deprecated" signature). > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de > > eMail: u...@thetaphi.de > > > >> -Original Message- >

Re: New feature idea - Backwards (FST) dictionary for approximate string search

2019-07-10 Thread Michael Sokolov
cations of the >>> technique. It sounds like efficient approximate (ie with some edits) >>> substring search is the main idea? I don't believe such a query exists >>> today in Lucene (nor any Suggester as far as I know). It sounds as if >>> this would be useful

Re: Lucene/Solr 8.2.0

2019-07-15 Thread Michael Sokolov
Hmm that's possible, although the jump is bigger than anything I observed while testing. I assume these charts are building off of apache/master, or something close to that? If so, then the timing is off a bit. LUCENE-8781 was pushed quite a while before that, and then

Re: significant lucene benchmark regression: JDK11?

2019-04-25 Thread Michael Sokolov
Strangely LatLonShape seems to move in the opposite direction, or was that due to a known functional change? On Thu, Apr 25, 2019 at 3:33 PM Robert Muir wrote: > > looks to me like the default garbage collector may play a part in > this? look at JIT/gc times > >

Improve performance of FST Arc traversal

2019-04-25 Thread Michael Sokolov
I've been experimenting with a new FST encoding, and the performance gains are exciting on FST-intensive benchmarks like for the suggesters and for PKLookup in luceneutil. In our production system we see some gains in regular search performance as well, although these are modest since FST lookup

Re: Improve performance of FST Arc traversal

2019-04-25 Thread Michael Sokolov
Hi Dawid, The heuristic I used was to encode using the direct-array approach when more than 1/4 of the array indices would be filled (ie max-label - min-label / num-labels < 4), and otherwise to use the existing packed array encoding. I only applied the direct encoding when we previously would

Re: Lucene/Solr 8.2.0

2019-07-15 Thread Michael Sokolov
an writeDirectly = false; // labelRange > 0 && labelRange > < Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs; > >//System.out.println("write int @pos=" + (fixedArrayStart-4) + > " numArcs=" + nodeIn.numArcs); >// create the header > > On

Re: Lucene/Solr 8.2.0

2019-07-15 Thread Michael Sokolov
AM Ignacio Vera wrote: > > The change to Lucene 8.2.0 snapshot was done on July 10th. Previous to that > the Lucene version was 8.1.0. > > On Mon, Jul 15, 2019 at 12:53 PM Michael Sokolov wrote: >> >> Hmm that's possible, although the jump is bigger than anything I &

Re: [GitHub] [lucene-solr] atris commented on issue #815: LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

2019-08-16 Thread Michael Sokolov
OK; I guess I was confusing taskRepeatCount (within a JVM), but you can also have jvmCount On Tue, Aug 13, 2019 at 6:16 AM GitBox wrote: > > atris commented on issue #815: LUCENE-8213: Introduce Asynchronous Caching in > LRUQueryCache > URL:

Re: Failing tests

2019-08-22 Thread Michael Sokolov
Merry Christmas! On Thu, Aug 22, 2019, 9:44 AM Erick Erickson wrote: > Just for yucks, I grepped the e-mails I’ve been sending out for the number > of failing tests in the most recent 4 of Hoss’s rollups, see below. > > The drop in the last few weeks is dramatic, hope it’s a trend….. > >

precommit fail or is it me?

2019-09-06 Thread Michael Sokolov
Is anybody else seeing this error: ...workspace/lucene/lucene_baseline/build.xml:117: The following error occurred while executing this line: .../workspace/lucene/lucene_baseline/lucene/build.xml:90: The following error occurred while executing this line:

Re: precommit fail or is it me?

2019-09-06 Thread Michael Sokolov
ontinuation\*` report on your system? > > does `ant clean clean-jars` help? > > what does `git clean --dry-run -dx` say after you try to run ant clean > clean-jars? > > (it won't delet anything with --dry-run, but it might tell you if you have > unexpected stuff) > > : Dat

Re: precommit fail or is it me?

2019-09-06 Thread Michael Sokolov
I replied to a separate thread - seems that I had dangling symlinks left over from removing my Maven repo at some point in the past. I hadn't used this particular folder in a while ... By the way find -L solr -type l -exec rm -fr {} \; will remove broken symlinks On Fri, Sep 6, 2019 at 10:15

Direct I/O

2019-09-16 Thread Michael Sokolov
https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that Direct I/O is (or may be?) available now in JDK's since JDK10. Should we try using that API in NativeUnixDirectory in order to avoid JNI calls? - To

Are the FSDirectory javadocs inconsistent?

2019-09-07 Thread Michael Sokolov
In the class level javadoc it says, about NIOFSDIrectory, "...on all other platforms [than Windows] this is the preferred choice." Later it recommends calling #open() in order to follow recommendations for your platform. Javadocs for open(), on the other hand, say "Currently this returns {@link

Re: Welcome Atri Sharma as Lucene/Solr committer

2019-09-18 Thread Michael Sokolov
Welcome Atri! On Wed, Sep 18, 2019, 3:12 AM Adrien Grand wrote: > Hi all, > > Please join me in welcoming Atri Sharma as Lucene/ Solr committer! > > If you are following activity on Lucene, this name will likely sound > familiar to you: Atri has been very busy trying to improve Lucene over >

Re: Separate dev mailing list for automated mails?

2019-08-07 Thread Michael Sokolov
big +1 -- I'm also curious why the subject lines of many automated emails (from Jira?) start with [CREATED] even though they are generated by comments or other kinds of updates (not creating a new issue). Overall, I think we have way too much comment spam. In particular Github comments are so

Re: patch review for github PRs

2019-07-20 Thread Michael Sokolov
reCommit-SOLR-Build > . > > > [1] > https://issues.apache.org/jira/browse/SOLR-10912?focusedCommentId=16380775=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16380775 > <https://issues.apache.org/jira/browse/SOLR-10912?focusedCommentId=16380775=com.atlassian.ji

Re: [VOTE] Release Lucene/Solr 8.2.0 RC1

2019-07-22 Thread Michael Sokolov
I keep getting ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] Maybe the system I am trying to run on is lacking a CA cert that is needed? On Mon, Jul 22, 2019 at 1:40 PM Kevin Risden wrote: > > +1 > > SUCCESS! [1:37:46.956358] > > Kevin Risden > > > On Mon, Jul 22, 2019 at 9:41 AM Atri Sharma

Re: [JENKINS] Lucene-Solr-NightlyTests-8.2 - Build # 5 - Unstable

2019-07-17 Thread Michael Sokolov
Ah, never mind - I found the link in the email, doh On Wed, Jul 17, 2019 at 9:26 AM Michael Sokolov wrote: > > I believe I checked in a fix for this, and saw an email from another > recent 8.2 jenkins build job that seems to have had only a single > failure (something di

patch review for github PRs

2019-07-19 Thread Michael Sokolov
Is there a way to have Yetus build a PR as we can do with patch files? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 8.2.0

2019-07-15 Thread Michael Sokolov
e a mitigation for worst-case scenarii in 8.2 > or should we revert from branch_8_2 to keep the release process going > and work on this for 8.3? > > On Mon, Jul 15, 2019 at 5:12 PM Michael Sokolov wrote: > > > > Thanks for the nice test, Adrien. Yes, the tradeoff of di

Re: Lucene/Solr 8.2.0

2019-07-15 Thread Michael Sokolov
rollback and having a 8.3 as soon as we nail this down (even if that >> is days or 1-2 weeks after 8.2). >> >> On Mon, 15 Jul, 2019, 9:22 PM Michael Sokolov, wrote: >>> >>> I guess whether we roll back depends on timing. I think we are close >>> t

Re: [JENKINS] Lucene-Solr-NightlyTests-8.2 - Build # 5 - Unstable

2019-07-17 Thread Michael Sokolov
I believe I checked in a fix for this, and saw an email from another recent 8.2 jenkins build job that seems to have had only a single failure (something different from this Kuromoji one). I guess this nightly job started before I committed my fix, but I'd like to check the status of all the

Re: New feature idea - Backwards (FST) dictionary for approximate string search

2019-07-06 Thread Michael Sokolov
Juan, that sounds intriguing. I skimmed the paper trying to understand possible applications of the technique. It sounds like efficient approximate (ie with some edits) substring search is the main idea? I don't believe such a query exists today in Lucene (nor any Suggester as far as I know). It

Re: gradle module/project structure

2019-11-14 Thread Michael Sokolov
expanding paths as well): > > > > > > ./gradlew -p lucene/analysis test > > > > > > the difference being it'll try to run the 'test' task in any of the > > > submodules under that folder. > > > > > > Also, this will show you all available tasks a

Re: Anyone interested in the Gradle build, please comment on SOLR-13915

2019-11-16 Thread Michael Sokolov
I would start by looking at this: https://docs.gradle.org/current/userguide/application_plugin.html? On Sat, Nov 16, 2019 at 5:02 PM Erick Erickson wrote: > > Hmmm, I’ll have to start looking at this then. There may be two separate > issues here: > > 1> for developers, a convenient way to just

kicking tires of gradle build

2019-11-12 Thread Michael Sokolov
Hi I am playing around with the gradle build. Overall looks great! Thanks to everyone who has been pushing this forward. I have a few questions; maybe just gradle noob questions, since I haven't used it much (except as part of Android Studio, where all the details are kind of taken care of for

Re: kicking tires of gradle build

2019-11-12 Thread Michael Sokolov
ah as soon as I sent, I realized that failed test output goes to the console, so all is well on that front. There are no dumb questions, right? On Tue, Nov 12, 2019 at 9:11 AM Michael Sokolov wrote: > > Hi I am playing around with the gradle build. Overall looks great! > Thanks to eve

Re: kicking tires of gradle build

2019-11-12 Thread Michael Sokolov
AM Dawid Weiss wrote: > > Also, I wrote a short help for running tests under Gradle branch -- if > it's of some help to you, please read it! > > https://github.com/apache/lucene-solr/blob/jira/SOLR-13452_gradle_7/buildSrc/common/help-text/testHelp.txt > > On Tue, Nov 12, 20

Re: BadApple

2019-11-18 Thread Michael Sokolov
I ran TestFstDirectAddressing.testDeDupeTails > 100 times with no failure, both on master and on branch_8x (on an Ubuntu w/a couple of different JDKs). Can you tell which branch/jvm/os is seeing the failures for that test? I couldn't tell from the attached report. On Mon, Nov 18, 2019 at 11:28 AM

Re: BadApple

2019-11-18 Thread Michael Sokolov
Perhaps the failures were reported prior to the recent fix? https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=359864c On Mon, Nov 18, 2019 at 5:15 PM Michael Sokolov wrote: > > I ran TestFstDirectAddressing.testDeDupeTails > 100 times with no > failure, both on master and

Re: Lucene/Solr 8.4

2019-11-22 Thread Michael Sokolov
+1 from me - does this mean you (Adrien) are volunteering to be RM? On Fri, Nov 22, 2019 at 9:01 AM Erick Erickson wrote: > > +1 > > > On Nov 22, 2019, at 5:10 AM, Ignacio Vera wrote: > > > > +1 > > > > On Fri, Nov 22, 2019 at 10:56 AM jim ferenczi > > wrote: > > +1 > > > > Le ven. 22 nov.

Re: Welcome Bruno Roustant as Lucene/Solr committer

2019-11-23 Thread Michael Sokolov
Welcome Bruno! On Sat, Nov 23, 2019, 3:46 AM Dawid Weiss wrote: > Hi and welcome, Bruno. > Dawid > > On Sat, Nov 23, 2019 at 9:29 AM Adrien Grand wrote: > > > > Hi all, > > > > Please join me in welcoming Bruno Roustant as the latest Lucene/Solr > committer! > > > > It didn't take many JIRA

Re: GitHub UI: only want "squash" merge button

2019-10-25 Thread Michael Sokolov
Looks good except are we sure we do not want rebase? On Fri, Oct 25, 2019, 5:51 PM David Smiley wrote: > Infra informed me this is now self-service via adding a ".asf.yaml" file. > I looked a the docs and I see this is also an opportunity to set the GitHub > project's description, homepage

  1   2   3   4   5   6   >