Hi,
Robert and I worked a bit on our analyzers again. Starting from now on main
and 9.x branch, the well-known integration test "TestRandomChains"
(complemented by "TestAllAnalyzersHaveFactories") gots lifted to the next
stage: It uses the Java Module System (yeaaah) to discover all TokenFilters,
nixFileSystemProvider.java:369)
> at java.base/java.nio.file.Files.getFileStore(Files.java:1492)
> at org.apache.solr.handler.IndexFetcher.getUsableSpace(IndexFetcher.java:1046)
>
>
>
>
> On Sun, Oct 18, 2020, 11:06 AM Michael Sokolov wrote:
>>
>> I ran a full test
java:1046)
On Sun, Oct 18, 2020, 11:06 AM Michael Sokolov wrote:
> I ran a full test suite this morning and got 51 test failures, all in
> solr.core. It looks like the same has been happening in Jenkins for
> the last couple of days
> https://ci-builds.apache.org/job/Lucene/job/Lu
I ran a full test suite this morning and got 51 test failures, all in
solr.core. It looks like the same has been happening in Jenkins for
the last couple of days
https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Check-master/.
Based on timing, the only commit that seemed likely was
LUCENE
Sometime in the last day or so we’ve had a bad uptick in test failures, see:
http://fucit.org/solr-jenkins-reports/failure-report.html, especially for the
last 24 hours.
Fail rate Test
153% CdcrVersionReplicationTest
147% TestReplicationHandlerDiskOverflow
146
r affects them
(cherry picked from commit c7822c393e6affa77c233f9e8e9bf9d8aeb12578)
(cherry picked from commit 0291db44bc8e092f7cb2f577f0ac8ab6fa6a5fd7)
> Sporadic Auth + Cloud test failures, probably due to lag in nodes reloading
>
mmit 0291db44bc8e092f7cb2f577f0ac8ab6fa6a5fd7 in lucene-solr's branch
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0291db4 ]
SOLR-13464: fix javadoc typo that precommit somehow missed?
> Sporadic Auth + Cloud test failures, probably d
r affects them
> Sporadic Auth + Cloud test failures, probably due to lag in nodes reloading
> security config
> ---
>
> Key: SOLR-13464
> URL: https://issues.
om commit 878d332a0bd7374190a85a23d3a6241d930289f3)
> Sporadic Auth + Cloud test failures, probably due to lag in nodes reloading
> security config
> ---
>
> Key: SOLR-13464
> URL: https://is
mmit 878d332a0bd7374190a85a23d3a6241d930289f3 in lucene-solr's branch
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=878d332 ]
Harden BasicAuthIntegrationTest w/work around for SOLR-13464
> Sporadic Auth + Cloud test failures, probabl
're moved.
Hey, but when I do get moved, if all goes well there _may_ be a fiber
connection, won't I be fast then.
> Clean up any test failures resulting from defaulting to async logging
> -
>
>
sSysoutChecks or run with -Dtests.verbose=true
[junit4]>at
__randomizedtesting.SeedInfo.seed([6B010C34AB756A8C]:0)
[junit4]>at java.base/java.lang.Thread.run(Thread.java:835)
{noformat}
> Cle
transport http2 module to 9.4.19 (which
contains the fix). The tests passed + the Hoss's command for running the failed
test also passed.
[~hossman] I'm planning to commit this patch to branch_8x and master also to
see the result of Jenkins run. Does that make sense?
> suspicious
ased:
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.19.v20190610
> suspicious test failures caused by jetty TimeoutException related to using
> HTTP2
>
>
>
sion 9.4.19 will contain the fix. But I'm
not sure when it will be released.
{quote}
is there any work around we can do in solr code in the meantime?
{quote}
I can only come up with the idea of moving Jetty classes (around 5,6 classes)
with fixes and use them in Solr.
> suspicious test failur
sion of jetty will have this fix
and/or when it will be publicly available?
is there any work around we can do in solr code in the meantime? (it seemed
pretty low level based on the jetty commit, but i figured i'd ask)
> suspicious test failures caused by jetty TimeoutException rela
ate {{GET /admin/auth...}} to every (live?) node in the
cluster, and include map of nodeName => enabled.version for every node ...
maybe?)
: >
: > Thoughts?
: >
: >> Sporadic Auth + Cloud test failures, probably due to lag in nodes
reloading security config
: &g
uld potentially take things even a step further, and add something like
> a {{verify.cluster.version=true|false}} option to SecurityConfHandlerZk, that
> would federate {{GET /admin/auth...}} to every (live?) node in the cluster,
> and include map of nodeName => enabled.version fo
some attention got
paid to this issue and it didn't get lost in the process.
Given that I'm pretty sure it's a function of the test framework and it's
become quite rare even in the test situations, I don't think it needs to be a
blocker.
> Clean up any test failures
but has no Fix Version.??
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/jira/browse/SOLR-13268
&g
ntially take things even a step further, and add something like a
{{verify.cluster.version=true|false}} option to SecurityConfHandlerZk, that
would federate {{GET /admin/auth...}} to every (live?) node in the cluster, and
include map of nodeName => enabled.version for every node ...
;at
__randomizedtesting.SeedInfo.seed([C9DDA6254F404BB5:75B3D037EB13C8CF]:0)
[junit4]>at
org.apache.solr.security.BasicAuthIntegrationTest.testBasicAuth(BasicAuthIntegrationTest.java:204)
Hoss Man created SOLR-13464:
---
Summary: Sporadic Auth + Cloud test failures, probably due to lag
in nodes reloading security config
Key: SOLR-13464
URL: https://issues.apache.org/jira/browse/SOLR-13464
cene test case directly. Although you then *still* see these
fails and find out that Hadoop tests redefine thread leak stuff so that they
only use the hadoop stuff and not the base solr thread leak stuff. With all of
that addressed, these go away.
> Clean up any test failures resulting from de
this is a real bug in the system,
existing in 8.0.
> suspicious test failures caused by jetty TimeoutException related to using
> HTTP2
>
>
> Key: SOLR-13413
> URL: htt
[
https://issues.apache.org/jira/browse/SOLR-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gus Heck updated SOLR-13413:
Affects Version/s: 8.0
> suspicious test failures caused by jetty TimeoutException related to us
down!
> suspicious test failures caused by jetty TimeoutException related to using
> HTTP2
>
>
> Key: SOLR-13413
> URL: https://issues.apache.org/
.com/eclipse/jetty.project/issues/3605
> suspicious test failures caused by jetty TimeoutException related to using
> HTTP2
>
>
> Key: SOLR-13413
> URL: https://issues.
[
https://issues.apache.org/jira/browse/SOLR-13413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cao Manh Dat reassigned SOLR-13413:
---
Assignee: Cao Manh Dat
> suspicious test failures caused by jetty TimeoutException rela
mmit f08ddbc713b8fa528307c6c1c48e2522e7c220f8 in lucene-solr's branch
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f08ddbc ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging
(cherry picked fr
mmit 48dc020ddaf0b0911012b4d9b77d859b2af3d3ae in lucene-solr's branch
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=48dc020 ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging
> Clean up any test failur
l
files. Does it need to be synchronous? I've changed it to async in the push I'm
about to do, but I can change it back.
> Clean up any test failures resulting from defaulting to async logging
> -
>
mmit c5339888749200a95e4177ff6ec9413ab83861e5 in lucene-solr's branch
refs/heads/master from Cao Manh Dat
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c533988 ]
SOLR-13413: Adding debug log for HttpConnection
> suspicious test failures caused by jetty TimeoutException rela
Hoss Man created SOLR-13413:
---
Summary: suspicious test failures caused by jetty TimeoutException
related to using HTTP2
Key: SOLR-13413
URL: https://issues.apache.org/jira/browse/SOLR-13413
Project: Solr
is happening too late and/or I need to move
the bit that (apparently) reroutes the logging. But that's wild speculation and
I may be totally off base.
[~janhoy] You've been into some of the logging, any clue?
> Clean up any test
.7.0 http://yetus.apache.org |
This message was automatically generated.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
>
mmit 3ac07b8dfb00322a291c61c480f7410afd5e4ee9 in lucene-solr's branch
refs/heads/master from erick
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3ac07b8 ]
SOLR-13268: Patch that flushes when shutting down
> Clean up any test failures resulting from defaulting to a
mmit 0894e39cebbc349968bee024b82792f1d433bbd7 in lucene-solr's branch
refs/heads/branch_8x from erick
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0894e39 ]
SOLR-13268: Patch that flushes when shutting down
> Clean up any test failures resulting from defaulting to a
ase, but that's certainly open for discussion.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issue
mmit a87ba09e11a9bce4e5d97e3f713afe261112efc5 in lucene-solr's branch
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a87ba09 ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging. Put TestXmlQParser back
(cher
mmit b893548d97f8b04b40dfbebd79bd860603b92c63 in lucene-solr's branch
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b893548 ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging. Put TestXmlQParser back
>
h the test. And besides, I've lost it
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/j
wondered if it shouldn't just be removed.
{quote}
Could you share details of what the reproducible failure is? I'd be curious
what it is. And yeah if it's not too much trouble to fix then it likely would
be worthwhile fixing IMHO.
> Clean up any test failures resulting
mmit 132f5433103a121d400f071c5739a9af318f4807 in lucene-solr's branch
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=132f543 ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging. Fix precommit test to not in
mmit 3a1603dab349fa828217fe1146e73c6a5f1a4fcf in lucene-solr's branch
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3a1603d ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging. Fix precommit test to not in
http://yetus.apache.org |
This message was automatically generated.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://iss
mmit 277251c20297ece755267f9bf70013b563854bb9 in lucene-solr's branch
refs/heads/branch_8x from erick
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=277251c ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging
(cherry picked from commit 9272c29)
&
mmit 9272c295392a6e95394655b7abec8d75518c19d4 in lucene-solr's branch
refs/heads/master from erick
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9272c29 ]
SOLR-13268: Clean up any test failures resulting from defaulting to async
logging
> Clean up any test failures re
pen for a while.
[~cpoerschke] I went ahead and removed TestXmlQParser, let me know if we need
to add it back.
> Clean up any test failures resulting from defaulting to async logging
> -
>
>
[
https://issues.apache.org/jira/browse/SOLR-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson updated SOLR-13268:
--
Attachment: SOLR-13268.patch
> Clean up any test failures resulting from defaulting to as
ook like it was intended to do anything solr-ish.
It relates to this JIRA, I have a reproducible seed on this test class. I can
fix it without too much trouble, but when I looked at this I wondered if it
shouldn't just be removed.
> Clean up any test failures resulting
ttps://builds.apache.org/job/PreCommit-SOLR-Build/330/console |
| Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
> Clean up any test failures resulting from defaulting to async
d the tests that currently subclass LuceneTestCase. Comments?
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/
the
logger. Should not need to call it anywhere else. SolrTestCaseJ4 could extend
SolrTestCase and it would get the shutdown for free?
> Clean up any test failures resulting from defaulting to async
the same gremlin, the "smoking gun" is "at
com.lmax.disruptor". I should have a patch later today that I think addresses
all of these, the question will be whether it's the right approach.
> Clean up any test failures resul
ead leaked from SUITE
scope at org.apache.solr.request.TestFacetMethods:
[junit4]>1) Thread[id=13, name=Log4j2-TF-1-AsyncLoggerConfig-1,
state=TIMED_WAITING, group=TGRP-TestFacetMethods]
[junit4]> at sun.misc.Unsafe.park(Native Method)
{quote}
> Clean up an
s out that somehow when I copy/pasted, there was a high-ascii byte 'c2'
after "-Dtests.assert" that was invisible that I was only able to see in a hex
editor... About drove me nuts.
> Cl
in a hex
editor... About drove me nuts.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/j
the LogManager.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/jira/browse/SOLR-13268
> Project:
fail almost all the time, with or without
the seed. I've cleaned my ivy cache on the off chance there's something weird
there and verifying. So it's apparently not something peculiar to the seed or
randomization.
> Clean up any test failures resulting from defau
m wondering if there's some weirdness with the slf4j....
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues
7;s still there. I can still work with the seeds
that reliably fail of course.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: h
ly either on the public Jenkins servers. I've
still seen a few failures locally based on the master branch.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-1
the last day, however the seeds still reproduce.
[~krisden] I've been fiddling around with my mail filters, so wanted to check
if you've seen any errors reported. I see other test errors so I'm guessing
it's just luck but wanted to check.
> Clean up any test failures re
f" pop out.
On the other other hand, I wonder if proper shutdown in general (and I'm
grasping at straws) might account for some other test failures to leaked
objects. Doubtful, but I can always hope.
I do want to look at the reproducing seeds and see if I can understand _why_
there&
roperly.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/jira/browse/SOLR-13268
>
blem like you had said previously. Since I
think tests are the only place the JVM is not shutdown between usage of the
LogManager.
> Clean up any test failures resulting from defaulting to async logging
> -
>
>
nous logging, which I also think is yucky.
If we do something like <1>, I'd like to have it become a precommit failure if
possible to keep it from creeping back.
I suppose in any extreme cases we can "somehow" allow a test case to subclass
LuceneTestCase, but I can'
or.java:125)
[junit4] > at java.lang.Thread.run(Thread.java:748)
[junit4] > at __randomizedtesting.SeedInfo.seed([1385A6FD46415A7F]:0)
[junit4] Completed [1/1 (1!)] in 26.39s, 6 tests, 2 errors <<&
ntly is how is the LogManager getting
shutdown in the synchronous case? I don't see any LogManager.shutdown() calls
and yet everything behaves properly. Maybe only the async case spins up
background threads and thats why the shutdown has to be handled correctly?
> Clean up any t
Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/jira/browse/SOLR-13268
> Project: Solr
lean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/jira/browse/SOLR-13268
> Project: Solr
>
sensitive to a logger being defined (or not). But I
admit I have no idea _how_ to make that happen.
But that would give the Lucene devs heartburn, Lucene doesn't use logging.
H2. Is it really that onerous to require that test cases extend
SolrTestCaseJ4 rather than LuceneTestCase?
> Clea
s is if the FIRST test in the test JVM,
doesn't have "LoggerFactory.getLogger(", then there will be a failure. All the
failures I've seen lately have been very early in the test runs.
> Clean up any test failures resulting from
think adding "private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());" to tests is the
correct answer but it definitely is an interesting finding.
> Clean up any test fail
#x27;t be able to look at this at least for the
rest of the day.
The shutdown bits changed some tests from always failing to passing, but I
suspect that it was just masking a problem. Especially since the tests pass
with that bit commented out. So I'll probably back those all out...
>
d [1/844 (1!)] on J2 in 38.09s, 4 tests, 2 errors <<<
FAILURES!
{code}
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
>
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
[junit4]> at java.lang.Thread.run(Thread.java:748)
[junit4]>at
__randomizedtesting.SeedInfo.seed([3EF531368FC72D9F]:0)
[junit4] Completed [8/844 (1!)] on J1 in 24.65s, 1 test, 2 errors <&l
ic start/stop handling.
> Clean up any test failures resulting from defaulting to async logging
> -
>
> Key: SOLR-13268
> URL: https://issues.apache.org/jira/browse/SOLR-13268
I'm not
sure I see any _practical_ (as opposed to a test issue) consequences. And
certainly users can enable it on their own.
Or leave the config for async logging commented out (the opposite of now). Not
sure I like that one, but....
Thoughts?
> Clean up any test fail
the shutdown and test class changes
removed in https://github.com/apache/lucene-solr/pull/586. Tests pass with the
cleanup as well.
> Clean up any test failures resulting from defaulting to async logging
> -
>
>
og4j thread leak
On JDK 11 in ~30 runs - 4 failures due to Log4j thread leak
Much better than JDK 8/11 on Apache master branch which was failing almost
every other run.
> Clean up any test failures resulting from defaulting to async
mmit a108b4f730c5404ec7f798fb9d01da1ff0587070 in lucene-solr's branch
refs/heads/branch_8x from erick
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a108b4f ]
SOLR-13268: Clean up any test failures resulting from SOLR-12055 (async
logging). Kevin's upgrades
(cherry pick
[
https://issues.apache.org/jira/browse/SOLR-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson updated SOLR-13268:
--
Summary: Clean up any test failures resulting from defaulting to async
logging (was: Clean up
mmit fe5a96a284cf2f75b25114c20299d5bb4de843a3 in lucene-solr's branch
refs/heads/master from erick
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fe5a96a ]
SOLR-13268: Clean up any test failures resulting from SOLR-12055 (async
logging). Kevin's upgrades
> Clean up
ternoon, so if this does horrible things anyone who
cares to should feel free to revert.
> Clean up any test failures resulting from SOLR-12055 (async logging)
>
>
> Key: SOLR-13268
> URL
[
https://issues.apache.org/jira/browse/SOLR-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson updated SOLR-13268:
--
Attachment: SOLR-13268.patch
> Clean up any test failures resulting from SOLR-12055 (as
the additions to StartupLoggingUtils
in particular is cruft.
All bets are off of course if we still have problems, I'll un-comment the
shutdown in StartupLoggingUtils in tat case and we'll see.
Wish I could get this stuff to fail locally. I beasted 1,000 times with a bunch
12 in parall
hether my shutdown in StartupLoggingUtils is necessary
in jdk8 with these upgrades, mostly out of curiosity. I'm not going to be able
to push anything tonight, maybe tomorrow morning.
> Clean up any test failures resulting from SOLR-120
rd Apache
license from https://logging.apache.org/log4j/2.0/license.html? If so I'll put
this in and push this all sometime tonight...
> Clean up any test failures resulting from SOLR-12055 (async logging)
>
>
>
ense/notice.
I haven't seen any log4j thread leaks on JDK 8 in ~15 runs. I saw two log4j
thread leak failures out of 12 runs so far for JDK 11.
> Clean up any test failures resulting from SOLR-12055 (a
m getting a precommit failure that there's no
license for log4j-web-LICENSE.. Is this just a copy of the standard Apache
license from https://logging.apache.org/log4j/2.0/license.html? If so I'll put
this in and push this all sometime tonight...
> Clean up any test failures res
nges and run the test suites and if all that
works do you want to push it or shall I?
And thanks a million for helping here...
> Clean up any test failures resulting from SOLR-12055 (async logging)
>
>
>
ttps://github.com/apache/lucene-solr/pull/586] to try the changes I mentioned
above.
* Upgrade log4j2 from 2.11.0 to 2.11.2
* Upgrade lmax from 3.4.0 to 3.4.2
* Add log4j-web dependency based on LOG4J2-1259 and
https://logging.apache.org/log4j/2.0/manual/webapp.html
> Clean up any test f
ross LOG4J2-1259. Near the end it
mentioned log4j-web. https://logging.apache.org/log4j/2.0/manual/webapp.html
> Clean up any test failures resulting from SOLR-12055 (async logging)
>
>
>
ier.waitFor(ProcessingSequenceBarrier.java:56)
[junit4]> at
app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
[junit4]> at
java.base@11.0.2/java.lang.Thread.run(Thread.java:834)
[junit4]>at
__randomizedtestin
at
__randomizedtesting.SeedInfo.seed([466988C6E1830B46]:0)
[junit4] Completed [6/844 (1!)] on J0 in 21.23s, 7 tests, 2 errors <<<
FAILURES!
{code}
> Clean up any test failures resulting from SOLR-12055 (async logging)
>
ttps://github.com/LMAX-Exchange/disruptor/releases
Upgrading LMAX disrupter to 3.4.2 was in the log4j 2.11.1 release notes. We
could also upgrade log4j2 2.11.0 -> 2.11.2.
Not sure if these would help but probably wouldn't hurt either.
> Clean up any test failures resulting from SOLR-12055
a:56)
[junit4]> at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
[junit4]> at java.lang.Thread.run(Thread.java:748)
[junit4]>at
__randomizedtesting.SeedInfo.seed([44A484596B614BAD]:0)
[junit4] Completed [4/844 (1!)] on J0 in 27.
Erick Erickson created SOLR-13268:
-
Summary: Clean up any test failures resulting from SOLR-12055
(async logging)
Key: SOLR-13268
URL: https://issues.apache.org/jira/browse/SOLR-13268
Project: Solr
1 - 100 of 701 matches
Mail list logo