[ 
https://issues.apache.org/jira/browse/LUCENE-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6576.
---------------------------------
    Resolution: Invalid

bad ram chip after all.

> possible index corruption with java 8u45
> ----------------------------------------
>
>                 Key: LUCENE-6576
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6576
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>
> Recently, I've experienced sporatic corruptions when trying to index 
> wikipedia in the benchmark. I know  [~mikemccand] hit similar problems in the 
> nightly benchmark, and he also has an older cpu (see below for more on this).
> I am using this python script (compliments of mike) to index wikipedia in a 
> loop, tweaked for lots of threads and heavy merging so it fails faster: 
> http://pastebin.com/jwpdELDe I get corruptions constantly, though sometimes 
> it takes a few iterations.
> The errors look like this, where the bytes we write "seem to be fine" but the 
> CRC32 itself is maybe computed incorrectly at *write time*:
> {quote}
> Exception in thread "Thread-0" java.lang.RuntimeException: 
> org.apache.lucene.index.CorruptIndexException: checksum failed (hardware 
> problem?) : expected=e2b2d8f5 actual=a04da0c 
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim")))
>       at perf.IndexThreads$IndexThread.run(IndexThreads.java:402)
> Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
> (hardware problem?) : expected=e2b2d8f5 actual=a04da0c 
> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data/corrumption_playground/index/_1p_Lucene50_0.tim")))
> {quote}
> This happens with different file extensions (.tip, .tim, .pos, .doc, .dvd, 
> ...). Whenever one of these corrupted files was included in a commit point, 
> I've run "the rest of CheckIndex" minus the CRC32 check and it always passes: 
> but that is no guarantee thats what is happening.
> I think maybe the bugs are for some reason, easier to reproduce on my CPU, 
> maybe because its older and only has AVX1, or some other reason:
> {quote}
> model         : 42
> model name    : Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
> flags         : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp 
> lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
> aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 
> cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx 
> lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept 
> vpid
> {quote}
> Other notes:
> * does not need multiple threads. I did this to make the "test" fail faster. 
> It will fail sometimes with maxBufferedDocs + SerialMergeScheduler + 1 
> thread, which is deterministic.
> * have not tested JDK9 in any way, might be some already-fixed bug.
> * I've run numerous hardware tests: memory, disk, etc. 
> * I've run the tests with two different SSD drives: both fail.
> First step: clean up this script and make it so it can be reproduced on other 
> hardware. I can try on my laptop as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to