Failed CI verify (was Re: [VOTE] Apache Accumulo 1.6.1 RC1)

2014-09-26 Thread Josh Elser
Welp, after 8 hrs of memtest86+ with no errors, followed by 4B CI (~11hrs) with 2 tservers + random manual `kill -9`'ing (same characteristics as the first run), I just had a clean verify. REFERENCED=4034576211 UNREFERENCED=1000161 I did update to a newer version of 2.6.0-SNAPSHOT and updated

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-26 Thread William Slacum
Ah, that's a thought to think about. The conclusion I came was made specifically because the vote had ended, so idk if it would've helped. Of course, actually participating on my end would've been the best course of action. On Fri, Sep 26, 2014 at 8:12 AM, Christopher wrote: > No, not after the

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-26 Thread Christopher
No, not after the vote closes. I was trying to say that the concerns you expressed might have had greatest impact if they were expressed with a -1 while the vote was open. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Sep 26, 2014 at 12:40 AM, William Slacum < wilhelm.von.cl...@

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread William Slacum
Can you do that after the vote closed? Corey did some good stuff in documenting our release process, so I'm confident these releases can be iterated on faster now, which would mitigate this situation. On Thu, Sep 25, 2014 at 9:31 PM, Christopher wrote: > Sorry, reply was to Bill. I know GMail do

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Christopher
Sorry, reply was to Bill. I know GMail doesn't thread well, so top-posting is problematic. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Thu, Sep 25, 2014 at 9:28 PM, Corey Nolet wrote: > Christopher, are you referring to Keith's last comment or Bill Slacum's? > > On Thu, Sep 25, 2

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Corey Nolet
Christopher, are you referring to Keith's last comment or Bill Slacum's? On Thu, Sep 25, 2014 at 9:13 PM, Christopher wrote: > That seems like a reason to vote -1 (and perhaps to encourage others to do > so also). I'm not sure this can be helped so long as people have different > criteria for th

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Christopher
That seems like a reason to vote -1 (and perhaps to encourage others to do so also). I'm not sure this can be helped so long as people have different criteria for their vote, though. If we can fix those issues, I'm ready to vote on a 1.6.2 :) -- Christopher L Tubbs II http://gravatar.com/ctubbsii

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Keith Turner
I ran 24 hr of random walk against 1.6.1. I saw ACCUMULO-3169, ACCUMULO-3170, and ACCUMULO-3171. I feel like these are not new in 1.6.1, but have not investigated in depth yet. On Fri, Sep 19, 2014 at 10:49 PM, Corey Nolet wrote: > Devs, > > Please consider the following candidate for Apache Ac

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread William Slacum
I'm a little concerned we had two +1's that mention failures. The one time when we're supposed to have a clean run through, we have 50% of the participators noticing failure. It doesn't instill much confidence in me. On Thu, Sep 25, 2014 at 2:18 PM, Josh Elser wrote: > Please make a ticket for i

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Josh Elser
Please make a ticket for it and supply the MAC directories for the test and the failsafe output. It doesn't fail for me. It's possible that there is some edge case that you and Bill are hitting that I'm not. Corey Nolet wrote: I'm seeing the behavior under Max OS X and Fedora 19 and they hav

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Corey Nolet
I'm seeing the behavior under Max OS X and Fedora 19 and they have been consistently failing for me. I'm thinking ACCUMULO-3073. Since others are able to get it to pass, I did not think it should fail the vote solely on that but I do think it needs attention, quickly. On Thu, Sep 25, 2014 at 10:43

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Bill Havanki
I haven't had an opportunity to try it again since my +1, but prior to that it has been consistently failing. - I tried extending the timeout on the test, but it would still time out. - I see the behavior on Mac OS X and under CentOS. (I wonder if it's a JVM thing?) On Wed, Sep 24, 2014 at 9:06 P

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Corey Nolet
Vote passes with 4 +1's and no -1's. Bill, were you able to get the IT to run yet? I'm still having timeouts on my end as well. On Wed, Sep 24, 2014 at 1:41 PM, Josh Elser wrote: > The crux of it is that both of the errors in the CRC where single bit > "variants". > > y instead of 9 and p inst

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Josh Elser
The crux of it is that both of the errors in the CRC where single bit "variants". y instead of 9 and p instead of 0 Both of these cases are a '1' in the most significant bit of the byte instead of a '0'. We recognized these because y and p are outside of the hex range. Fixing both of these fi

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Corey Nolet
Bill, I've been having that same IT issue and said the same thing "It's not happening to others". I lifted the timeout completely and it never finished. On Wed, Sep 24, 2014 at 1:13 PM, Mike Drob wrote: > Any chance the IRC chats can make it only the ML for posterity? > > Mike > > On Wed, Sep

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Mike Drob
Any chance the IRC chats can make it only the ML for posterity? Mike On Wed, Sep 24, 2014 at 12:04 PM, Keith Turner wrote: > On Wed, Sep 24, 2014 at 12:44 PM, Russ Weeks > wrote: > > > Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from each > > other. I blame cosmic rays! > > >

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Keith Turner
+1 I did a little more poking at 1.6.1 and it looks good. Thanks Corey and Josh for putting these releases together. Sigs and hashes for bin.tar.gz and src.tar.gz look good Successfully ran mutslam[1] against 1.6.1 rc1, using staging repo. Ci w/ Agitation verify ran successfully. Env hadoop 2.

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Keith Turner
On Wed, Sep 24, 2014 at 12:44 PM, Russ Weeks wrote: > Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from each > other. I blame cosmic rays! > It is interesting, and thats only half of the story. Its been interesting chatting w/ Josh about this on irc and hearing about his findin

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Bill Havanki
+1 - MD5 and SHA1 checksums verified - signatures verified - unit tests pass - integration tests pass except one (see below) - recursive MD5 of all files pass - two one-hour CI runs (one with agitation, one without) verified I consistently get a timeout failure from DeleteTableDuringSplitIT, but

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Russ Weeks
Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from each other. I blame cosmic rays! On Wed, Sep 24, 2014 at 9:05 AM, Josh Elser wrote: > >>> The offending keys are: >>> >>> 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242 >>> >>> 3a10885b-d481-4d00-be00-0477e231ey65:8576b169:

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Josh Elser
Keith Turner wrote: 7e56b58a0c7df128 5fa0:6249 [] 1411499311578 > > 3a10885b-d481-4d00-be00-0477e231e965:p000872d60eb:499fa72752d82a7c:5c5f19e8 > > which both happened a little after 3:00pm eastern (I stopped CI around > 3:30pm eastern). I don't see anything immediately wrong in the tserv

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Josh Elser
The offending keys are: 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242 3a10885b-d481-4d00-be00-0477e231ey65:8576b169:0cd98965c9ccc1d0:ba15529e The careful eye will notice that the UUID in the first component of the value has a different suffix than the next corrupt key/value (ends with

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Josh Elser
Accumulo server logs are ~118M. The CI logs are tiny. Hadoop logs I have are around 126M (although probably relevant bits are much smaller). Sadly, I didn't run with archived WALs, so I'm not sure how useful the server logs are on their own. Sean Busbey wrote: Josh, how big are all the logs?

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Keith Turner
Rescinding my -1 vote. Josh helped me figure this one out on IRC. There used to be a tag named 1.6.1-rc1. That no longer exists at apache repo, now there is a branch called 1.6.1-rc. I had both the old tag and the new branch, git was taking the tag. git fetch --prune did not remove the tag, I h

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Keith Turner
-1 GOOD Sigs and hashes for bin.tar.gz and src.tar.gz look good Successfully ran mutslam[1] against 1.6.1 rc1, using staging repo. Ci w/ Agitation verify ran successfully. Env hadoop 2.3.0, ZK 3.4.5, Centos 6, 20 node EC2. 17B ingested. [1]: https://github.com/keith-turner/mutslam BAD I am

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Keith Turner
I had a bit more positive experience :) org.apache.accumulo.test.continuous.ContinuousVerify$Counts REFERENCED=16960663560 UNREFERENCED=16491955 Ci w/ Agitation verify ran successfully. Env hadoop 2.3.0, ZK 3.4.5, Centos 6, 20 node EC2. 17B ingested. On

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Keith Turner
On Wed, Sep 24, 2014 at 12:43 AM, Josh Elser wrote: > Well, color me shocked -- the verify found some bad data. It looks > like two keys have bad checksums (which I assume is what created the > UNDEFINEDs, too?). > > CORRUPT 2 > oh wow, I have never seen that happen since I added the checksums t

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Sean Busbey
Josh, how big are all the logs? On Tue, Sep 23, 2014 at 9:43 PM, Josh Elser wrote: > Well, color me shocked -- the verify found some bad data. It looks > like two keys have bad checksums (which I assume is what created the > UNDEFINEDs, too?). > > CORRUPT 2 > REFERENCED 219908 > UNDEFINED 2

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-23 Thread Josh Elser
Well, color me shocked -- the verify found some bad data. It looks like two keys have bad checksums (which I assume is what created the UNDEFINEDs, too?). CORRUPT 2 REFERENCED 219908 UNDEFINED 2 UNREFERENCED 874770 I ran two tabletservers on my desktop, turned on hflush instead of hsync, swit

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-23 Thread Josh Elser
+1 * Verified checksums+sigs * Build from source tarball and ran all unit+functional tests against Apache Hadoop 2.5.1 and 2.6.0-SNAPSHOT * Ingested 2B records w/ CI + clean verify with single tserver (Apache Hadoop 2.6.0-SNAPSHOT + Apache ZooKeeper 3.4.5) * Ingested ~2.5B records w/ CI with 2 tse

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-23 Thread Eric Newton
> > (which needs to be signed) > It is signed... I forgot I have to add trust: $ gpg --update-trustdb Thanks Corey! On Tue, Sep 23, 2014 at 9:19 AM, Eric Newton wrote: > +1 > Verified signature (which needs to be signed) > Verified ingest performance (on a single node) > Looked over the user

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-23 Thread Eric Newton
+1 Verified signature (which needs to be signed) Verified ingest performance (on a single node) Looked over the user's manual from the binary tarball Ran all unit and integration tests On Fri, Sep 19, 2014 at 10:49 PM, Corey Nolet wrote: > Devs, > > Please consider the following candidate for A

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-22 Thread Corey Nolet
Yeah I'll push it tonight. On Mon, Sep 22, 2014 at 4:28 PM, Josh Elser wrote: > This appears to have been a snafu (related to the push-screwup). I'll try > to restore if I have the branch locally, but you might have to re-push your > branch, Corey (or anyone else who has the SHA1 listed in his o

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-22 Thread Josh Elser
This appears to have been a snafu (related to the push-screwup). I'll try to restore if I have the branch locally, but you might have to re-push your branch, Corey (or anyone else who has the SHA1 listed in his original VOTE email). On 9/22/14, 1:26 PM, Josh Elser wrote: Corey, I don't see th

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-22 Thread Josh Elser
Corey, I don't see the branch. Did you forget to push? On 9/19/14, 10:49 PM, Corey Nolet wrote: Devs, Please consider the following candidate for Apache Accumulo 1.6.1 Branch: 1.6.1-rc1 SHA1: 88c5473b3b49d797d3dabebd12fe517e9b248ba2 Staging Repository: *https://repository.apache.org/content/re

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-22 Thread Keith Turner
I started running CI w/ agitation on 20 EC2 nodes against 1.6.1 RC1 On Fri, Sep 19, 2014 at 10:49 PM, Corey Nolet wrote: > Devs, > > Please consider the following candidate for Apache Accumulo 1.6.1 > > Branch: 1.6.1-rc1 > SHA1: 88c5473b3b49d797d3dabebd12fe517e9b248ba2 > Staging Repository: > *

[VOTE] Apache Accumulo 1.6.1 RC1

2014-09-19 Thread Corey Nolet
Devs, Please consider the following candidate for Apache Accumulo 1.6.1 Branch: 1.6.1-rc1 SHA1: 88c5473b3b49d797d3dabebd12fe517e9b248ba2 Staging Repository: *https://repository.apache.org/content/repositories/orgapacheaccumulo-1017/