Rise of the Dragon

2016-03-19 Thread Colin P. McCabe
Who is creating all these dragon JIRAs? Colin

Re: CHANGES.txt is gone from trunk, branch-2, branch-2.8

2016-03-08 Thread Colin P. McCabe
+1 Thanks, Andrew. This will avoid so many spurious conflicts when cherry-picking changes, and so much wasted time on commit. best, Colin On Thu, Mar 3, 2016 at 9:11 PM, Andrew Wang wrote: > Hi all, > > With the inclusion of HADOOP-12651 going back to branch-2.8,

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Colin P. McCabe
ther / when it can be released in 2.9. I think we > should rather concentrate our EC dev efforts to harden key features under > the follow-on umbrella HDFS-8031 and make it solid for a 3.0 release. > > Sincerely, > Zhe > > On Mon, Feb 22, 2016 at 9:25 AM Colin P. McCabe <c

Re: Looking to a Hadoop 3 release

2016-02-22 Thread Colin P. McCabe
+1 for a release of 3.0. There are a lot of significant, compatibility-breaking, but necessary changes in this release... we've touched on some of them in this thread. +1 for a parallel release of 2.8 as well. I think we are pretty close to this, barring a dozen or so blockers. best, Colin On

Re: Hadoop encryption module as Apache Chimera incubator project

2016-02-02 Thread Colin P. McCabe
It's great to see interest in improving this functionality. I think Chimera could be successful as an Apache project. I don't have a strong opinion one way or the other as to whether it belongs as part of Hadoop or separate. I do think there will be some challenges splitting this functionality

Re: Jenkins stability and patching

2015-11-23 Thread Colin P. McCabe
I agree that our tests are in a bad state. It would help if we could maintain a list of "flaky tests" somewhere in git and have Yetus consider the flakiness of a test before -1ing a patch. Right now, we pretty much all have that list in our heads, and we're not applying it very consistently.

Re: Jenkins stability and patching

2015-11-23 Thread Colin P. McCabe
On Mon, Nov 23, 2015 at 1:53 PM, Colin P. McCabe <cmcc...@apache.org> wrote: > I agree that our tests are in a bad state. It would help if we could > maintain a list of "flaky tests" somewhere in git and have Yetus > consider the flakiness of a test before -1ing a patch

Re: hadoop-hdfs-client splitoff is going to break code

2015-10-19 Thread Colin P. McCabe
Thanks for being proactive here, Steve. I think this is a good example of why this change should have been done in a branch rather than having been done directly in trunk. regards, Colin On Wed, Oct 14, 2015 at 10:36 AM, Steve Loughran wrote: > just an FYI, the split

Re: [DISCUSS] Looking to a 2.8.0 release

2015-10-05 Thread Colin P. McCabe
I think it makes sense to have a 2.8 release since there are a tremendous number of JIRAs in 2.8 that are not in 2.7. Doing a 3.x release seems like something we should consider separately since it would not have the same compatibility guarantees as a 2.8 release. There's a pretty big delta

Re: INotify stability

2015-09-22 Thread Colin P. McCabe
Hi Mohammad, Like ATM said, HDFS-8965 is an important fix in this area. We have found that it prevents cases where INotify tries to read invalid sequences of bytes (sometimes because the edit log was truncated or corrupted; other times because it is in the middle of a write). HDFS-8964 fixes the

Re: Even after HDFS-2856 JSVC References are require..?

2015-09-14 Thread Colin P. McCabe
Has anyone measured the overhead of running SASL on DataTransferProtocol? I would expect it to be non-zero compared with simply running on a low port. The CPU overhead especially could regress performance on a typical Hadoop cluster. best, Colin On Thu, Sep 10, 2015 at 9:55 AM, Chris Nauroth

Re: How to Tag a request using optional field

2015-06-15 Thread Colin P. McCabe
On Thu, Jun 4, 2015 at 2:46 PM, Rahul Shrivastava rhshr...@gmail.com wrote: Hi, Suppose I write a Java client to create a directory on HDFS. Is there a way to tag this request and get the tagged information on NameNode via DFSInotifyEventInputStream or otherwise ? In short, is there a way

Re: DISCUSS: is the order in FS.listStatus() required to be sorted?

2015-06-15 Thread Colin P. McCabe
On Mon, Jun 1, 2015 at 3:21 AM, Steve Loughran ste...@hortonworks.com wrote: HADOOP-12009 (https://issues.apache.org/jira/browse/HADOOP-12009) patches the FS javadoc and contract tests to say the order you get things back from a listStatus() isn't guaranteed to be alphanumerically sorted

Re: Jenkins precommit-*-build

2015-05-05 Thread Colin P. McCabe
Thanks, Allen. This has long been a thorn in our side, and it's really good to see someone work on it. cheers, Colin On Tue, May 5, 2015 at 2:59 PM, Allen Wittenauer a...@altiscale.com wrote: TL;DR: Heads up: I’m going to hack on these scripts to fix the race conditions.

Re: [RESULT][VOTE] Release Apache Hadoop 2.7.0 RC0

2015-04-23 Thread Colin P. McCabe
Sorry for the late reply. It seems like the consensus is that we should push these fixes to 2.7.1. That works for me. HADOOP-11802 should be in there soon, hopefully the rest will follow quickly. best, Colin On Wed, Apr 22, 2015 at 4:27 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com

Re: Hadoop - Major releases

2015-03-17 Thread Colin P. McCabe
Thanks, Andrew and Joep. +1 for maintaining wire and API compatibility, but moving to JDK8 in 3.0 best, Colin On Mon, Mar 16, 2015 at 3:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: I took the liberty of adding line breaks to Joep's mail. Thanks for the great feedback Joep. The goal

Re: upstream jenkins build broken?

2015-03-16 Thread Colin P. McCabe
I'm guessing that's the problem. Been that way since 9:32 UTC on March 5th. -- Aaron T. Myers Software Engineer, Cloudera On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe cmcc...@apache.org wrote: Hi all, A very quick (and not thorough) survey shows that I can't

upstream jenkins build broken?

2015-03-10 Thread Colin P. McCabe
Hi all, A very quick (and not thorough) survey shows that I can't find any jenkins jobs that succeeded from the last 24 hours. Most of them seem to be failing with some variant of this message: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-10 Thread Colin P. McCabe
on hadoop-3.x right? So, I don't see the difference? Arun From: Colin P. McCabe cmcc...@apache.org Sent: Monday, March 09, 2015 3:05 PM To: hdfs-dev@hadoop.apache.org Cc: mapreduce-...@hadoop.apache.org; common-...@hadoop.apache.org; yarn

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-10 Thread Colin P. McCabe
Er, that should read as Allen commented C. On Tue, Mar 10, 2015 at 11:55 AM, Colin P. McCabe cmcc...@apache.org wrote: Hi Arun, Not all changes which are incompatible can be fixed-- sometimes an incompatibility is a necessary part of a change. For example, taking a really old library

Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-09 Thread Colin P. McCabe
Java 7 will be end-of-lifed in April 2015. I think it would be unwise to plan a new Hadoop release against a version of Java that is almost obsolete and (soon) no longer receiving security updates. I think people will be willing to roll out a new version of Java for Hadoop 3.x. Similarly, the

Re: DISCUSSION: Patch commit criteria.

2015-03-02 Thread Colin P. McCabe
I agree with Andrew and Konst here. I don't think the language is unclear in the rule, either... consensus with a minimum of one +1 clearly indicates that _other people_ are involved, not just one person. I would also mention that we created the branch committer role specifically to make it

Re: TimSort bug and its workaround

2015-03-02 Thread Colin P. McCabe
Thanks for bringing this up. If you can find any place where an array might realistically be larger than 67 million elements, then I guess file a JIRA for it. Also this array needs to be of objects, not of primitives (quicksort is used for those in jdk7, apparently). I can't think of any such

Re: Erratic Jenkins behavior

2015-02-18 Thread Colin P. McCabe
Hortonworks http://hortonworks.com/ On 2/12/15, 2:00 PM, Colin P. McCabe cmcc...@apache.org wrote: We could potentially use different .m2 directories for each executor. I think this has been brought up in the past as well. I'm not sure how maven handles concurrent access to the .m2

Re: Theory question: good values for FileStatus.getBlockSize()

2015-02-17 Thread Colin P. McCabe
In the past, block size and size of block N were completely separate concepts in HDFS. The former was often referred to as default block size or preferred block size or some such thing. Basically it was the point at which we'd call it a day and move on to the next block, whenever any block got

Re: max concurrent connection to HDFS name node

2015-02-17 Thread Colin P. McCabe
Hi Demai, Nearly all input and output stream operations will talk directly to the DN without involving the NN. The NameNode is involved in metadata operations such as renaming or opening files, not in reading data. Hope this helps. best, Colin On Thu, Feb 12, 2015 at 4:21 PM, Demai Ni

Re: Datanode synchronization is horrible. I’m thinking we can use ReentrantReadWriteLock for synchronization. What do you guys think?

2015-02-17 Thread Colin P. McCabe
In general, the DN does not perform reads from files under a big lock. We only need the lock for protecting the replica map and some of the block state. This lock hasn't really been a big problem in the past and I would hesitate to add complexity here (although I haven't thought about it that

Re: Erratic Jenkins behavior

2015-02-12 Thread Colin P. McCabe
, Colin P. McCabe (cmcc...@apache.orgmailto:cmcc...@apache.org) wrote: I'm sorry, I don't have any insight into this. With regard to HADOOP-11084, I thought that $BUILD_URL would be unique for each concurrent build, which would prevent build artifacts from getting mixed up between jobs. Based

Re: Erratic Jenkins behavior

2015-02-09 Thread Colin P. McCabe
I'm sorry, I don't have any insight into this. With regard to HADOOP-11084, I thought that $BUILD_URL would be unique for each concurrent build, which would prevent build artifacts from getting mixed up between jobs. Based on the value of PATCHPROCESS that Kihwal posted, perhaps this is not the

Re: NFSv3 Filesystem Connector

2015-01-14 Thread Colin P. McCabe
Hi Niels, I agree that direct-attached storage seems more economical for many users. As an HDFS developer, I certainly have a dog in this fight as well :) But we should be respectful towards people trying to contribute code to Hadoop and evaluate the code on its own merits. It is up to our

Re: HDFS 2.6.0 upgrade ends with missing blocks

2015-01-08 Thread Colin P. McCabe
Hi dlmarion, In general, any upgrade process we do will consume disk space, because it's creating hardlinks and a new current directory, and so forth. So upgrading when disk space is very low is a bad idea in any scenario. It's certainly a good idea to free up some space before doing the