Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Nigel Daley
Eric, Arun, I'd like to explicitly clarify one aspect of this branch and what you mean by 'release' -- it can have many meanings. Are you asking to actually create an Apache release from this branch (binary & source)? Or, as I was assuming, simply commit all this code to this branch and leave

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Doug Cutting
On 01/12/2011 11:07 PM, Arun C Murthy wrote: Thus, I think a jumbo patch should suffice. It will also ensure this can done quickly so that the community can then concentrate on 0.22 and beyond. However, I will (manually) ensure all relevant jiras are referenced in the CHANGES.txt and Release Not

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Nigel Daley
On Jan 17, 2011, at 12:11 PM, Doug Cutting wrote: > On 01/12/2011 11:07 PM, Arun C Murthy wrote: >> Thus, I think a jumbo patch should suffice. It will also ensure this can >> done quickly so that the community can then concentrate on 0.22 and beyond. >> >> However, I will (manually) ensure all

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Eric Baldeschwieler
Hi Folks, We are very interested in sharing what we are doing with the community. I think we can separate this into multiple stages. 1) To doug's point - Yes, absolutely, we want folks to review this. The patch is now available. Lets work together to get it formatted as folks like in subver

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Eric Baldeschwieler
Hi Stack, I feel your pain. We're running a 700 node HBASE cluster containing a HUGE collections of all web pages. Both versions of append were started by engineers working at yahoo and we've put A LOT of investment into both. I really, really want to see the append issue solved for HBASE!!

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Doug Cutting
On 01/17/2011 02:56 PM, Eric Baldeschwieler wrote: 1) To doug's point - Yes, absolutely, we want folks to review this. The patch is now available. Lets work together to get it formatted as folks like in subversion and reviewed. Where there are issues, let's work to resolve them. With luck folk

Manage a cluster where not all machines are always available

2011-01-17 Thread Charles Gonçalves
Hi Guys, I'm running a series of pig scripts in a cluster with a dozen of machines. The problem is that those machines belongs to a lab in my University and sometimes not all them are available for my use. What is the best approach to manage the configuration and the data on hdfs on this enviromen

RE: Manage a cluster where not all machines are always available

2011-01-17 Thread Segel, Mike
Charles, If I understand you correctly you want to trim the cluster down to only those machines that you control... Ok... Do you care about the data that is currently on the cluster? (Is all of the data yours, or replaceable?) Can you easily copy the data off the cluster on to plain old unix

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Chris Douglas
On Mon, Jan 17, 2011 at 12:11 PM, Doug Cutting wrote: > We would not release this until each change in it has been reviewed by the > community, right?  Otherwise we may end up with changes in a 0.20 release > that don't get approved when they're contributed to trunk and cause trunk to > regress.  

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Todd Papaioannou
That's only true if you plan to pull forward the changes wholesale into .21, .22 and beyond. And that is not what is being proposed. If the plan is to just land an updated and more stable version of .20 that is completely backwards compatible, then this can be done within that code line without

how to know which partition data the reduce receive

2011-01-17 Thread lei liu
There is job that has three reduce tasks that is the map output data are divided into three partition, how can I know every reduce receive which partition?

Re: how to know which partition data the reduce receive

2011-01-17 Thread Harsh J
The emitted Partition Number == Reducers' ID. 2011/1/18 lei liu : > There is job that has three reduce tasks that is the map output data > are divided > into three partition, how can I know every reduce receive which partition? > -- Harsh J www.harshj.com

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Jeff Hammerbacher
Hey, We had this exact same discussion about the 0.20-append branch a few weeks ago. A few organizations have tested that code at scale and feel strongly that it's stable. We decided not to release it because it does not meet the Apache guidelines for a release. The Apache process has its pros and

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Arun C Murthy
On 1/17/11 12:11 PM, "Doug Cutting" wrote: We would not release this until each change in it has been reviewed by the community, right? Otherwise we may end up with changes in a 0.20 release that don't get approved when they're contributed to trunk and cause trunk to regress. So I don't yet s

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Jeff Hammerbacher
> > Apache Hadoop hasn't had a stable, updated release in a while. > That's what 0.22 is for? However, it does remedy the critical problem - a stable, updated Apache > Hadoop release. > Again, isn't that what 0.22 is for? > An appeal: Let's use a bit of common sense and get the project moving

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Nigel Daley
Just to be clear, the proposal currently being discussed is NOT a full undo of the split -- it might be better described as a tweak or a bug fix to the (on-going) project split. If someone would like to start a discussion on a complete undo of the project split, please do so under a different th

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Nigel Daley
Good questions. Keep them coming! I'll compile a list so we can start an FAQ on this. > # Is project split a goal for hadoop in the future (even though we are not > ready yet?). If it is, then putting projects back together might result in > tight dependencies between the project. Ho do we av

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Nigel Daley
On Jan 14, 2011, at 11:53 AM, Tsz Wo (Nicholas), Sze wrote: > This is a kind of an incompatible change: all the developers, QAs, release > engineers and users have to change their local settings and scripts for this > change. I have a hard time believing this as I suspect the very small set

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Tom White
On Mon, Jan 17, 2011 at 9:04 PM, Nigel Daley wrote: > Good questions.  Keep them coming!  I'll compile a list so we can start an > FAQ on this. > >> # Is project split a goal for hadoop in the future (even though we are not >> ready yet?). If it is, then putting projects back together might resu

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Chris Douglas
On Mon, Jan 17, 2011 at 8:30 PM, Jeff Hammerbacher wrote: > We had this exact same discussion about the 0.20-append branch a few weeks > ago. A few organizations have tested that code at scale and feel strongly > that it's stable. We decided not to release it because it does not meet the > Apache

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Eric Baldeschwieler
On Jan 17, 2011, at 9:13 PM, Nigel Daley wrote: > > On Jan 14, 2011, at 11:53 AM, Tsz Wo (Nicholas), Sze wrote: >> ... >> Why do we want to enforce the releases as a unit, given that the long term >> target is to release these 3 projects independently? > > Because that long term view is curr

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Konstantin Boudnik
On Mon, Jan 17, 2011 at 21:40, Eric Baldeschwieler wrote: > > On Jan 17, 2011, at 9:13 PM, Nigel Daley wrote: > >> >> On Jan 14, 2011, at 11:53 AM, Tsz Wo (Nicholas), Sze wrote: >>> > ... >>> Why do we want to enforce the releases as a unit, given that the long term >>> target is to release these

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Arun C Murthy
Bringing 'organizations' into this discussion is very disingenuous. Doug, credit to him, was the first person to propose this release: http://www.mail-archive.com/general@hadoop.apache.org/msg01427.html I have supported the append-release: http://www.mail-archive.com/general@hadoop.apache.org/ms

Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset

2011-01-17 Thread Arun C Murthy
On Jan 17, 2011, at 8:40 PM, Jeff Hammerbacher wrote: Apache Hadoop hasn't had a stable, updated release in a while. That's what 0.22 is for? Every single Hadoop release in the recent past, and I have worked on pretty much every single Hadoop release since forever, has taken at least

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Nigel Daley
Wiki FAQ started here: http://wiki.apache.org/hadoop/ProjectSplit Todd, left a note there for you to add in a link to a git-history-fixer-script Jira when the time comes. If folks find more docs that will need updating, please add them to the list at the end of the FAQ. We still need to fill i

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Eric Baldeschwieler
Nigel proposes that in this release (as in previous releases), everything should be packaged together. Our in house experience at yahoo is that this makes a lot of sense. It is how we find it most effective to operate. The project split has introduced a lot of complexity with no return. Do y

Re: [DISCUSS] Move project split down a level

2011-01-17 Thread Konstantin Boudnik
Packaging everything together makes sense for unit/functional level of tests. Since Hadoop in current shape doesn't have any other kinds of tests (despite of limited system tests implemented within Herriot framework) there's no objections per se. And you know my take on Hadoop's stack testing for