Re: Maximum number of files in Hadoop v2

2016-06-04 Thread J. Rottinghuis
Hi Ascot, No, especially with the Block ID based datanode layout ( https://issues.apache.org/jira/browse/HDFS-6482) this should no longer be true on HDFS. If you do plan to have millions of files per datanode, you'd do well to familiarize yourself with https://issues.apache.org/jira/browse/HDFS-87

Re: Reliability of Hadoop

2016-05-27 Thread J. Rottinghuis
We run several clusters of thousands of nodes (as do many companies), our largest one has over 10K nodes. Disks, machines, memory, and network fail all the time. The larger the scale, the higher the odds that some machine is bad in a given day. On the other hand, scale helps. If a single node our o

Re: WELCOME to user@hadoop.apache.org

2015-06-07 Thread J. Rottinghuis
On each node you can configure how much memory is available for containers to run. On the other hand, for each application you can configure how large containers should be. For MR apps, you can separately set mappers, reducers, and the app master itself. Yarn will detemine through scheduling rules

Re: test lzo problem in hadoop

2013-08-01 Thread J. Rottinghuis
In Hadoop 2.0 some of the classes have changed from an abstract class to an interface. You'll have to compile again. In addition, you need to use a version of hadoop-lzo that is compatible with Hadoop 2.0 (Yarn). See: https://github.com/twitter/hadoop-lzo/issues/56 and the announcement of a newer

Re: Question about writing HDFS files

2013-05-17 Thread J. Rottinghuis
Yes. Joep On Fri, May 17, 2013 at 6:38 AM, John Lilley wrote: > Right, sorry for the ambiguity, I was talking about HDFS writes only. > > So my application doesn't need to do anything to signal that it is writing > from inside vs. outside of the Hadoop cluster, it figures that out from IP > or

Re: concurrency

2012-10-12 Thread J. Rottinghuis
Hi Harsh, Moge Koert, If Koerts problem is similar to what I have been thinking about where we want to consolidate and re-compress older datasets, then the _SUCCESS does not really help. _SUCCESS helps to tell if a new dataset is completely written. However, what is needed here is to replace an ex

Re: where to download hadoop-1.0.0

2012-10-05 Thread J. Rottinghuis
Any release in the 1.0.x line should be equally compatible, so is there any reason not to use the latest in that line? Cheers, Joep On Fri, Oct 5, 2012 at 12:06 PM, wrote: > Hello, > > I try to use hbase-0.92.1 which is compatible with hadoop-1.0.0. However, > I do not see this version of hado