Datanode disk configuration

2014-11-12 Thread Brian C. Huffman
All, I'm setting up a 4-node Hadoop 2.5.1 cluster. Each node has the following drives: 1 - 500GB drive (OS disk) 1 - 500GB drive 1 - 2 TB drive 1 - 3 TB drive. In past experience I've had lots of issues with non-uniform drive sizes for HDFS, but unfortunately it wasn't an option to get all

Re: Datanode disk configuration

2014-11-12 Thread daemeon reiydelle
I would consider a jbod with 16-64mb stride. This would be a choice where one or more (e.g. MR) steps will be io bound. Otherwise one or more tasks will be hit with the low read/write times of having large amounts of data behind a single spindle On Nov 12, 2014 8:37 AM, Brian C. Huffman

Re: Datanode disk configuration

2014-11-12 Thread Brian C. Huffman
That will make the volume balancing easy, but couldn't it hurt performance? My understanding is that there would be three write threads pointing to the 3TB disk and 2 threads pointing to the 2TB disk. Would it be better from a performance perspective to include the 500GB drive in the

Re: Datanode disk configuration

2014-11-12 Thread daemeon reiydelle
Yes. That is why you should consider striping across raid 0 (JBOD) *...“The race is not to the swift,nor the battle to the strong,but to those who can see it coming and jump aside.” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Wed, Nov

Re: ext4 on a hadoop cluster datanodes

2014-11-12 Thread Brian C. Huffman
Would this set of ext4 parameters be ok for a 500GB HDFS data drive? Thanks, Brian On 10/06/2014 06:09 PM, Travis wrote: For filesystem creation, we use the following with mkfs.ext4 mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L $HDFS_LABEL /dev/${DEV}1 By default, mkfs

Sr.Technical Architect/Technical Architect/Sr. Hadoop /Big Data Developer for CA, GA, NJ, NY, AZ Locations_(FTE)Full Time Employment

2014-11-12 Thread Amendra Singh Gangwar
Hi, Please let me know if you are available for this FTE position for CA, GA, NJ, NY with good travel. Please forward latest resume along with Salary Expectations, Work Authorization Minimum joining time required. Job Descriptions: Positions : Technical Architect/Sr. Technical

ingesting unstructured data into Hadoop, problem creating tables using Hive

2014-11-12 Thread David Novogrodsky
I am trying to ingest unstructured data into Hive so it can be queried. I am trying to follow the steps in Tutorial Exercise 3, I am having some problems. The created tables has no data in it. Here is a sample of the unstructured datacolon; 560)211-5250 437)810-5830 04:35 21 May 2014 17:26:39

Re: Sr.Technical Architect/Technical Architect/Sr. Hadoop /Big Data Developer for CA, GA, NJ, NY, AZ Locations_(FTE)Full Time Employment

2014-11-12 Thread daemeon reiydelle
no relo *...“The race is not to the swift,nor the battle to the strong,but to those who can see it coming and jump aside.” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Wed, Nov 12, 2014 at 11:31 AM, Amendra Singh Gangwar

Re: ext4 on a hadoop cluster datanodes

2014-11-12 Thread Travis
On Wed, Nov 12, 2014 at 10:47 AM, Brian C. Huffman bhuff...@etinternational.com wrote: Would this set of ext4 parameters be ok for a 500GB HDFS data drive? Probably? It would depend on how many blocks you end up having stored from HDFS on each specific ext4 mount point. The largefile

Adding extra commands to MR AM

2014-11-12 Thread Yehia Elshater
Hi All, Is there a way to add extra commands to be run while launching an instance of MRApplicaitonMaster instance ? Thanks Yehia

Re: Sr.Technical Architect/Technical Architect/Sr. Hadoop /Big Data Developer for CA, GA, NJ, NY, AZ Locations_(FTE)Full Time Employment

2014-11-12 Thread mark charts
Hello. I am interested. Attached are my cover letter and my resume. Mark Charts On Wednesday, November 12, 2014 2:45 PM, Amendra Singh Gangwar amen...@exlarate.com wrote: Hi, Please let me know if you are available for this FTE position for CA, GA, NJ, NY with good travel.

Re: Sr.Technical Architect/Technical Architect/Sr. Hadoop /Big Data Developer for CA, GA, NJ, NY, AZ Locations_(FTE)Full Time Employment

2014-11-12 Thread Larry McCay
Everyone should be aware that replying to this mail results in sending your papers to everyone on the list On Wed, Nov 12, 2014 at 8:17 PM, mark charts mcha...@yahoo.com wrote: Hello. I am interested. Attached are my cover letter and my resume. Mark Charts On Wednesday, November

Re: Sr.Technical Architect/Technical Architect/Sr. Hadoop /Big Data Developer for CA, GA, NJ, NY, AZ Locations_(FTE)Full Time Employment

2014-11-12 Thread Stephen Boesch
Please refrain from job postings on this mailing list. 2014-11-12 20:53 GMT-08:00 Larry McCay lmc...@hortonworks.com: Everyone should be aware that replying to this mail results in sending your papers to everyone on the list On Wed, Nov 12, 2014 at 8:17 PM, mark charts mcha...@yahoo.com

Both hadoop fsck and dfsadmin can not detect missing replica in time?

2014-11-12 Thread sam liu
Hi Experts, In my hdfs, there is a file named /tmp/test.txt belonging to 1 block with 2 replica. The block id is blk_1073742304_1480 and the 2 replica resides on datanode1 and datanode2. Today I manually removed the block file on datanode2: