Sharing my experience of building and installing hadoop on Windows 8

2014-07-09 Thread Tai Zu
Hi Guys, It took me a while to build and get Hadoop 2.4.0 working on my Windows 8 machine. Thus I would like to share it. See my blog post for details: http://zutai.blogspot.com/2014/06/build-install-and-run-hadoop-24-240-on.html . Regards, Zutai

Re: Huge text file for Hadoop Mapreduce

2014-07-09 Thread Stanley Shi
You can get the wikipedia data from it's website, it's pretty big; Regards, *Stanley Shi,* On Tue, Jul 8, 2014 at 1:35 PM, Du Lam delim123...@gmail.com wrote: Configuration conf = getConf(); conf.setLong(mapreduce.input.fileinputformat.split.maxsize,1000); // u can set this to some

Re: How do I recover the namenode?

2014-07-09 Thread Stanley Shi
This is not recommended. You only backup the fsimage file, the data blocks, which are stored in datanode, are not stored. The chances may be if you removed some file after your fsimage backup, those data blocks belonging to this file will be removed from all datanodes; in this case, your namenode

Re: Managed File Transfer

2014-07-09 Thread Stanley Shi
There's a DistCP utility for this kind of purpose; Also there's Spring XD there, but I am not sure if you want to use it. Regards, *Stanley Shi,* On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan radhakrishnan.mo...@gmail.com wrote: Hi, We used a commercial FT and scheduler

Re: The number of simultaneous map tasks is unexpected.

2014-07-09 Thread Adam Kawa
Hi Tomek, You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value of yarn.nodemanager.resource.memory-mb? You consume 1GB of RAM per container (8 containers running = 8GB of memory used). My idea is that, after running 8 containers (1 AM + 7 map tasks), you have only 315MB of

ControlledJob.java:submit Job in state RUNNING instead of DEFINE - can someone shed some light on this error for me ;O)

2014-07-09 Thread Chris MacKenzie
Hi, I¹m using Controlledjob and my code is: ControlledJob doConcordance = new ControlledJob( this.doParallelConcordance(), null); ŠŠ...Š. control.addJob(doConcordance); control.addJob(viableSubequenceMaxLength);

Re: listing a 530k files directory

2014-07-09 Thread Adam Kawa
You can try snakebite https://github.com/spotify/snakebite. $ snakebite ls -R path I just run it to list 705K files and it went fine. 2014-05-30 20:42 GMT+02:00 Harsh J ha...@cloudera.com: The HADOOP_OPTS gets overriden by HADOOP_CLIENT_OPTS for FsShell utilities. The right way to extend

unsubscribe

2014-07-09 Thread Kartashov, Andy
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment

Re: unsubscribe

2014-07-09 Thread Ted Yu
See http://hadoop.apache.org/mailing_lists.html#User Cheers 2014-07-09 7:28 GMT-07:00 Kartashov, Andy andy.kartas...@mpac.ca: NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is

rhadoop install

2014-07-09 Thread Raj Hadoop
hi all, I am trying to find documentation relavanet to 'rhadoop' on cdh4. If there are any one in the group who has experience in 'rhadoop' can you provide me some details like 1) installation procedure of rhadoop on cdh4.4. regards, raj

Re: rhadoop install

2014-07-09 Thread Ted Yu
Looks like you may get better answer from cdh mailing list. Cheers On Jul 9, 2014, at 7:53 AM, Raj Hadoop hadoop...@yahoo.com wrote: hi all, I am trying to find documentation relavanet to 'rhadoop' on cdh4. If there are any one in the group who has experience in 'rhadoop' can you provide

Re: issue about remove yarn jobs history logs

2014-07-09 Thread Adam Kawa
Have you restarted your Job History Server? 2014-05-30 4:56 GMT+02:00 ch huang justlo...@gmail.com: hi,maillist: i want remove jobs history logs ,and i configure the following info in yarn-site.xml,but it seems no use ,why? ( i use CDH4.4 yarn ,i configue on each datanode ,and

Need to evaluate a cluster

2014-07-09 Thread YIMEN YIMGA Gael
Hello Dear, I made an estimation of a number of nodes of a cluster that can be supplied by 720GB of data/day. My estimation gave me 367 datanodes in a year. I'm a bit afraid by that amount of datanodes. The assumptions, I used are the followings : - Daily supply (feed) : 720GB -

Re: debugging class path issues with containers.

2014-07-09 Thread Adam Kawa
You might need to set *yarn.application.classpath* in yarn-site.xml *property* * nameyarn.application.classpath/name* *

Re: Need to evaluate a cluster

2014-07-09 Thread Mirko Kämpf
Hello, if I follow your numbers I see one missing fact: *What is the number of HDDs per DataNode*? Let's assume you use machines with 6 x 3TB HDDs per box, you would need about 60 DataNodes per year (0.75 TB per day x 3 for replication x 1.3 for overhead / ( nr of HDDs per node x capacity per HDD

Re: Managed File Transfer

2014-07-09 Thread Mohan Radhakrishnan
I am a beginner. But this seems to be similar to what I intend. The data source will be external FTP or S3 storage. Spark Streaming can read data from HDFS http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html ,Flume http://flume.apache.org/, Kafka

Re: Need to evaluate a cluster

2014-07-09 Thread Olivier Renault
Is your data already compressed? If it's not you can safely assume a compression ratio of 5. Olivier On 9 Jul 2014 17:10, Mirko Kämpf mirko.kae...@gmail.com wrote: Hello, if I follow your numbers I see one missing fact: *What is the number of HDDs per DataNode*? Let's assume you use

Re: Need to evaluate a cluster

2014-07-09 Thread Oner Ak.
367 nodes sounded quite high for that amount of data per day. You might need 367 disks, but do your nodes have more than one disk? You may also take into account the compression factor that you are likely to use for the data on the cluster. Oner 9 Tem 2014 19:00 tarihinde YIMEN YIMGA Gael

Re: Copy hdfs block from one data node to another

2014-07-09 Thread Yehia Elshater
Hi Chris, Actually I need this functionality for my research, basically for fault tolerance. I can calculate some failure probability for some data nodes after certain unit of time. So I need to copy all the blocks reside on these nodes to another nodes. Thanks Yehia On 7 July 2014 20:45,

how to access configuration properties on a remote Hadoop cluster

2014-07-09 Thread Geoff Thompson
Hello, Is there a way to query the Resource Manager for configuration properties from an external client process other than using the web interface? Our background: We run a YARN application by running a Client on an external machine that may access one of many remote Hadoop clusters. The

Re: how to access configuration properties on a remote Hadoop cluster

2014-07-09 Thread Adam Kawa
Instead of Resource-Manager-WebApp-Address/conf, If you have application id and job id, you can query the Resource Manager for the configuration of this particular application. You can use HTTP and Java API for that. 2014-07-09 21:42 GMT+02:00 Geoff Thompson ge...@bearpeak.com: Hello, Is

Pegasus

2014-07-09 Thread Deep Pradhan
Hi, I am using Pegasus. Can someone help me with this error? When I run the command list in the UI, after giving a demo command (demo adds the graph catstar, but I get error afterwards), I get the following PEGASUS list === GRAPH LIST === 14/07/09 14:45:22 WARN util.NativeCodeLoader: Unable to

Re: Copy hdfs block from one data node to another

2014-07-09 Thread sudhakara st
You can get info about all blocks stored in perticuler data node, i,e block report. But you to handle, move in block level not in file or start and end bytes level. On Thu, Jul 10, 2014 at 2:49 AM, Chris Mawata chris.maw...@gmail.com wrote: Haven't looked at the source but the thing you are

Re: Copy hdfs block from one data node to another

2014-07-09 Thread Arpit Agarwal
The balancer does something similar. It uses DataTransferProtocol.replaceBlock. On Wed, Jul 9, 2014 at 9:20 PM, sudhakara st sudhakara...@gmail.com wrote: You can get info about all blocks stored in perticuler data node, i,e block report. But you to handle, move in block level not in file or

Muliple map writing into same hdfs file

2014-07-09 Thread rab ra
hello I have one use-case that spans multiple map tasks in hadoop environment. I use hadoop 1.2.1 and with 6 task nodes. Each map task writes their output into a file stored in hdfs. This file is shared across all the map tasks. Though, they all computes thier output but some of them are