Transfer large file 50Gb with DistCp from s3 to cluster

2012-09-04 Thread Soulghost
Hello guys I have a problem using the DistCp to transfer a large file from s3 to HDFS cluster, whenever I tried to make the copy, I only saw processing work and memory usage in one of the nodes, not in all of them, I don't know if this is the proper behaviour of this or if it is a configuration

Re: Transfer large file 50Gb with DistCp from s3 to cluster

2012-09-04 Thread Kai Voigt
Hi, my guess is that you run hadoop distcp on one of the datanodes... In that case, the node will get the first replica of each block. But you should also see copies on more nodes as well. But that one node will get a replica of all the blocks. Kai Am 04.09.2012 um 22:07 schrieb Soulghost

Re: Transfer large file 50Gb with DistCp from s3 to cluster

2012-09-04 Thread Mischa Tuffield
Hello, You could try this jar which I found link to from one of the amazon pages. s3cmd get s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar s3dist.jar copies via mapreduce to s3 and back . If you cluster has N number of reducers available, you can : hadoop jar s3distcp.jar

why hbase doesn't provide Encryption

2012-09-04 Thread Farrokh Shahriari
Hello I just wanna know why HBase doesn't provide Encryption ? Tnx

Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Andy Xue
Hi: I wish to use Hadoop streaming to run a program which requires specific PATH and CLASSPATH variables. I have set these two variables in both /etc/profile and ~/.bashrc on all slaves (and restarted these slaves). However, when I run the hadoop streaming job, the program generates error

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Narasingu Ramesh
Hi Andy, Please specify the environment varibles in gedit .bashrc. you can specify for the JAVA_HOME environment variables and configuration hadoop-site.xml,hadoop-core.xml,and hadoop-default.xml files you can specify which version hadoop you can use those things and then close .bashrc

Re: how to execute different tasks on data nodes(simultaneously in hadoop).

2012-09-04 Thread Narasingu Ramesh
Hi Users, Hadoop can distribute all the data into HDFS inside MapReduce tasks can work together. which one is goes to which data node and how it works all those things it can maintain each task has own JVM in each data node. JVM can handle hell number of data to process to the all

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Joshi, Rekha
Hi Andy, If you are referring to HADOOP_CLASSPATH, that is env variable on your cluster or effected via config xml.But if you need your own environment variables for streaming you may use -cmdenv PATH= on your streaming command.Or if you have specific jars for the streaming process -libjars

Re: knowing the nodes on which reduce tasks will run

2012-09-04 Thread Narasingu Ramesh
Hi Abhay, NameNode it has address of the all data nodes. MapReduce can do all the data is processing. First data set is putting into HDFS filesystem and then run hadoop jar file. Map task can handle input files for shufle, sorting and grouped together. Map task is completed and then

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Narasingu Ramesh
Hi Rekha, What I said means first he has set install java and then pwd(present working directory) for JAVA_HOME directory and then set to classpath for the hadoop installation. Thanks Regards, Ramesh.Narasingu On Tue, Sep 4, 2012 at 1:36 PM, Joshi, Rekha rekha_jo...@intuit.com wrote:

Re: Error using hadoop in non-distributed mode

2012-09-04 Thread Narasingu Ramesh
Hi Pat, Please specify correct input file location. Thanks Regards, Ramesh.Narasingu On Mon, Sep 3, 2012 at 9:28 PM, Pat Ferrel p...@occamsmachete.com wrote: Using hadoop with mahout in a local filesystem/non-hdfs config for debugging purposes inside Intellij IDEA. When I run one

Re: best way to join?

2012-09-04 Thread Björn-Elmar Macek
Hi Dexter, i think, what you want is a clustering of points based on the euclidian distance or density based clustering ( http://en.wikipedia.org/wiki/Cluster_analysis ). I bet there are some implemented quite well in Mahout already: afaik this is the datamining framework based on Hadoop.

SNN

2012-09-04 Thread surfer
Hi When I start my cluster (with start-dfs.sh), secondary namenodes are created on all the machines in conf/slaves. I set conf/masters to a single different machine (along with dfs.http.address pointing to the nameserver) but seems to be ignored. any hint of what I'm doing wrong? thanks giovanni

Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Narasingu Ramesh
Hi Andy, Please try once other wise you can again download new hadoop 1.0.2 it is stable. you can set all environment variables in vi .bashrc or gedt .bashrc you can set paths JAVA_HOME and HADOOP_HOME environment variables and also set path. Thanks Regards, Ramesh.Narasingu On Tue,

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Visioner Sadak
also getting one more error * org.apache.hadoop.ipc.RemoteException*: Server IPC version 5 cannot communicate with client version 4 On Tue, Sep 4, 2012 at 2:44 PM, Visioner Sadak visioner.sa...@gmail.comwrote: Thanks shobha tried adding conf folder to tomcats classpath still getting same

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Bejoy KS
Hi You are running tomact on a windows machine and trying to connect to a remote hadoop cluster from there. Your core site has name fs.default.name/name valuehdfs://localhost:9000/value But It is localhost here.( I assume you are not running hadoop on this windows environment for some testing)

Re: SNN

2012-09-04 Thread Michel Segel
Which distro? Saw this happen, way back when with a Cloudera release. Check your config files too... Sent from a remote device. Please excuse any typos... Mike Segel On Sep 4, 2012, at 3:22 AM, surfer sur...@crs4.it wrote: Hi When I start my cluster (with start-dfs.sh), secondary

Re: Integrating hadoop with java UI application deployed on tomcat

2012-09-04 Thread Visioner Sadak
Thanks bejoy, actually my hadoop is also on windows(i have installed it in psuedo-distributed mode for testing) its not a remote cluster On Tue, Sep 4, 2012 at 3:38 PM, Bejoy KS bejoy.had...@gmail.com wrote: ** Hi You are running tomact on a windows machine and trying to connect to a

Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Udayini Pendyala
Hi, Following is a description of what I am trying to do and the steps I followed. GOAL: a). Install Hadoop 1.0.3 b). Hadoop in a standalone (or local) mode c). OS: Windows 7 STEPS FOLLOWED: 1.    1.   I followed instructions from:

Re: SNN

2012-09-04 Thread surfer
On 09/04/2012 12:58 PM, Michel Segel wrote: Which distro? Saw this happen, way back when with a Cloudera release. Check your config files too... Sent from a remote device. Please excuse any typos... Mike Segel thanks for your answer the config files are these:

Re: SNN

2012-09-04 Thread Terry Healy
Can you please show contents of masters and slaves config files? On 09/04/2012 09:15 AM, surfer wrote: On 09/04/2012 12:58 PM, Michel Segel wrote: Which distro? Saw this happen, way back when with a Cloudera release. Check your config files too... Sent from a remote device. Please

Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Max Hansmire
I ran into an issue yesterday where one of the blocks on HDFS seems to have gone away. I would appreciate any help that you can provide. I am running Hadoop on Amazon's Elastic Map Reduce (EMR). I am running hadoop version 0.20.205 and hive version 0.8.1. I have a hive table that is written out

Re: SNN

2012-09-04 Thread Michael Segel
The other question you have to look at is the underlying start and stop script to see what is being passed on to them. I thought there was a parameter that would overload the defaults where you specified the slaves and master files, but I could be wrong. Since this is raw Apache, I don't

RE: Pi Estimator failing to print output after finishing the job sucessfully

2012-09-04 Thread Jeffrey Buell
You didn't do anything wrong, this is just a bug in the Pi application. The application _should_ be able to divide two numbers and not require an exact decimal result. Everything you need to know is in the first line of the error message. Try it with 100 maps and 10 billion samples per map,

Re: Pi Estimator failing to print output after finishing the job sucessfully

2012-09-04 Thread Michael Segel
You blew out the stack? Or rather your number was too 'big'/'long'? On Sep 4, 2012, at 11:10 AM, Gaurav Dasgupta gdsay...@gmail.com wrote: Hi All, I am running the Pi Estimator from hadoop-examples.jar in my 11 node CDH3u4 cluster. Initially I ran the job for 10 maps and 100

could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
I've been running up against the good old fashioned replicated to 0 nodes gremlin quite a bit recently. My system (a set of processes interacting with hadoop, and of course hadoop itself) runs for a while (a day or so) and then I get plagued with these errors. This is a very simple system, a

Re: Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Michael Segel
Next time, try reading and writing to S3 directly from your hive job. Not sure why the block was bad... What did the AWS folks have to say? -Mike On Sep 4, 2012, at 11:30 AM, Max Hansmire hansm...@gmail.com wrote: I ran into an issue yesterday where one of the blocks on HDFS seems to have

Re: could only be replicated to 0: TL;DR

2012-09-04 Thread Keith Wiley
If the datanode is definitely not running out of space, and the overall system has basically been working leading up to the replicated to 0 nodes error (which proves the configuration and permissions are all basically correct), then what other explanations are there for why hdfs would suddenly

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Suresh Srinivas
- A datanode is typically kept free with up to 5 free blocks (HDFS block size) of space. - Disk space is used by mapreduce jobs to store temporary shuffle spills also. This is what dfs.datanode.du.reserved is used to configure. The configuration is available in hdfs-site.xml. If you have not

Can't get out of safemode

2012-09-04 Thread Keith Wiley
Observe: ~/ $ hd fs -put test /test put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/test. Name node is in safe mode. ~/ $ hadoop dfsadmin -safemode leave Safe mode is OFF ~/ $ hadoop dfsadmin -safemode get Safe mode is ON ~/ $ hadoop dfsadmin -safemode leave

Re: Data loss on EMR cluster running Hadoop and Hive

2012-09-04 Thread Max Hansmire
Especially where I am reading from from the file using a Map-Reduce job in the next step I am not sure that it makes sense in terms of performance to put the file on S3. I have not tested, but my suspicion is that the local disk reads on HDFS would outperform reading and writing the file to S3.

Re: Can't get out of safemode

2012-09-04 Thread Serge Blazhiyevskyy
Can look in name node logs and post last few lines? On 9/4/12 10:07 AM, Keith Wiley kwi...@keithwiley.com wrote: Observe: ~/ $ hd fs -put test /test put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/test. Name node is in safe mode. ~/ $ hadoop dfsadmin

secondary namenode

2012-09-04 Thread derrick thomas
Hi When I start my cluster with start-dfs.sh the secondary namenodes are created in the slaves machines. I set conf/masters to a different single machine (along with the assignment of dfs.http.address to the nameserver:50070) but it is apparently ignored. hadoop version: 1.0.3 1 machine with JT

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote: When these errors are thrown, please send the namenode web UI information. It has storage related information in the cluster summary. That will help debug. Sure thing. Thanks. Here's what I currently see. It looks like the problem isn't

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Suresh Srinivas
Keith, Assuming that you were seeing the problem when you captured the namenode webUI info, it is not related to what I suspect. This might be a good question for CDH forums given this is not an Apache release. Regards, Suresh On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley kwi...@keithwiley.com

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Bejoy Ks
Hi Udayani By default hadoop works well for linux and linux based OS. Since you are on Windows you need to install and configure ssh using cygwin before you start hadoop daemons. On Tue, Sep 4, 2012 at 6:16 PM, Udayini Pendyala udayini_pendy...@yahoo.com wrote: Hi, Following is a

Re: Yarn defaults for local directories

2012-09-04 Thread Andy Isaacson
On Mon, Sep 3, 2012 at 5:09 AM, Hemanth Yamijala yhema...@gmail.com wrote: Is there a reason why Yarn's directory paths are not defaulting to be relative to hadoop.tmp.dir. For e.g. yarn.nodemanager.local-dirs defaults to /tmp/nm-local-dir. Could it be ${hadoop.tmp.dir}/nm-local-dir instead ?

Re: could only be replicated to 0: TL;DR

2012-09-04 Thread Harsh J
Hi Keith, See http://search-hadoop.com/m/z9oYUIhhUg and the method isGoodTarget under http://search-hadoop.com/c/Hadoop:/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java||isGoodTarget On Tue, Sep 4, 2012 at 10:24 PM,

Re: questions about SequenceFile

2012-09-04 Thread Harsh J
Hi Young, Note that the SequenceFile.Writer#sync method != HDFS sync(), its just a method that writes a sync marker (a set of bytes representing an end points for one or more records, kinda like a newline in text files but not for every record) I don't think sync() would affect much. Although,

Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Harsh J
Keith, The NameNode has a resource-checker thread in it by design to help prevent cases of on-disk metadata corruption in event of filled up dfs.namenode.name.dir disks, etc.. By default, an NN will lock itself up if the free disk space (among its configured metadata mounts) reaches a value 100

Re: Error using hadoop in non-distributed mode

2012-09-04 Thread Hemanth Yamijala
Hi, The path /tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/snip is a location used by the tasktracker process for the 'DistributedCache' - a mechanism to distribute files to all tasks running in a map reduce job. (

Re: Exception while running a Hadoop example on a standalone install on Windows 7

2012-09-04 Thread Hemanth Yamijala
Though I agree with others that it would probably be easier to get Hadoop up and running on Unix based systems, couldn't help notice that this path: \tmp \hadoop-upendyal\mapred\staging\upendyal-1075683580\.staging seems to have a space in the first component i.e '\tmp ' and not '\tmp'. Is that

why hbase doesn't provide Encryption

2012-09-04 Thread Farrokh Shahriari
Hello I just wanna know why hbase doesn't provide Encryption ? Tnx