Hdfs questions

2013-09-10 Thread xeon
Hi, I am having some difficulty in copy data between 2 HDFS filesystems in Amazon EC2.I want to try to use distcp2 command to see if I can. - Where is the distcp2 command in yarn? - Is it possible to copy data between HDFS using SSL? - Has anyone copied data between HDFS filesystems in 2 di

Securing the Secondary Name Node

2013-09-10 Thread Christopher Penney
Hi, After hosting an insecure Hadoop environment for early testing I'm transitioning to something more secure that would (hopefully) more or less mirror what a production environment might look like. I've integrated our Hadoop cluster into our Kerberos realm and everything is working ok except fo

Re: UnsupportedOperationException occurs with Hadoop-2.1.0-beta jar files

2013-09-10 Thread Vinayakumar B
Yes. . Protobuf 2.5 jars wants every Protobuf code in its jvm to be generated and compiled using 2.5. Its not supporting old compiled code. Even though there will not be any compilation issues with 2.4 generated code, exception will be thrown at runtime. So upgrade all your code to 2.5 and generat

Re: hadoop cares about /etc/hosts ?

2013-09-10 Thread Vinayakumar B
Ensure that for each ip there is only one hostname configured in /etc/hosts file. If you configure multiple different hostnames for same ip then os will chose first one when finding hostname using ip. Similarly for ip using hostname. Regards, Vinayakumar B On Sep 10, 2013 9:27 AM, "Chris Embree"

Re: modify hdfs block size

2013-09-10 Thread Vinayakumar B
You can change it to any size in multiples of 512 bytes by default which is bytesPerChecksum. But setting it to lesser values leads to heavy load on cluster and setting to very high value will not distribute the data. So 64MB or (128MB in latest trunk.) Is recommended as optimal. Its upto you to de

Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Jerry Lam
Hi Hadoop users, I have been trying to concatenate multiple sequence files into one. Since the total size of the sequence files is quite big (1TB), I won't use mapreduce because it requires 1TB in the reducer host to hold the temporary data. I ended up doing what have been suggested in this threa

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Adam Muise
Jerry, It might not help with this particular file, but you might considered the approach used at Blackberry when dealing with your data. They block compressed into small avro files and then concatenated into large avro files without decompressing. Check out the boom file format here: https://git

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread John Meagher
Here's a great tool for exactly what you're looking for https://github.com/edwardcapriolo/filecrush On Tue, Sep 10, 2013 at 11:07 AM, Jerry Lam wrote: > Hi Hadoop users, > > I have been trying to concatenate multiple sequence files into one. > Since the total size of the sequence files is quite b

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Jay Vyas
iirc sequence files can be concatenated as is and read as one large file but maybe im forgetting something.

Re: Hadoop on IPv6

2013-09-10 Thread Aji Janis
Harsh, We have the config params that you mentioned, but to be clear are you saying that the suggestions Adam made won't work because of these configurations ? Thanks, Aji On Mon, Sep 9, 2013 at 4:55 PM, Harsh J wrote: > For Hadoop JVMs, generally speaking, need to have the following in > yo

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Jerry Lam
Hi guys, Thank you for all the advices here. I really appreciate it. I read through the code in filecrush and I found out that it is doing exactly what I'm currently doing. The logic resides in CrushReducer.java with the following lines that do the concatenation: while (reader.next(key, value))

can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-10 Thread Jun Li
Hi, I am trying to evaluate the MapReduce with different setting. I wonder whether the following two HDFS parameters: *dfs.block.size *dfs.replication can be set at the time I load the file to the HDFS (that is, it is the client side setting)? or these are the system parameter settings that can

whether dfs.domain.socket.path is supported in Hadoop 1.1?

2013-09-10 Thread Jun Li
Hi, In the link, http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml, the explanation: dfs.domain.socket.path is that: "Optional. This is a path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients. If the st

Re: Hadoop on IPv6

2013-09-10 Thread Aji Janis
Thank you for the clarification Adam. On Tue, Sep 10, 2013 at 12:34 PM, Adam Muise wrote: > Harsh is giving you a best practice for JVMs using IPv4 in general. As > what I am suggesting is IPv4-only connections to the Hadoop daemons and > clients on the cluster and gateway, you would not have i

Re: Hdfs questions

2013-09-10 Thread Peyman Mohajerian
In Amazon the best approach and I think cheapest is to first copy to s3, there is a command in EMR to facilitate that, if you aren't using EMR you may still be able to install it. On Tue, Sep 10, 2013 at 6:20 AM, xeon wrote: > Hi, > > I am having some difficulty in copy data between 2 HDFS file

Re: can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-10 Thread Jun Li
Hello Shahab, Thanks for the reply. Typically, to invoke the HDFS client, I will use "bin/haddop dfs ...". But the command that you used "hadoop fs ...". makes me wonder what this is the Hadoop 2.* client commands. Could you clarify for me such "-D fs.local.block.size" is supported in Hadoop 1.1.

Re: can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-10 Thread Shahab Yunus
"can be set at the time I load the file to the HDFS (that is, it is the client side setting)? " I don't think you can do this while reading. These are done at the time of writing. You can do it like this (the example is for CLI as evident): hadoop fs -D fs.local.block.size=134217728 -put local_na

Re: HTTP ERROR 500 when call application master proxy URL in Hadoop 2.1.0-beta

2013-09-10 Thread Jian Fang
Ok, seems there is a jira for this issue. https://issues.apache.org/jira/browse/YARN-800 On Mon, Sep 9, 2013 at 3:39 PM, Jian Fang wrote: > Hi, > > I need to use the web services in application master, for example, > > curl > http://10.6.179.230:9026/proxy/application_1378761541170_0003/ws/v1/

Re: whether dfs.domain.socket.path is supported in Hadoop 1.1?

2013-09-10 Thread Harsh J
HDFS-347 introduced this feature, and it is currently only available in 2.1.x onwards. On Wed, Sep 11, 2013 at 12:00 AM, Jun Li wrote: > Hi, > > In the link, > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml, > > the explanation: > dfs.domain.socket.path > >

Hadoop Metrics Issue in ganglia.

2013-09-10 Thread orahad bigdata
Hi All, I'm facing an issue while showing Hadoop metrics in ganglia, Though I have installed ganglia on my master/slaves nodes and I'm able to see all the default metrics on ganglia UI from all the nodes but I'm not able to see Hadoop metrics in metrics section. versions:- Hadoop 1.1.1 ganglia 3

Import data from MySql to HBase using Sqoop2

2013-09-10 Thread Dhanasekaran Anbalagan
Hi Guys, How to import mysql to Hbase table. I am using sqoop2 when i try to import table it's doesn't show storage as Hbase. Schema name: sqoop:000> create job --xid 12 --type import . . . . Boundary query: Output configuration Storage type: * 0 : HDFS* Choose: Please guide me. How to do th

Re: Hadoop on IPv6

2013-09-10 Thread Adam Muise
Harsh is giving you a best practice for JVMs using IPv4 in general. As what I am suggesting is IPv4-only connections to the Hadoop daemons and clients on the cluster and gateway, you would not have issues heeding his advice. Obviously you would not set "-Djava.net.preferIPv4Stack=true" to any java

Re: hadoop cares about /etc/hosts ?

2013-09-10 Thread Cipher Chen
So for the first *wrong* /etc/hosts file, the sequence would be : find hdfs://master:54310 find master -> 192.168.6.10 (*but it already got ip here*) find 192.168.6.10 -> localhost find localhost -> 127.0.0.1 The other thing, when 'ping master', i would got reply from '192.168.6.10' instead of 12