RE: modify hdfs block size

2013-09-10 Thread Brahma Reddy Battula
You can change the block size of existing files with a command like hadoop distcp -Ddfs.block.size=$[256*1024*1024] /path/to/inputdata /path/to/inputdata-with-largeblocks. After this command completes, you can remove the original data From: kun yan

Hdfs questions

2013-09-10 Thread xeon
Hi, I am having some difficulty in copy data between 2 HDFS filesystems in Amazon EC2.I want to try to use distcp2 command to see if I can. - Where is the distcp2 command in yarn? - Is it possible to copy data between HDFS using SSL? - Has anyone copied data between HDFS filesystems in 2

Securing the Secondary Name Node

2013-09-10 Thread Christopher Penney
Hi, After hosting an insecure Hadoop environment for early testing I'm transitioning to something more secure that would (hopefully) more or less mirror what a production environment might look like. I've integrated our Hadoop cluster into our Kerberos realm and everything is working ok except

Re: UnsupportedOperationException occurs with Hadoop-2.1.0-beta jar files

2013-09-10 Thread Vinayakumar B
Yes. . Protobuf 2.5 jars wants every Protobuf code in its jvm to be generated and compiled using 2.5. Its not supporting old compiled code. Even though there will not be any compilation issues with 2.4 generated code, exception will be thrown at runtime. So upgrade all your code to 2.5 and

Re: modify hdfs block size

2013-09-10 Thread Vinayakumar B
You can change it to any size in multiples of 512 bytes by default which is bytesPerChecksum. But setting it to lesser values leads to heavy load on cluster and setting to very high value will not distribute the data. So 64MB or (128MB in latest trunk.) Is recommended as optimal. Its upto you to

Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Jerry Lam
Hi Hadoop users, I have been trying to concatenate multiple sequence files into one. Since the total size of the sequence files is quite big (1TB), I won't use mapreduce because it requires 1TB in the reducer host to hold the temporary data. I ended up doing what have been suggested in this

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Adam Muise
Jerry, It might not help with this particular file, but you might considered the approach used at Blackberry when dealing with your data. They block compressed into small avro files and then concatenated into large avro files without decompressing. Check out the boom file format here:

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread John Meagher
Here's a great tool for exactly what you're looking for https://github.com/edwardcapriolo/filecrush On Tue, Sep 10, 2013 at 11:07 AM, Jerry Lam chiling...@gmail.com wrote: Hi Hadoop users, I have been trying to concatenate multiple sequence files into one. Since the total size of the sequence

Re: Concatenate multiple sequence files into 1 big sequence file

2013-09-10 Thread Jay Vyas
iirc sequence files can be concatenated as is and read as one large file but maybe im forgetting something.

Re: Hadoop on IPv6

2013-09-10 Thread Aji Janis
Thank you for the clarification Adam. On Tue, Sep 10, 2013 at 12:34 PM, Adam Muise amu...@hortonworks.com wrote: Harsh is giving you a best practice for JVMs using IPv4 in general. As what I am suggesting is IPv4-only connections to the Hadoop daemons and clients on the cluster and gateway,

Re: can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-10 Thread Jun Li
Hello Shahab, Thanks for the reply. Typically, to invoke the HDFS client, I will use bin/haddop dfs But the command that you used hadoop fs makes me wonder what this is the Hadoop 2.* client commands. Could you clarify for me such -D fs.local.block.size is supported in Hadoop 1.1. or

Re: can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-10 Thread Shahab Yunus
can be set at the time I load the file to the HDFS (that is, it is the client side setting)? I don't think you can do this while reading. These are done at the time of writing. You can do it like this (the example is for CLI as evident): hadoop fs -D fs.local.block.size=134217728 -put

Re: HTTP ERROR 500 when call application master proxy URL in Hadoop 2.1.0-beta

2013-09-10 Thread Jian Fang
Ok, seems there is a jira for this issue. https://issues.apache.org/jira/browse/YARN-800 On Mon, Sep 9, 2013 at 3:39 PM, Jian Fang jian.fang.subscr...@gmail.comwrote: Hi, I need to use the web services in application master, for example, curl

Re: whether dfs.domain.socket.path is supported in Hadoop 1.1?

2013-09-10 Thread Harsh J
HDFS-347 introduced this feature, and it is currently only available in 2.1.x onwards. On Wed, Sep 11, 2013 at 12:00 AM, Jun Li jltz922...@gmail.com wrote: Hi, In the link, http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml, the explanation:

Hadoop Metrics Issue in ganglia.

2013-09-10 Thread orahad bigdata
Hi All, I'm facing an issue while showing Hadoop metrics in ganglia, Though I have installed ganglia on my master/slaves nodes and I'm able to see all the default metrics on ganglia UI from all the nodes but I'm not able to see Hadoop metrics in metrics section. versions:- Hadoop 1.1.1 ganglia

Import data from MySql to HBase using Sqoop2

2013-09-10 Thread Dhanasekaran Anbalagan
Hi Guys, How to import mysql to Hbase table. I am using sqoop2 when i try to import table it's doesn't show storage as Hbase. Schema name: sqoop:000 create job --xid 12 --type import . . . . Boundary query: Output configuration Storage type: * 0 : HDFS* Choose: Please guide me. How to do

Re: hadoop cares about /etc/hosts ?

2013-09-10 Thread Cipher Chen
So for the first *wrong* /etc/hosts file, the sequence would be : find hdfs://master:54310 find master - 192.168.6.10 (*but it already got ip here*) find 192.168.6.10 - localhost find localhost - 127.0.0.1 The other thing, when 'ping master', i would got reply from '192.168.6.10' instead of