Re: the balance of datanodes

2013-05-02 Thread Harsh J
With replication factor 1, what you see is expected, if you also did your writes from a node that runs a DN (135.224.99.69 in your case - you ran the data load here). This is cause of the HDFS write optimization where if it finds a local DN to write to, it will write there. That fact, coupled

datanode's write_block_op_avg_time value

2013-05-02 Thread Jack
Hi, I check the datanode's write_block_op_avg_time on my cluster. It turns out the value of write_block_op_avg_time is about 2ms. Is that normal ? The replication is 3. Regards, Jack

Re: datanode's write_block_op_avg_time value

2013-05-02 Thread Harsh J
What are you finding alarming w.r.t. your cluster? The metric is simple: When did the write start, and when did it finally end, for a single block? The difference is the writeBlockOp time. The average is over a varied collection, which is what you're looking at. Are your jobs I/O bound? If so,

How to restore HDFS data

2013-05-02 Thread 段洪义
Datanode error, and then I run hadoop fsck /-delete.Lost a lot of data, Later, these datanode recovery, Is there any way to restore the fsck command to delete the data.

Re: How to restore HDFS data

2013-05-02 Thread Harsh J
If you've already run -delete (-move is a better choice if you know DN fault is temporary), then the missing blocks are already deleted from the namespace permanently. No way to recover the data since the blocks have also been invalidated by now. A HDFS Snapshots feature will arrive in near

Error writing file (Invalid argument)

2013-05-02 Thread Jean-Marc Spaggiari
Hi, I'm facing the issue below with Hadoop. Configuration: - 1 WAS node; - Replication factore setup to 1; - Short Circuit activated. Exception: 2013-05-02 14:02:41,063 INFO org.apache.hadoop.hdfs.server. datanode.DataNode: opWriteBlock

Saving data in db instead of hdfs

2013-05-02 Thread Chengi Liu
Hi, I am using hadoop streaming api (python) for some processing. While I want the data to be processed via hadoop but I want to pipe it to db instead of hdfs. How do I do this? THanks

HIVE question

2013-05-02 Thread KayVajj
I'm running CDH 4.1.2 and Hive version 0.9.0. I have a hive oozie action which is trying to load a file xyz.json into a hive table defined with a serde. The table already has a file named xyz.json. The oozie hivve action fails with the following error Hive history

Re: Why could not find finished jobs in yarn.resourcemanager.webapp.address?

2013-05-02 Thread Sandy Ryza
This shouldn't be asked on the dev lists, so putting mapreduce-dev and hdfs-dev in the bcc. Have you made sure you're not using the local job runner? Did you restart the resourcemanager after running the job? -Sandy On Thu, May 2, 2013 at 6:31 PM, sam liu samliuhad...@gmail.com wrote: Can

Re: Why could not find finished jobs in yarn.resourcemanager.webapp.address?

2013-05-02 Thread sam liu
I did not restart resourcemanager after running the job, and just launched a sample job directly using command 'hadoop jar share/hadoop/mapreduce/hadoop- mapreduce-examples-2.0.3-alpha.jar pi 2 30' 2013/5/3 Sandy Ryza sandy.r...@cloudera.com This shouldn't be asked on the dev lists, so putting

Re: Why could not find finished jobs in yarn.resourcemanager.webapp.address?

2013-05-02 Thread Sandy Ryza
In your yarn-site.xml, do you have mapreduce.framework.name set to YARN? On Thu, May 2, 2013 at 6:43 PM, sam liu samliuhad...@gmail.com wrote: I did not restart resourcemanager after running the job, and just launched a sample job directly using command 'hadoop jar

Re: Why could not find finished jobs in yarn.resourcemanager.webapp.address?

2013-05-02 Thread sam liu
That's the cause. After setting mapreduce.framework.name to YARN in mapred-site.xml, the MR job could be found in the 'FINISHED Applications' tab now. Thanks very much! 2013/5/3 Sandy Ryza sandy.r...@cloudera.com In your yarn-site.xml, do you have mapreduce.framework.name set to YARN? On

Re: Saving data in db instead of hdfs

2013-05-02 Thread Mirko Kämpf
Hi, just use Sqoop to push the data from HDFS to a database via JDBC. Intro to Sqoop: http://blog.cloudera.com/blog/2009/06/introducing-sqoop/ Or even use Hive-JDBC to connect to your result data from outside the hadoop cluster. You can also create your own OutputFormat (with Java API), which

Re: Saving data in db instead of hdfs

2013-05-02 Thread Ahmed Radwan
You can use the DBOutputFormat to directly write your job output to a DB, see: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/db/DBOutputFormat.html I'd also recommend looking into sqoop (http://sqoop.apache.org/) for more capabilities. On Thu, May 2, 2013 at 2:03 PM,