With replication factor 1, what you see is expected, if you also did
your writes from a node that runs a DN (135.224.99.69 in your case -
you ran the data load here).
This is cause of the HDFS write optimization where if it finds a local
DN to write to, it will write there. That fact, coupled
Hi,
I check the datanode's write_block_op_avg_time on my cluster. It turns
out the value of write_block_op_avg_time is about 2ms. Is that
normal ? The replication is 3.
Regards,
Jack
What are you finding alarming w.r.t. your cluster?
The metric is simple: When did the write start, and when did it finally
end, for a single block? The difference is the writeBlockOp time. The
average is over a varied collection, which is what you're looking at.
Are your jobs I/O bound? If so,
Datanode error, and then I run hadoop fsck /-delete.Lost a lot of data,
Later, these datanode recovery, Is there any way to restore the fsck
command to delete the data.
If you've already run -delete (-move is a better choice if you know DN
fault is temporary), then the missing blocks are already deleted from the
namespace permanently. No way to recover the data since the blocks have
also been invalidated by now.
A HDFS Snapshots feature will arrive in near
Hi,
I'm facing the issue below with Hadoop.
Configuration:
- 1 WAS node;
- Replication factore setup to 1;
- Short Circuit activated.
Exception:
2013-05-02 14:02:41,063 INFO org.apache.hadoop.hdfs.server.
datanode.DataNode: opWriteBlock
Hi,
I am using hadoop streaming api (python) for some processing.
While I want the data to be processed via hadoop but I want to pipe it to
db instead of hdfs.
How do I do this?
THanks
I'm running CDH 4.1.2 and Hive version 0.9.0.
I have a hive oozie action which is trying to load a file xyz.json into a
hive table defined with a serde.
The table already has a file named xyz.json.
The oozie hivve action fails with the following error
Hive history
This shouldn't be asked on the dev lists, so putting mapreduce-dev and
hdfs-dev in the bcc. Have you made sure you're not using the local job
runner? Did you restart the resourcemanager after running the job?
-Sandy
On Thu, May 2, 2013 at 6:31 PM, sam liu samliuhad...@gmail.com wrote:
Can
I did not restart resourcemanager after running the job, and just launched
a sample job directly using command 'hadoop jar
share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.0.3-alpha.jar pi 2 30'
2013/5/3 Sandy Ryza sandy.r...@cloudera.com
This shouldn't be asked on the dev lists, so putting
In your yarn-site.xml, do you have mapreduce.framework.name set to YARN?
On Thu, May 2, 2013 at 6:43 PM, sam liu samliuhad...@gmail.com wrote:
I did not restart resourcemanager after running the job, and just launched
a sample job directly using command 'hadoop jar
That's the cause. After setting mapreduce.framework.name to YARN in
mapred-site.xml, the MR job could be found in the 'FINISHED Applications'
tab now. Thanks very much!
2013/5/3 Sandy Ryza sandy.r...@cloudera.com
In your yarn-site.xml, do you have mapreduce.framework.name set to YARN?
On
Hi,
just use Sqoop to push the data from HDFS to a database via JDBC.
Intro to Sqoop:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/
Or even use Hive-JDBC to connect to your result data from outside the
hadoop cluster.
You can also create your own OutputFormat (with Java API), which
You can use the DBOutputFormat to directly write your job output to a
DB, see:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/db/DBOutputFormat.html
I'd also recommend looking into sqoop (http://sqoop.apache.org/) for
more capabilities.
On Thu, May 2, 2013 at 2:03 PM,
14 matches
Mail list logo