There are many instances of getFileBlockLocations in hadoop/fs. Can you explain
which one is the main?
It must be combined with a method of logically splitting the input data along
block boundaries, and of launching tasks on worker nodes that are close to
the data splits
Is this a user level
hadoop fsck mytext.txt -files -locations -blocks
I expect something like a tag which is attached to each block (say block X)
that shows the position of the replicated block of X. The method you mentioned
is a user level task. Am I right?
Regards,
Mahmood
I recently upgraded from 1.0.4 to 1.1.2. Now however my HDFS won't start
up. There appears to be something wrong in the edits file.
Obviously I can roll back to a previous checkpoint, however it appears
checkpointing has been failing for some time and my last check point is
over a month old.
Looking in the source, it appears that In HDFS, the Namenode supports
getting this info directly via the client, and ultimately communicates
block locations to the DFSClient , which is used by the
DistributedFileSystem.
/**
* @see ClientProtocol#getBlockLocations(String, long, long)
*/
Hi Agarwal,
I once have similar questions, and have done some experiment. Here is my
experience:
1. For some applications over MR, like HBase, Hive, which does not need to
submit additional files to HDFS, file:/// could work well without any
problem (According to my test).
2. For simple MR
thanks a lot John
*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
On Mon, Jun 3, 2013 at 10:29 PM, John Lilley john.lil...@redpoint.netwrote:
I had asked a similar question recently:
** **
First, follow
ClientProtocol namenode = DFSClient.createNamenode(conf);
HdfsFileStatus hfs = namenode.getFileInfo(your_hdfs_file_name);
LocatedBlocks lbs = namenode.getBlockLocations(your_hdfs_file_name, 0,
hfs.getLen());
for (LocatedBlock lb : lbs.getLocatedBlocks()) {
DatanodeInfo[] info =
hadoop fsck path -files -blocks -locations –racks
replace path with real path J
From: 一凡 李 [mailto:zhuazhua_...@yahoo.com.cn]
Sent: Tuesday, June 04, 2013 12:49 PM
To: user@hadoop.apache.org
Subject: how to locate the replicas of a file in HDFS?
Hi,
Could you tell me how to
Try this command
hadoop fsck file path -files -blocks
On Tue, Jun 4, 2013 at 3:41 PM, zangxiangyu zangxian...@qiyi.com wrote:
hadoop fsck * path* -files -blocks -locations –racks
** **
replace path with real path J
** **
*From:* 一凡 李 [mailto:zhuazhua_...@yahoo.com.cn]
Hi again,
unfortunately my problem is not solved.
I downloaded Hadoop v. 1.1.2a and made a basic configuration as suggested in
[1].
No security, no ACLs, default scheduler ... The files are attached.
I still have the same error message. I also tried another Java version (6u45
instead of 7u21).
Hi Matteo,
Are you able to add more space to your test machines? Also, what says the pi
example (hadoop jar hadoop-examples pi 10 10 ?
- Alex
On Jun 4, 2013, at 4:34 PM, Lanati, Matteo matteo.lan...@lrz.de wrote:
Hi again,
unfortunately my problem is not solved.
I downloaded Hadoop v.
Hi Alex,
you gave me the right perspective ... pi works ;-) . It's finally satisfactory
seeing it at work.
The job finished without problems.
I'll try some other test programs such as grep, to check that there are no
problems with input files.
Thanks,
Matteo
On Jun 4, 2013, at 5:43 PM,
Hi,
I have the following redcuer class
public static class TokenCounterReducer
extends ReducerText, Text, Text, Text {
public void reduce(Text key, IterableText values, Context context)
throws IOException, InterruptedException {
//String[] fields = s.split(\t, -1)
When you use the HDFS client interface to read a file, it automatically figures
out which datanodes to contact for reading which blocks. There isn't really a
main block. However I have read that the first location listed for each
block is the recommended one to read for an outside client.
http://hadoop.apache.org/docs/current/hdfs_user_guide.html
--
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
laser...@cloudera.com
The part-m-0,part-m-1 file names are Hadoop naming conventions. To
use custom output file names use the MultipleOutputs class.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
With MultipleOutputs the file name may be customized as
If you need to save the JSON as it is then you could implement OutputFormat
to create you custom outputformat that'll allow you to write the data as
per your wish.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Tue, Jun 4, 2013 at 11:39 PM, Chengi Liu chengi.liu...@gmail.com wrote:
Hi,
I
Have you tried something like this (i do not have a pc here to check this
code)
context.write(NullWritable, new Text(jsn.toString()));
On Jun 4, 2013 8:10 PM, Chengi Liu chengi.liu...@gmail.com wrote:
Hi,
I have the following redcuer class
public static class TokenCounterReducer
Yes...This should do the trick.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Wed, Jun 5, 2013 at 1:38 AM, Niels Basjes ni...@basjes.nl wrote:
Have you tried something like this (i do not have a pc here to check this
code)
context.write(NullWritable, new Text(jsn.toString()));
On Jun 4,
Answered my own question. The Eclipse installs with Centos6 (or with yum)
seems to have this problem. A direct download of Eclipse for Java EE works
fine.
John
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Monday, June 03, 2013 5:49 PM
To: user@hadoop.apache.org; Deepak Vohra
Chengi,
You can also see this for pointers:
http://java.dzone.com/articles/hadoop-practice
Regards,
Shahab
On Tue, Jun 4, 2013 at 4:15 PM, Mohammad Tariq donta...@gmail.com wrote:
Yes...This should do the trick.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Wed, Jun 5, 2013 at 1:38
Going by what I have read ,I think its a general purpose hook of Yarn arch.
to run any service in node managers. Hadoop uses this for shuffle service .
Other yarn based applications might use this as well.
Thanks,
Rahul
On Wed, Jun 5, 2013 at 4:00 AM, John Lilley john.lil...@redpoint.netwrote:
Hi,
This is very basic fundamental question.
Is time among all nodes needs to be synced?
I've never even thought of timing in hadoop cluster but recently
experienced my servers going out of sync with time. I know hbase requires
time to by synced due to its timestamp action. But I wonder any of
23 matches
Mail list logo