we can execute the above command anywhere or do i need to execute it in any
particular directory?
thanks
On Thu, Mar 27, 2014 at 11:41 PM, divye sheth wrote:
> I believe you are using Hadoop 2. In order to get the mapred working you
> need to set the HADOOP_MAPRED_HOME path in either your /etc
You can execute this command on any machine where you have set the
HADOOP_MAPRED_HOME
Thanks
Divye Sheth
On Fri, Mar 28, 2014 at 12:31 PM, Avinash Kujur wrote:
> we can execute the above command anywhere or do i need to execute it in
> any particular directory?
>
> thanks
>
>
> On Thu, Mar 27,
i am not getting where to set HADOOP_MAPRED_HOME and how to set.
thanks
On Fri, Mar 28, 2014 at 12:06 AM, divye sheth wrote:
> You can execute this command on any machine where you have set the
> HADOOP_MAPRED_HOME
>
> Thanks
> Divye Sheth
>
>
> On Fri, Mar 28, 2014 at 12:31 PM, Avinash Kujur
Yes, use
http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,
long, long)
On Fri, Mar 28, 2014 at 7:33 AM, Libo Yu wrote:
> Hi all,
>
> "hadoop path fsck -files -block -locations" can list locations for all
> blocks in th
Please also indicate your exact Hadoop version in use.
On Fri, Mar 28, 2014 at 9:04 AM, haihong lu wrote:
> dear all:
>
> I had a problem today, when i executed the command "mapred job
> -list" on a slave, an error came out. show the message as below:
>
> 14/03/28 11:18:47 INFO Config
Hi Avinash,
The export command you can execute on any one machine in the cluster as of
now. Once you have executed the export command i.e. export
HADOOP_MAPRED_HOME=/path/to/your/hadoop/installation you can then execute
the mapred job -list command from that very same machine.
Thanks
Divye Sheth
There's is a big chance that your map output is being copied to your
reducer, this could take quite some time if you have a lot of data and
could be resolved by:
1) having more reducers
2) adjust the slowstart parameter so that the copying can start while the
map tasks are still running
Regards,
I have a program that do some map-reduce job and then read the result
of the job.
I learned that hdfs is not strong consistent. when it's safe to read the result?
as long as output/_SUCCESS exist?
_SUCCES implies that the job has succesfully terminated, so this seems like
a reasonable criterion.
Regards, Dieter
2014-03-28 9:33 GMT+01:00 Li Li :
> I have a program that do some map-reduce job and then read the result
> of the job.
> I learned that hdfs is not strong consistent. when it's s
thanks. is the following codes safe?
int exitCode=ToolRunner.run()
if(exitCode==0){
//safe to read result
}
On Fri, Mar 28, 2014 at 4:36 PM, Dieter De Witte wrote:
> _SUCCES implies that the job has succesfully terminated, so this seems like
> a reasonable criterion.
>
> Regards, Dieter
>
>
>
How to run data node block scanner on data node in a cluster from a
remote machine?
By default data node executes block scanner in 504 hours. This is the
default value of dfs.datanode.scan.period . If I want to run the data
node block scanner then one way is to configure the propert
How to run data node block scanner on data node in a cluster from a remote
machine?
By default data node executes block scanner in 504 hours. This is the
default value of dfs.datanode.scan.period . If I want to run the data
node block scanner then one way is to configure the property of
dfs.
To ensure data I/O integrity, hadoop uses CRC 32 mechanism to generate
checksum for the data stored on hdfs . But suppose I have a data node machine
that does not have ecc(error correcting code) type of memory, So will hadoop
hdfs will be able to generate checksum for data blocks when read/wri
Hello Reena,
No there isn't a programmatic way to invoke the block scanner. Note
though that the property to control its period is DN-local, so you can
change it on DNs and do a DN rolling restart to make it take effect
without requiring a HDFS downtime.
On Fri, Mar 28, 2014 at 3:07 PM, reena upa
While the HDFS functionality of computing, storing and validating
checksums for block files does not specifically _require_ ECC, you do
_want_ ECC to avoid frequent checksum failures.
This is noted in Tom's book as well, in the chapter that discusses
setting up your own cluster:
"ECC memory is str
I was going through this link
http://stackoverflow.com/questions/9406477/data-integrity-in-hdfs-which-data-nodes-verifies-the-checksum
. Its written that in recent version of hadoop only the last data node
verifies the checksum as the write happens in a pipeline fashion.
Now I have a question:
hi,
how can i be assignee fro a particular issue?
i can't see any option for being assignee on the page.
Thanks.
no doubt
Sent from my iPhone 6
> On Mar 23, 2014, at 17:37, Fengyun RAO wrote:
>
> What does this exception mean? I googled a lot, all the results tell me it's
> because the time is not synchronized between datanode and namenode.
> However, I checked all the servers, that the ntpd service is o
Hey,
I did look in HDFS for replication in filesystem master x slave.
Have any way to do master x master?
I just have 1 TB of files in a server and i want to replicate to another
server, in real time sync.
Thanks !
Hi All,
I have created a wiki on github:
https://github.com/ercoppa/HadoopDiagrams/wiki
This is an effort to provide an updated documentation of how the internals
of Hadoop work. The main idea is to help the user understand the "big
picture" without removing too much internal details. You can f
If you’re spitballing options might also look at Pattern
http://www.cascading.org/projects/pattern/
Has some nuances so be sure to spend the time to vet your specific use case
(i.e. what you’re actually doing in R and what you want to accomplish
leveraging data in Hadoop).
From: Sri [mailto:ha
You mean replication between two different hadoop cluster or you just need data
to be replicated between two different nodes?
Sent from my iPhone
> On Mar 28, 2014, at 8:10 AM, Victor Belizário
> wrote:
>
> Hey,
>
> I did look in HDFS for replication in filesystem master x slave.
>
> Have
what is your compression format gzip, lzo or snappy
for lzo final output
FileOutputFormat.setCompressOutput(conf, true);
FileOutputFormat.setOutputCompressorClass(conf, LzoCodec.class);
In addition, to make LZO splittable, you need to make a LZO index file.
On Thu, Mar 27, 2014 at 8:57 PM, Kim
have you looked into FileSystem API this is hadoop v2.2.0
http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fs/FileSystem.html
does not exist in
http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/fs/FileSystem.html
org.apache.hadoop.fs.RemoteIteratorhttp://hadoop.apache.org/do
how about adding
ipc.client.connect.max.retries.on.timeouts
*2 (default is 45)*Indicates the number of retries a client will make on
socket timeout to establish a server connection.
does that help?
On Thu, Mar 27, 2014 at 4:23 PM, John Lilley wrote:
> It seems to take a very long time to timeo
Hi Avin,
You should be added as an sub-project's contributor, then you can be an
assignee. so you can find how to be an contributor on the Wiki.
On Fri, Mar 28, 2014 at 6:50 PM, Avinash Kujur wrote:
> hi,
>
> how can i be assignee fro a particular issue?
> i can't see any option for being assi
Very helpful indeed Emillio, thanks!
On Fri, Mar 28, 2014 at 12:58 PM, Emilio Coppa wrote:
> Hi All,
>
> I have created a wiki on github:
>
> https://github.com/ercoppa/HadoopDiagrams/wiki
>
> This is an effort to provide an updated documentation of how the internals
> of Hadoop work. The main
if the job complets without any failures exitCode should be 0 and safe
to read the result
public class MyApp extends Configured implements Tool {
public int run(String[] args) throws Exception {
// Configuration processed by ToolRunner
Configuration conf = getConf();
Hi Victor,
if by replication you mean copy from one cluster to other, you can use the
distcp command.
Cheers.
On 28 Mar 2014, at 16:30, Serge Blazhievsky wrote:
> You mean replication between two different hadoop cluster or you just need
> data to be replicated between two different nodes?
None of that.
I checked the the input file's SequenceFile Header and it says
"org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater"
Kim
On Fri, Mar 28, 2014 at 10:34 AM, Hardik Pandya wrote:
> what is your compression format gzip, lzo or snappy
>
> for lzo final output
>
> FileOutputFormat.s
Hi Reena,
the pipeline is per block. If you have half of your file in data node A only,
that means the pipeline had only one node (node A, in this case, probably
because replication factor is set to 1) and then, data node A has the checksums
for its block. The same applies to data node B.
Al
hello experts,
am really new to hadoop - Is it possible to find out based on pig or hive
query to find out under the hood map reduce algorithm??
thanks
You can use ILLUSTRATE and EXPLAIN commands to see the execution plan, if
you mean that by 'under the hood algorithm'
http://pig.apache.org/docs/r0.11.1/test.html
Regards,
Shahab
On Fri, Mar 28, 2014 at 5:51 PM, Spark Storm wrote:
> hello experts,
>
> am really new to hadoop - Is it possible
Hi Everybody,
I am trying to get my first hadoop cluster started using the Amazon EC2. I
tried quite a few times and searched the web for the solutions, yet I still
cannot get it up. I hope somebody can help out here.
Here is what I did based on the Apache Whirr Quick Guide (
http://whirr.apache.
Hi Max,
Not sure if you have already, but you might also want to look into
Apache Ambari [1] for provisioning, managing, and monitoring Hadoop
clusters.
Many have successfully deployed Hadoop clusters on EC2 using Ambari.
[1] http://ambari.apache.org/
Yusaku
On Fri, Mar 28, 2014 at 7:07 PM, Max
35 matches
Mail list logo