Re: R on hadoop

2014-03-27 Thread Sri
Try OpenSource h2o.ai - a cran-style package that allows fast scalable R on Hadoop in-Memory. One can invoke single threaded R from h2o package and the runtime on clusters is Java (not R!) - So you get better memory management. http://docs.0xdata.com/deployment/hadoop.html

Job froze for hours because of an unresponsive disk on one of the task trackers

2014-03-27 Thread Krishna Rao
Hi, we have a daily Hive script that usually takes a few hours to run. The other day I notice one of the jobs was taking in excess of a few hours. Digging into it I saw that there were 3 attempts to launch a job on a single node: Task Id Start Time Finish Time Error

Re: Job froze for hours because of an unresponsive disk on one of the task trackers

2014-03-27 Thread Dieter De Witte
The ids of the tasks are different so the node got killed after failing on 3 different(!) reduce tasks. The reduce task 48 will probably have been resubmitted to another node. 2014-03-27 10:22 GMT+01:00 Krishna Rao krishnanj...@gmail.com: Hi, we have a daily Hive script that usually takes a

Re: Job froze for hours because of an unresponsive disk on one of the task trackers

2014-03-27 Thread Krishna Rao
I noticed, but none of the jobs ended up being re-submitted! And all 3 of those jobs failed on the same node. All we know is that the disk on that node became unresponsive. On 27 March 2014 09:33, Dieter De Witte drdwi...@gmail.com wrote: The ids of the tasks are different so the node got

[hadoop-user] subject missing

2014-03-27 Thread Andrew Holway
Hi, All the other mailing lists that I am part of usually put [centos] or [foobar] in the subject field. Is there a way to set up on this mailing list? Thanks, Andrew

ipc.Client: Retrying connect to server

2014-03-27 Thread Mahmood Naderan
Hi, I don't know what mistake I did that now I get this error   INFO ipc.Client:Retrying connect toserver:localhost/127.0.0.1:54310.Already tried2time(s);retry policy isRetryUpToMaximumCountWithFixedSleep(maxRetries=10,sleepTime=1SECONDS)   INFO ipc.Client:Retrying connect

RE: ipc.Client: Retrying connect to server

2014-03-27 Thread John Lilley
Does netstat -an | grep LISTEN show these ports being listened on? Can you stat hdfs from the command line e.g.: hdfs dfsadmin -report hdfs fsck / hdfs dfs -ls / Also, check out /var/log/hadoop or /var/log/hdfs for more details. john From: Mahmood Naderan [mailto:nt_mahm...@yahoo.com] Sent:

RE: Hadoop Takes 6GB Memory to run one mapper

2014-03-27 Thread John Lilley
Could you have a pmem-vs-vmem issue as in: http://stackoverflow.com/questions/8017500/specifying-memory-limits-with-hadoop john From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Tuesday, March 25, 2014 7:38 AM To: user@hadoop.apache.org Subject: Re: Hadoop Takes 6GB Memory to run one

RE: Hadoop Takes 6GB Memory to run one mapper

2014-03-27 Thread John Lilley
This discussion may also be relevant to your question: http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits Do you actually need to specify that -Xmx6000m for java heap or could it be one of the other issues discussed? John From: John Lilley

Re: ipc.Client: Retrying connect to server

2014-03-27 Thread Mahmood Naderan
One more thing. I ran hadoop namenode and it says that namenode has not been formatted!! But I have run  classification commands some days ago and the size of data dir is nearly 60GB containing my data. So why it says that namenode has not been formatted? Please see the output $ hadoop

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Jonathan Poon
Hi Stanley, Sorry about the confusion, but I'm trying to read a txt file into my Mapper function. I am trying to copy the file using the -files option when submitting the Hadoop job. I try to obtain the filename using the following lines of code in my Mapper: URI[] localPaths =

Maps stuck on Pending

2014-03-27 Thread Clay McDonald
Hi all, I have a job running with 1750 maps and 1 reduce and the status has been the same for the last two hours. Any thoughts? Thanks, Clay

Re: Maps stuck on Pending

2014-03-27 Thread Serge Blazhievsky
Next step would be to look in the logs under userlog directory for that job Sent from my iPhone On Mar 27, 2014, at 11:08 AM, Clay McDonald stuart.mcdon...@bateswhite.com wrote: Hi all, I have a job running with 1750 maps and 1 reduce and the status has been the same for the last two

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Jonathan Poon
Hi Serge, I'm using the -files option through the hadoop cli. The following lines of code works Path[] localPaths = context.getLocalCacheFiles(); String configFilename = localPaths[0].toString(); However, context.getLocalCacheFiles() is deprecated. What is the correct equivalent function in

Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
I have a simple M/R job using Mapper only thus no reducer. The mapper read a timestamp from the value, generate a path to the output file and writes the key and value to the output file. The input file is a sequence file, not compressed and stored in the HDFS, it has a size of 162.68 MB. Output

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Thomas Bentsen
Have you checked the content of the files you write? /th On Thu, 2014-03-27 at 11:43 -0700, Kim Chew wrote: I have a simple M/R job using Mapper only thus no reducer. The mapper read a timestamp from the value, generate a path to the output file and writes the key and value to the output

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
Yea, gonna do that. 8-) Kim On Thu, Mar 27, 2014 at 12:30 PM, Thomas Bentsen t...@bentzn.com wrote: Have you checked the content of the files you write? /th On Thu, 2014-03-27 at 11:43 -0700, Kim Chew wrote: I have a simple M/R job using Mapper only thus no reducer. The mapper read a

RE: Maps stuck on Pending

2014-03-27 Thread Clay McDonald
Thanks Serge, looks like I need to at memory to my datanodes. Clay McDonald Cell: 202.560.4101 Direct: 202.747.5962 -Original Message- From: Serge Blazhievsky [mailto:hadoop...@gmail.com] Sent: Thursday, March 27, 2014 2:16 PM To: user@hadoop.apache.org Cc: user@hadoop.apache.org

reducing HDFS FS connection timeouts

2014-03-27 Thread John Lilley
It seems to take a very long time to timeout a connection to an invalid NN URI. Our application is interactive so the defaults of taking many minutes don't work well. I've tried setting: conf.set(ipc.client.connect.max.retries, 2); conf.set(ipc.client.connect.timeout, 7000); before calling

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
I am also wondering if, say, I have two identical timestamp so they are going to be written to the same file. Does MulitpleOutputs handle appending? Thanks. Kim On Thu, Mar 27, 2014 at 12:30 PM, Thomas Bentsen t...@bentzn.com wrote: Have you checked the content of the files you write? /th

Re: Hadoop 2.2.0 Distributed Cache

2014-03-27 Thread Azuryy
-files was used by hive, not MR. So it cannot be recognized by your MR job. Sent from my iPhone5s On 2014年3月28日, at 2:31, Jonathan Poon jkp...@ucdavis.edu wrote: Hi Serge, I'm using the -files option through the hadoop cli. The following lines of code works Path[] localPaths =

Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-27 Thread Kim Chew
Thanks folks. I am not awared my input data file has been compressed. FileOutputFromat.setCompressOutput() is set to true when the file is written. 8-( Kim On Thu, Mar 27, 2014 at 5:46 PM, Mostafa Ead mostafa.g@gmail.comwrote: The following might answer you partially: Input key is not

block location

2014-03-27 Thread Libo Yu
Hi all, The hadoop blocks of a region may be moved to different region servers (e.g. by hadoop rebalancer). In such a scenario, is there a way to fix that and move all the blocks to the region server that hosts the region? Should hadoop rebalancer never be used when hbase is in use? Thanks.

Re: block location

2014-03-27 Thread Bryan Beaudreault
Correct, do not use the hadoop balancer on a cluster serving HBase data. Let HBase handle it at the region level. To recover locality, you will need to run major compactions of the affected regions or tables. In hbase shell: major_compact 'tablename' On Thu, Mar 27, 2014 at 9:33 PM, Libo Yu

RE: block location

2014-03-27 Thread Libo Yu
Thanks for the answer. If I run the major compaction on a region that only has a single hfile whose blocks may reside on different data nodes, will major compact occur in this case and recover the data locality? Thanks. From: bbeaudrea...@hubspot.com Date: Thu, 27 Mar 2014 21:44:49 -0400

How to get locations of blocks programmatically?‏

2014-03-27 Thread Libo Yu
Hi all, hadoop path fsck -files -block -locations can list locations for all blocks in the path. Is it possible to list all blocks and the block locations for a given path programmatically? Thanks, Libo

Re: How to get locations of blocks programmatically?

2014-03-27 Thread Wangda Tan
Hi Libo, DFSClient.getBlockLocations, is this what you want? Regards, Wangda Tan On Fri, Mar 28, 2014 at 10:03 AM, Libo Yu yu_l...@hotmail.com wrote: Hi all, hadoop path fsck -files -block -locations can list locations for all blocks in the path. Is it possible to list all blocks and the

RE: How to get locations of blocks programmatically?

2014-03-27 Thread Libo Yu
Thanks, it is useful. If I know the block I'd, how to find out its location? From: wheele...@gmail.com Date: Fri, 28 Mar 2014 10:12:10 +0800 Subject: Re: How to get locations of blocks programmatically? To: user@hadoop.apache.org Hi Libo,DFSClient.getBlockLocations, is this what you want?

mapred job -list error

2014-03-27 Thread haihong lu
dear all: I had a problem today, when i executed the command mapred job -list on a slave, an error came out. show the message as below: 14/03/28 11:18:47 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 14/03/28 11:18:47 INFO

HADOOP_MAPRED_HOME not found!

2014-03-27 Thread Avinash Kujur
hi, when i am trying to execute this command: hadoop job -history ~/1 its giving error like: DEPRECATED: Use of this script to execute mapred command is deprecated. Instead use the mapred command for it. HADOOP_MAPRED_HOME not found! from where can i get HADOOP_MAPRED_HOME? thanks.

Re: HADOOP_MAPRED_HOME not found!

2014-03-27 Thread divye sheth
Which version of hadoop are u using? AFAIK the hadoop mapred home is the directory where hadoop is installed or in other words untarred. Thanks Divye Sheth On Mar 28, 2014 10:43 AM, Avinash Kujur avin...@gmail.com wrote: hi, when i am trying to execute this command: hadoop job -history ~/1

Re: HADOOP_MAPRED_HOME not found!

2014-03-27 Thread Avinash Kujur
yes, it is there. then why execution of that command throwing such error. Do i need to change anything in this hadoop file ? #!/usr/bin/env bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this