Hi all,
I use hadoop-0.21.0 distribution. I have a large number of small files (KB).
Is there any efficient way of handling it in hadoop?
I have heard that solution for that problem is using:
1. HAR (hadoop archives)
2. cat on files
I would like to know if there are any
hi,
every time after starting our hadoop cluster (using Cloudera's) this message
appears:
2011-09-13 04:35:05,207 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
mode extension entered.
The reported blocks 8995 has reached the threshold 0.9990 of total blocks
9005. Safe mode will be turned
Hi Naveen,
I use hadoop-0.21.0 distribution. I have a large number of small files (KB).
Word of warning, 0.21 is not a stable release. The recommended version
is in the 0.20.x range.
Is there any efficient way of handling it in hadoop?
I have heard that solution for that problem is using:
Hi,
This probably belongs on mapreduce-user as opposed to common-user. I
have BCC'ed the common-user group.
Generally it's a best practice to ship the scripts with the job. Like so:
hadoop jar
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-input
I believe it defaults to submit a job to the default queue if you don't
specify it. You don't have the default queue defined in your list of
mapred.queue.names. So add -Dmapred.job.queue.name=myqueue1 (or another
queue you have defined) to the wordcount command like:
bin/hadoop jar
Hi,
I am using the latest Cloudera distribution, and with that I am able to use
the latest Hadoop API, which I believe is 0.21, for such things as
import org.apache.hadoop.mapreduce.Reducer;
So I am using mapreduce, not mapred, and everything works fine.
However, in a small streaming job,
I am sure if you ask at provider's specific list you'll get a better answer
than from common Hadoop list ;)
Cos
On Wed, Sep 14, 2011 at 09:48PM, Mark Kerzner wrote:
Hi,
I am using the latest Cloudera distribution, and with that I am able to use
the latest Hadoop API, which I believe is
I am sorry, you are right.
mark
On Wed, Sep 14, 2011 at 9:52 PM, Konstantin Boudnik c...@apache.org wrote:
I am sure if you ask at provider's specific list you'll get a better answer
than from common Hadoop list ;)
Cos
On Wed, Sep 14, 2011 at 09:48PM, Mark Kerzner wrote:
Hi,
I am
Hey, thanks Joey for that information. Would work on what you said.
Regards
Naveen Mahale
On Wed, Sep 14, 2011 at 5:32 PM, Joey Echeverria j...@cloudera.com wrote:
Hi Naveen,
I use hadoop-0.21.0 distribution. I have a large number of small files
(KB).
Word of warning, 0.21 is not a
On 09/15/2011 08:18 AM, Mark Kerzner wrote:
Hi,
I am using the latest Cloudera distribution, and with that I am able to use
the latest Hadoop API, which I believe is 0.21, for such things as
import org.apache.hadoop.mapreduce.Reducer;
So I am using mapreduce, not mapred, and everything works
Thank you, Prashant, it seems so. I already verified this by refactoring the
code to use 0.20 API as well as 0.21 API in two different packages, and
streaming happily works with 0.20.
Mark
On Wed, Sep 14, 2011 at 11:46 PM, Prashant prashan...@imaginea.com wrote:
On 09/15/2011 08:18 AM, Mark
11 matches
Mail list logo