Maybe just a silly guess, did you close your Writer?
Yong
Date: Thu, 14 Nov 2013 12:47:13 +0530
Subject: Re: Folder not created using Hadoop Mapreduce code
From: unmeshab...@gmail.com
To: user@hadoop.apache.org
@rab ra: ys using filesystem s mkdir() we can create folders and we can also
create i
.
Date: Tue, 29 Oct 2013 08:57:32 +0100
Subject: Re: Why the reducer's input group count is higher than my
GroupComparator implementation
From: drdwi...@gmail.com
To: user@hadoop.apache.org
Did you overwrite the partitioner as well?
2013/10/29 java8964 java8964
Hi, I have a stran
than 11.
Date: Tue, 29 Oct 2013 08:57:32 +0100
Subject: Re: Why the reducer's input group count is higher than my
GroupComparator implementation
From: drdwi...@gmail.com
To: user@hadoop.apache.org
Did you overwrite the partitioner as well?
2013/10/29 java8964 java8964
Hi, I have
Hi, I have a strange question related to my secondary sort implementation in
the MR job.Currently I need to support 2nd sort in one of my MR job. I
implemented my custom WritableComparable like following:
public class MyPartitionKey implements WritableComparable {
String type;long id1;
has url
"hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I
specify as the output path for MR job?
Thanks
On Thursday, October 24, 2013 5:31 PM, java8964 java8964
wrote:
Just specify the output location using the URI to another cluster. As long
as the
Just specify the output location using the URI to another cluster. As long as
the network is accessible, you should be fine.
Yong
Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myx...@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org
The scenario is: I run mapr
snappy on hadoop 1.1.1
whats the output of ldd on that lib? Does it link properly? You should compile
natives for your platforms as the packaged ones may not link properly.
On Sat, Oct 5, 2013 at 2:37 AM, java8964 java8964
wrote:
I kind of read the hadoop 1.1.1 source code for this,
I kind of read the hadoop 1.1.1 source code for this, it is very strange for me
now.
>From the error, it looks like runtime JVM cannot find the native method of
>org/apache/hadoop/io/compress/snappy/SnappyCompressor.compressBytesDirect()I,
>that my guess from the error message, but from the log,
Hi,
I am using hadoop 1.1.1. I want to test to see the snappy compression with
hadoop, but I have some problems to make it work on my Linux environment.
I am using opensuse 12.3 x86_64.
First, when I tried to enable snappy in hadoop 1.1.1 by:
conf.setBoolean("mapred.compress.map.outp
Hi, I have a question related to how the mapper generated for the input files
from HDFS. I understand the split and blocks concept in the HDFS, but my
originally understanding is that one mapper will only process data from one
file in HDFS, no matter how small this file it is. Is that correct?
T
I am also thinking about this for my current project, so here I share some of
my thoughts, but maybe some of them are not correct.
1) In my previous projects years ago, we store a lot of data as plain text, as
at that time, people thinks the Big data can store all the data, no need to
worry abou
Not exactly know what you are trying to do, but it seems like the memory is
your bottle neck, and you think you have enough CPU resource, so you want to
use multi-thread to utilize CPU resources?
You can start multi-threads in your mapper, as if you think your mapper logic
is very cpu intensive
Just curious, any reason you don't want to use the DFSDataInputStream?
Yong
Date: Thu, 26 Sep 2013 16:46:00 +0200
Subject: Extending DFSInputStream class
From: tmp5...@gmail.com
To: user@hadoop.apache.org
Hi
I would like to wrap DFSInputStream by extension. However it seems that the
DFSInputStr
Hi, I have a question related to sequence file. I wonder why I should use it
under what kind of circumstance?
Let's say if I have a csv file, I can store that directly in HDFS. But if I do
know that the first 2 fields are some kind of key, and most of MR jobs will
query on that key, will it make
Or you do the calculation in the reducer close() method, even though I am not
sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your
Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration
object as
Hi, I currently have a project to process the data using MR. I have some
thoughts about it, and am looking for some advices if anyone had any feedback.
Currently in this project, I have lot of events data related to email tracking
coming into the HDFC. So the events are the data for email trackin
Did you do a hadoop version upgrade before this error happened?
Yong
Date: Wed, 11 Sep 2013 16:57:54 +0800
From: heya...@jiandan100.cn
To: user@hadoop.apache.org
CC: user-unsubscr...@hadoop.apache.org
Subject: help!!!,what is happened with my project?
Hi:
Today when I
The error doesn't mean the file not existed in the HDFS, but it means local
disk. If you read the error stack trace:
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581)
It indicates the error happened on Local file system.
If you try to copy data from an existing
Well, The reducers normally will take much longer than the mappers stage,
because the copy/shuffle/sort all happened at this time, and they are the hard
part.
But before we simply say it is part of life, you need to dig into more of your
MR jobs to find out if you can make it faster.
You are the
The method getPartition() needs to return a positive number. Simply use
hashCode() method is not enough.
See the Hadoop HashPartitioner implementation:
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
When I first read this code, I always wonder why not use Math.abs? Is ( &
I
What's wrong by using old Unix pipe?
hadoop fs -cat /user/input/foo.txt | head -100 > local_file
Date: Thu, 29 Aug 2013 13:50:37 -0700
Subject: Re: copy files from hdfs to local fs
From: chengi.liu...@gmail.com
To: user@hadoop.apache.org
tail will work as well.. ??? but i want to extract just (sa
I am not sure the original suggestion will work for your case.
My understanding is the you want to use some API, only exists in slf4j versiobn
1.6.4, but this library with different version already existed in your hadoop
environment, which is quite possible.
To change the maven build of the appli
As Harsh said, sometime you want to do the 2nd sort, but for MR, it can only be
sorted by key, not by value.
A lot of time, you want to the reducer output sort by a field, but only do the
sort within a group, kind of like 'windowing sort' in relation DB SQL. For
example, if you have a data about
lave nodes, it works fine. I
am not able to figure out how to fix this and the reason for the error. I am
not understand why it complains about the input directory is not present. As
far as I know, slave nodes get a map and map method contains contents of the
input file. This should be fine f
If you don't plan to use HDFS, what kind of sharing file system you are going
to use between cluster? NFS?For what you want to do, even though it doesn't
make too much sense, but you need to the first problem as the shared file
system.
Second, if you want to process the files file by file, inste
Hi,
This is a 4 node hadoop cluster running on CentOS 6.3 with Oracle JDK (64bit)
1.6.0_43. Each node has 32G memory, with max 8 mapper tasks and 4 reducer tasks
being set. The hadoop version is 1.0.4.
This is setup on Datastax DES 3.0.2, which is using Cassandra CFS as underline
DFS, instead o
I am also interested in your research. Can you share some insight about the
following questions?
1) When you use CompressionCodec, can the encrypted file split? From my
understand, there is no encrypt way can make the file decryption individually
by block, right? For example, if I have 1G file
Can someone share some idea what the Hadoop source code of class
org.apache.hadoop.io.compress.BlockDecompressorStream, method rawReadInt() is
trying to do here?
There is a comment in the code this this method shouldn't return negative
number, but in my testing file, it contains the following b
Hi, Davie:
I am not sure I understand this suggestion. Why smaller block size will help
this performance issue?
>From what the original question about, it looks like the performance problem
>is due to that there are a lot of small files, and each file will run in its
>own mapper.
As hadoop nee
I don't think you can get list of all input files in the mapper, but what you
can get is the current file's information.
In the context object reference, you can get the InputSplit(), which should
give you all the information you want of the current input file.
http://hadoop.apache.org/docs/r2.0
Hi, Chris:
Here is my understand about the file split and Data block.
The HDFS will store your file into multi data blocks, each block will be 64M or
128M depend on your setting. Of course, the file could contain multi records.
So the boundary of the record won't match with the block boundary (i
e can convert
any existing Writable into an encrypted form. Dave From: java8964 java8964
[mailto:java8...@hotmail.com]
Sent: Sunday, February 10, 2013 3:50 AM
To: user@hadoop.apache.org
Subject: Question related to Decompressor interface HI, Currently I am
researching about options of encry
Our cluster on cdh3u4 has the same problem. I think it is caused by some bugs
in JobTracker. I believe Cloudera knows about this issue.
After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure if
it is confirmed to fix in the CDH3U5.
Yong
> Date: Mon, 4 Feb 2013 15:21:18 -08
What range you gave it for mapred.task.profile.maps? And you sure your mapper
will invoke the methods you expect in the traces?
Yong
Date: Wed, 6 Feb 2013 23:50:08 +0200
Subject: Profiling the Mapper using hprof on Hadoop 0.20.205
From: yaron.go...@gmail.com
To: user@hadoop.apache.org
Hi,I wish
Ted comments on performance are
spot on.
Regards
Bertrand
On Thu, Oct 4, 2012 at 9:02 PM,
java8964 java8964 wrote:
I did the cumulative sum in the HIVE UDF, as one o
I did the cumulative sum in the HIVE UDF, as one of the project for my employer.
1) You need to decide the grouping elements for your cumulative. For example,
an account, a department etc. In the mapper, combine these information as your
omit key.2) If you don't have any grouping requirement, yo
Hi,
During my development of ETLs on hadoop platform, there is one question I want
to ask, why hadoop didn't provide a round robin partitioner?
>From my experience, it is very powerful option for small limited distinct
>value keys case, and balance the ETL resource. Here is what I want to say:
1
37 matches
Mail list logo