You should setNumberReducerTask in your job, just there is no such max reducer
count in the Yarn any more.
Setting reducer count is kind of art, instead of science.
I think there is only one rule about it, don't set the reducer number larger
than the reducer input group count.
Set the reducer nu
Don't be confused by 6.03 MB/s.
The relationship between mapper and reducer is M to N relationship, which means
the mapper could send its data to all reducers, and one reducer could receive
its input from all mappers.
There could be a lot of reasons why you think the reduce copying phase is too
In the MR1, the max reducer is a static value set in the mapred-site.xml. That
is the value you get in the API.
In the YARN, there is no such static value any more, so you can set any value
you like, it is up to RM to decide at runtime, how many reducer tasks are
available or can be granted to y
Avro data should be a binary format in its own version. Why you got something
like JSON?
What output format class you use?
Yong
Date: Fri, 3 Oct 2014 17:34:35 +0800
Subject: avro mapreduce output
From: delim123...@gmail.com
To: user@hadoop.apache.org
Hi,
In mapreduce with reduce output format of
so, you need to dig into
the source data for that block, to think why it will cause OOM.
I am not sure about this. Is there a hint in the logs to figure it out?
3) Did you give reasonable heap size for the mapper? What it is?
9 Gb (too small??)
Best regards,
Blanca
Von: java8964 [mai
I don't have any experience with MongoDB, but just gave my 2 cents here.
Your code is not efficient, as using the "+=" on String, and you could have
reused the Text object in your mapper, as it is a mutable class, to be reused
and avoid creating it again and again like "new Text()" in the mapper.
Georgi:
I think you misunderstand the originally answer.
If you already use Avor format, then the file will be splitable. If you want to
add compression on top of that, feel free going ahead.
If you read the Avor DataFileWriter API:
http://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/file
Why do you say so? Does it cause a bug in your case? If so, can you explain the
problem you are facing?
Yong
From: qixiangm...@hotmail.com
To: user@hadoop.apache.org
Subject: is the HDFS BlockReceiver.PacketResponder source code wrong ?
Date: Tue, 23 Sep 2014 07:37:41 +
In org.apache.had
ed
in at this moment is the folder(for local filesystem) for data node dir. I am
thinking about doing some local read, so it will the very first step if I know
where to read the data.
Demai
On Tue, Sep 9, 2014 at 11:13 AM, java8964 wrote:
The configuration in fact depends on the xml file.
The configuration in fact depends on the xml file. Not sure what kind of
cluster configuration variables/values you are looking for.
Remember, the cluster is made of set of computers, and in hadoop, there are
hdfs xml, mapred xml and even yarn xml.
Mapred.xml and yarn.xml are job related. Without
If you want to use NLineInputFormat, and also want the individual file to be
processed in the map task which prefer to be on the same task node as data
node, you need to implement and control that kind of logic by yourself.
Extend the NLineInputFormat, Override the getSplits() method, read the l
> > block.
> >
> > If the FS in use has its advantages it's better to implement a proper
> > interface to it making use of them, than to rely on the LFS by mounting it.
> > This is what we do with HDFS.
> >
> > On Aug 15, 2014 8:52 PM, "java8964&q
I believe that Calvin mentioned before that this parallel file system mounted
into local file system.
In this case, will Hadoop just use java.io.File as local File system to treat
them as local file and not split the file?
Just want to know the logic in hadoop handling the local file.
One suggest
Are you sure user 'Alex' belongs to 'hadoop' group? Why not your run command
'id alex' to prove it? And 'Alex' belongs to 'hadoop' group can be confirmed on
the namenode?
Yong
Date: Thu, 24 Jul 2014 17:11:06 +0800
Subject: issue about run MR job use system user
From: justlo...@gmail.com
To: user
What your understanding is almost correct, but not with the part your
highlighted.
The HDFS is not designed for write performance, but the client doesn't have to
wait for the acknowledgment of previous packets before sending the next packets.
This webpage describes it clearly, and hope it is help
Why do you say so? What problem you got from this code?
Yong
From: yu_l...@hotmail.com
To: user@hadoop.apache.org
Subject: Is it a bug from CombineFileInputFormat?
Date: Mon, 12 May 2014 22:10:44 -0400
Hi,
This is a private static inner class from CombineFilenputFormat.java
When "locations' l
Your first understanding is not correct. Where do you get that interruption
from the book?
About the #spilled records, every record of output of mapper will be spilled at
least one time.So in ideal scenario, these 2 numbers should be equal. If they
are not, and spilled number is much larger than
There are several issues could come together, since you know your data, we can
only guess here:
1) mapred.child.java.opts=-Xmx2g setting only works IF you didn't set
"mapred.map.child.java.opts" or "mapred.reduce.child.java.opts", otherwise, the
later one will override the "mapred.child.java.opt
There are several issues could come together, since you know your data, we can
only guess here:
1) mapred.child.java.opts=-Xmx2g setting only works IF you didn't set
"mapred.map.child.java.opts" or "mapred.reduce.child.java.opts", otherwise, the
later one will override the "mapred.child.java.opt
There are several issues could come together, since you know your data, we can
only guess here:
1) mapred.child.java.opts=-Xmx2g setting only works IF you didn't set
"mapred.map.child.java.opts" or "mapred.reduce.child.java.opts", otherwise, the
later one will override the "mapred.child.java.opt
You may consider "SpecificRecord" or "GenericRecord" of Avor.
Yong
Date: Fri, 7 Mar 2014 10:29:49 +0800
Subject: Re: MapReduce: How to output multiplt Avro files?
From: raofeng...@gmail.com
To: user@hadoop.apache.org; ha...@cloudera.com
thanks, Harsh.
any idea on how to build a common map output
ttas
anth...@mattas.net
On Wed, Mar 5, 2014 at 8:47 AM, java8964 wrote:
Are you doing on standalone one box? How large are your test files and how long
of the jobs of each type took?
Yong
> From: anth...@mattas.net
> Subject: Benchmarking Hive Changes
> Date: Tue, 4 Mar 2014 21:3
Are you doing on standalone one box? How large are your test files and how long
of the jobs of each type took?
Yong
> From: anth...@mattas.net
> Subject: Benchmarking Hive Changes
> Date: Tue, 4 Mar 2014 21:31:42 -0500
> To: user@hadoop.apache.org
>
> I’ve been trying to benchmark some of the Hi
Using the Union schema is correct, which should be able to support multi
schemas input.
One question is that why you setInputKeySchema? Does your job load the Avro
data as the key to the following Mapper?
Yong
Date: Thu, 27 Feb 2014 16:13:34 +0530
Subject: Multiple inputs for different avro inpu
If the file is big enough and you want to split them for parallel processing,
then maybe one option could be that in your mapper, you can always get the full
file path from the InputSplit, then open it (The file path, which means you
can read from the the beginning), read the first 4 lines, and
See my reply for another email today for similar question.
"RE: Can the file storage in HDFS be customized?"Thanks
Yong
From: sugandha@gmail.com
Date: Tue, 25 Feb 2014 11:40:13 +0530
Subject: Reading a file in a customized way
To: user@hadoop.apache.org
Hello,
Irrespective of the file blocks
Hi, Naolekar:
The blocks in HDFS just store the bytes. It has no idea nor cares what kind of
data, or how many ploygons in this block. It just store 128M (if your default
block size is set to 128M) bytes.
It is your InputFormat/RecordReader to read these bytes in, and deserialize
them to pair.
Hi, Brian:
I hope I understand your question correctly. Here is my view what provided from
the Seekable interface.
The Seekable interface also defines the "seek(long pos)" method, which allows
the client to seek to a specified position in the underline InputStream.
In the RecordReader, it will ge
Where do you compile your libhadoop.so.1.0.0?
It is more like that you compiled libhadoop.so.1.0.0 in a environment with
glibc 2.14, but tried to use it in an environment only have glibc 2.12.
If you are using a hadoop compiled by yourself, then it is best to compile in
an environment matching wi
Hi, Ted:
Our environment is using a distribution from a Vendor, so it is not easy just
to patch it myself.
But I can seek the option to see if the vendor is willing to patch it in next
release.
Before I do that, I just want to make sure patching the code is the ONLY
solution.
I read the source c
t;values":"string"}
}} ]}
And I tried creating corresponding classes by using avro tool and with plugin,
but there are few errors on generated java code. What could be the issue?
1) Error: The method deepCopy(Schema, List>) is
undefined for the type GenericData
Just as Harsh pointed out, as long as the underline DFS provides all the
required API of DFS for Hadoop, DistCP should work. One thing is that all the
required library (including any conf files) needs to be in the classpath, if
they are not available in the runtime cluster.
Same as S3 file syste
In avro, you need to think about a schema to match your data. Avor's schema is
very flexible and should be able to store all kinds of data.
If you have a Json string, you have 2 options to generate the Avro schema for
it:
1) Use "type: string" to store the whole Json string into Avro. This will b
Hi, Ognen:
I noticed you were asking this question before under a different subject line.
I think you need to tell us where you mean unbalance space, is it on HDFS or
the local disk.
1) The HDFS is independent as MR. They are not related to each other.2) Without
MR1 or MR2 (Yarn), HDFS should wo
Or you can implement your own InputSplit and InputFormat, which you can control
how to send tasks to which node, and how many per node.
Some detail examples you can get from book "Professional Hadoop Solution"
Character 4.
Yong
> Subject: Re: Force one mapper per machine (not core)?
> From: kwi.
You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you
are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and
TextOutputFormat will both convert bytes array to tex
y on multi nodes in
your cluster?2) If you don't use bzip2 file as input, do you have the same
problem for other type files, like plain text file?
Yong
From: ken.willi...@windlogics.com
To: user@hadoop.apache.org
Subject: RE: Streaming jobs getting poor locality
Date: Thu, 23 Jan 2014 16:0
I believe Hadoop can figure out the codec from the file name extension, and
Bzip2 codec is supported from Hadoop as Java implementation, which is also a
SplitableCompressionCodec.
So 5G bzip2 files generate about 45 mappers is very reasonable, assuming
128M/block.
The question is why ONLY one no
You cannot use hadoop "NLineInputFormat"?
If you generate 100 lines of text file, by default, one line will trigger one
mapper task.
As long as you have 100 task slot available, you will get 100 mapper running
concurrently.
You want perfect control over mapper num? NLineInputFormat is designed fo
I read this blog, and have the following questions:
What is the relationship between "mapreduce.map.memory.mb" and
"mapreduce.map.java.opts"?
In the blog, it gives the following settings as example:
For our example cluster, we have the minimum RAM for a Container
(yarn.scheduler.minimum-allocatio
like this?
hadoop MyJob -input /foo -output output
Kim
On Fri, Jan 10, 2014 at 8:04 AM, java8964 wrote:
Yes.
The hadoop is very flexible for underline storage system. It is in your
control, how to utilize the cluster's resource, include CPU, memory, IO and
network bandwidth.
C
Yes.
The hadoop is very flexible for underline storage system. It is in your
control, how to utilize the cluster's resource, include CPU, memory, IO and
network bandwidth.
Check out hadoop NLineInportFormat, it maybe the right choice for your case.
You can put all the metadata of your files (da
s
On 12/20/2013 9:28 AM, java8964 wrote:
I believe the "-fs local" should be removed too.
The reason is that even you have a dedicated JobTracker after
removing "-jt local", but with "-fs local"
Or even easier like this:
hadoop fs -dus /path
> From: j...@hortonworks.com
> Date: Wed, 8 Jan 2014 17:07:21 -0800
> Subject: Re: how to caculate a HDFS directory size ?
> To: user@hadoop.apache.org
>
> You may want to check the fs shell command COUNT and DU.
>
> On Wed, Jan 8, 2014 at 4:57 PM,
If you really confirmed that libsnappy.so.1 is in the correct location, and
being loaded into java library path, working in your test program, but still
didn't work in MR, there is one another possibility which was puzzling me
before.
How do you get the libhadoop.so in your hadoop environment? D
ce never cross files, but since HDFS splits files into blocks, it
may cross blocks, which makes it difficult to write MR job. I don't quite
understand what you mean by "WholeFileInputFormat ". Actually, I have no idea
how to deal with dependence across blocks.
2013/12/31 java89
What's wrong to download it from the Apache official website?
http://archive.apache.org/dist/hadoop/core/hadoop-1.1.2/
Yong
Date: Mon, 30 Dec 2013 11:42:25 -0500
Subject: Unable to access the link
From: navaz@gmail.com
To: user@hadoop.apache.org
Hi
I am using below instruction set to set up h
I don't know any example of IIS log files. But from what you described, it
looks like analyzing one line of log data depends on some previous lines data.
You should be more clear about what is this dependence and what you are trying
to do.
Just based on your questions, you still have different o
The best way, I am thinking, is to try following:
1) Use the ant command line to generate eclipse file from the hadoop 1.2.1
source folder by "ant eclipse"2) After that, you can using "import project" in
IntelliJ for "Eclipse" project, which will handle all the path correctly in
Intellij for you
You need to store your data into "column-based" format, checking out Hive
RCFile, and its InputFormat option.
Yong
Date: Mon, 23 Dec 2013 21:37:23 +0800
Subject: Any method to get input splits by column?
From: samliuhad...@gmail.com
To: user@hadoop.apache.org
Hi,
By default, MR inputformat clas
I believe the "-fs local" should be removed too. The reason is that even you
have a dedicated JobTracker after removing "-jt local", but with "-fs local", I
believe that all the mappers will be run sequentially.
"-fs local" will force the mapreducer run in "local" mode, which is really a
test mo
I don't think in HDFS, a file can be written concurrently. Process B won't be
able to write the file (But can read) until it is CLOSED by process A.
Yong
Date: Fri, 20 Dec 2013 15:55:00 +0800
Subject: Re: Why other process can't see the change after calling hdfsHFlush
unless hdfsCloseFile is cal
If it is not killed by OOM killer, maybe the JVM just did a core dump due to
whatever reason. Search for core dump of process in the /var/log/messages, or
core dump file in your system.
From: stuck...@umd.edu
To: user@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemo
If the thread is killed, I don't know there is a way you can get the lease and
close the file on behavior of the killed thread, unless your other threads hold
the reference of the file writer, and close it.
I don't know if any command line tool can do that.
Yong
From: xell...@outlook.com
To: use
The HDFS client that opens a file for writing is granted a lease for the file;
no other client can write to the file. The writing client periodically renews
the lease by sending a heartbeat to the NameNode. When the file is closed, the
lease is revoked. The lease duration is bound by a soft limi
e.org
one of important things is my input file is very small ,each file less than
10M,and i have a huge number of files
On Thu, Dec 12, 2013 at 9:58 AM, java8964 wrote:
Assume the block size is 128M, and your mapper each finishes within half
minute, then there is not too much logic in your m
gs is my input file is very small ,each file less than
10M,and i have a huge number of files
On Thu, Dec 12, 2013 at 9:58 AM, java8964 wrote:
Assume the block size is 128M, and your mapper each finishes within half
minute, then there is not too much logic in your mapper, as it can f
the job with all 15 reducer, and i do
not know if i increase reducer number from 15 to 30 ,each reduce allocate 6G
MEM,that will speed the job or not ,the job run on my product env, it run
nearly 1 week,it still not finished
On Wed, Dec 11, 2013 at 9:50 PM, java8964 wrote:
The whole job
The whole job complete time depends on a lot of factors. Are you sure the
reducers part is the bottleneck?
Also, it also depends on how many Reducer input groups it has in your MR job.
If you only have 20 reducer groups, even you jump your reducer count to 40,
then the epoch of reducers part won
ntrun/build-main.xml
May it be a missing dependency? Do you know how can I check the plugin actually
exists using Maven?
Thanks!
On 4 December 2013 20:23, java8964 wrote:
Can you try JDK 1.6?
I just did a Hadoop 2.2.0 GA release build myself days ago. From my experience,
JDK 1.7 not wor
I do:
~/hadoop-2.2.0-maven$ cmake --versioncmake version 2.8.2
On 4 December 2013 19:51, java8964 wrote:
Do you have 'cmake' in your environment?
Yong
Date: Wed, 4 Dec 2013 17:20:03 +0100
Subject: Ant BuildException error building Hadoop 2.2.0
From: silvi.ca...@gmail.com
Do you have 'cmake' in your environment?
Yong
Date: Wed, 4 Dec 2013 17:20:03 +0100
Subject: Ant BuildException error building Hadoop 2.2.0
From: silvi.ca...@gmail.com
To: user@hadoop.apache.org
Hello everyone,
I've been having trouble to build Hadoop 2.2.0 using Maven 3.1.1, this is part
of th
Maybe just a silly guess, did you close your Writer?
Yong
Date: Thu, 14 Nov 2013 12:47:13 +0530
Subject: Re: Folder not created using Hadoop Mapreduce code
From: unmeshab...@gmail.com
To: user@hadoop.apache.org
@rab ra: ys using filesystem s mkdir() we can create folders and we can also
create i
.
Date: Tue, 29 Oct 2013 08:57:32 +0100
Subject: Re: Why the reducer's input group count is higher than my
GroupComparator implementation
From: drdwi...@gmail.com
To: user@hadoop.apache.org
Did you overwrite the partitioner as well?
2013/10/29 java8964 java8964
Hi, I have a stran
than 11.
Date: Tue, 29 Oct 2013 08:57:32 +0100
Subject: Re: Why the reducer's input group count is higher than my
GroupComparator implementation
From: drdwi...@gmail.com
To: user@hadoop.apache.org
Did you overwrite the partitioner as well?
2013/10/29 java8964 java8964
Hi, I have
Hi, I have a strange question related to my secondary sort implementation in
the MR job.Currently I need to support 2nd sort in one of my MR job. I
implemented my custom WritableComparable like following:
public class MyPartitionKey implements WritableComparable {
String type;long id1;
has url
"hdfs://machine.domain:8080" and data folder "/tmp/myfolder", what should I
specify as the output path for MR job?
Thanks
On Thursday, October 24, 2013 5:31 PM, java8964 java8964
wrote:
Just specify the output location using the URI to another cluster. As long
as the
Just specify the output location using the URI to another cluster. As long as
the network is accessible, you should be fine.
Yong
Date: Thu, 24 Oct 2013 15:28:27 -0700
From: myx...@yahoo.com
Subject: Mapreduce outputs to a different cluster?
To: user@hadoop.apache.org
The scenario is: I run mapr
snappy on hadoop 1.1.1
whats the output of ldd on that lib? Does it link properly? You should compile
natives for your platforms as the packaged ones may not link properly.
On Sat, Oct 5, 2013 at 2:37 AM, java8964 java8964
wrote:
I kind of read the hadoop 1.1.1 source code for this,
I kind of read the hadoop 1.1.1 source code for this, it is very strange for me
now.
>From the error, it looks like runtime JVM cannot find the native method of
>org/apache/hadoop/io/compress/snappy/SnappyCompressor.compressBytesDirect()I,
>that my guess from the error message, but from the log,
Hi,
I am using hadoop 1.1.1. I want to test to see the snappy compression with
hadoop, but I have some problems to make it work on my Linux environment.
I am using opensuse 12.3 x86_64.
First, when I tried to enable snappy in hadoop 1.1.1 by:
conf.setBoolean("mapred.compress.map.outp
Hi, I have a question related to how the mapper generated for the input files
from HDFS. I understand the split and blocks concept in the HDFS, but my
originally understanding is that one mapper will only process data from one
file in HDFS, no matter how small this file it is. Is that correct?
T
I am also thinking about this for my current project, so here I share some of
my thoughts, but maybe some of them are not correct.
1) In my previous projects years ago, we store a lot of data as plain text, as
at that time, people thinks the Big data can store all the data, no need to
worry abou
Not exactly know what you are trying to do, but it seems like the memory is
your bottle neck, and you think you have enough CPU resource, so you want to
use multi-thread to utilize CPU resources?
You can start multi-threads in your mapper, as if you think your mapper logic
is very cpu intensive
Just curious, any reason you don't want to use the DFSDataInputStream?
Yong
Date: Thu, 26 Sep 2013 16:46:00 +0200
Subject: Extending DFSInputStream class
From: tmp5...@gmail.com
To: user@hadoop.apache.org
Hi
I would like to wrap DFSInputStream by extension. However it seems that the
DFSInputStr
Hi, I have a question related to sequence file. I wonder why I should use it
under what kind of circumstance?
Let's say if I have a csv file, I can store that directly in HDFS. But if I do
know that the first 2 fields are some kind of key, and most of MR jobs will
query on that key, will it make
Or you do the calculation in the reducer close() method, even though I am not
sure in the reducer you can get the Mapper's count.
But even you can't, here is what can do:1) Save the JobConf reference in your
Mapper conf metehod2) Store the Map_INPUT_RECORDS counter in the configuration
object as
Hi, I currently have a project to process the data using MR. I have some
thoughts about it, and am looking for some advices if anyone had any feedback.
Currently in this project, I have lot of events data related to email tracking
coming into the HDFC. So the events are the data for email trackin
Did you do a hadoop version upgrade before this error happened?
Yong
Date: Wed, 11 Sep 2013 16:57:54 +0800
From: heya...@jiandan100.cn
To: user@hadoop.apache.org
CC: user-unsubscr...@hadoop.apache.org
Subject: help!!!,what is happened with my project?
Hi:
Today when I
The error doesn't mean the file not existed in the HDFS, but it means local
disk. If you read the error stack trace:
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581)
It indicates the error happened on Local file system.
If you try to copy data from an existing
Well, The reducers normally will take much longer than the mappers stage,
because the copy/shuffle/sort all happened at this time, and they are the hard
part.
But before we simply say it is part of life, you need to dig into more of your
MR jobs to find out if you can make it faster.
You are the
The method getPartition() needs to return a positive number. Simply use
hashCode() method is not enough.
See the Hadoop HashPartitioner implementation:
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
When I first read this code, I always wonder why not use Math.abs? Is ( &
I
What's wrong by using old Unix pipe?
hadoop fs -cat /user/input/foo.txt | head -100 > local_file
Date: Thu, 29 Aug 2013 13:50:37 -0700
Subject: Re: copy files from hdfs to local fs
From: chengi.liu...@gmail.com
To: user@hadoop.apache.org
tail will work as well.. ??? but i want to extract just (sa
I am not sure the original suggestion will work for your case.
My understanding is the you want to use some API, only exists in slf4j versiobn
1.6.4, but this library with different version already existed in your hadoop
environment, which is quite possible.
To change the maven build of the appli
As Harsh said, sometime you want to do the 2nd sort, but for MR, it can only be
sorted by key, not by value.
A lot of time, you want to the reducer output sort by a field, but only do the
sort within a group, kind of like 'windowing sort' in relation DB SQL. For
example, if you have a data about
lave nodes, it works fine. I
am not able to figure out how to fix this and the reason for the error. I am
not understand why it complains about the input directory is not present. As
far as I know, slave nodes get a map and map method contains contents of the
input file. This should be fine f
If you don't plan to use HDFS, what kind of sharing file system you are going
to use between cluster? NFS?For what you want to do, even though it doesn't
make too much sense, but you need to the first problem as the shared file
system.
Second, if you want to process the files file by file, inste
Hi,
This is a 4 node hadoop cluster running on CentOS 6.3 with Oracle JDK (64bit)
1.6.0_43. Each node has 32G memory, with max 8 mapper tasks and 4 reducer tasks
being set. The hadoop version is 1.0.4.
This is setup on Datastax DES 3.0.2, which is using Cassandra CFS as underline
DFS, instead o
I am also interested in your research. Can you share some insight about the
following questions?
1) When you use CompressionCodec, can the encrypted file split? From my
understand, there is no encrypt way can make the file decryption individually
by block, right? For example, if I have 1G file
Can someone share some idea what the Hadoop source code of class
org.apache.hadoop.io.compress.BlockDecompressorStream, method rawReadInt() is
trying to do here?
There is a comment in the code this this method shouldn't return negative
number, but in my testing file, it contains the following b
Hi, Davie:
I am not sure I understand this suggestion. Why smaller block size will help
this performance issue?
>From what the original question about, it looks like the performance problem
>is due to that there are a lot of small files, and each file will run in its
>own mapper.
As hadoop nee
I don't think you can get list of all input files in the mapper, but what you
can get is the current file's information.
In the context object reference, you can get the InputSplit(), which should
give you all the information you want of the current input file.
http://hadoop.apache.org/docs/r2.0
Hi, Chris:
Here is my understand about the file split and Data block.
The HDFS will store your file into multi data blocks, each block will be 64M or
128M depend on your setting. Of course, the file could contain multi records.
So the boundary of the record won't match with the block boundary (i
e can convert
any existing Writable into an encrypted form. Dave From: java8964 java8964
[mailto:java8...@hotmail.com]
Sent: Sunday, February 10, 2013 3:50 AM
To: user@hadoop.apache.org
Subject: Question related to Decompressor interface HI, Currently I am
researching about options of encry
Our cluster on cdh3u4 has the same problem. I think it is caused by some bugs
in JobTracker. I believe Cloudera knows about this issue.
After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure if
it is confirmed to fix in the CDH3U5.
Yong
> Date: Mon, 4 Feb 2013 15:21:18 -08
What range you gave it for mapred.task.profile.maps? And you sure your mapper
will invoke the methods you expect in the traces?
Yong
Date: Wed, 6 Feb 2013 23:50:08 +0200
Subject: Profiling the Mapper using hprof on Hadoop 0.20.205
From: yaron.go...@gmail.com
To: user@hadoop.apache.org
Hi,I wish
Ted comments on performance are
spot on.
Regards
Bertrand
On Thu, Oct 4, 2012 at 9:02 PM,
java8964 java8964 wrote:
I did the cumulative sum in the HIVE UDF, as one o
I did the cumulative sum in the HIVE UDF, as one of the project for my employer.
1) You need to decide the grouping elements for your cumulative. For example,
an account, a department etc. In the mapper, combine these information as your
omit key.2) If you don't have any grouping requirement, yo
Hi,
During my development of ETLs on hadoop platform, there is one question I want
to ask, why hadoop didn't provide a round robin partitioner?
>From my experience, it is very powerful option for small limited distinct
>value keys case, and balance the ETL resource. Here is what I want to say:
1
99 matches
Mail list logo