Zhenyu,
It's a bit complicated and involves some layers of
indirection. CombineFileRecordReader is a sort of shell RecordReader that
passes the actual work of reading records to another child record reader.
That's the class name provided in the third parameter. Instructing it to use
Hi folks,
we proudly present the Berlin Buzzwords talks and presentations.
As promised there are tracks specific to the three tags search, store
and scale. We have a fantastic mixture of developers and users of open
source software projects that make scaling data processing today
possible.
There
Hi folks :)
I have one big file... I read it with FileInputFormat, this generates only
one task and of course, this doesn't get distributed across the cluster
nodes.
Should I use an other Input class or do I have a bug in my implementation?
The desired behavior is one task per line.
Thanks.
What's the format of this file ? gzip can been split.
On Mon, May 10, 2010 at 5:21 AM, Pierre ANCELOT pierre...@gmail.com wrote:
Hi folks :)
I have one big file... I read it with FileInputFormat, this generates only
one task and of course, this doesn't get distributed across the cluster
Simple and pure raw ascii text. One line == one treatment to do.
On Mon, May 10, 2010 at 2:52 PM, Jeff Zhang zjf...@gmail.com wrote:
What's the format of this file ? gzip can been split.
On Mon, May 10, 2010 at 5:21 AM, Pierre ANCELOT pierre...@gmail.com
wrote:
Hi folks :)
I have one
Idea is, I want to share the lines of the file equally between nodes...
On Mon, May 10, 2010 at 3:05 PM, Pierre ANCELOT pierre...@gmail.com wrote:
Simple and pure raw ascii text. One line == one treatment to do.
On Mon, May 10, 2010 at 2:52 PM, Jeff Zhang zjf...@gmail.com wrote:
What's
NLineInputFormat seems a fit for your need.
On Mon, May 10, 2010 at 6:05 AM, Pierre ANCELOT pierre...@gmail.com wrote:
Simple and pure raw ascii text. One line == one treatment to do.
On Mon, May 10, 2010 at 2:52 PM, Jeff Zhang zjf...@gmail.com wrote:
What's the format of this file ? gzip
Hi,
I've been trying all morning to post a Hadoop question to this list but
can't get through the spam filter. At a loss.
Does anyone have any ideas what may trigger it? What can I do to not have it
tag me?
Thanks,
/ Oscar
Try sending plaintext email instead of rich - the spam scoring for
HTML email is overly agressive on the apache listservs.
-Todd
On Mon, May 10, 2010 at 11:14 AM, Oscar Gothberg
oscar.gothb...@gmail.com wrote:
Hi,
I've been trying all morning to post a Hadoop question to this list but
can't
Hi,
I keep having jobs fail at the very end, with 100% complete map,
100% complete reduce,
due to NotReplicatedYetException w.r.t the _temporary subdirectory of
the job output directory.
It doesn't happen 100% of the time, so it's not trivially
reproducible, but it happens enough
(10-20% of
Thanks a lot! Didn't even notice that Gmail would default to HTML format.
/ Oscar
On Mon, May 10, 2010 at 11:15 AM, Todd Lipcon t...@cloudera.com wrote:
Try sending plaintext email instead of rich - the spam scoring for
HTML email is overly agressive on the apache listservs.
-Todd
On Mon,
If you curious, I found out this morning that NLineInputFormat is not ported
to the new mapreduce api current yet. (It might be in trunk). So using
NLineFormat forces you into the older mapred api.
Edward
On Mon, May 10, 2010 at 12:35 PM, Ted Yu yuzhih...@gmail.com wrote:
NLineInputFormat
Hello Everyone,
Is there a patch available for HADOOP-4584 that can be used on 0.20.2?
Link https://issues.apache.org/jira/browse/HADOOP-4584 seems to indicate
that a patch is available for 0.21 version but this
version is not release yet.
Block reports are taking several minutes on our cluster
Hi Everyone, thanks for your time. What's the best way to repartition one
table into 3 partitions using a
replication factor of 3? We have anywhere between 100-150 TB in this table. I
would like to avoid
copying data over. Any suggestions?
From what I understand that Hadoop/Hive is file
My team and I were working with sequence files and were using the
LuceneDocumentWrapper. But when I try to get the valcall, i get a no such
method exception from the ReflectionUtils, which is caused because it's
trying to call a default constructor which doesn't exist for that class.
So my
Matias,
Hive partitions map to subdirectories in HDFS. You can do a 'mv' if you're
lucky enough to have each partition in a distinct HDFS file that could be
moved to the right partition subdirectory. Otherwise, you can run a
MapReduce job to collate your data into separate files per partition.
Actually would you have a case when no splitting is needed. Just curious.
It seems that you would use LZO or not use any compression at all.
H
- Original Message
From: Alex Baranov alex.barano...@gmail.com
To: common-user@hadoop.apache.org
Sent: Mon, May 10, 2010 4:27:11 PM
Subject:
I meant splitting of very huge file to distribute it over multiple Map jobs.
Alex.
http://sematext.com
On Tue, May 11, 2010 at 6:13 AM, himanshu chandola
himanshu_cool...@yahoo.com wrote:
Actually would you have a case when no splitting is needed. Just curious.
It seems that you would use
18 matches
Mail list logo