I updated http://wiki.apache.org/hadoop/UsingLzoCompression to specifically
mention this potential issue so that other people can avoid such problem.
Feel free to add more onto it.
On Thu, Jul 8, 2010 at 8:26 PM, bmdevelopment wrote:
> Thanks everyone.
>
> Yes, using the Google Code version refer
Hi,
Create the "Job" after you create the configuration.
Like.,
Path p=new
Path("hdfs://localhost:9100/user/denimLive/denim/DCache/Orders.txt");
DistributedCache.addCacheFile(p.toUri(), conf);
Job job = new Job(conf, "Driver");
If you create the "Job" before creating configuration, for some re
Thanks everyone.
Yes, using the Google Code version referenced on the wiki:
http://wiki.apache.org/hadoop/UsingLzoCompression
I will try the latest version and see if that fixes the problem.
http://github.com/kevinweil/hadoop-lzo
Thanks
On Fri, Jul 9, 2010 at 3:22 AM, Todd Lipcon wrote:
> On T
Hi Alan,
Is the content of the original file ascii text? Then you should be using
signature. By default 'hadoop fs -text ...' just
will call toString() on the object. You get the object itself in the map()
method and can do whatever you want with it. If Text or BytesWritable does
not work for
Hi Alex,
I'm not sure what you mean. I already set my mapper's signature to:
public class MyMapper extends Mapper {
...
public void map(Text key, BytesWritable value, Context context)
}
}
In my map() loop the contents of value is the text from the original file
and the value.
Hello all,
As a new user of hadoop, I am having some problems with understanding some
things. I am writing a program to load a file to the distributed cache and read
this file in each mapper. In my driver program, I have added the file to my
distributed cache using:
Path p=new
yes, I can get the partition number using
jobconf.getInt("mapred.task.partition") but how can I custom name my output
file
of each reducer with just this partition number?
From: Ted Yu
To: mapreduce-user@hadoop.apache.org
Sent: Thu, July 8, 2010 6:22:54 PM
Hi Alan,
SequenceFiles keep track of the key and value type, so you should be able to
use the Writables in the signature. Though it looks like you're using the
new API, and I admit that I'm not an expert with the new API. Have you
tried using the Writables in the signature?
Alex
On Thu, Jul 8,
On Thu, Jul 8, 2010 at 10:38 AM, Ted Yu wrote:
> Todd fixed a bug where LZO header or block header data may fall on read
> boundary:
>
> http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58
>
I am wondering if that is related to the issue you saw.
>
> I don't t
Todd fixed a bug where LZO header or block header data may fall on read
boundary:
http://github.com/toddlipcon/hadoop-lzo/commit/f3bc3f8d003bb8e24f254b25bca2053f731cdd58
I am wondering if that is related to the issue you saw.
On Wed, Jul 7, 2010 at 11:49 PM, bmdevelopment wrote:
> A little more
Please take a look at getUniqueName() method of
src/mapred/org/apache/hadoop/mapred/FileOutputFormat.java
It retrieves "mapred.task.partition"
On Thu, Jul 8, 2010 at 2:13 AM, Denim Live wrote:
> Hi Everyone,
> I am having some problem with naming the output file of each reduce task
> with the pa
To get around the small-file-problem (I have thousands of 2MB log files) I wrote
a class to convert all my log files into a single SequenceFile in
(Text key, BytesWritable value) format. That works fine. I can run this:
hadoop fs -text /my.seq |grep peemt114.log | head -1
10/07/08 15:02:
You can import the web logs into HDFS, and then use Pig or Hive to do data
analysis.
See
http://hadoop.apache.org/pig/
http://hadoop.apache.org/hive/
On Thu, Jul 8, 2010 at 5:55 PM, Tim Jones wrote:
> Hi,
>
>
> I want to be able to discover the 10 most popular routes through our web
> site
> t
Hi,
I want to be able to discover the 10 most popular routes through our web site
that lead a visitor to register with us.
I am already logging page view data but don't seem to be able to find the best
solution to query it. (Each Visitor has an ID, each Visitor makes multiple
Visits, each w
Hi Everyone,
I am having some problem with naming the output file of each reduce task with
the partition number. First of all, how canĀ I get the partition number within
each reduce? Second, How am I going to name the output file with that partition
number?
I have looked to the MultipleTextOut
Thanks alex, it worked.
From: Alexandros Konstantinakis - Karmis
To: mapreduce-user@hadoop.apache.org
Sent: Thu, July 8, 2010 9:10:26 AM
Subject: Re: Finding the time it took for a mapreduce job to get execute
through the web gui. it reports both total time (in
On 07/08/2010 10:51 AM, Denim Live wrote:
Hi folks,
I want to determine the exact time it took for my mapreduce job to get
executed for some anaylsis purpose. How can I calculate it?
Thanks
through the web gui. it reports both total time (in the job execution
page). but you can also get the m
Hi folks,
I want to determine the exact time it took for my mapreduce job to get executed
for some anaylsis purpose. How can I calculate it?
Thanks
18 matches
Mail list logo