I just started playing with 0.20.0. I see that the mapred package is
deprecated in favor of the mapreduce package. Is there any
migration documentation for the new API (i.e., something more touristy
than Javadoc)? All the website docs and Wiki examples are on the old API.
Sorry if this is on t
, only successfully completed tasks have the files
> moved up.
>
> I don't recall if the FileOutputCommitter class appeared in 0.18
>
>
> On Wed, Jun 3, 2009 at 6:43 PM, Ian Soboroff wrote:
>
>> Ok, help. I am trying to create local task outputs in my reduce job,
If you're case is like mine, where you have lots of .gz files and you
don't want splits in the middle of those files, you can use the code I
just sent in the thread about traversing subdirectories. In brief, your
RecordReader could do something like:
public static class MyRecordReader
Here's how I solved the problem using a custom InputFormat... the key
part is in listStatus(), where we traverse the directory tree. Since
HDFS doesn't have links this code is probably safe, but if you have a
filesystem with cycles you will get trapped.
Ian
import java.io.IOException;
import ja
sending on the command
> line?
> - Aaron
>
> On Wed, Jun 3, 2009 at 5:46 PM, Ian Soboroff wrote:
>
> If after I call getConf to get the conf object, I manually add the key/
> value pair, it's there when I need it. So it feels like ToolRunner isn't
Ok, help. I am trying to create local task outputs in my reduce job,
and they get created, then go poof when the job's done.
My first take was to use FileOutputFormat.getWorkOutputPath, and
create directories in there for my outputs (which are Lucene
indexes). Exasperated, I then wrote a
If after I call getConf to get the conf object, I manually add the key/
value pair, it's there when I need it. So it feels like ToolRunner
isn't parsing my args for some reason.
Ian
On Jun 3, 2009, at 8:45 PM, Ian Soboroff wrote:
Yes, and I get the JobConf via 'JobConf jo
PM, Aaron Kimball wrote:
Are you running your program via ToolRunner.run()? How do you
instantiate the JobConf object?
- Aaron
On Wed, Jun 3, 2009 at 10:19 AM, Ian Soboroff
wrote:
I'm backporting some code I wrote for 0.19.1 to 0.18.3 (long story),
and I'm finding that when I run a
I'm backporting some code I wrote for 0.19.1 to 0.18.3 (long story),
and I'm finding that when I run a job and try to pass options with -D
on the command line, that the option values aren't showing up in my
JobConf. I logged all the key/value pairs in the JobConf, and the
option I passed t
Brian Bockelman writes:
> Despite my trying, I've never been able to come even close to pegging
> the CPUs on our NN.
>
> I'd recommend going for the fastest dual-cores which are affordable --
> latency is king.
Clue?
Surely the latencies in Hadoop that dominate are not cured with faster
proce
Simon Lewis writes:
> On 3 Apr 2009, at 15:11, Ian Soboroff wrote:
>> Steve Loughran writes:
>>
>>> I think from your perpective it makes sense as it stops anyone
>>> getting
>>> itchy fingers and doing their own RPMs.
>>
>> Um, what's
Steve Loughran writes:
> -RPM and deb packaging would be nice
Indeed. The best thing would be to have the hadoop build system output
them, for some sensible subset of systems.
> -the jdk requirements are too harsh as it should run on openjdk's JRE
> or jrockit; no need for sun only. Too bad th
Steve Loughran writes:
> I think from your perpective it makes sense as it stops anyone getting
> itchy fingers and doing their own RPMs.
Um, what's wrong with that?
Ian
faction.com/cloudera/topics/should_we_release_host_rpms_for_all_releases
>
> We could even skip the branding on the "devel" releases :-)
>
> Cheers,
> Christophe
>
> On Thu, Apr 2, 2009 at 12:46 PM, Ian Soboroff wrote:
>>
>> I created a JIRA (https://issues.
I created a JIRA (https://issues.apache.org/jira/browse/HADOOP-5615)
with a spec file for building a 0.19.1 RPM.
I like the idea of Cloudera's RPM file very much. In particular, it has
nifty /etc/init.d scripts and RPM is nice for managing updates.
However, it's for an older, patched version of
Or if you have a node blow a motherboard but the disks are fine...
Ian
On Mar 30, 2009, at 10:03 PM, Mike Andrews wrote:
i tried swapping two hot-swap sata drives between two nodes in a
cluster, but it didn't work: after restart, one of the datanodes shut
down since namenode said it reported a
inal
> results to local file system and then copy to HDFS. In contrib/index,
> the intermediate results are in memory and not written to HDFS.
>
> Hope it clarifies things.
>
> Cheers,
> Ning
>
>
> On Mon, Mar 16, 2009 at 2:57 PM, Ian Soboroff wrote:
>>
>>
I understand why you would index in the reduce phase, because the anchor
text gets shuffled to be next to the document. However, when you index
in the map phase, don't you just have to reindex later?
The main point to the OP is that HDFS is a bad FS for writing Lucene
indexes because of how Luce
Amandeep Khurana writes:
> Is it possible to write a map reduce job using multiple input files?
>
> For example:
> File 1 has data like - Name, Number
> File 2 has data like - Number, Address
>
> Using these, I want to create a third file which has something like - Name,
> Address
>
> How can a m
I would love to see someplace a complete list of the ports that the
various Hadoop daemons expect to have open. Does anyone have that?
Ian
On Feb 4, 2009, at 1:16 PM, shefali pawar wrote:
Hi,
I will have to check. I can do that tomorrow in college. But if that
is the case what should i
y of doing
things. It would probably be better if FileInputFormat optionally
supported recursive file enumeration. (It would be incompatible and
thus cannot be the default mode.)
Please file an issue in Jira for this and attach your patch.
Thanks,
Doug
Ian Soboroff wrote:
Is there
Is there a reason FileInputFormat only traverses the first level of
directories in its InputPaths? (i.e., given an InputPath of 'foo', it
will get foo/* but not foo/bar/*).
I wrote a full depth-first traversal in my custom InputFormat which I
can offer as a patch. But to do it I had to du
So staring at these logs a bit more and reading hadoop-default.xml and
thinking a bit, it seems to me that for some reason my slave
tasktrackers are having trouble sending heartbeats back to the
master. I'm not sure why this is. It is happening during the shuffle
phase of the reduce setup
On Feb 2, 2009, at 11:38 PM, Sagar Naik wrote:
Can u post the output from
hadoop-argus--jobtracker.out
Sure:
Exception closing file /user/soboroff/output/_logs/history/
rogue_1233597148110_job_200902021252_0002_soboroff_index
java.io.IOException: Filesystem closed
at org.apache.hado
I hope someone can help me out. I'm getting started with Hadoop,
have written the firt part of my project (a custom InputFormat), and am
now using that to test out my cluster setup.
I'm running 0.19.0. I have five dual-core Linux workstations with most
of a 250GB disk available for playing, an
25 matches
Mail list logo