If you use the new API, you can access the MapContext object in the setup
method of the mapper. Then, you can get the input split with
MapContext#getInputSplit(), cast it to FileSplit and obtain the path of the
file the current split is part of through the FileSplit#getPath() method.
All records of
but may arrive soon.
On Wed, Dec 5, 2012 at 11:23 PM, Sigurd Spieckermann
wrote:
Hi guys,
I have been wondering if there's a way (hack'ish would be okay too) to tell
Hadoop that two files shall be stored together at the same location(s). It
would benefit map-side join performance if it
Hi guys,
I have been wondering if there's a way (hack'ish would be okay too) to tell
Hadoop that two files shall be stored together at the same location(s). It
would benefit map-side join performance if it could be done somehow because
all map tasks would be able to read data from a local copy. Do
same key that are in the in-memory buffer before the spill and it
should be at least a few per spill in my case. This is confusing...
2012/11/7 Sigurd Spieckermann
> Hm, maybe I need some clarification on what the combiner exactly does.
> From what I understand from "Hadoop - The
thout* compression. It seems they aren't compressed, but that's strange
because I definitely enabled compression the way I described.
2012/11/7 Sigurd Spieckermann
> OK, just wanted to confirm. Maybe there is another problem then. I just
> looked at the task logs and there were ~20
gt; may be counting decompressed values of the records written, and not
> post-compressed ones.
>
> On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann
> wrote:
> > Hi guys,
> >
> > I've encountered a situation where the ratio between "Map output bytes&q
. There is zero use of DistributedCache - the only decisions are
>> made based on the expression (i.e. to select which form of joining
>> record reader to use).
>>
>> Enhancements to this may be accepted though, so feel free to file some
>> JIRAs if you have someth
specify the
> smaller data set using some hints in query.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -Original Message-
> From: Sigurd Spieckermann
> Date: Mon, 22 Oct 2012 22:29:15
> To:
> Reply-To: user@hadoop.apache.org
&g
Hi guys,
I've been trying to figure out whether a map-side join using the
join-package does anything clever regarding data locality with respect
to at least one of the partitions to join. To be more specific, if I
want to join two datasets and some partition of dataset A is larger than
the co
Hi,
I've just noticed that the join-package only exists in the old map-reduce
API. Is there a particular reason why it's not in the new API? (deprecated
maybe?) If so, what is the approach to take in order to perform this
map-side join strategy with the new API?
Thanks,
Sigurd
each map task
takes one split and the combiner operates only on the key-value pairs
within one split. That's why the combiner has no effect in my case.
Is there a way to combine the mapper outputs of multiple splits
before they are sent off to the reducer?
2012/9/25 Sigurd Spieckermann mailto
split.
That's why the combiner has no effect in my case. Is there a way to combine
the mapper outputs of multiple splits before they are sent off to the
reducer?
2012/9/25 Sigurd Spieckermann
> Maybe one more note: the combiner and the reducer class are the same and
> in the reduce-phas
ed to appear in alphabetical order.". That may
> just be what is biting you, since standalone mode uses LFS.
>
> On Mon, Sep 17, 2012 at 6:45 PM, Sigurd Spieckermann
> wrote:
> > I've tracked down the problem to only occur in standalone mode. In
> > pseudo-distributed
is problem before and knows a solution?
Thanks,
Sigurd
2012/9/17 Sigurd Spieckermann
> I'm experiencing a strange problem right now. I'm writing part-files to
> the HDFS providing initial data and (which should actually not make a
> difference anyway) write them in ascending o
ot;, they
are in the order part-1, part-0, part-2, part-3 etc. How is
that possible? Why aren't they shown in natural order? Also the map-side
join package considers them in this order which causes problems.
2012/9/10 Sigurd Spieckermann
> OK, interesting. Just to confirm: is
omatically in memory,
> but on the local disk of the TaskTracker your task runs on. You can
> choose to then use a LocalFileSystem impl. to read it out from there,
> which would end up being (slightly) faster than your same approach
> applied to MapFiles on HDFS.
>
> On Mon, Sep 10
;
>
> On Mon, Sep 10, 2012 at 3:27 PM, Sigurd Spieckermann <
> sigurd.spieckerm...@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to perform a map-side join of two large datasets where
>> dataset A consists of m*n elements and dataset B consists of n ele
Hi,
I would like to perform a map-side join of two large datasets where dataset
A consists of m*n elements and dataset B consists of n elements. For the
join, every element in dataset B needs to be accessed m times. Each mapper
would join one element from A with the corresponding element from B.
E
Hi guys,
I am trying to implement a block matrix-vector multiplication algorithm
with Hadoop according to the schematics from
http://i.stanford.edu/~ullman/mmds/ch5.pdf page 162. My matrix is going to
be sparse and the vector dense which is exactly what is required in
PageRank as well. The vector
19 matches
Mail list logo