Yes. There's always a workaround.
Say you have input1 and it's a tab separated text with 3 attributes
and you have another input2 in a sequence file with another 6
attributes so yes, you could run 2 map-only jobs on them to bring them
to a homogeneous format with a join key indicating which part t
You can't mix and match old and new APIs in general, no.
It's better to use new APIs unless it would make the implementation really
hard or really slow.
The new APIs lack MultipleInputs as of 0.20.x. That doesn't mean you can't
have multiple inputs. You can add several input paths as Shannon says
Job input path is always multiple paths. You don't need to have
multiple inputs to specify that. What you need multiple inputs for is
to be able to specify different input file formats and assign
different mappers to handle them.
If all your input is formatted homogeneously both record structure
w
As i said, and as i think Shannon's reply confirms in part, you
sometimes can weasel your way out of this, but this is not how this
api is intended to be used. To begin with, old and new api have never
been intended to be used together (so you are already breaking interop
guarantees with any of fut
Isn't this just a matter of making multiple calls to
FileInputFormat.addInputPath(...) (to adhere to the new APIs) ?
On 5/28/11 5:54 PM, Dmitriy Lyubimov wrote:
I don't see how you can use deprecated multiple inputs, as if i am not
missing anything, its signature is tied to old api types, such
I don't see how you can use deprecated multiple inputs, as if i am not
missing anything, its signature is tied to old api types, such as
JobConf, which you of course won't have as you define a new api job.
On Sat, May 28, 2011 at 3:43 PM, Dhruv Kumar wrote:
> Isabel and Dmitry,
>
> Thank you for
As far as I understand, the problem isn't adding multiple inputs; you
can do it exactly as the documentation you linked shows. The problem
(which is what we're trying to solve in MAHOUT-537) is how to tell
within the Mapper/Reducer itself from which input path the current data
are taken; there'
Isabel and Dmitry,
Thank you for your input on this. I've noticed that Mahout's code uses the
new mapreduce package, so I have been following the new APIs. This was also
suggested by Sean w.r.t Mahout-294.
Multiple inputs is a requirement for my project and I was planning on using
the old mapred.
Dhruv,
Just a warning, before you want to lock yourself to new apis:
Yes new APIs are preferrable but it is not always possible to use them
because 0.20.2 lacks _a lot_ in terms of bare necessities in new api
realm . (multiple inputs/ outputs come to mind at once).
I think i did weasel my way ou
On 18.05.2011 Dhruv Kumar wrote:
> For the GSoC project which version of Hadoop's API should I follow?
Try to use the new M/R apis where possible - we had the same discussion in an
earlier thread on spectral clustering, in addition Sean just opened an issue
concerning Upgrading to newer Hadoop v
Good man.
On Mon, May 23, 2011 at 3:45 PM, Hector Yee wrote:
> FYI the ICLA has been filed.
>
>
FYI the ICLA has been filed.
On Wed, May 18, 2011 at 3:27 AM, Ted Dunning wrote:
> Hector,
>
> An in-core variant or a sequential on-disk variant is a great starting
> point
> and focussing on the kernelized ranker is also a good place to start.
>
> It would help if you can provide lots of visib
On Wed, May 18, 2011 at 6:38 AM, Sean Owen wrote:
> I think it first has to finish embracing MapReduce! The code base already
> uses 2.5 different versions of Hadoop. It would be better clean up the
> modest clutter of approaches we already have before thinking about
> extending
> it.
>
For the
I just completed and submitted an online passive aggressive classifier as my
test case (MAHOUT-702). I believe I've followed the how to except I couldn't
find a CHANGES.txt to write my changes in.
On Wed, May 18, 2011 at 6:27 PM, Ted Dunning wrote:
> Hector,
>
> An in-core variant or a sequentia
Well, this much I think is uncontroversial.
On Wed, May 18, 2011 at 3:38 AM, Sean Owen wrote:
> And I do think we need to focus on cleanup now rather than later. For
> example I will shortly suggest deprecating M/R jobs that use Hadoop 0.19
> APIs in the name of moving forward.
>
I think it first has to finish embracing MapReduce! The code base already
uses 2.5 different versions of Hadoop. It would be better clean up the
modest clutter of approaches we already have before thinking about extending
it.
Good news is there's a fair bit of time before any other particular
fram
Hector,
An in-core variant or a sequential on-disk variant is a great starting point
and focussing on the kernelized ranker is also a good place to start.
It would help if you can provide lots of visibility early in the process.
IF the JIRA process of attaching a diff becomes cumbersome, then yo
This is a theme that is going to raise itself over and over.
I think that strategically, Mahout is going to have to embrace the MapReduce
nextGen work so that we can have flexible computation models. We already
need this with all the large scale SVD work. We could very much use it for
the SGD st
https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
On May 18, 2011, at 1:17 AM, Hector Yee wrote:
> Re: boosting scalability, I've implemented it on thousands of machines, but
> not with mapreduce, rather with direct RPC calls. The gradient computation
> tends to be iterative, s
Re: boosting scalability, I've implemented it on thousands of machines, but
not with mapreduce, rather with direct RPC calls. The gradient computation
tends to be iterative, so one way to do it is to have each iteration run per
mapreduce.
Compute gradients in the mapper, gather them in the reducer,
On Tue, May 17, 2011 at 5:26 PM, Hector Yee wrote:
> I have some proposed contributions and I wonder if they will be useful in
> Mahout (otherwise I will just commit it in a new open source project in
> github).
>
These generally sound pretty good.
> - Sparse autoencoder (think of it as somet
Hello,
Some background on myself - I was at Google the last 5 years working on
the self-driving car, image search and youtube in machine learning (
http://www.linkedin.com/in/yeehector)
I have some proposed contributions and I wonder if they will be useful in
Mahout (otherwise I will just com
22 matches
Mail list logo