[
https://issues.apache.org/jira/browse/MAHOUT-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618271#comment-13618271
]
Ted Dunning commented on MAHOUT-1183:
-
Dave,
Thanks and all for picking out
Relative to Dan's recent mention of SOM as possible new project, here are
slides from KDD Cup 2012 in which Stephen Rendle describes how he did using
a very straightforward implementation of Factorization Machines [1,2].
FMs are interesting in the context of Mahout because they can be used in a
w
[
https://issues.apache.org/jira/browse/MAHOUT-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-1182:
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed in r1462882
(this
SOM doesn't have to be constrained to two dimensions.
That said, there are bunches of non-linear embedding methods that are more
current than SOM's. SOM's were part of the neural plausibility movement of
the late 80's which more recently can be seen as an approach toward modern
formulations of st
aimnphidmmapikej...@mahout.apache.org
> > To reject:
> >dev-reject-1364573050.63309.haimnphidmmapikej...@mahout.apache.org
> > To give a reason to reject:
> > %%% Start comment
> > %%% End comment
> >
> >
> >
> > -- Forwarded message
Pity that they don't bother to contribute back to Mahout itself.
On Fri, Mar 29, 2013 at 11:28 AM, Sean Owen wrote:
> Not sure if people saw this from Josh at Cloudera:
> http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/
> https://github.com/cloudera/ml
>
> This is a nice sho
I think (casually, informally) that the guarantees are preserved by simple
ordering arguments. The argument goes that the threshold on partitions
will grow less than if the partitions were handled sequentially. Thus the
partial sketches should have at least as much fidelity than if the same
segme
um clusters: 20; maxDistance: 1.029701
> >
> >
> > On Thu, Mar 28, 2013 at 6:45 PM, Dan Filimon <
> dangeorge.fili...@gmail.com>wrote:
> >
> >> You know what's even more odd? When I used Mahout's KMeans, everything
> >> was assigned to one sin
Cool!
We need the help.
On Thu, Mar 28, 2013 at 8:08 PM, Ray wrote:
> I'll focus for now on contributing to documentation, possibly some
> patches. See how the contribution process works, gain a little confidence
> there first. (I do have a background in neural networks.)
>
It should be possible to view a Lucene index as a matrix. This would
require that we standardize on a way to convert documents to rows. There
are many choices, the discussion of which should be deferred to the actual
work on the project, but there are a few obvious constraints:
a) it should be p
r 19 [2]: 98.733778
> Num clusters: 20; maxDistance: 762.326896
>
> On Thu, Mar 28, 2013 at 10:32 AM, Ted Dunning
> wrote:
> > I will have to think on this a bit.
> >
> > It should be possible to dump the sketches coming from each mapper and
> look
> > at t
e distances array that needs fixing.
> In fact, an entire new copy is needed if we're to be able to safely
> iterate and reindex.
>
> I'm shelving this for now. That ok?
>
> On Wed, Mar 27, 2013 at 4:35 AM, Ted Dunning
> wrote:
> > Another option is to make th
[
https://issues.apache.org/jira/browse/MAHOUT-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615384#comment-13615384
]
Ted Dunning commented on MAHOUT-1025:
-
Saikat,
Your outline is just
browse/MAHOUT-1164<https://issues.apache.org/jira/browse/MAHOUT-1164>
>
>
>
>
> On 03/27/2013 12:37 AM, Ted Dunning wrote:
>
>> Can you post a list of those patches?
>>
>> I haven't been tracking carefully and unless I have a moment when the
>> email
[
https://issues.apache.org/jira/browse/MAHOUT-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615019#comment-13615019
]
Ted Dunning commented on MAHOUT-1164:
-
Marty,
If you like this, I can commit
[
https://issues.apache.org/jira/browse/MAHOUT-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-1164:
Attachment: MAHOUT-1164.patch
Another round of revision. Cleaned up unused variables
[
https://issues.apache.org/jira/browse/MAHOUT-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-1164:
Attachment: MAHOUT-1164.patch
I factored large string constants out into resource files. More
stian Schelter wrote:
>
> > Totally agree on that. The impact of making Mahout more usable is much
> > higher than that of adding a new algorithm.
> >
> > On 27.03.2013 05:41, Ted Dunning wrote:
> >> It is critically important.
> >>
> >
Here are some ideas:
- reform and simplify the clustering API's. All of our main-line
clustering systems should work identically and have good and simple
diagnostics.
- simplify the connection to Lucene for clustering and classification.
On Tue, Mar 26, 2013 at 8:44 PM, Dan Filimon wrote:
> O
I can help peripherally, but my travel schedule is heinous and would
prevent full on mentoring.
On Tue, Mar 26, 2013 at 8:44 PM, Dan Filimon wrote:
> Okay, we just need to add JIRA issues that have the tags "gsoc2013"
> and "mentors" and we're good.
>
> The deadline for the ideas is March 29 and
[
https://issues.apache.org/jira/browse/MAHOUT-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614969#comment-13614969
]
Ted Dunning commented on MAHOUT-1176:
-
Let's keep it more terse than eve
I meant that the committer should add the line.
On Wed, Mar 27, 2013 at 7:14 AM, Mike Percy wrote:
> On Tue, Mar 26, 2013 at 9:54 PM, Ted Dunning
> wrote:
>
> > We should try to make sure to have the update of the CHANGELOG part of
> the
> > patch.
> >
>
Fine idea.
We should try to make sure to have the update of the CHANGELOG part of the
patch.
On Wed, Mar 27, 2013 at 1:38 AM, Mike Percy wrote:
> If others on the list agree then I think this is a fine idea.
>
> Regards,
> Mike
>
>
> On Tue, Mar 26, 2013 at 3:04 PM, Sebastian Schelter
> wrote:
g on consistent data format and command line option support. It's not
> glamorous but it's important.
>
>
> On 3/26/2013 8:26 PM, Ted Dunning wrote:
>
>> Gokhan,
>>
>> I think that the general drift of your recommendation is an excellent
>> sugg
Another option is to make the iterator take a reference to the array as it
exists and then during merging always create a new array.
A second option is to just let the iterator get a bit confused (don't like
the smell there).
On Tue, Mar 26, 2013 at 10:59 PM, Dan Filimon
wrote:
> Ted, everyone,
Gokhan,
I think that the general drift of your recommendation is an excellent
suggestion and it is something that we have wrestled with a lot over time.
The recommendations side of the house has more coherence in this matter
than other parts largely because there was a clear flow early on.
Now,
Yowza these are really good comments.
Where have you guys been?
To answer just one of the questions, my own criterion for voting up a
committer is that they
a) will work with other contributors positively
b) won't break things too often (breaking the build sometimes is probably a
good sign)
[
https://issues.apache.org/jira/browse/MAHOUT-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning resolved MAHOUT-1174.
-
Resolution: Fixed
Checked in updated links.
> Lanczos code and javadocs sho
[
https://issues.apache.org/jira/browse/MAHOUT-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613289#comment-13613289
]
Ted Dunning commented on MAHOUT-1173:
-
For grins and general knowledge, Jen
[
https://issues.apache.org/jira/browse/MAHOUT-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning reopened MAHOUT-1174:
-
Assignee: Ted Dunning
Fixing this now to point to the correct link.
Thanks for spotting that
[
https://issues.apache.org/jira/browse/MAHOUT-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613274#comment-13613274
]
Ted Dunning commented on MAHOUT-1173:
-
The problem before was that it produced
I am still a fan of GSOC, but there is no chance I have enough time to help
(although my working with Dan recently is a bit of a counter example)
On Mon, Mar 25, 2013 at 11:12 PM, Grant Ingersoll wrote:
>
> On Mar 25, 2013, at 4:24 PM, Isabel Drost-Fromm wrote:
>
> > Also, do we have any voluntee
[
https://issues.apache.org/jira/browse/MAHOUT-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning resolved MAHOUT-1174.
-
Resolution: Fixed
Committed javadoc fixes. Also added Preconditions check + assert for null at
Ted Dunning created MAHOUT-1174:
---
Summary: Lanczos code and javadocs should refer users to the SSVD
stuff
Key: MAHOUT-1174
URL: https://issues.apache.org/jira/browse/MAHOUT-1174
Project: Mahout
his refers to the cleanups I've done in the last days. In the
> future, I will create a Jira for each and attach a patch.
>
> On 25.03.2013 16:31, Ted Dunning wrote:
> > I would like it if all changes to the code be accompanied by a JIRA that
> > describes the problem bei
I would like it if all changes to the code be accompanied by a JIRA that
describes the problem being solved and that the commit messages associated
with the fix reference the JIRA.
Switching to apache git would make this easier.
On Mon, Mar 25, 2013 at 1:08 PM, Isabel Drost wrote:
> > As non-committer I'd contribute more to Mahout, had github be primary
> > source. Now, when I contribute a pull request, it gets merged to Apache
> git
> > server by committer, and I don't ge
Sebastian,
I was the one who turned off checkstyle.
Here is my (minimal) rationale.
http://mail-archives.apache.org/mod_mbox/mahout-dev/201212.mbox/%3CCAJwFCa3y6m8xiRW%3DcqTG29fOH5TWeTO%2Bi1YYFzv%3DQSMSk0SinQ%40mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/mahout-dev/201206.mbox/browser
There seems to be a discussion of issues with Jenkins in there.
Unfortunately mail-archives seems very flaky at the moment and I can't see
the actual messages.
On Sun, Mar 24, 2013 at 10:37 PM, Ted Dunning wrote:
I believe that it was removed because it was making the build unstable.
Probably worth trolling back the the email archives.
On Sun, Mar 24, 2013 at 5:47 PM, Sebastian Schelter wrote:
> Why is checkstyle removed from our pom? Is there a particular reason for
> that?
>
> I would suggest to reint
Saikat,
This sounds fairly interesting.
Are you talking about a non-commercial or commercial interest in doing this?
I ask because a non-commercial interest would probably mean that you would
be willing to donate more of your code but would have less time to spare.
A commercial interest would p
[
https://issues.apache.org/jira/browse/MAHOUT-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning resolved MAHOUT-1171.
-
Resolution: Fixed
PMD is slightly better than before, at 194 medium warnings versus 199.
Build
I created MAHOUT-1171 to track fixes. I just committed a hundred or so
changes that should get most of these.
On Sun, Mar 24, 2013 at 12:23 PM, Sebastian Schelter <
ssc.o...@googlemail.com> wrote:
> Guess I'm responsible for those warnings, let me have a look.
>
> On 2
[
https://issues.apache.org/jira/browse/MAHOUT-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612079#comment-13612079
]
Ted Dunning commented on MAHOUT-1171:
-
Committed r1460328 with many simple
[
https://issues.apache.org/jira/browse/MAHOUT-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning reassigned MAHOUT-1171:
---
Assignee: Ted Dunning
> PMD regression
> --
>
>
[
https://issues.apache.org/jira/browse/MAHOUT-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612066#comment-13612066
]
Ted Dunning commented on MAHOUT-1171:
-
Build #1920 [1] showed a sharply incre
Ted Dunning created MAHOUT-1171:
---
Summary: PMD regression
Key: MAHOUT-1171
URL: https://issues.apache.org/jira/browse/MAHOUT-1171
Project: Mahout
Issue Type: Bug
Reporter: Ted
Build #1920 [1] showed a sharply increased number of PMD warnings recently.
The report that shows the new warnings [2] indicates that the new warnings
seem to be primarily unused imports and other simple issues that should be
fixable by using IDE inspections. IntelliJ, for instance, would bitch
Do
mvn compile
These missing files will be created. The eclipse maven plugins aren't
smart enough to understand the source file creation step that is included
in the compilation. That means that you can mostly trust it, but not
always. The first time is a good example of what "not always"
Indeed. Dan and I have discussed this. The space that he starts in is
TF-IDF weighted and the projections is random so it should preserve much of
the metric in the original. Based on the experience that we had with
SSVD, using a properly learned projection would definitely give modest
improveme
Looks like a great idea.
We are very weak RTC. Some things are pretty obviously good ideas and low
risk so we wind up doing something like CTR.
On Wed, Mar 20, 2013 at 8:47 PM, Benson Margulies wrote:
> Anyone have any objections?
>
> Are we still formally RTC?
>
Grant was going to do the commit, I think.
On Wed, Mar 20, 2013 at 10:32 AM, Benson Margulies wrote:
> I'm a bit rusty. People want a patch and a jira, or just the trivial commit
> :-)
>
>
> On Wed, Mar 20, 2013 at 6:52 AM, Ted Dunning
> wrote:
>
> > Sounds good
AM
> Subject: Re: lucene 4.2.0?
>
> I wouldn't think so. Go for it.
>
> On Mar 19, 2013, at 6:35 AM, Ted Dunning wrote:
>
> > Shouldn't affect compatibility with >= 4.0, should it?
> >
> > On Tue, Mar 19, 2013 at 3:16 AM, Benson Margulies >wrote:
> >
> >> Any objection?
> >>
>
Shouldn't affect compatibility with >= 4.0, should it?
On Tue, Mar 19, 2013 at 3:16 AM, Benson Margulies wrote:
> Any objection?
>
t
> variables.
>
>
> On Fri, Mar 15, 2013 at 3:04 AM, Ted Dunning
> wrote:
>
> > Is this a dense dataset or sparse? What is the average sparsity if so.
> >
> > On Thu, Mar 14, 2013 at 11:26 AM, Ying Liao wrote:
> >
> > > I have a training set with 9
[
https://issues.apache.org/jira/browse/MAHOUT-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604477#comment-13604477
]
Ted Dunning commented on MAHOUT-1166:
-
There is a FileBasedMatrix implementa
Is this a dense dataset or sparse? What is the average sparsity if so.
On Thu, Mar 14, 2013 at 11:26 AM, Ying Liao wrote:
> I have a training set with 9M records and 1M independent variables. Any
> other tool can process the dataset?
>
>
> On Thu, Mar 14, 2013 at 11:33 AM, Danny Busch wrote:
>
I was unable to answer this off the cuff in direct email.
Anybody else remember the answer?
On Wed, Mar 13, 2013 at 12:44 PM, Dan Filimon
wrote:
> I'm trying to add the new StreamingKMeans job as
> (o.a.m.clustering.streaming.mapreduce.StreamingKMeansDriver [1; not
> yet a JIRA issue).
>
> I've
> http://openreview.net/iclr2013
>
> table 2, compared to the result you mentioned
>
> http://arxiv.org/pdf/1301.4171v1.pdf
>
>
> On Thu, Mar 14, 2013 at 11:44 AM, Ted Dunning
> wrote:
>
> > Yeah we have had little pull on these techniques beyond the simplest
&
Yeah we have had little pull on these techniques beyond the simplest
case of logistic regression.
Would you guys be willing to sign up for maintaining the code that might
result?
The thing that might move the needle would be a replication of this
architecture:
http://deeplearning.net/2012/12
Stick around!
We would love to see the fruits of this.
On Wed, Mar 13, 2013 at 1:01 AM, Nick Pentreath wrote:
> The main point of interest in this context is that I intend to build a
> minimal first-cut machine learning library for Spark. This is likely to
> involve porting / using parts of Mah
On Tue, Mar 12, 2013 at 3:05 PM, Josh Wills wrote:
>
> First, I wanted to say that I think that there are lots of problems that
> can be handled well in MapReduce (the recent k-means streaming stuff being
> a prime example), even if they could be performed even faster using an
> in-memory mo
[
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600518#comment-13600518
]
Ted Dunning commented on MAHOUT-668:
I don't think so. The searcher interf
Indeed. We have considered switching in the past, but the momentum never
developed.
On Tue, Mar 12, 2013 at 8:59 AM, Andy Schlaikjer <
andrew.schlaik...@gmail.com> wrote:
> For sake of argument, fastutil has fast entry sets / iterators for most
> specializations of maps:
>
>
> http://fastutil.di
pread unevenly enough.
>
>
> On Mon, Mar 11, 2013 at 11:59 PM, Ted Dunning wrote:
>
>> Yarn by itself won't fix this problem. Yarn + Spark would fix it. But,
>> then again, so would Mesos + Spark or AmigaOS + Spark.
>>
>> Should we open several additional mo
Yay for PIG.
I am still hoping that Drill does well and that the PIG folk build a syntax
facade for it so that I can write PIG programs that run really fast.
On Mon, Mar 11, 2013 at 5:46 PM, Jake Mannix wrote:
> On Mon, Mar 11, 2013 at 4:59 PM, Ted Dunning
> wrote:
>
> > Yarn
On Mon, Mar 11, 2013 at 5:58 PM, Jake Mannix wrote:
> On Mon, Mar 11, 2013 at 5:44 PM, Jake Mannix
> wrote:
>
> >
> >
> >
> > On Mon, Mar 11, 2013 at 5:14 PM, Ted Dunning >wrote:
> >
> >> [mvn compile|test|package] will do the trick.
> &g
On Mon, Mar 11, 2013 at 5:44 PM, Jake Mannix wrote:
> On Mon, Mar 11, 2013 at 5:14 PM, Ted Dunning
> wrote:
>
> > [mvn compile|test|package] will do the trick.
> >...
> > Not that it matters much since the compile is so fast.
> >
>
> Ok, I'll
r 11, 2013 at 4:42 PM, Jake Mannix wrote:
> On Mon, Mar 11, 2013 at 4:21 PM, Ted Dunning
> wrote:
>
> > It is part of math now since we had zero pull for it separate from math.
> >
>
> I see the code templates living in math, yes, but how to build it?
>
>
> &
pis in Yarn etc. hadoop native stuff, but isn't really
> what would solve iterative structured and interconnected stuff?
>
>
> >
> > On 11.03.2013 21:16, Ted Dunning wrote:
> > > Kinda sorta..
> > >
> > > You can defeat most of the sort if you want
[
https://issues.apache.org/jira/browse/MAHOUT-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599512#comment-13599512
]
Ted Dunning commented on MAHOUT-1130:
-
Sebastian,
I see no patch here. Is it
Why not (b) if (b) implies Giraph (which seems to have some momentum) or
Spark (which has its own momentum and was originally designed to support
machine learning anyway)?
Also, why not (b) if we agree now that it is an experiment that will will
cut away if it leads to a mess.
On Mon, Mar 11, 201
It is part of math now since we had zero pull for it separate from math.
What did you need?
On Mon, Mar 11, 2013 at 1:43 PM, Jake Mannix wrote:
> Question which I ought to know the answer to, but don't: if we want to make
> changes to mahout-collections, what's the build process / maven target
Kinda sorta..
You can defeat most of the sort if you want to just hash things to buckets.
On Mon, Mar 11, 2013 at 12:01 PM, Dmitriy Lyubimov wrote:
> Sort component adds log to
> the asymptotic complexity, whereas it is clear that any streaming merge
> algorithm just wouldn't need to do sort and
Nice!
On Mon, Mar 11, 2013 at 10:48 AM, Jake Mannix wrote:
> is currently on the wiki, please look it over, as the board meeting is this
> wednesday, I believe, so I need to send it over soon (it was due on
> saturday).
>
> https://cwiki.apache.org/confluence/display/MAHOUT/Monthly+Progress
>
>
erent point on a tradeoff curve, optimizing for a different type of
> problem.
>
Yes. Yes.
For instance, stochastic svd and streaming k-means both radically change
the game when it comes to map-reduce.
But the real issue has to do with whether scaling is truly linear or not.
>
>
The big cost in map-reduce iteration isn't just startup. It is that the
input has to be read from disk and the output written to same. Were it to
stay in memory, things would be vastly faster.
Also, startup costs are still pretty significant. Even on MapR, one of the
major problems in setting t
Thanks for the patch.
Sent from my iPhone
On Mar 8, 2013, at 1:22 AM, "Adam Bozanich (JIRA)" wrote:
>
> [
> https://issues.apache.org/jira/browse/MAHOUT-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Adam Bozanich updated MAHOUT-1157:
> -
[
https://issues.apache.org/jira/browse/MAHOUT-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596800#comment-13596800
]
Ted Dunning commented on MAHOUT-1155:
-
In this patch fragment:
+ @Over
See Giraph.
On Thu, Mar 7, 2013 at 6:01 PM, Andy Twigg wrote:
> That sounds like a horrid amount of work to do something simple. Is there a
> hadoop implementation of a master-workers problem you can point me to?
> On Mar 7, 2013 9:57 PM, "Ted Dunning" wrote:
>
> >
On Thu, Mar 7, 2013 at 6:25 AM, Andy Twigg wrote:
> ... Right now what we have is a
> single-machine procedure for scanning through some data, building a
> set of histograms, combining histograms and then expanding the tree.
> The next step is to decide the best way to distribute this. I'm not an
[
https://issues.apache.org/jira/browse/MAHOUT-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594848#comment-13594848
]
Ted Dunning commented on MAHOUT-1151:
-
Regarding the patch, it is mostly quite
I think it might be worth committing in steps.
The standalone clustering and utility code has almost no impact on existing
Mahout code (what small impacts there were on Vector and friends were
committed some time ago). These can be committed sooner.
Integration with the map-reduce and command li
making this consistent would be very helpful.
On Thu, Feb 28, 2013 at 9:33 AM, Marty Kube wrote:
> Hey,
>
> I've been looking at consuming ARFF files for random forest classification.
>
> If you look at the partial implementation example page one is asked to
> download an ARFF file, edit the ARFF
ect.org
> >>>>>
> >>>>> I think this framework is a nice fit for the problem.
> >>>>> If the input data fits into the "total cluster memory" you benefit
> from
> >>>>> the caching of the RDD's.
> >>>>
If non-MR means map-only job with communicating mappers and a state store,
I am down with that.
What did you mean?
On Tue, Feb 19, 2013 at 5:53 PM, Marty Kube <
martyk...@beavercreekconsulting.com> wrote:
>
> Right now I'd lean towards the planet model, or maybe a non-MR
> implementation. Anyon
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13579708#comment-13579708
]
Ted Dunning commented on MAHOUT-1148:
-
The difference was just a matter of rewri
M, deneche abdelhakim wrote:
>
>> On Fri, Feb 15, 2013 at 1:06 AM, Marty Kube <
>> martykube@**beavercreekconsulting.com>
>> wrote:
>>
>> On 01/28/2013 02:33 PM, Ted Dunning wrote:
>>>
>>> I think I was suggesting something weaker.
>>
Exactly.
On Fri, Feb 15, 2013 at 5:37 PM, Marty Kube wrote:
> Even if you are not doing map reduce exactly, hadoop does give you a nice
> infrastructure for running jobs across a lot of host.
>
> On 02/15/2013 04:00 PM, Ted Dunning wrote:
>
>> Remember that Hadoop != map-r
Remember that Hadoop != map-reduce.
If there is another style that we need to use, that isn't such a bad thing.
On Fri, Feb 15, 2013 at 7:42 AM, Andy Twigg wrote:
> I am having a hard time convincing myself that doing it on hadoop is
> the best idea (and like I said, it's not like there are ot
I think that file: is the right way to access the local file system.
On Wed, Feb 13, 2013 at 4:14 AM, Sean Owen wrote:
> Hmm I think it will work if you use "file:///..." URIs? I haven't tried in
> a long time though.
>
>
> On Wed, Feb 13, 2013 at 12:12 PM, Dan Filimon
> wrote:
>
> > I see. Well
The way to fix this would be to make the fancy source generation plugin be
more clever. It hasn't been worthwhile to date.
On Tue, Feb 12, 2013 at 6:57 AM, Dan Filimon wrote:
> Is there any way of building Mahout without regenerating the math sources?
> It seems that every time I run 'mvn compil
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571796#comment-13571796
]
Ted Dunning commented on MAHOUT-1148:
-
Well, Jake predicted a speedup. But thi
Your email has been hacked, it appears.
On Tue, Feb 5, 2013 at 1:04 AM, Elena Smirnova wrote:
> http://www.flysteve.it/k5ulzw.php?s=ot
>
>
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571563#comment-13571563
]
Ted Dunning edited comment on MAHOUT-1148 at 2/5/13 6:4
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning resolved MAHOUT-1148.
-
Resolution: Fixed
OK. That was a silly last minute change that went with the wrong polarity
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571542#comment-13571542
]
Ted Dunning commented on MAHOUT-1148:
-
Suneel,
Jenkins agrees with you.
I
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning reopened MAHOUT-1148:
-
Test failing
> QR Decomposition is too s
[
https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning resolved MAHOUT-1148.
-
Resolution: Fixed
I committed the new QR decomposition. This is a bit bold to do, but I think
; On Sun, Feb 3, 2013 at 9:45 PM, Ted Dunning wrote:
> > I think that getting it into the existing API would be very nice to have,
> > but not absolutely critical.
> >
> > If extending the release by, say, 2-3 weeks would solve the problem I
> would
> > recommend e
801 - 900 of 2677 matches
Mail list logo