Sebastian, if you're interested I'd be glad to walk you through the main ideas,
point you to the code and tell you how to run it.
Testing it on more data would be very helpful the project.
But, it makes hard cluster assignments.
On Mar 28, 2013, at 2:23, Ted Dunning ted.dunn...@gmail.com
It makes hard cluster assignments, but that would be helpful two ways:
a) it will help you diagnose data issues
b) it can produce good starting points for fuzzy k-means.
On Thu, Mar 28, 2013 at 7:19 AM, Dan Filimon dangeorge.fili...@gmail.comwrote:
Sebastian, if you're interested I'd be glad
I will have to think on this a bit.
It should be possible to dump the sketches coming from each mapper and look
at them for compatibility.
Are the mappers seeing only docs from a single news group? That might
produce some interesting and odd results.
What happens with the sequential version
Thank you.
Splitting the files leads to multiple MR-tasks!
Only changing the MR settings of hadoop did not help. In the future it
would be nice if the drivers would scale themself and would split the
data according to the dataset size and the number of available MR-slots.
Cheers
Sebastian
Am
This is really a Hadoop-level thing. I am not sure I have ever
successfully induced M/R to run multiple mappers on less than one
block of data, even with a low max split size. Reducers you can
control.
On Thu, Mar 28, 2013 at 9:04 AM, Sebastian Briesemeister
This is a longstanding Hadoop issue.
Your suggestion is interesting, but only a few cases would benefit. The
problem is that splitting involves reading from a very small number of
nodes and thus is not much better than just running the program with few
mappers. If the data is large enough to
It would also be very hard to do automatically, as clusters are shared
and a framework cannot know how much of the shared resources (available
map slots) it can take.
On 28.03.2013 10:07, Sean Owen wrote:
This is really a Hadoop-level thing. I am not sure I have ever
successfully induced M/R to
Sebastian,
For CPU-bound problems like matrix factorization with ALS, we have
recently seen good results with multithreaded mappers, where we had the
users specify the number of cores to use per mapper.
On 28.03.2013 10:20, Ted Dunning wrote:
This is a longstanding Hadoop issue.
Your
Hello,
I am new to mahout, I wanted information about the mahout project structure
(all the directories info, what they contain, how i can use them) ,
basically I am interested in frequent item mining stuff. Besides this, I
have executed the command
#mahout fpg -i accidents.dat -o patterns -k
In my case, each map processes requires a lot of memory and I would like
to distribute this consumption on multiple nodes.
However, I still get out of memory exceptions even if I split the input
file into several very small input files??? I though the mapper would
consider only one file at a time
Thanks Ted. I contacted them to find out about pricing, but I am sure it will
be expensive though.
It seems that since there are no open source solutions on this, my best is
either matlab or to purchase something from a company like neurosolutions or
skytree (there are a few others that fit
Hi all,
I am working on mahout-0.7 recommendations, ran following command from
the command line
./bin/mahout recommenditembased --input UserData.csv --output output/
--similarityClassname SIMILARITY_PEARSON_CORRELATION --numRecommendations 10
got the recommendations for every user.
I deployed
The Hadoop-based implementation samples down users with more than 1000
interactions by default, that could be the reason for the differences
that you are seeing.
On 28.03.2013 15:09, ch raju wrote:
Hi all,
I am working on mahout-0.7 recommendations, ran following command from
the command
Are the suggestions completely different, or somewhat different?
What about the neighborhoods?
On Thu, Mar 28, 2013 at 10:09 AM, ch raju ch.raju...@gmail.com wrote:
Hi all,
I am working on mahout-0.7 recommendations, ran following command from
the command line
./bin/mahout
From what I've seen, even if the mapper does throw an out of memory
exception, Hadoop will restart it increasing the memory.
There are ways to configure the mapper/reducer JVMs to use more memory by
default through the Configuration although I don't recall the exact
options. It's probably
I tried to increase the heap space, but it wasn't enough.
It seems the problem is not the number of mappers. I will start another
thread for this problem with some more details.
Cheers
Sebastian
Am 28.03.2013 16:41, schrieb Dan Filimon:
From what I've seen, even if the mapper does throw an
Dear all,
I have a large dataset consisting of ~50,000 documents and a dimension
of 90,000. I splitted the created input vectors in smaller files to run
a single mapper task on each of the files.
However, even with very small files containing only 50 documents, I run
into heap space problems.
I
Don't know if this will help with your heap issues (or if you've already tried
it) but increasing the mapred.child.java.opts in the mapred-site.xml resolved
some heap issues I was having. I was clustering 67000 small text docs into ~180
clusters and was seeing mapper heap issues until I made
I tried increasing the child heap size. But as I mentioned even 4GB
wasn't enough.
I am also not sure whether the block size has some influence on the
memory, but I assume this is not the case since such a design would be
really bad.
Any other ideas?
Am 28.03.2013 17:40, schrieb Chris
yeah, recommendations are completely different, out of 10 only one
suggestion got matched..
which neighborhoods are you asking about ? I am new to this, didn't
understand..
Thanks regards,
Raju
On Thu, Mar 28, 2013 at 8:25 PM, Koobas koo...@gmail.com wrote:
Are the suggestions completely
Why does Mahout not have what this person wants?
Is it not really in the scope of Mahout? Used to be in Mahout, gone
now? Needed someone to put it there in the first place?
Ray.
On 03/28/2013 07:02 AM, Michael Michael wrote:
Thanks Ted. I contacted them to find out about pricing, but I
21 matches
Mail list logo