date:20130328

Re: How to improve clustering?

2013-03-28 Thread Dan Filimon

Sebastian, if you're interested I'd be glad to walk you through the main ideas, point you to the code and tell you how to run it. Testing it on more data would be very helpful the project. But, it makes hard cluster assignments. On Mar 28, 2013, at 2:23, Ted Dunning ted.dunn...@gmail.com

Re: How to improve clustering?

2013-03-28 Thread Ted Dunning

It makes hard cluster assignments, but that would be helpful two ways: a) it will help you diagnose data issues b) it can produce good starting points for fuzzy k-means. On Thu, Mar 28, 2013 at 7:19 AM, Dan Filimon dangeorge.fili...@gmail.comwrote: Sebastian, if you're interested I'd be glad

Re: Clustering 20newsgroups with StreamingKMeans [was How to improve clustering?]

2013-03-28 Thread Ted Dunning

I will have to think on this a bit. It should be possible to dump the sketches coming from each mapper and look at them for compatibility. Are the mappers seeing only docs from a single news group? That might produce some interesting and odd results. What happens with the sequential version

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Sebastian Briesemeister

Thank you. Splitting the files leads to multiple MR-tasks! Only changing the MR settings of hadoop did not help. In the future it would be nice if the drivers would scale themself and would split the data according to the dataset size and the number of available MR-slots. Cheers Sebastian Am

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Sean Owen

This is really a Hadoop-level thing. I am not sure I have ever successfully induced M/R to run multiple mappers on less than one block of data, even with a low max split size. Reducers you can control. On Thu, Mar 28, 2013 at 9:04 AM, Sebastian Briesemeister

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Ted Dunning

This is a longstanding Hadoop issue. Your suggestion is interesting, but only a few cases would benefit. The problem is that splitting involves reading from a very small number of nodes and thus is not much better than just running the program with few mappers. If the data is large enough to

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Sebastian Schelter

It would also be very hard to do automatically, as clusters are shared and a framework cannot know how much of the shared resources (available map slots) it can take. On 28.03.2013 10:07, Sean Owen wrote: This is really a Hadoop-level thing. I am not sure I have ever successfully induced M/R to

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Sebastian Schelter

Sebastian, For CPU-bound problems like matrix factorization with ALS, we have recently seen good results with multithreaded mappers, where we had the users specify the number of cores to use per mapper. On 28.03.2013 10:20, Ted Dunning wrote: This is a longstanding Hadoop issue. Your

mahout_structure and FPGrowth

2013-03-28 Thread vsaxena

Hello, I am new to mahout, I wanted information about the mahout project structure (all the directories info, what they contain, how i can use them) , basically I am interested in frequent item mining stuff. Besides this, I have executed the command #mahout fpg -i accidents.dat -o patterns -k

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Sebastian Briesemeister

In my case, each map processes requires a lot of memory and I would like to distribute this consumption on multiple nodes. However, I still get out of memory exceptions even if I split the input file into several very small input files??? I though the mapper would consider only one file at a time

RE: classifier for non-linear relationships

2013-03-28 Thread Michael Michael

Thanks Ted. I contacted them to find out about pricing, but I am sure it will be expensive though. It seems that since there are no open source solutions on this, my best is either matlab or to purchase something from a company like neurosolutions or skytree (there are a few others that fit

Regarding ItemBased Recommendation Results

2013-03-28 Thread ch raju

Hi all, I am working on mahout-0.7 recommendations, ran following command from the command line ./bin/mahout recommenditembased --input UserData.csv --output output/ --similarityClassname SIMILARITY_PEARSON_CORRELATION --numRecommendations 10 got the recommendations for every user. I deployed

Re: Regarding ItemBased Recommendation Results

2013-03-28 Thread Sebastian Schelter

The Hadoop-based implementation samples down users with more than 1000 interactions by default, that could be the reason for the differences that you are seeing. On 28.03.2013 15:09, ch raju wrote: Hi all, I am working on mahout-0.7 recommendations, ran following command from the command

Re: Regarding ItemBased Recommendation Results

2013-03-28 Thread Koobas

Are the suggestions completely different, or somewhat different? What about the neighborhoods? On Thu, Mar 28, 2013 at 10:09 AM, ch raju ch.raju...@gmail.com wrote: Hi all, I am working on mahout-0.7 recommendations, ran following command from the command line ./bin/mahout

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Dan Filimon

From what I've seen, even if the mapper does throw an out of memory exception, Hadoop will restart it increasing the memory. There are ways to configure the mapper/reducer JVMs to use more memory by default through the Configuration although I don't recall the exact options. It's probably

Re: Number of Clustering MR-Jobs

2013-03-28 Thread Sebastian Briesemeister

I tried to increase the heap space, but it wasn't enough. It seems the problem is not the number of mappers. I will start another thread for this problem with some more details. Cheers Sebastian Am 28.03.2013 16:41, schrieb Dan Filimon: From what I've seen, even if the mapper does throw an

Fuzyy Clustering accumulates lots of memory

2013-03-28 Thread Sebastian Briesemeister

Dear all, I have a large dataset consisting of ~50,000 documents and a dimension of 90,000. I splitted the created input vectors in smaller files to run a single mapper task on each of the files. However, even with very small files containing only 50 documents, I run into heap space problems. I

Re: Fuzyy Clustering accumulates lots of memory

2013-03-28 Thread Chris Harrington

Don't know if this will help with your heap issues (or if you've already tried it) but increasing the mapred.child.java.opts in the mapred-site.xml resolved some heap issues I was having. I was clustering 67000 small text docs into ~180 clusters and was seeing mapper heap issues until I made

Re: Fuzyy Clustering accumulates lots of memory

2013-03-28 Thread Sebastian Briesemeister

I tried increasing the child heap size. But as I mentioned even 4GB wasn't enough. I am also not sure whether the block size has some influence on the memory, but I assume this is not the case since such a design would be really bad. Any other ideas? Am 28.03.2013 17:40, schrieb Chris

Re: Regarding ItemBased Recommendation Results

2013-03-28 Thread ch raju

yeah, recommendations are completely different, out of 10 only one suggestion got matched.. which neighborhoods are you asking about ? I am new to this, didn't understand.. Thanks regards, Raju On Thu, Mar 28, 2013 at 8:25 PM, Koobas koo...@gmail.com wrote: Are the suggestions completely

Re: classifier for non-linear relationships

2013-03-28 Thread Ray

Why does Mahout not have what this person wants? Is it not really in the scope of Mahout? Used to be in Mahout, gone now? Needed someone to put it there in the first place? Ray. On 03/28/2013 07:02 AM, Michael Michael wrote: Thanks Ted. I contacted them to find out about pricing, but I

Re: How to improve clustering?

Re: How to improve clustering?

Re: Clustering 20newsgroups with StreamingKMeans [was How to improve clustering?]

Re: Number of Clustering MR-Jobs

Re: Number of Clustering MR-Jobs

Re: Number of Clustering MR-Jobs

Re: Number of Clustering MR-Jobs

Re: Number of Clustering MR-Jobs

mahout_structure and FPGrowth

Re: Number of Clustering MR-Jobs

RE: classifier for non-linear relationships

Regarding ItemBased Recommendation Results

Re: Regarding ItemBased Recommendation Results

Re: Regarding ItemBased Recommendation Results

Re: Number of Clustering MR-Jobs

Re: Number of Clustering MR-Jobs

Fuzyy Clustering accumulates lots of memory

Re: Fuzyy Clustering accumulates lots of memory

Re: Fuzyy Clustering accumulates lots of memory

Re: Regarding ItemBased Recommendation Results

Re: classifier for non-linear relationships

21 matches

Site Navigation

Mail list logo

Footer information