Re: Streaming K-means

2015-06-02 Thread Marko Dinic
is why I'm considering this Streaming approach now. Would you think that it is worthy of giving a shot? I'm really stretching for a scalable solution. Best regards, Marko On Tue 02 Jun 2015 12:03:40 AM CEST, Ted Dunning wrote: The streaming k-means works by building a sketch of th

Streaming K-means

2015-06-01 Thread Marko Dinic
igger problems than K-means because it's not scalable, but can be useful in some cases (e.g. It allows more sophisticated distance measures). What is your opinion about implementation of this? Best regards, Marko

K-means implementation

2015-01-23 Thread Marko Dinic
Hello everyone, I was digging through K-means implementation on Hadoop and I'm a bit confused with one thing so I wanted to check. To calculate the distance from point to all centroids, centroids need to be accessed from every mapper. So it seemed logical to me to put the centroids (sequenceF

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-15 Thread marko . dinic
th the implementation, if it doesn't sound that crazy. I wish you all the best, Marko Quoting Ted Dunning : On Thu, Jan 15, 2015 at 3:50 AM, Marko Dinic wrote: Thank you for your answer. Maybe I made a wrong picture about my data when giving sinusoid as an example, my time series are

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-15 Thread Marko Dinic
calculations, how much time could I expect for such an algorithm in case of 10.000 signals with 300 points, for example? How can I even estimate that? Thanks for your effort, if you have time to answer. Regards, Marko On Thu 15 Jan 2015 05:25:55 AM CET, Anand Avati wrote: Perhaps you could think o

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-10 Thread Marko Dinic
scalable solution for my problem, I tried to fit it in what's already implemented in Mahout (for clustering), but it's not so obvious to me. I'm open to suggestions, I'm still new to all of this. Thanks, Marko On Sat 10 Jan 2015 07:32:33 AM CET, Ted Dunning wrote: Why is i

Re: DTW distance measure and K-medioids, Hierarchical clustering

2015-01-09 Thread Marko Dinic
about the scalability? I would highly appreciate your answer, thanks. On Thu 08 Jan 2015 08:19:18 PM CET, Ted Dunning wrote: On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic wrote: 1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that could be used as a distance measure for cluster

DTW distance measure and K-medioids, Hierarchical clustering

2015-01-08 Thread Marko Dinic
Hello everyone. I have a couple of questions. 1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that could be used as a distance measure for clustering? 2) Why isn't there an implementation of K-mediods in Mahout? I'm guessing that it could not be implemented efficiently

Re: How can I include mahout 0.9 with hadoop 2.3 in my project?

2014-12-15 Thread Marko Dinic
Hello, Sorry for bumping like this, but I have a very similar question, can I use Mahout 0.9 with Hadoop 0.20.2? Thanks On Mon 15 Dec 2014 10:09:56 AM CET, jyotiranjan panda wrote: Hi, mahout-0.9 is compatible with hadoop-1.2.1 Regards Jyoti Ranjan Panda On Mon, Dec 15, 2014 at 2:33 PM, Le

Re: Mahout 0.9 on Hadoop 0.20.2

2014-10-28 Thread Marko Dinić
since Hadoop is installed on the cluster? I have never done deployment to cluster, so I'm really confused. Any help would great, or any reference like the previous one? Regards, Marko On уторак, 28. октобар 2014. 17:12:59 CET, Chandramani Tiwary wrote: Hi Marko, Nothing special needs to b

Re: Mahout 0.9 on Hadoop 0.20.2

2014-10-28 Thread Marko Dinić
xpect failures in case of it? Regards, Marko On уторак, 28. октобар 2014. 16:48:03 CET, Chandramani Tiwary wrote: Hi Marko, You can configure Mahout 0.9 over Hadoop 0.20.2 but the Hadoop dependencies might lead to failure quite a few time. One example, If I remember correctly is that Hadoop 0

Mahout 0.9 on Hadoop 0.20.2

2014-10-28 Thread Marko Dinić
Hello, I have Hadoop cluster on which Hadoop 0.20.2 is installed. Is there a way to use Mahout 0.9 on that cluster? I understand that Mahout 0.9 is based on Hadoop 1.2.1, but I have this constraint, so I cannot install another version of Hadoop on it. Thanks, Marko

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
o how many points? possible to share ur dataset to troubleshoot ? On Thu, Oct 9, 2014 at 9:18 AM, Marko Dinić wrote: Suneel, Thank you for your answer, this was rather strange to me. The number of points is 942. I have multiple runs, in each run I have a loop in which number of cluste

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Here is the dataset. On четвртак, 09. октобар 2014. 16:53:25 CEST, Marko Dinić wrote: Yes it is small, but it is just a sample, so the dataset will probably be much bigger. So you think that this was the problem? Will this problem be avoided in case of larger dataset? I think that there were

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
share ur dataset to troubleshoot ? On Thu, Oct 9, 2014 at 9:18 AM, Marko Dinić wrote: Suneel, Thank you for your answer, this was rather strange to me. The number of points is 942. I have multiple runs, in each run I have a loop in which number of clusters is increased in each iteration

Re: Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
take a look at this. On Thu, Oct 9, 2014 at 5:39 AM, Marko Dinić wrote: Hello everyone, I'm using Mahout Streaming K Means multiple times in a loop, every time for same input data, and output path is always different. Concretely, I'm increasing number of clusters in each iteration.

Streaming K Means exception without any reason

2014-10-09 Thread Marko Dinić
Hello everyone, I'm using Mahout Streaming K Means multiple times in a loop, every time for same input data, and output path is always different. Concretely, I'm increasing number of clusters in each iteration. Currently it is run on a single machine. A couple of times (maybe 3 of 20 runs) I

Use of streaming K Means

2014-10-02 Thread Marko Dinić
that is performed after streaming step. The question that arrives is - when to do Ball K Means step, since the data arrives all the time... Should I even consider this, or should I go for lambda architecture? Any help would be great. Thanks, Marko

Re: Streaming K Means

2014-10-02 Thread Marko Dinić
ster-reuters.sh that you have provided, what is it used for? Thanks, Marko On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote: This was replied to earlier with the details u r looking for, repeating here again: See http://stackoverflow.com/questions/17272296/how-to-use-mah

Streaming K Means

2014-09-29 Thread Marko
Hello everyone, I have previously asked a question about Streaming K Means examples, and got an answer that there are not so many available. Can anyone give me example of how to call Streaming K Means clustering for a dataset, and how to get the results? What are the results, are they the s

Re: word weights using BM25

2014-09-24 Thread Marko
Hello everyone, I'm very sorry to bump in like this, I have been added to the mail list (I think), but it seems that I'm somehow unable to ask a question, that is, I asked a question full times and got no answer. I hope this way will work. I'm new to Mahout and I've been struggling with Stre

KMeans for clustering individual point

2014-09-08 Thread Marko
Hello, I know that Mahout is used for batch processing, but I am interested if I can use its KMeans, and how, for clustering individual points? Let's say that we have following situation * Global clustering, that performs batch processing on all data and gives centroids as result * One p

Streaming K Means

2014-09-04 Thread Marko
Configuration configuration = new Configuration(); configuration.set("--estimatedNumMapClusters", "18"); configuration.set("-k", "6"); configuration.set("--distanceMeasure", "org.apache.mahout.common.distance.

Re: question on writing a customized item similiarity function

2011-09-30 Thread Marko Ciric
item > > description content) for example for product recommendation, how can i > > customize the similarity function ? As far as I understand, the current > > mahout similarity function is based on user rating only. Any one had > > experience writing a custom item based similarity

Re: Ehcache and Mahout

2011-09-30 Thread Marko Ciric
> http://ehcache.org/ > > > > > > For iterative MapReduce applications running on a NoSQL data store, it > > > should provide a good performance boost by providing an in-memory > object > > > cache (I think). Any comments? > > > -- -- Marko Ćirić ciric.ma...@gmail.com

Re: Article on Mahout recommenders and Cassandra

2011-08-16 Thread Marko Ciric
g Cassandra > > and/or the non-distributed recommenders. > > > > Sean > > > -- -- Marko Ćirić ciric.ma...@gmail.com

Re: Advice request

2011-08-08 Thread Marko Ciric
You could also introduce clustering and build clusters from pages that have a lot of similar words. If your pages data doesn't change too often, you could select most similar pages from within a cluster and recommend it to a user.. On Aug 8, 2011 6:08 PM, "Marko Ciric" wrote: >

Re: Advice request

2011-08-08 Thread Marko Ciric
You might want to use TanimotoCoefficientSimilarity if your data set isn't large. On Jul 27, 2011 10:51 AM, "Sean Owen" wrote: > Sounds good. In that case, the surprise-n-coincidence counterpart you are > probably looking for it LogLikelihoodSimilarity, which implements > ItemSimilarity. Use it wi

Re: Mahout Binary Recommender Evaluator

2011-07-28 Thread Marko Ciric
Correction: I didn't mean to re-implement the existing functionality, but there should be an easy way to connect UAC with Taste evaluators. On 28 July 2011 12:57, Marko Ciric wrote: > I think it wouldn't be a big problem to reimplement it thought it would > have to have a sort o

Re: Mahout Binary Recommender Evaluator

2011-07-28 Thread Marko Ciric
l, we do have numerous ways to compute AUC. I don't think that they are > integrated into the recommendation evaluation framework yet. Would you > like > to take on the application of suitable glue? > > > On Mon, Jul 25, 2011 at 1:00 PM, Marko Ciric > wrote: > > >

AUC

2011-07-25 Thread Marko Ciric
Hi guys, I'm wondering if any resources or tutorials are available (and where) about calculating AUC when working with boolean preferences data models? -- -- Marko Ćirić ciric.ma...@gmail.com

Re: Mahout Binary Recommender Evaluator

2011-07-25 Thread Marko Ciric
On Mon, Jul 25, 2011 at 3:16 AM, Marko Ciric > wrote: > > > The better way to do it is to implement an evaluator which accepts the > > collection of items that are relevant. > > > -- -- Marko Ćirić ciric.ma...@gmail.com

Re: Mahout Binary Recommender Evaluator

2011-07-25 Thread Marko Ciric
difficulty is including > it in a clean way. Up for a patch? > > > > > > > Finaly, I believe the documentation page has some mistakes in the last > code > > excerpt : > > > > evaluator.evaluate(builder, myModel, null, 3, > > RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD, > >§1.0); > > > > should be > > evaluator.evaluate(builder, null, myModel, null, 3, > > GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); > > > > > > OK will look at that. > -- -- Marko Ćirić ciric.ma...@gmail.com

Re: Evaluating boolean preference data sets

2011-07-21 Thread Marko Ciric
Also the evaluation could be done per user, and thus manually running multiple times per each user. Or simple defining a matrix with relevant items per each user.. On Jul 21, 2011 4:18 PM, "Marko Ciric" wrote: > Yes, there should exist an evaluation that allows you to pass whic

Re: Evaluating boolean preference data sets

2011-07-21 Thread Marko Ciric
tings. > It has to pick random items as "relevant", for starters. It's another > reason > your idea is good, to let the user specify those relevant items. > > On Thu, Jul 21, 2011 at 1:49 PM, Marko Ciric > wrote: > > > Hi guys, > > > > I wonder if

Evaluating boolean preference data sets

2011-07-21 Thread Marko Ciric
items, the precision and recall would have the same value. Is this Ok or is it a bug, given that precision = intersection / num_recommended_items (where num_recommended_items is almost always "at") recall = intersection / num_relevant_items (also "at" as the previously mention

Re: Connection Pooling

2011-07-21 Thread Marko Ciric
M, Vitali Mogilevsky > > >> > wrote: > > >> > > > >> >> Hey, > > >> >> I got the same problem, of slowness while using MYSQL data model, > > after > > >> a > > >> >> small research and looking into mysql's query log, revealed that > user > > - > > >> >> user > > >> >> recommendation just floods the database with thousands and > thousands > > of > > >> >> requests. > > >> >> and thats on small database. > > >> >> for now Im dumbping the database into file, and using filedata > model > > >> which > > >> >> works much faster > > >> >> > > >> >> > > >> > > > >> > > > > > > -- -- Marko Ćirić ciric.ma...@gmail.com

Re: Exclude by RuleSet

2011-07-04 Thread Marko Ciric
rences? > > Thanks! > > > Am 04.07.2011 12:39, schrieb Marko Ciric: > > > > Hi Em, > > > > If I understood well what you're asking, you could implement a new > > CandidateItemStrategy class. If you see that interface, there's this > > method ge

Re: Exclude by RuleSet

2011-07-04 Thread Marko Ciric
Hi Em, If I understood well what you're asking, you could implement a new CandidateItemStrategy class. If you see that interface, there's this method getCandidateItems(long userID, DataModel dataModel) that has all parameters you need in order to filter out items that belong to the unwanted

Re: Hybrid RecSys — ways to do it

2011-06-27 Thread Marko Ciric
quality or satisfaction indicator and a > per-user current model indicator then you might be able to use these > as a feature for an interesting "if it ain't broke, don't fix it" > stacking model. > > On Thu, Jun 9, 2011 at 3:51 PM, Marko Ciric wrote: > &

Re: Mahout and Kolt

2011-06-23 Thread Marko Ciric
framework. But recently there has been talk about switching all of this to use fastutil (?) On Thu, Jun 23, 2011 at 2:25 PM, Marko Ciric wrote: How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)? -- Marko Ćirić ciric.ma...@gmail.com

Mahout and Kolt

2011-06-23 Thread Marko Ciric
How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)? -- Marko Ćirić ciric.ma...@gmail.com

Re: Which is more effective?

2011-06-22 Thread Marko Ciric
gt; >>> > >>>> I have used the SGD classifiers for content based recommendation. It > >>> works > >>>> out reasonably but the interaction variables can get kind of > expensive. > >>>> > >>>> Doing it again, I t

Which is more effective?

2011-06-21 Thread Marko Ciric
the experience with comparing performance/accuracy of those? Thanks -- Marko Ćirić ciric.ma...@gmail.com

Re: Hybrid RecSys — ways to do it

2011-06-09 Thread Marko Ciric
eatures is required first if I'm correct. What features to use when the recommended items (that need to be classified) are a result of different recommenders that use different similarity calculation (only a "brand" recommender is using an item feature here and CF and top-40 recommenders

Content-based recommending with Taste

2011-02-25 Thread Marko Ciric
e existing recommender evaluators to evaluate my content-based recommender. Any hints? -- -- Marko Ćirić ciric.ma...@gmail.com