is why I'm
considering this Streaming approach now.
Would you think that it is worthy of giving a shot? I'm really
stretching for a scalable solution.
Best regards,
Marko
On Tue 02 Jun 2015 12:03:40 AM CEST, Ted Dunning wrote:
The streaming k-means works by building a sketch of th
igger problems than K-means because it's not
scalable, but can be useful in some cases (e.g. It allows more
sophisticated distance measures).
What is your opinion about implementation of this?
Best regards,
Marko
Hello everyone,
I was digging through K-means implementation on Hadoop and I'm a bit
confused with one thing so I wanted to check.
To calculate the distance from point to all centroids, centroids need to
be accessed from every mapper.
So it seemed logical to me to put the centroids (sequenceF
th the implementation,
if it doesn't sound that crazy.
I wish you all the best,
Marko
Quoting Ted Dunning :
On Thu, Jan 15, 2015 at 3:50 AM, Marko Dinic
wrote:
Thank you for your answer. Maybe I made a wrong picture about my data when
giving sinusoid as an example, my time series are
calculations,
how much time could I expect for such an algorithm in case of 10.000
signals with 300 points, for example? How can I even estimate that?
Thanks for your effort, if you have time to answer.
Regards,
Marko
On Thu 15 Jan 2015 05:25:55 AM CET, Anand Avati wrote:
Perhaps you could think o
scalable solution for my problem, I tried to
fit it in what's already implemented in Mahout (for clustering), but
it's not so obvious to me.
I'm open to suggestions, I'm still new to all of this.
Thanks,
Marko
On Sat 10 Jan 2015 07:32:33 AM CET, Ted Dunning wrote:
Why is i
about the scalability?
I would highly appreciate your answer, thanks.
On Thu 08 Jan 2015 08:19:18 PM CET, Ted Dunning wrote:
On Thu, Jan 8, 2015 at 7:00 AM, Marko Dinic
wrote:
1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout that
could be used as a distance measure for cluster
Hello everyone.
I have a couple of questions.
1) Is there an implementation of DTW (Dynamic Time Warping) in Mahout
that could be used as a distance measure for clustering?
2) Why isn't there an implementation of K-mediods in Mahout? I'm
guessing that it could not be implemented efficiently
Hello,
Sorry for bumping like this, but I have a very similar question, can I
use Mahout 0.9 with Hadoop 0.20.2?
Thanks
On Mon 15 Dec 2014 10:09:56 AM CET, jyotiranjan panda wrote:
Hi,
mahout-0.9 is compatible with hadoop-1.2.1
Regards
Jyoti Ranjan Panda
On Mon, Dec 15, 2014 at 2:33 PM, Le
since Hadoop is installed on the cluster?
I have never done deployment to cluster, so I'm really confused. Any
help would great, or any reference like the previous one?
Regards,
Marko
On уторак, 28. октобар 2014. 17:12:59 CET, Chandramani Tiwary wrote:
Hi Marko,
Nothing special needs to b
xpect failures in case of it?
Regards,
Marko
On уторак, 28. октобар 2014. 16:48:03 CET, Chandramani Tiwary wrote:
Hi Marko,
You can configure Mahout 0.9 over Hadoop 0.20.2 but the Hadoop dependencies
might lead to failure quite a few time. One example, If I remember
correctly is that Hadoop 0
Hello,
I have Hadoop cluster on which Hadoop 0.20.2 is installed. Is there a
way to use Mahout 0.9 on that cluster?
I understand that Mahout 0.9 is based on Hadoop 1.2.1, but I have this
constraint, so I cannot install another version of Hadoop on it.
Thanks,
Marko
o how many points?
possible to share ur dataset to troubleshoot ?
On Thu, Oct 9, 2014 at 9:18 AM, Marko Dinić
wrote:
Suneel,
Thank you for your answer, this was rather strange to me.
The number of points is 942. I have multiple runs, in each run I have a
loop in which number of cluste
Here is the dataset.
On четвртак, 09. октобар 2014. 16:53:25 CEST, Marko Dinić wrote:
Yes it is small, but it is just a sample, so the dataset will probably
be much bigger. So you think that this was the problem? Will this
problem be avoided in case of larger dataset?
I think that there were
share ur dataset to troubleshoot ?
On Thu, Oct 9, 2014 at 9:18 AM, Marko Dinić
wrote:
Suneel,
Thank you for your answer, this was rather strange to me.
The number of points is 942. I have multiple runs, in each run I have a
loop in which number of clusters is increased in each iteration
take a look at this.
On Thu, Oct 9, 2014 at 5:39 AM, Marko Dinić
wrote:
Hello everyone,
I'm using Mahout Streaming K Means multiple times in a loop, every time
for same input data, and output path is always different. Concretely, I'm
increasing number of clusters in each iteration.
Hello everyone,
I'm using Mahout Streaming K Means multiple times in a loop, every time
for same input data, and output path is always different. Concretely,
I'm increasing number of clusters in each iteration. Currently it is run
on a single machine.
A couple of times (maybe 3 of 20 runs) I
that is performed after streaming step. The question that
arrives is - when to do Ball K Means step, since the data arrives all
the time...
Should I even consider this, or should I go for lambda architecture?
Any help would be great.
Thanks,
Marko
ster-reuters.sh
that you have provided, what is it used for?
Thanks,
Marko
On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote:
This was replied to earlier with the details u r looking for, repeating
here again:
See
http://stackoverflow.com/questions/17272296/how-to-use-mah
Hello everyone,
I have previously asked a question about Streaming K Means examples, and
got an answer that there are not so many available.
Can anyone give me example of how to call Streaming K Means clustering
for a dataset, and how to get the results?
What are the results, are they the s
Hello everyone,
I'm very sorry to bump in like this, I have been added to the mail list
(I think), but it seems that I'm somehow unable to ask a question, that
is, I asked a question full times and got no answer. I hope this way
will work.
I'm new to Mahout and I've been struggling with Stre
Hello,
I know that Mahout is used for batch processing, but I am interested if
I can use its KMeans, and how, for clustering individual points?
Let's say that we have following situation
* Global clustering, that performs batch processing on all data and
gives centroids as result
* One p
Configuration configuration = new Configuration();
configuration.set("--estimatedNumMapClusters", "18");
configuration.set("-k", "6");
configuration.set("--distanceMeasure",
"org.apache.mahout.common.distance.
item
> > description content) for example for product recommendation, how can i
> > customize the similarity function ? As far as I understand, the current
> > mahout similarity function is based on user rating only. Any one had
> > experience writing a custom item based similarity
> http://ehcache.org/
> > >
> > > For iterative MapReduce applications running on a NoSQL data store, it
> > > should provide a good performance boost by providing an in-memory
> object
> > > cache (I think). Any comments?
> >
>
--
--
Marko Ćirić
ciric.ma...@gmail.com
g Cassandra
> > and/or the non-distributed recommenders.
> >
> > Sean
> >
>
--
--
Marko Ćirić
ciric.ma...@gmail.com
You could also introduce clustering and build clusters from pages that have
a lot of similar words. If your pages data doesn't change too often, you
could select most similar pages from within a cluster and recommend it to a
user..
On Aug 8, 2011 6:08 PM, "Marko Ciric" wrote:
>
You might want to use TanimotoCoefficientSimilarity if your data set isn't
large.
On Jul 27, 2011 10:51 AM, "Sean Owen" wrote:
> Sounds good. In that case, the surprise-n-coincidence counterpart you are
> probably looking for it LogLikelihoodSimilarity, which implements
> ItemSimilarity. Use it wi
Correction: I didn't mean to re-implement the existing functionality, but
there should be an easy way to connect UAC with Taste evaluators.
On 28 July 2011 12:57, Marko Ciric wrote:
> I think it wouldn't be a big problem to reimplement it thought it would
> have to have a sort o
l, we do have numerous ways to compute AUC. I don't think that they are
> integrated into the recommendation evaluation framework yet. Would you
> like
> to take on the application of suitable glue?
>
>
> On Mon, Jul 25, 2011 at 1:00 PM, Marko Ciric
> wrote:
>
> >
Hi guys,
I'm wondering if any resources or tutorials are available (and where) about
calculating AUC when working with boolean preferences data models?
--
--
Marko Ćirić
ciric.ma...@gmail.com
On Mon, Jul 25, 2011 at 3:16 AM, Marko Ciric
> wrote:
>
> > The better way to do it is to implement an evaluator which accepts the
> > collection of items that are relevant.
> >
>
--
--
Marko Ćirić
ciric.ma...@gmail.com
difficulty is including
> it in a clean way. Up for a patch?
>
>
>
> >
> > Finaly, I believe the documentation page has some mistakes in the last
> code
> > excerpt :
> >
> > evaluator.evaluate(builder, myModel, null, 3,
> > RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD,
> >§1.0);
> >
> > should be
> > evaluator.evaluate(builder, null, myModel, null, 3,
> > GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
> >
> >
> > OK will look at that.
>
--
--
Marko Ćirić
ciric.ma...@gmail.com
Also the evaluation could be done per user, and thus manually running
multiple times per each user. Or simple defining a matrix with relevant
items per each user..
On Jul 21, 2011 4:18 PM, "Marko Ciric" wrote:
> Yes, there should exist an evaluation that allows you to pass whic
tings.
> It has to pick random items as "relevant", for starters. It's another
> reason
> your idea is good, to let the user specify those relevant items.
>
> On Thu, Jul 21, 2011 at 1:49 PM, Marko Ciric
> wrote:
>
> > Hi guys,
> >
> > I wonder if
items, the precision and
recall would have the same value. Is this Ok or is it a bug, given that
precision = intersection / num_recommended_items (where
num_recommended_items is almost always "at")
recall = intersection / num_relevant_items (also "at" as the previously
mention
M, Vitali Mogilevsky
> > >> > wrote:
> > >> >
> > >> >> Hey,
> > >> >> I got the same problem, of slowness while using MYSQL data model,
> > after
> > >> a
> > >> >> small research and looking into mysql's query log, revealed that
> user
> > -
> > >> >> user
> > >> >> recommendation just floods the database with thousands and
> thousands
> > of
> > >> >> requests.
> > >> >> and thats on small database.
> > >> >> for now Im dumbping the database into file, and using filedata
> model
> > >> which
> > >> >> works much faster
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> >
>
--
--
Marko Ćirić
ciric.ma...@gmail.com
rences?
>
> Thanks!
>
>
> Am 04.07.2011 12:39, schrieb Marko Ciric:
> >
> > Hi Em,
> >
> > If I understood well what you're asking, you could implement a new
> > CandidateItemStrategy class. If you see that interface, there's this
> > method ge
Hi Em,
If I understood well what you're asking, you could implement a new
CandidateItemStrategy class. If you see that interface, there's this
method getCandidateItems(long userID, DataModel dataModel) that has all
parameters you need in order to filter out items that belong to the
unwanted
quality or satisfaction indicator and a
> per-user current model indicator then you might be able to use these
> as a feature for an interesting "if it ain't broke, don't fix it"
> stacking model.
>
> On Thu, Jun 9, 2011 at 3:51 PM, Marko Ciric wrote:
> &
framework.
But recently there has been talk about switching all of this to use fastutil
(?)
On Thu, Jun 23, 2011 at 2:25 PM, Marko Ciric wrote:
How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?
--
Marko Ćirić
ciric.ma...@gmail.com
How similar are Mahout collections (like FastMap) with Kolt (cern.kolt)?
--
Marko Ćirić
ciric.ma...@gmail.com
gt; >>>
> >>>> I have used the SGD classifiers for content based recommendation. It
> >>> works
> >>>> out reasonably but the interaction variables can get kind of
> expensive.
> >>>>
> >>>> Doing it again, I t
the experience with comparing performance/accuracy of those?
Thanks
--
Marko Ćirić
ciric.ma...@gmail.com
eatures is required
first if I'm correct. What features to use when the recommended items (that
need to be classified) are a result of different recommenders that use
different similarity calculation (only a "brand" recommender is using an
item feature here and CF and top-40 recommenders
e existing recommender
evaluators to evaluate my content-based recommender. Any hints?
--
--
Marko Ćirić
ciric.ma...@gmail.com
46 matches
Mail list logo