Re: [ANNOUNCE] Andrew Musselman, New Mahout PMC Chair

2018-07-19 Thread Sebastian Schelter
Congrats! 2018-07-19 9:31 GMT+02:00 Peng Zhang : > Congrats Andrew! > > On Thu, Jul 19, 2018 at 04:01 Andrew Musselman > > wrote: > > > Thanks Andy, looking forward to it! Thank you too for your support and > > dedication the past two years; here's to continued progress! > > > > Best > > Andrew

Re: Does mahout 0.5 fit hadoop-0.20.2?

2014-06-25 Thread Sebastian Schelter
Please use a recent version of mahout. 0.4 and 0.5 are totally outdated. -s On 06/25/2014 09:05 AM, seabiscuit08 wrote: > Hi everyone, i am new in mahout. > Our hadoop cluster is hadoop-0.20.2 ,i try out mahout-distribution-0.4 lda > function, and it works well. But It can't inference new documen

Re: divide a vector (sum) by a double, error

2014-06-16 Thread Sebastian Schelter
Its also not a good idea to put the vectors into a hashset, i don't think we have equals and hashcode correctly implemented for that Am 16.06.2014 18:21 schrieb "Ted Dunning" : > Patrice, > > This sounds like a classpath problem more than code error. Are you sure > that you can run any program th

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
thought of moving to Mahout. However, it seems like, for now, it's better to go with the single machine implementation. Thanks for your suggestions, Warunika On Fri, Jun 6, 2014 at 3:36 PM, Sebastian Schelter wrote: 1M ratings take up something like 20 megabytes. This is a datasize whe

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
same performance level. What is the average running time for the Mahout distributed recommendation job on 1 million ratings? Does it usually take more than 1 minute? Thanks in advance, Warunika On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter wrote: You should not use Hadoop for such a

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
You should not use Hadoop for such a tiny dataset. Use the GenericItemBasedRecommender on a single machine in Java. --sebastian On 06/06/2014 11:10 AM, Warunika Ranaweera wrote: Hi, I am using Mahout's recommenditembased algorithm on a data set with nearly 10,000 (implicit) user ratings. This

Re: Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Sebastian Schelter
I have added the threshold merely as a way to increase the performance of RowSimilarityJob. If a threshold is given, some item pairs don't need to be looked at. A simple example is if you use cooccurrence count as similarity measure, and set a threshold of n cooccurrences, than any pair contain

Re: Calculation of Kappa value

2014-05-25 Thread Sebastian Schelter
Could be a bug introduced by a recent modification of the Confusion matrix. Are you using trunk? Can you provide a patch that fixes the issue? Best, Sebastian On 05/25/2014 05:05 PM, Michael Christopher wrote: Hi With the Confusion Matrix below I get a Kappa value of 0,8023. Actually it shou

Re: Setting mahout heapsize for rowsimilarity job

2014-05-23 Thread Sebastian Schelter
I don't think you should use RowSimilarity job for that case, if you only have 6 columns. Can you tell us a little bit about the data and what problem your are trying to solve? --sebastian On 05/23/2014 09:03 PM, Suneel Marthi wrote: I had seen this issue too with RSJ until 0.8. Switch to

Re: Theory behind LogisticRegression in Mahout

2014-05-23 Thread Sebastian Schelter
We should add these links to the LR page on the website. --s On 05/23/2014 03:20 PM, Ted Dunning wrote: Ahh... my error then. Happily, Dmitriy and others have provided the requisite links. On Thu, May 22, 2014 at 11:50 PM, namit maheshwari < namitmaheshwa...@gmail.com> wrote: No I didnt fin

Re: Running KMeans with the new spark bindings

2014-05-22 Thread Sebastian Schelter
Hi Meethu, K-Means has not been ported to the spark bindings DSL yet. You have to used the MapReduce implementation. Best, Sebastian On 05/22/2014 11:05 AM, MEETHU MATHEW wrote: Hi, I am a beginner in MAHOUT. I have run kmeans clustering for various datasets. Can anyone tell me how to run

Re: Mahout recommendation in implicit feedback situation

2014-05-05 Thread Sebastian Schelter
ere is the error in this code? Thank you. On 05/03/14 16:42, Sebastian Schelter wrote: You should try the org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender which has been built to handle such data. Best, Sebastian On 05/03/2014 04:34 PM, Alessandro Sugli

Re: Fwd: Mahout Naive Bayes CSV Classification

2014-05-04 Thread Sebastian Schelter
Hi Jossef, You have to vectorize and normalize your data. The input for naive bayes is a sequencefile containing a Text object as key (your label) and a VectorWritable that holds a vector with the data. Instructions to run NaiveBayes can be found here: https://mahout.apache.org/users/classif

Re: Mahout recommendation in implicit feedback situation

2014-05-03 Thread Sebastian Schelter
mmendation appropriately." On 05/03/14 16:25, Sebastian Schelter wrote: Hi Allessandro, what result do you expect and what do you get? Can you give a concrete example? --sebastian On 05/03/2014 12:11 PM, Alessandro Suglia wrote: Good morning, I've tried to create a recommender system using M

Re: Mahout recommendation in implicit feedback situation

2014-05-03 Thread Sebastian Schelter
Hi Allessandro, what result do you expect and what do you get? Can you give a concrete example? --sebastian On 05/03/2014 12:11 PM, Alessandro Suglia wrote: Good morning, I've tried to create a recommender system using Mahout in an implicit feedback situation. What I'm trying to do is explai

Re: Future of Frequent Pattern Mining

2014-05-01 Thread Sebastian Schelter
d it who promised to maintain it and has not been heard from. On Mon, Apr 28, 2014 at 2:19 AM, Sebastian Schelter wrote: Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay in the code

Re: Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
pr 28, 2014 at 2:19 AM, Sebastian Schelter wrote: Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay in the codebase. The original author suggested to remove the original implementation

Re: Reading the wiki

2014-04-27 Thread Sebastian Schelter
apease the browsers under https handshake. I am not sure what would be associated with that, I am not sure if mathjax is solely static content or it is an actual server doing something. On Sun, Apr 27, 2014 at 12:41 AM, Sebastian Schelter wrote: What if we store a copy of the js file on our site and

Re: Future of Frequent Pattern Mining

2014-04-27 Thread Sebastian Schelter
M, Michael Wechner wrote: what is the alternative and if one would still want to use the "frequent pattern mining code" in the future, how would this be possible otherwise? Thanks Michael Am 28.04.14 08:19, schrieb Sebastian Schelter: Hi, I'm resending this mail to also include t

Future of Frequent Pattern Mining

2014-04-27 Thread Sebastian Schelter
Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay in the codebase. The original author suggested to remove the original implementation and maybe retain the FPGrowth2 implementation. I

Re: Reading the wiki

2014-04-27 Thread Sebastian Schelter
What if we store a copy of the js file on our site and also serve it via https? On 04/27/2014 05:34 AM, Pat Ferrel wrote: Often CMSs have a way to configure https access to be used only for password or other secure areas of the site. No idea if the Apache CMS does this but worth asking. If th

Welcome Pat Ferrel as new committer on Mahout

2014-04-24 Thread Sebastian Schelter
Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Pat Ferrel to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA it als

Re: org.apache.mahout.math.IndexException

2014-04-20 Thread Sebastian Schelter
patch before. I found the following page http://mahout.apache.org/developers/patch-check-list.html is that description enough for applying a patch? On Sat, Apr 19, 2014 at 2:23 AM, Sebastian Schelter wrote: Mario, could you check whether the patch from https://issues.apache.org/ jira/browse

Re: Spark Mahout with a CLI?

2014-04-20 Thread Sebastian Schelter
I'll create a jira ticket for this, as I have a little time to work on it. On 04/16/2014 08:15 PM, Pat Ferrel wrote: bug in the pseudo code, should use columnIds: val hashedCrossIndicatorMatrix = new HashedSparseMatrix(indicatorMatrices(1), hashedDrms(0).columnIds(), hashedDrms(1).columnI

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Sebastian Schelter
Mario, could you check whether the patch from https://issues.apache.org/jira/browse/MAHOUT-1517 fixes your problem? Best, Sebastian On 04/18/2014 11:03 PM, Mario Levitin wrote: In my dataset ID's are strings so I use MemoryIDMigrator. This migrator produces large longs. I'm not doing any tra

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Sebastian Schelter
Hi Mario, this is indeed a bug. The problem is that the CF code (taste) uses long ids, while our math library internally uses int keys. I'll open a jira and post patch that will hopefully help you. --sebastian On 04/18/2014 11:03 PM, Mario Levitin wrote: In my dataset ID's are strings so I

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
You can, but I'm not sure how much we can help you. Give it a try :) On 04/18/2014 10:11 PM, Christopher Eugene wrote: sorry I thought I replied to it :). I can ask predictionio related questions on the list too? On Fri, Apr 18, 2014 at 11:06 PM, Sebastian Schelter wrote: Please rep

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
That is wrong, but you could use a server such as PredictionIO (which uses Mahout internally) with PHP. --sebastian On 04/18/2014 09:49 PM, Christopher Eugene wrote: @sebastian I have version 1.7. @Andrew I plan on using mahout with php since I heard that there is a new API or am I wrong? On

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
Which version do you use, it shouldn't be a problem with oracle java. --sebastian On 04/18/2014 09:39 PM, Christopher Eugene wrote: Hello, I want to install mahout on Ubuntu 14.04. I had previously tried in vain to install on 13.10. Could the version of Java be the problem? I am compiling from

Re: Performance Issue using item-based approach!

2014-04-18 Thread Sebastian Schelter
dateItemsStrategy.html candidateItemsStrategy,MostSimilarItemsCandidateItemsStrategy< https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html mostSimilarItemsCandidateItemsStrategy) Am 17.04.2014 um 12:41 schrieb Sebastian Schelte

Re: simple idea for improving mahout docs over the next month?

2014-04-18 Thread Sebastian Schelter
t all the information from a mailing list search but i think a rolling FAQ would much more (1) be likely evolve into real documentation and (2) be more easily refined . Is that a little convincing ? If not i guess we can table the idea/// just a thought. On Thu, Apr 17, 2014 at 1:38 AM, Seba

Re: Is there any website documentation repository or tool for Apache Mahout?

2014-04-17 Thread Sebastian Schelter
The templates for the individual pages are in the svn under site/ in markdown format. You can use an online markdown editor to approximately see how they look like. We don't have a better solution yet, unfortunately. --sebastian Am 17.04.2014 20:09 schrieb "Andrew Musselman" : > The content of t

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
arItemsCandidateItemsStrategy<https://builds.apache.org/job/mahout-quality/javadoc/org/apache/mahout/cf/taste/recommender/MostSimilarItemsCandidateItemsStrategy.html> mostSimilarItemsCandidateItemsStrategy) Am 17.04.2014 um 12:41 schrieb Sebastian Schelter : Hi Najum, I think I found t

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
hrieb Sebastian Schelter : Hi Najum, I think I found the problem. Remember: Two items are similar whenever at least one user interacted with both of them ("the items co-occur"). In the movielens dataset this is true for almost all pairs of items, unfortunately. From 3076 items, more

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
almost same results. Although what I also don´t understand is, why am I getting different RecommendItems? That really frustrates me… You can find the Java file in the attachment. Greetings from Germany, Najum Am 17.04.2014 um 11:44 schrieb Sebastian Schelter mailto:s...@apache.org>>: Yes, ju

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Yes, just to make sure the problem is in the mahout code and not in the surrounding environment. On 04/17/2014 11:43 AM, Najum Ali wrote: @Sebastian What do u mean with a standalone recommender? A simple offline java main program? Am 17.04.2014 um 11:41 schrieb Sebastian Schelter : Could

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
og are also item-based using pre computed similarity The last log is the userbased recommender using pearson Look at the huge time difference! Am 17.04.2014 um 11:23 schrieb Sebastian Schelter mailto:s...@apache.org>>: Najum, this is really strange, feeding an ItemBased Recommender

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Najum, this is really strange, feeding an ItemBased Recommender with precomputed similarities should give you superfast recommendations. Are you sure that the precomputation is done only once and not in every request? --sebastian On 04/17/2014 11:17 AM, Najum Ali wrote: Hi guys, I have c

Re: simple idea for improving mahout docs over the next month?

2014-04-17 Thread Sebastian Schelter
Hi Najum, please write a new mail to ask a question and don't reply to an unrelated thread --> https://people.apache.org/~hossman/#threadhijack If you write a new mail, I'm sure we can help you with your recommender problem. Can you give us a few more details, such as the similarity that you

Re: simple idea for improving mahout docs over the next month?

2014-04-16 Thread Sebastian Schelter
Hi Jay, I'm not sure what the benefit of this approach is, people can already post their questions to the mailinglist and get answers here, why would a google doc be helpful? --sebastian On 04/16/2014 09:31 PM, Jay Vyas wrote: hi mahout... i finally thought of a really easy way of ad-hoc im

Documentation, Documentation, Documentation

2014-04-13 Thread Sebastian Schelter
Hi, this is another reminder that we still have to finish our documentation improvements! The website looks shiny now and there have been lots of discussions about new directions but we still have some work todo in cleaning up webpages. We should especially make sure that the examples work.

Re: PreferenceArray userID uniqeness?

2014-04-11 Thread Sebastian Schelter
Yes, its a unique identifier for a user. --sebastian On 04/11/2014 04:41 PM, Mike Summers wrote: Does the userId of a preferenceArray need to be unique across all entries in a FastByIDMap? I'm comparing two types of objects that contain the same set of traits however it's possible that the use

Re: Can any one help

2014-04-08 Thread Sebastian Schelter
It seems there is a problem with your hdfs, how did you configure that? --sebastian On 04/08/2014 07:23 PM, Neetha wrote: Hi, I am trying to run Mahout -kmeans clustering on hadoop, but I am getting this error, hduser3@ubuntu:/usr/local/hadoop-1.0.1/mahout3$ bin/mahout seqdirectory \-i maho

Re: Best practice for partial cartesian product

2014-04-08 Thread Sebastian Schelter
response. Could you or anyone point me to the mahout classes where this is being solved? thank you guys reinis On 08.04.2014 10:27, Sebastian Schelter wrote: I don't know a good name for that. The problems is that a quadratic amount of pairs needs to be emitted here. In our collabor

Re: Best practice for partial cartesian product

2014-04-08 Thread Sebastian Schelter
I don't know a good name for that. The problems is that a quadratic amount of pairs needs to be emitted here. In our collaborative filtering code, we solve this through downsampling. --sebastian On 04/08/2014 10:08 AM, Reinis Vicups wrote: Hi, this is not mahout question directly, but I figu

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Sebastian Schelter
The top 3 recommendations "based on videos you liked" are very good! Nice job. On 04/06/2014 07:26 PM, Pat Ferrel wrote: After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directl

Re: Number of features for ALS

2014-03-30 Thread Sebastian Schelter
ke too many features, it doesn't much hurt so you should always take as many as you can compute. On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter < s...@apache.org> wrote: Hi, does anyone know of a principled approach of choosing the number of features for ALS (other than c

Re: (help!) Can someone scan this

2014-03-29 Thread Sebastian Schelter
Jay, which version of Mahout are you using? Have you tried to explicitly set the temp path? --sebastian On 03/29/2014 01:52 AM, Jay Vyas wrote: Hi again mahout: Im wrapping a distributed recommender like this: https://raw.githubusercontent.com/jayunit100/bigpetstore/master/src/main/java/or

Re: The 3 distributed recommenders

2014-03-28 Thread Sebastian Schelter
Hi Jay, there's not much documentation unfortunately. We're in the process of creating that however. We removed the pseudo-distributed recommender, mainly because nobody ever used it. There are two research papers that could help you with understanding the other two distributed recommenders:

Number of features for ALS

2014-03-27 Thread Sebastian Schelter
Hi, does anyone know of a principled approach of choosing the number of features for ALS (other than cross-validation?) --sebastian

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
nd am not sure if it was there in 0.8. I vaguely remember removing it in 0.9 based on a conversation with Manuel on user@. Manuel, if u could chime in here. On Monday, March 24, 2014 9:44 AM, Sebastian Schelter wrote: The webapp in Mahout does not offer much functionality. If you'd l

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
inkedin.com/in/bhargavgolla> | Website <http://www.bhargavgolla.com/> On Mon, Mar 24, 2014 at 2:12 AM, Sebastian Schelter wrote: Hi Bhargav, you are right, the content on the page is outdated and contains some errors. I've created a jira ticket to fix this [1]. Thank you fo

Re: Does Recommender System Overview Demo work?

2014-03-23 Thread Sebastian Schelter
Hi Bhargav, you are right, the content on the page is outdated and contains some errors. I've created a jira ticket to fix this [1]. Thank you for reporting the problem! [1] https://issues.apache.org/jira/browse/MAHOUT-1485 On 03/24/2014 04:41 AM, Bhargav Golla wrote: Hi I was wondering i

Re: Problem with K-Means clustering on Amazon EMR

2014-03-23 Thread Sebastian Schelter
t when creating filesystem instances by using the two argument get(...). it's time to update it filesystem 2.0 Apis. Can you file a Jira for this ? If not I will :) On Mar 16, 2014, at 12:37 PM, Sebastian Schelter wrote: I've also encountered a similar error once. It's really just the

Re: Documentation, documentation, documentation

2014-03-22 Thread Sebastian Schelter
, "Sebastian Schelter" wrote: Hi, It's great to see a lot of work being spent on cleaning up the website. I think we have already done a great job here, but there are still a few more pages that need work. I created a jira issue for every single page that needs some work, would b

Documentation, documentation, documentation

2014-03-22 Thread Sebastian Schelter
Hi, It's great to see a lot of work being spent on cleaning up the website. I think we have already done a great job here, but there are still a few more pages that need work. I created a jira issue for every single page that needs some work, would be awesome if we could find enough voluntee

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Sebastian Schelter
I've also encountered a similar error once. It's really just the FileSystem.get call that needs to be modified. I think its a good idea to walk through the codebase and refactor this where necessary. --sebastian On 03/16/2014 05:16 PM, Andrew Musselman wrote: Another wild guess, I've had iss

Re: Website, urgent help needed

2014-03-13 Thread Sebastian Schelter
to appear to be stupid asking how to update the documentation (my bad - not anyone else). Now I know that it was not possible unless I was a commiter. Who should I send my scripts to, or how should I proceed with a current form of the page? SCott On 3/12/14, 5:02 AM, "Sebastian Schelter&qu

Re: verbose output

2014-03-13 Thread Sebastian Schelter
To my knowledge, there is no such flag for mahout. You can check hadoop's logs for further information however. On 03/13/2014 10:21 AM, Mahmood Naderan wrote: Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line. Regards, Mahmood

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
Are executing maven in the topmost directory? On 03/13/2014 10:09 AM, Kevin Moulart wrote: I did, but then it fails because of these missing files : https://gist.github.com/kmoulart/9524828 Kévin Moulart 2014-03-13 9:57 GMT+01:00 Sebastian Schelter : Maven should generate the classes

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
01:00 Sebastian Schelter : Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - i

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import org.apache.mahout.math.map.OpenIntLongHashMap

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
commiter. Who should I send my scripts to, or how should I proceed with a current form of the page? SCott On 3/12/14, 5:02 AM, "Sebastian Schelter" wrote: Hi Pavan, Awesome that you're willing to help. The documentation are the pages listed under "Clustering" in

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
will in-turn help me understand things better. Do we already have a Jira ticket for organizing the cleaning up of documentation ? Just want to be sure, that I am not stepping on pages some else has already updated. Thanks Regards, Pramit On Wed, Mar 12, 2014 at 3:07 AM, Sebastian Schelter wrote

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
ll check and remove errors. or better let me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, "Sebastian Schelter" wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
can start looking on the page myself. Manoj On Wed, Mar 12, 2014 at 12:33 PM, Sebastian Schelter wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
i ll help with clustering algorithms documentation. do send me old documentation and i will check and remove errors. or better let me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, "Sebastian Schelter" wrote: Hi, As you've probably noticed, I've put in a lot of effort ov

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
lanning to keep it in the new web, I can help pointing them out again. Thanks a lot for your effort. On Wed, Mar 12, 2014 at 7:03 AM, Sebastian Schelter wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I&#x

Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the amout of outdated and incorrect information on our website, as well as links pointing to nowhere. I think our lack

Re: Problem with FileSystem in Kmeans

2014-03-11 Thread Sebastian Schelter
Hi Bikash, Have you tried adding hdfs:// to your input path? Maybe that helps. --sebastian On 03/11/2014 11:22 AM, Bikash Gupta wrote: Hi, I am running Kmeans in cluster where I am setting the configuration of fs.hdfs.impl and fs.file.impl before hand as mentioned below conf.set("fs.hdfs.imp

Re: Few questions about SVM configuration in Mahout

2014-03-10 Thread Sebastian Schelter
Hi Quentin, Mahout does not have SVMs. Best, Sebastian On 03/10/2014 10:38 AM, Quentin-Gabriel Thurier wrote: Hi all, Just few questions about the configuration of an SVM in Mahout : - Is it possible to do a multi-class classification ? - Which kernels are already available (linear, polynomi

Re: Heap space

2014-03-09 Thread Sebastian Schelter
I usually do try and error. Start with some very large value and do a binary search :) --sebastian On 03/09/2014 01:30 PM, Mahmood Naderan wrote: Excuse me, I added the -Xmx option and restarted the hadoop services using sbin/stop-all.sh && sbin/start-all.sh however still I get heap size erro

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-09 Thread Sebastian Schelter
Hi Koji, I've added a link to your article to our website: https://mahout.apache.org/general/books-tutorials-and-talks.html On 03/07/2014 03:29 AM, Koji Sekiguchi wrote: > Hello, > > I just posted an article on Comparing Document Classification Functions > of Lucene and Mahout. > > http://sole

Re: Welcome Andrew Musselman as new comitter

2014-03-08 Thread Sebastian Schelter
14 22:56, Frank Scholten wrote: Congratulations Andrew! On Fri, Mar 7, 2014 at 6:12 PM, Sebastian Schelter wrote: Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we are pleased to announce that h

Welcome Andrew Musselman as new comitter

2014-03-07 Thread Sebastian Schelter
Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA

Re: Rework our website

2014-03-06 Thread Sebastian Schelter
urged? On Thursday, March 6, 2014 9:07 AM, Sebastian Schelter wrote: Thank you very much! Could you create a jira ticket and post the links there? That would be awesome, then we can track that this stuff gets fixed. Best, Sebastian On 03/06/2014 02:58 PM, Kevin Moulart wrote: Hi I also prefe

Re: Rework our website

2014-03-06 Thread Sebastian Schelter
hat's just the ones I found in 2 minutes on the quickstart page. Best Regards, Kevin 2014-03-05 23:43 GMT+01:00 Sebastian Schelter : At the moment, only committers can change the website unfortunately. If you have a text to add, I'm happy to work it in and add your name to our contribut

Re: Rework our website

2014-03-05 Thread Sebastian Schelter
5, 2014, at 4:11 AM, Sebastian Schelter wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super h

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter wrote: So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user do

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
gy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote:

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
knownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter wrote: Hi Juan, that is a good catch. CandidateIte

Rework our website

2014-03-05 Thread Sebastian Schelter
Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prev

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
ted a Jira issue already. I only use the non-hadoop part of Mahout recommender algorithms. May be I can create a patch for that part. However, I have not done it before, and don't know how to proceed. On Wed, Mar 5, 2014 at 1:01 AM, Sebastian Schelter wrote: Would you be willing to set

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
Would you be willing to set up a jira issue and create a patch for this? --sebastian On 03/04/2014 11:58 PM, Mario Levitin wrote: I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. What do you think? Best, Sebastian On 03/04/2014 05:32 PM, Pat Ferrel wrote: I’d suggest a command line option if you want

Re: Mahout-232-0.8.patch using

2014-03-04 Thread Sebastian Schelter
I think you should rather choose a different library that already offers an SVM than trying to revive a 4 year old patch. --sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use Mah

Re: Mahout-232-0.8.patch using

2014-03-03 Thread Sebastian Schelter
Hi Amol, SVMs are not integrated in Mahout. I'd suggest you try our logistic regression classifier instead. Best, Sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use Mahout-232-

Re: Issue updating a FileDataModel

2014-03-03 Thread Sebastian Schelter
in time. Would the fact of adding the new preferences to new files or appending to the existing one make any difference or does everything depends on the time elapsed between two calls to recommender.refresh(null)? Many thanks. On Mon, Mar 3, 2014 at 1:18 PM, Sebastian Schelter wrote: Hi Juan

Re: classification in standalone application in Apache Mahout 0.9

2014-03-03 Thread Sebastian Schelter
4-03-03 15:11 GMT+01:00 Sebastian Schelter : If you don't want to call a shell, I assume you don't want to use a Hadoop cluster, right? In that case, you should rather try Mahout's logistic regression classifier, which is tuned for usage on a single machine. --sebastian On 03/03

Re: classification in standalone application in Apache Mahout 0.9

2014-03-03 Thread Sebastian Schelter
If you don't want to call a shell, I assume you don't want to use a Hadoop cluster, right? In that case, you should rather try Mahout's logistic regression classifier, which is tuned for usage on a single machine. --sebastian On 03/03/2014 03:07 PM, Hollow Quincy wrote: I am looking for simp

Re: Issue updating a FileDataModel

2014-03-03 Thread Sebastian Schelter
Hi Juan, IIRC then FileDataModel has a parameter that determines how much time must have been spent since the last modification of the underlying file. You can also directly append new data to the original file. If you want a to have a DataModel that can be concurrently updated, I suggest yo

Re: parallelALS and RMSE TEST

2014-03-01 Thread Sebastian Schelter
The output of parallelALS are two matrices U and M whose product is an approximation of your input matrix. The matrices are outputed as sequence files with an IntWritable as key (the index of the row in the matrix) and a VectorWritable as value which holds the contents of the row vector. --s

Re: Load output of rowsimilarity to memory

2014-02-25 Thread Sebastian Schelter
24, 2014 at 9:27 PM, Sebastian Schelter wrote: I overlooked that you're interested in document similarities. Sry again :) Another way would be to read the output of RowSimilarityJob with a o.a.m.common.iterator.sequencefile.SequenceFileDirIterable You create a list of instances of o.a.m

Re: Load output of rowsimilarity to memory

2014-02-25 Thread Sebastian Schelter
difference would be that I will write the output to a file that can be later used to create a FileItemSimilarity. I think that would be a very nice feature to have in the API. Thanks again. On Mon, Feb 24, 2014 at 9:27 PM, Sebastian Schelter wrote: I overlooked that you're interested in doc

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
item. In order to use ItemSimilarityJob for this purpose, what should be the input I need to provide? Would it be the output of seq2sparse? Thanks again. On Mon, Feb 24, 2014 at 8:54 PM, Sebastian Schelter wrote: You're right, my bad. If you don't use RowSimilar

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
t that the output of RowSimilarityJob can be loaded by the FileItemSimilarity after doing the appropriate parsing. Is that correct, or is there actually a way to load the raw output of RowSimilarityJob into FileItemSimilarity? Thanks. On Mon, Feb 24, 2014 at 7:41 PM, Sebastian Schelter wrote: The

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
The output of RowSimilarityJob can be loaded by the FileItemSimilarity. --sebastian On 02/24/2014 08:31 PM, Juan José Ramos wrote: Is there a way to reproduce this process: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line inside Java

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Sebastian Schelter
NaiveBayes expects a SequenceFile as input. The key is the class label as Text, the value are the features as VectorWritable. --sebastian On 02/24/2014 11:51 AM, Kevin Moulart wrote: Hi again, I finally set my mind on going through java to make a sequence file for the naive bayes, but I still

Re: Mahout with SQL SERVER

2014-02-23 Thread Sebastian Schelter
You can give o.a.m.cf.taste.impl.model.jdbc.GenericJDBCDataModel a try. If that doesn't work, you need to create a custom implementation of AbstractJDBCDataModel which shouldn't be too hard. --sebastian On 02/23/2014 06:11 PM, Ahmed Kamal wrote: Dear All , I just have a question. I chose to

  1   2   3   4   5   6   7   >