Re: Why I am getting different precision using 32 vs 64 bit

2012-03-26 Thread Sean Owen
This is no useful detail at all. What algorithm are you even running?? On Mon, Mar 26, 2012 at 11:29 PM, ziad kamel wrote: >  Dear developers , > > I run some recommendations on mahout of 32 and 64 bit machines (Ubuntu) . I > found out that on 32 bit I am getting higher precision . Any reason for

Re: cluster-based recommendation algorithm

2012-03-26 Thread Sean Owen
Can it be implemented? sure, but what you see is what is available. If you want a different clustering approach you would have to implement it. The algorithm there is not k-means. On Mon, Mar 26, 2012 at 8:49 PM, Ahmed Abdeen Hamed wrote: > Hello, > > This might sound trivial but I have to ask be

Re: Mahout beginner questions...

2012-03-26 Thread Sean Owen
An SQL database doesn't have much role to play in this kind of system, and that's no criticism of RDBMSes. The algorithms operate on very simple, nearly unstructured data and are essentially read-only. So the complexity of keys and transactions is just overhead. The simple, non-distributed impleme

Re: Mahout beginner questions...

2012-03-26 Thread Sean Owen
o my memory in order that the online part > will calculate his part. Maybe I'm wrong here, and I don't necessarily need > to load the entire intermediate file (similarity results) into the memory?! > > > -Original Message- > From: Sean Owen [mailto:sro...@gmail.

Re: Mahout beginner questions...

2012-03-26 Thread Sean Owen
I'm sure he's referring to the off-line model-building bit, not an online component. On Mon, Mar 26, 2012 at 9:27 AM, Razon, Oren wrote: > By saying: "At Veoh, we built our models from several billion interactions > on a tiny cluster " you meant that you used the distributed code on your > clust

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
#x27;m wrong but a good way to boost up speed could be to use > > caching recommender, meaning computing the recommendations in advanced > > (refresh it every X min\hours) and always recommend using the most > updated > > recommendations, right?! > > > > -Origi

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
t; Correct me if I'm wrong but a good way to boost up speed could be to use > caching recommender, meaning computing the recommendations in advanced > (refresh it every X min\hours) and always recommend using the most updated > recommendations, right?! > > -Original Message-

Re: Mahout beginner questions...

2012-03-25 Thread Sean Owen
It is memory. You will need a pretty large heap to put 100M data in memory -- probably 4GB, if not a little more (so the machine would need 8GB+ RAM). You can go bigger if you have more memory but that size seems about the biggest to reasonably assume people have. Of course more data slows things

Re: Significant - serendipity in recommending

2012-03-25 Thread Sean Owen
Au contraire, you can do exactly this with an IDRescorer. Divide by (the log of) and item's occurrences for example to penalize popular items. I don't recommend this. Stuff like the log-likelihood metric is already in a sense accounting for things that are just generally popular and normalizing th

Re: Significant - serendipity in recommending

2012-03-24 Thread Sean Owen
Define "significant"? On Sat, Mar 24, 2012 at 1:38 PM, ziad kamel wrote: > Dear developers, > > How can I know that the recommendations I get from Mahout is significant ? > Is there a way to know that there is serendipity in recommending using > certain recommender than other ? > > Thanks >

Re: HadoopUtil

2012-03-24 Thread Sean Owen
Why are you posting to Mahout lists, 3 times, if you are asking about Hadoop? Etiquette foul. On Mar 24, 2012 10:41 AM, "Bahadır Yılmaz" wrote: > Hi everyone, > i have a problem with HadoopUtil.overwriteOutput(**outPath).In intellij > idea,i am using maven project and overwriteOutput() written

Re: Merging similarities from two different approaches

2012-03-23 Thread Sean Owen
On Fri, Mar 23, 2012 at 8:33 PM, Ahmed Abdeen Hamed wrote: > As for merging the scores, I need an OR rule, which translates to the > addition. If I used AND that will make the likelihood smaller because the > probabilities will be multiplied. This will restrict the clusters to items > that appears

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
Yes, but you can't use it as both things at once. I meant that you swap them at the broadest level -- at your original input. So all "items" are really users and vice versa. At the least you need two separate implementations, encapsulating two different notions of similarity. Similarity is item-it

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
ize, > > numItems - prefs1Size - prefs2Size + intersectionSize); > > // merging the distance and the loglikelihood similarity > > return ExperimentParams.LOGLIKELIHOOD_WEIGHT*(1.0 - 1.0 / (1.0 + > logLikelihood)) + (ExperimentParams.PROXIMITY_WEIGHT * proximity); > > &g

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
What do you mean that you have a user-item association from a log-likelihood metric? Combining two values is easy in the sense that you can average them or something, but only if they are in the same "units". Log likelihood may be viewed as a probability. The distance function you derive from it -

Re: How to add classes into mahout-score-0.5-job.jar?

2012-03-22 Thread Sean Owen
It is wherever you compiled your own classes -- it's up to you. SIMILARITY_EUCLEDEAN_DISTANCE is not a class. You should use 0.6 anyway. While you may find you have to make minor modifications if following the book, it's 99% compatible. On Thu, Mar 22, 2012 at 8:07 PM, jeanbabyxu wrote: > From C

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Sean Owen
Yes. This prevents accidental overwrite, and mimics how Hadoop/HDFS generally act. On Thu, Mar 22, 2012 at 6:58 PM, jeanbabyxu wrote: > I was able to manually clear out the output directory by using > > bin/hadoop dfs -rmr output. > > But do we have to remove all content in the output directory m

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Sean Owen
That pretty much means what it says = delete temp. On Thu, Mar 22, 2012 at 6:06 PM, jeanbabyxu wrote: > Thanks so much tianwild for pointing out the typo. Now it's running but I got > a different error msg: > > Exception in thread "main" > org.apache.hadoop.mapred.FileAlreadyExistsException: Outp

Re: Mahout beginner questions...

2012-03-22 Thread Sean Owen
t; > BTW... another question, it seem that a good solution to the recommender > scalability will be to use model based recommenders. > Saying this, I wonder why there is such few model based recommenders, > especially considering the fact that Mahout contain several data mining >

Re: Mahout beginner questions...

2012-03-22 Thread Sean Owen
1. These are the JDBC-related classes. For example see MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/ 2. The distributed and non-distributed code are quite separate. At this scale I don't think you can use the non-distributed code to a meaningful degree. For example you could pre-compu

Re: Error Running mahout-core-0.5-job.jar

2012-03-21 Thread Sean Owen
It's -Dmapred.output.dir=output not --Dmapred.output.dir=output (one dash), but, that's not even the problem. I don't think you can specify -D options this way, as they are JVM arguments. You need to configure these in Hadoop's config files. This is not specific to Mahout. On Wed, Mar 21, 2012 at

Re: multiple Database-based data with Mahout

2012-03-20 Thread Sean Owen
No there is not such support right now. The most useful piece of code would be a DataModel implementation that combines the data in several other DataModels. That would easily let you read from several databases. The hard part there is merging data sets (what if two DBs have data for one user-ite

Re: MongoDBDataModel in memory ?

2012-03-20 Thread Sean Owen
If you don't need Hadoop then this is pretty simple. You can just write a nested loop that computes all pairs off an ItemSimilarity implementation. If I recall rightly GenericItemSimilarity will do that for you off an existing ItemSimilarity and then has the results in memory as a new ItemSimilari

Re: Edit Distance

2012-03-19 Thread Sean Owen
No I don't think that really comes into play in any of the ML algorithms here. At least I do not recall seeing it. On Mon, Mar 19, 2012 at 3:44 PM, Ahmed Abdeen Hamed wrote: > Hello, > > Does Mahout have support for Edit Distance between two Strings? I looked on > the web but can't find anything

Re: MongoDBDataModel in memory ?

2012-03-18 Thread Sean Owen
Yep it's all in memory -- it would be too slow to access it out of Mongo. The purpose is just making it easy to read and re-read data into Mongo, and facilitate updates. If the data is too big to fit in memory you should look first at pruning your data -- can sampling 10% of it still give you good

Re: Export to MongoDB

2012-03-17 Thread Sean Owen
What do you mean by indexed here? On Sat, Mar 17, 2012 at 10:56 PM, Pat Ferrel wrote: > I need to digest some mahout files and merge them into a MongoDB database. > Since digesting would be a lot easier if the mahout keys were indexed I > wonder if a "seqdumper --format json or mongodb" might be

Re: ClassNotFoundException while using RecommenderJob

2012-03-15 Thread Sean Owen
After 'mvn package' you should see a file ending in 'job.jar' under target/ This is the jar file to use with Hadoop. On Thu, Mar 15, 2012 at 10:56 AM, Janina wrote: > These are great news! I was not quite sure if the item based recommender is > fully distributes, but this helps! Thanks! > > I hav

Re: ClassNotFoundException while using RecommenderJob

2012-03-15 Thread Sean Owen
ndations on a Hadoop Cluster? I have > read that only the clustering and classification parts of mahout are really > able to be distributed on a hadoop cluster. > > 2012/3/15 Sean Owen > >> You shouldn't have to add anything to your jar, if you use the >> supplied '

Re: ClassNotFoundException while using RecommenderJob

2012-03-15 Thread Sean Owen
You shouldn't have to add anything to your jar, if you use the supplied 'job' file which contains all transitive dependencies. If you do add your own jars, I think you need to unpack and repack them, not put them into the overall jar as a jar file, even with a MANIFEST.MF entry. I am not sure that

Re: The recommendation algorithm behind org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

2012-03-13 Thread Sean Owen
Yes it's item-based only. --similarityClassname chooses the metric but it is item-based. On Tue, Mar 13, 2012 at 11:53 PM, Rich wrote: > Hi, > I have been digging into Mahout on Hadoop for the pas few days. > I was wondering the recommendation > algorithm that is used in RecommenderJob.java. For

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
commender might > > define item (movie) similarity as a function of movie attributes like genre, > director, > > actor, and year of release. Using such an implementation within a > traditional item" > > > This is the part that I am trying to understand and have a solu

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
OK, you have some users. You have some items, and those items have attributes. Nothing here connects users to items though, so how can any process estimate any additional user-item connections? You could compute item-item similarities, but that doesn't resolve this. Sorry I am really confused --

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
Before I answer, I want to make sure we're on the same page. You are definitely describing a search problem. Was my guess at how you are also adding in something recommender-related accurate? Otherwise we may be talking past each other again. On Tue, Mar 13, 2012 at 5:35 PM, Ahmed Abdeen Hamed w

Re: Injecting content into item-item CF

2012-03-13 Thread Sean Owen
nly be available after the user enters the query. > > My question now is: is there a way to compute these similarities offline? > > Thanks very much, > > -Ahmed > > > > > > On Tue, Mar 6, 2012 at 5:14 PM, Sean Owen wrote: >> >> Sure, you just write you

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
(It's out there as TanimotoCoefficientSimilarity -- not named JaccardSimilarity or anything.) On Mon, Mar 12, 2012 at 10:59 PM, Ted Dunning wrote: > I would generally recommend using the LLR similarity. > > But if you have an itch, scratch it.  I do think we have a tanimoto > similarity already,

Re: Cluster-based recommenders

2012-03-12 Thread Sean Owen
You can set a threshold rather than a count -- that's about as much as that bit of code does in this regard. On Mon, Mar 12, 2012 at 10:18 PM, Ahmed Abdeen Hamed wrote: > I have a question about the TreeClusteringRecommender: > > Is there a way that you can estimate the number of clusters rather

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
OK if that's the case, put the pre-computed values in a GenericItemSimilarity and you're done. Hadoop most certainly does not help you compute anything 'on the fly'. It might help you precompute. Don't worry about distribution until you're sure you have a big scale problem, and that usually takes

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
Similarity computations need to be very fast. I don't know if you can pre-compute them since they're time-dependent and I assume need to use up-to-the-second information. You'll need to store something in memory to make this fast enough. That can make scale a problem, but, I am also guessing you c

Re: Cluster-based recommenders

2012-03-12 Thread Sean Owen
Sure -- to do this, you simply flip your items and users. Feed item IDs as user IDs and vice versa. Then you have a system that recommends users to items, really. And you can use clustering if you like, to do that. In fact you can use any algorithm. Sean On Mon, Mar 12, 2012 at 1:56 PM, Ahmed Abd

Re: Item Recommendations - Time based

2012-03-12 Thread Sean Owen
You can implement your own custom ItemSimilarity that computes this metric, or anything else you can imagine. In fact there is already a bit of API in DataModel for storing and retrieving timestamps too, so this should be easy. It's probably a bit easier said than done given the exact logic you're

Re: Trouble with deriving popular items from mahout

2012-03-11 Thread Sean Owen
No, it's so easy you can do it in about 20 lines of code so I don't think it really warrants a software component. On Sun, Mar 11, 2012 at 12:39 PM, mahout user wrote: > Thanks Sean Owen, > >   is it any class available with mahout for doing this stuff? >

Re: Trouble with deriving popular items from mahout

2012-03-11 Thread Sean Owen
This isn't a recommender problem -- it's simpler. It sounds like you just want to count the most frequently occurring items, and pairs of items. That's just a question of counting. On Sun, Mar 11, 2012 at 12:32 PM, mahout user wrote: > Hello group, > > I am new to mahout..I am developing recommen

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
atching problem. Thanks for the answers and the effort > > On Sat, Mar 10, 2012 at 9:38 PM, Sean Owen wrote: > >> It sounds like you have substantially a search problem. You know the >> user's attributes, you know the items' attributes, and are just >> finding

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
and filtering using IDRescorer > should suffice. > > Since I'll probably want to integrate User-based recommendation as well at > a later point - is there any existing Recommender implementation which > blends both item-based and user-based recommendations? > > On Sat, Mar 10, 2

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
der to find the items that most closely match this perfect > item. Do you think that would be a feasible solution? > > On Sat, Mar 10, 2012 at 6:25 PM, Sean Owen wrote: > >> If by #3 you mean you have preferences for many users, this is of >> course the standard input for a

Re: User-item similarity and time-based recommendations

2012-03-10 Thread Sean Owen
If by #3 you mean you have preferences for many users, this is of course the standard input for a recommender, yes. If you also have some user-user similarity info beyond that, you can implement UserSimliarity and use GenericUserBasedRecommender to incorporate that. If you want to boost items acco

Re: R: Using recommenders with String identifiers

2012-03-09 Thread Sean Owen
In this case, the code in question is the non-distributed code rather than Hadoop. But yes I agree it will make a perhaps bigger difference on Hadoop. All of the Hadoop stuff uses integer keys. On Fri, Mar 9, 2012 at 2:10 AM, Paritosh Ranjan wrote: > Are these identifiers used as keys for mappers

Re: How/where to run DisplayKMeans example

2012-03-09 Thread Sean Owen
This means you are running on a headless machine without a monitor. The program needs to show a window with graphics but cant. On Mar 9, 2012 6:48 AM, "rahul raghavendhra" wrote: > hi Lance, > i tried as u said, but now i got a new exception > > Exception in thread "main" java.lang.InternalError:

Re: Using recommenders with String identifiers

2012-03-08 Thread Sean Owen
No. It used to work this way, but was removed just because you get much better memory and performance using longs. It would be a lot of surgery to undo this. The best answer is to use longs. If you must use strings, IDMigrator does the trick quite well. On Thu, Mar 8, 2012 at 1:27 PM, Claudia Gri

Re: why log-likelihood similarity is faster than Tanimoto coefficient

2012-03-08 Thread Sean Owen
I don't expect they are different in speed. Both do about exactly the same thing and finish with a simple computation. On Thu, Mar 8, 2012 at 9:52 AM, Ayad Al-Qershi wrote: > Dear All, > > can anyone tell me why running the recommender job with log-likelihood > similarity performs better (faster)

Re: packaging a recommender as a war file

2012-03-07 Thread Sean Owen
RecommenderService.jws is a JWS file, which is one standard for making SOAP-based web services. RecommenderServlet is a 'raw' servlet wrapper. Both are just wrappers around a Recommender that expose it over HTTP. Neither is quite REST-ful; both are JavaEE, yes. You can do anything you want here an

Re: DistributedRowMatrix - FileNotFoundException

2012-03-07 Thread Sean Owen
DistributedRowMatrix operates on IntWritable,VectorWritable in a sequence file, and it looks like you're feeding text. No, it doesn't accept some text-based format. On Wed, Mar 7, 2012 at 8:41 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ wrote: > > Sorry but I can't understand how to do it. > > I have sing

Re: override mapreduce compression?

2012-03-07 Thread Sean Owen
The client can override cluster defaults unless the cluster marks them "final". On Wed, Mar 7, 2012 at 9:02 PM, Dmitriy Lyubimov wrote: > Aren't hadoop site.xml settings on the driver's client usually > overshadow whatever it is on the cluster? Or you don't have the privs > to change that either?

Re: packaging a recommender as a war file

2012-03-07 Thread Sean Owen
Yes this doesn't exist as a push-button solution anymore. There is no target that builds a .war. However it's pretty easy to resurrect the script from 0.5, or, simply configure your IDE to build a .war with the Mahout .jar, your .jar, and a one-liner web.xml that configures RecommenderServlet. Sea

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
wrote: > Why should it not be compressed in the first place? > > Here is the header of one of the reducer parts that was written into > /mahout/kmeans/clusters-5-final > > SEQ org.apache.hadoop.io.Text+org.apache.mahout.clustering.kmeans.Cluster > )org.apache.hadoop.io.compress.Sna

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
l try > adding this param to the HADOOP_OPTS and in the longterm probably come up > with a cleaner way to do this. Thanks! > > -Luke > > On 3/6/12 6:24 PM, "Sean Owen" wrote: > > >-D arguments are to the JVM so need to be set in HADOOP_OPTS (as I > >r

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
o have to install native snappy which is > why I'm trying to override this param).  Passing -Dkey=value on the mahout > command line does not seem to have any effect on the mapreduce job > configuration from what I can tell.  Any ideas? > > -Luke > > On 3/6/12 3:48 PM, "

Re: Injecting content into item-item CF

2012-03-06 Thread Sean Owen
Sure, you just write your own ItemSimilarity implementation based on the content, whatever that may be. what you do there is mostly up to you; there's not a framework for this. On Tue, Mar 6, 2012 at 10:09 PM, Ahmed Abdeen Hamed wrote: > Hello friends, > > Is there an example on how you can injec

Re: override mapreduce compression?

2012-03-06 Thread Sean Owen
Mapper compression? -Dmapreduce.map.output.compress=false. I think the key was mapred.output.compress in Hadoop 0.20.0. I am not sure if there is reducer compression built-in, but, I could have missed it. On Tue, Mar 6, 2012 at 9:40 PM, Luke Forehand wrote: > Hello, > > Is there a way to run the

Re: DistributedRowMatrix - FileNotFoundException

2012-03-06 Thread Sean Owen
Your input is still text though, and I assume your'e trying to use TextInputFormat. You can't do this as it expects an IntWritable, and that means it expects input as a sequence file, via SequenceFileInputFormat. On Tue, Mar 6, 2012 at 7:21 PM, PEDRO MANUEL JIMENEZ RODRIGUEZ wrote: > > Thanks for

Re: Washing machines - Mahout algorithm advice

2012-03-03 Thread Sean Owen
I answered on SO: The only thing I can think of that sounds like this problem is PageRank. It's computed by a sort of iterative simluation. Each page has some influence (color) which flows via its links (socks its washed with) and at some point the page influence reaches a steady state (final colo

Re: Item Recommender Does not read Filedatamodel

2012-02-29 Thread Sean Owen
Caused by: java.lang.IllegalArgumentException: Bad line: 444,25414 This is your problem. On Wed, Feb 29, 2012 at 12:21 PM, VIGNESH PRAJAPATI wrote: > Hello Mahout Group, > > When i am going to rum my ItemBased Recommender on below given Dataset > structure.It gives me this errors. I cant under

Re: Cassandra Data Model

2012-02-29 Thread Sean Owen
data in Cassandra to Hadoop. > > -srinivas > > > > > On Wed, Feb 29, 2012 at 10:30 AM, Sean Owen wrote: > > > That is for non distributed recomenders, not using Hadoop. For anything > > else using Hadoop you use Cassandra by using it as an input to Hadoop. It > > is no

Re: Cassandra Data Model

2012-02-29 Thread Sean Owen
e states support for Cassandra Data model, which I am > guessing is support for mapping on to Columns, SuperColumns etc or am I > mistaken ? > > > -srinivas > > > > On Wed, Feb 29, 2012 at 9:23 AM, Sean Owen wrote: > > > CassandraDataModel is not related to

Re: Cassandra Data Model

2012-02-29 Thread Sean Owen
CassandraDataModel is not related to HMM. Maybe you could be more specific here. On Feb 29, 2012 4:43 AM, "Srinivas Krishnan" wrote: > I am currently designing my Data Model for a small cassandra cluster and > wanted to incorporate the HMM model from Mahout. I could not find much > documentation

Re: Mahout sample datasets for Recommender, classifier and clustering

2012-02-28 Thread Sean Owen
Oh its very easy: tr ";" "," < in.csv | tr "\"" "" > out.csv Or something close. On Feb 28, 2012 7:31 PM, "VIGNESH PRAJAPATI" wrote: > Hello Daniel Glauser , > > Thanks for your suggestion, but I have 2,00,000 raws in my Csv file.so > its require great modification. for solution,I want anot

Re: problem:while running RecommenderJob over Hadoop

2012-02-28 Thread Sean Owen
his I > didn't get any idea. > > > sean owen: > Your job file is corrupt or missing. Verify its there and try rebuilding. > > >> I am newbie to mahout. >> can any body  help me out to solve the following error.? >> >> When ever i try to run Recom

Re: problem:while running RecommenderJob over Hadoop

2012-02-28 Thread Sean Owen
Your job file is corrupt or missing. Verify its there and try rebuilding. On Feb 28, 2012 7:54 AM, "manish dunani" wrote: > I am newbie to mahout. > can any body help me out to solve the following error.? > > When ever i try to run RecommenderJob over apache hadoop i got the > following error:(R

Re: Documentation error for GenericUserPreferenceArray?

2012-02-27 Thread Sean Owen
Definitely a typo in the second passage. Ill fix when I get home unless someone beats me to it. On Feb 27, 2012 3:35 PM, "Don Smith" wrote: > The documentation for GenericUserPreferenceArray says "Like {@link > GenericItemPreferenceArray} but stores preferences for one user (all user > IDs the sa

Re: Update Mahout Wiki with latest Mahout Versions

2012-02-16 Thread Sean Owen
Hmm. I updated it in SVN and thought our fancy new svnpubsub system was supposed to push that for us. I'll ask if there's something else we need to do. On Thu, Feb 16, 2012 at 5:17 PM, Suneel Marthi wrote: > Could someone update the Mahout wiki - http://mahout.apache.org with the > correct relea

Re: Mahout Hosting Provider

2012-02-16 Thread Sean Owen
No, it's a library that you run where you like. There's no hosting for it per se but yeah you could run on Amazon. On Thu, Feb 16, 2012 at 8:30 AM, VIGNESH PRAJAPATI wrote: > Hi Folks, > >  I am new to mahout.I want to know that is there any mahout hosting > provider for Apache Mahout except amaz

Re: Support of HBase

2012-02-16 Thread Sean Owen
I think this thread is talking about at least 4 different things. 1. There is no "HBaseDataModel" for non-distributed code, that uses the HBase driver presumably, but could be like there is CassandraDataModel. That's what I was talking about. 2. You could use a JDBC driver for HBase with JDBCDataM

Re: Support of HBase

2012-02-15 Thread Sean Owen
In the non-distributed bit of the code? no... though I put together a Cassandra-backed implementation and a Mongo-backed one was contributed. It can't be hard to write. These don't do much in the sense that the data gets loaded into memory anyway. It just facilitiates reading it off storage. On W

Re: new to mahout

2012-02-14 Thread Sean Owen
True, those are more for the non-distributed implementation, but you could still re-use them as you roll your own translation code in this context too. On Tue, Feb 14, 2012 at 1:49 PM, Manuel Blechschmidt wrote: > Hi Ayad, > you can use the different IDMigrators implementations which are provided

Re: new to mahout

2012-02-14 Thread Sean Owen
Yes, you need to maintain some mapping from strings to numbers. Create and store the mapping, and use it to pre-process the input into numeric ID form, then translate the output after it's done back into strings as you need. You will have to write this yourself. On Tue, Feb 14, 2012 at 11:42 AM, A

Re: Saving UserSimilarity after computation

2012-02-14 Thread Sean Owen
No, because most similarities are not fully computed, only the similarities that are needed are recorded. You could iterate through and write all of the user-user similarities to disk, but you'd have to do that yourself. You could then re-read them into GenericUserSimilarity but this would take a l

Re: GenericBooleanPrefDataModel MapReduce

2012-02-13 Thread Sean Owen
These classes are entirely different than what you'd use with MapReduce / Hadoop, so no there is no direct way to use them on Hadoop. (Well, refer to the discussion of pseudo-distributed recommenders in Chapter 6, but that's kind of faking it.) Instead look to RecommenderJob, and (as I recall) it

Re: Google predictions API

2012-02-13 Thread Sean Owen
PS I should name the ones I know... TheFilter, Directed Edge, Strands, Recsys. On Mon, Feb 13, 2012 at 4:43 PM, Sean Owen wrote: > As a simple API? No, though this is exactly what I am working on. > There are several vendors providing something like a hosted service, > though I have h

Re: Google predictions API

2012-02-13 Thread Sean Owen
ive alternatives that you may be > aware of? > > > Sent from Yahoo! Mail on Android > > ------ > * From: * Sean Owen ; > * To: * ; > * Subject: * Re: Google predictions API > * Sent: * Mon, Feb 13, 2012 2:41:42 PM > > I have not used it,

Re: Google predictions API

2012-02-13 Thread Sean Owen
I have not used it, but looks great as a concept -- hosted machine learning is the wave of the future. The reason I haven't seriously considered it or recommended it for production is that it seems limited in the scale you're allowed. https://developers.google.com/prediction/docs/pricing Training

Re: Apache Mahout 0.6 Released

2012-02-10 Thread Sean Owen
No user-based approach there, no. On Fri, Feb 10, 2012 at 5:00 PM, Ahmed Abdeen Hamed wrote: > Thank you for the wonderful work! > > Does the new release support built-in MapReduce for the User-based > Recommenders?

Re: help using Recommender against a MySQL database table

2012-02-07 Thread Sean Owen
This isn't related to Mahout per se, but an error from the JDBC driver (which may be in turn an error from the server). Are you using a connection pool? if not, do so for performance, but limit its size to not overwhelm the server. On Tue, Feb 7, 2012 at 5:06 PM, David Donohue wrote: > Hi!  I am

Re: Precision Recall -- Without RecommenderBuilder

2012-02-06 Thread Sean Owen
It doesn't take a Recommender, but a RecommenderBuilder, a thing which makes your Recommender for a given DataModel -- because it will run your recommender on a test data set it creates. It has to accept your Recommender in some form or else how does it know what to test? So I don't know what it wo

Re: Item-based Recommendation Engine Performance for E-Commerce

2012-02-05 Thread Sean Owen
echschmidt < > manuel.blechschm...@gmx.de> > wrote: > > Hello Varad, > > > > On 22.01.2012, at 10:47, Sean Owen wrote: > > > >> If you are always reading from the database it is never going to be > >> anywhere near fast. You have to put it in memory,

Re: Moving from Standalone to HDFS and then Hadoop?

2012-02-03 Thread Sean Owen
Feb 3, 2012 at 10:51 PM, praveenesh kumar wrote: > @Sean - Then how can we access remote HDFS files ? > > On Sat, Feb 4, 2012 at 4:18 AM, Sean Owen wrote: > > > This isn't going to work -- it may happen to work locally, but you can't > > access remote HDFS files li

Re: Moving from Standalone to HDFS and then Hadoop?

2012-02-03 Thread Sean Owen
This isn't going to work -- it may happen to work locally, but you can't access remote HDFS files like this. On Fri, Feb 3, 2012 at 10:39 PM, praveenesh kumar wrote: > Yes, I agree. > I also faced the same thing :-) > > After specifying your config files in conf.addResource(), you can specify > y

Re: Moving from Standalone to HDFS and then Hadoop?

2012-02-03 Thread Sean Owen
r mahout code can get the HDFS filesystem >> information and can get the files by specifying HDFS file path. >> Hope that will help. >> >> Thanks, >> Praveenesh >> >> On Sat, Feb 4, 2012 at 2:59 AM, Sean Owen wrote: >> >> If you are running Had

Re: Moving from Standalone to HDFS and then Hadoop?

2012-02-03 Thread Sean Owen
(Yes, this is configuring HDFS, rather than Mahout.) On Fri, Feb 3, 2012 at 10:11 PM, praveenesh kumar wrote: > AFAIK, In order to load data from HDFS to Mahout, you need to add your > hadoop config files in your hadoop configuration object > using conf.addResource(../../core-site.xml) hdfs-site

Re: Moving from Standalone to HDFS and then Hadoop?

2012-02-03 Thread Sean Owen
If you are running Hadoop-based stuff, the data not only can be, but *has* to be in something like HDFS, rather than a local file, to be accessible to Hadoop. (Well... I think you'll find you can actually get away with using "file:///" URLs locally with Hadoop but this is not the real way to use th

Re: Parallel ALS-WR on very large matrix -- crashing (I think)

2012-02-02 Thread Sean Owen
I have seen this happen in "normal" operation when the sorting on the mapper is taking a long long time, because the output is large. You can tell it to increase the timeout. If this is what is happening, you won't have a chance to update a counter as a keep-alive ping, but yes that is generally r

Re: mahout jar in classpath

2012-01-30 Thread Sean Owen
DenseVector is definitely in math. You can use 'jar tf' to see it. There must be something else going on in your javac command. On Mon, Jan 30, 2012 at 9:52 PM, Daniel Quach wrote: > I am trying to figure out which mahout jar file to include in the > classpath if I want to use a DenseVector. > >

Re: KnnItemBasedRecommender/AbstractItemSimilarity Question

2012-01-29 Thread Sean Owen
You mean that there are two items with near identical data, and one shows up in recs and the other doesn't? I can make a general guess, and it comes down to the fact that your similarity data isn't "transitive". This comes up in a minor and a major way. The minor way is just theoretical: these met

Re: Startup cost for instantiation of recommenders

2012-01-29 Thread Sean Owen
(This is all not Hadoop-based, I presume.) It varies from implementation to implementation; some implementations have a lot of precomputation at startup and some don't. Most do a fair bit of caching so they respond slowly and then more quickly as the caches fill. Anything's possible, but this is

Re: Add on to itemsimilarity

2012-01-28 Thread Sean Owen
real preference) but instead measures > > behavior that is a very complicated outcome of social pressures, > > expectations and the internal mental state. So using ratings boils down > to > > using one kind of behavior to estimate mental state that then is > > hypothesized to r

Re: custom modifications to include implicit feedback in neighbourhood model

2012-01-28 Thread Sean Owen
You almost certainly need to modify a DataModel too to expose this extra information. But that and your proposed modification seems like all you need to worry about. The rest should work as-is. I would use code from Subversion, not just 0.5. On Sat, Jan 28, 2012 at 4:42 PM, gj wrote: > Hi, > > I

Re: Add on to itemsimilarity

2012-01-28 Thread Sean Owen
It means *something* that a user clicked on one item and not 10,000 others. You will learn things like that Star Wars and Star Trek are somehow related from this data. I don't think that clicks are a bad input per se. I agree that it's not obvious how to meaningfully translate user actions into a

Re: MongoDBDatamodel for parallel recommendations.

2012-01-27 Thread Sean Owen
Surely, would you create a patch with what you have in mind and stick it in JIRA? On Fri, Jan 27, 2012 at 11:25 AM, Danny wrote: > Hi, > > in MongoDBDatamodel every id of an item is mapped to an internal id. These > id's > are stored in the MONGO_MAP_COLLECTION. The name of the collection can no

Re: Add on to itemsimilarity

2012-01-27 Thread Sean Owen
I think it would be surprising behavior for a recommender to return data it already knows; I just think the implicit contract is to return only predictions. That's how real-world recommender systems appear to behave, to the end user; Amazon doesn't show you books you have already read, even if inde

Re: Add on to itemsimilarity

2012-01-26 Thread Sean Owen
elegate, I cannot. That > makes me sad. > > I hope this helps some of you, and I would appreciate some feedback on > whether what I'm doing is even a good idea, and how to go about it. > > Thanks, > > Anatoliy > > > On 01/25/2012 09:36 PM, Sean Owen wrote: >

Re: Add on to itemsimilarity

2012-01-25 Thread Sean Owen
I am not sure that fits in to an item-based recommender since this is data that is not about your 'items'. You might use it to influence a user similarity metric in a user-based computation. Or better, don't try to use this data yet and see where you get with the simple implementation. Sean On

<    2   3   4   5   6   7   8   9   10   11   >