Thanks Sebastian.
Although I got the FileDataModel updating correctly after following your
advice, everything seems to point that I will need to use a database to
back my dataModel.
On Mon, Mar 3, 2014 at 3:47 PM, Sebastian Schelter s...@apache.org wrote:
I think it depends on the difference
Hi Juan,
IIRC then FileDataModel has a parameter that determines how much time
must have been spent since the last modification of the underlying file.
You can also directly append new data to the original file.
If you want a to have a DataModel that can be concurrently updated, I
suggest
calls to recommender.refresh(null)?
Many thanks.
On Mon, Mar 3, 2014 at 1:18 PM, Sebastian Schelter s...@apache.org wrote:
Hi Juan,
IIRC then FileDataModel has a parameter that determines how much time must
have been spent since the last modification of the underlying file. You can
also
wrote:
Hi Juan,
IIRC then FileDataModel has a parameter that determines how much time must
have been spent since the last modification of the underlying file. You can
also directly append new data to the original file.
If you want a to have a DataModel that can be concurrently updated, I
suggest
I am having issues refreshing my recommender, in particular with the
DataModel.
I am using a FileDataModel and a GenericItemBasedRecommender that also has
a CachingItemSimilarity wrapping a FileItemSimilarity. But for the test I
am running I am making things even simpler.
By the time I
A follow-up question.
I worked around for quite a while and got stuck. It is difficult for me to
figure out which are the classes I need to extend.
I suppose they are : FileDataModel, Preference, PreferenceArray, am I
correct?
Thanks!
Jia
On Fri, May 17, 2013 at 1:20 AM, Manuel Blechschmidt
want to build a recommendation model based on Mahout. My dataset format
is in the format of
userID, itemID, rating timestamp tag1 tag2 tag3. Thus, I think I need to
extend the FileDataModel.
I looked into *JesterDataModel* as an example. However, I have a problem
with the logic flow. In its
is in the format of
userID, itemID, rating timestamp tag1 tag2 tag3. Thus, I think I need to
extend the FileDataModel.
I looked into *JesterDataModel* as an example. However, I have a problem
with the logic flow. In its *buildModel()* method, an empty map data is
first constructed
, rating timestamp tag1 tag2 tag3. Thus, I think I need to
extend the FileDataModel.
I looked into *JesterDataModel* as an example. However, I have a problem
with the logic flow. In its *buildModel()* method, an empty map data is
first constructed. It is then thrown into processFile. I
Hi,
I want to build a recommendation model based on Mahout. My dataset format
is in the format of
userID, itemID, rating timestamp tag1 tag2 tag3. Thus, I think I need to
extend the FileDataModel.
I looked into *JesterDataModel* as an example. However, I have a problem
with the logic flow
be reloaded.
On Mar 2, 2013 6:34 AM, Nadia Najjar ned...@gmail.com wrote:
I am using a FileDataModel and remove and insert preferences before
estimating preferences. Do I need to rebuild the recommender after these
methods are called for it to be reflected in the prediction?
Yes to integrate any new data everything must be reloaded.
On Mar 2, 2013 6:34 AM, Nadia Najjar ned...@gmail.com wrote:
I am using a FileDataModel and remove and insert preferences before
estimating preferences. Do I need to rebuild the recommender after these
methods are called
to integrate any new data everything must be reloaded.
On Mar 2, 2013 6:34 AM, Nadia Najjar ned...@gmail.com wrote:
I am using a FileDataModel and remove and insert preferences before
estimating preferences. Do I need to rebuild the recommender after these
methods are called for it to be reflected
I am using a FileDataModel and remove and insert preferences before estimating
preferences. Do I need to rebuild the recommender after these methods are
called for it to be reflected in the prediction?
the
loading time by tuning those.
Best,
Sebastian
On 11.11.2012 11:53, Onur Kuru wrote:
Hi all,
If I use FileDataModel, it takes about 5 secs to build the data model
with 1m movielens data but it takes about 25 secs if I use
ReloadFromJDBCDataModel.
I know the former uses file and the latter
, and then
it's in memory.
On Sun, Nov 11, 2012 at 10:53 AM, Onur Kuru kuru.on...@gmail.com wrote:
Hi all,
If I use FileDataModel, it takes about 5 secs to build the data model
with 1m movielens data but it takes about 25 secs if I use
ReloadFromJDBCDataModel.
I know the former uses file and the latter
: Creating FileDataModel for file datasets\mydb.csv
Feb 29, 2012 10:38:08 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Reading file info...
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method
There is a Dictionary class that might help.
Do you have some code to contribute?
On Thu, Aug 11, 2011 at 7:30 PM, Charles McBrearty ctm...@gmail.com wrote:
After having actually having implemented the import/export conversions it
makes a little more sense why you didn't want to put this in
set that I have that uses strings as the ItemID's and it
looks to me like the suggested way to do this is to subclass FileDataModel
and then use FileIdMigrator to manage the String - Long mapping.
This seems like a lot of complication to deal with what I would imagine is
a pretty common use case
, Charles McBrearty ctm...@gmail.com wrote:
Hi,
I am taking a look at running some of the recommender examples from Mahout
in action on a data set that I have that uses strings as the ItemID's and it
looks to me like the suggested way to do this is to subclass FileDataModel
and then use
examples from Mahout
in action on a data set that I have that uses strings as the ItemID's and it
looks to me like the suggested way to do this is to subclass FileDataModel
and then use FileIdMigrator to manage the String - Long mapping.
This seems like a lot of complication to deal with what I
and it
looks to me like the suggested way to do this is to subclass
FileDataModel
and then use FileIdMigrator to manage the String - Long mapping.
This seems like a lot of complication to deal with what I would imagine
is
a pretty common use case. Is there something that I'm missing here
You don't need to rekey those tables.
You can use hashes of the strings. Or you can build a dictionary to use at
the import/export points.
On Thu, Aug 11, 2011 at 3:27 PM, Charles McBrearty ctm...@gmail.com wrote:
In any event, your suggestion to switch to numeric IDs is a non-starter.
This
Hi,
I am taking a look at running some of the recommender examples from Mahout in
action on a data set that I have that uses strings as the ItemID's and it looks
to me like the suggested way to do this is to subclass FileDataModel and then
use FileIdMigrator to manage the String - Long
The issue is that actually supporting strings through the whole process
kills performance.
Interning the strings to be consecutively assigned integers helps
ginormously.
On Wed, Aug 10, 2011 at 5:02 PM, Charles McBrearty ctm...@gmail.com wrote:
This seems like a lot of complication to deal
I've read the source for FileDataModel and it suggested using a JDBC
backed implementation for larger datasets so I decided to upgrade our
recommendation system to use MySQLJDBCDataModel with
MySQLJDBCInMemoryItemSimilarity.
I've found that the JDBC backed versions performance is actually
Yes, this is trading memory for speed. If you can fit everything in memory,
then you should. FileDataModel is in memory.
MySQLJDBCDataModel is not in memory and queries the DB every time. This is
pretty slow, though by caching item-item similarity as you do, a lot of the
load is removed. However
. If you can fit everything in memory,
then you should. FileDataModel is in memory.
MySQLJDBCDataModel is not in memory and queries the DB every time. This is
pretty slow, though by caching item-item similarity as you do, a lot of the
load is removed. However if you want to go all in memory, use
Yes. Both are just fine to use in production. For speed and avoiding abuse
of the database, I'd load into memory and tell it to periodically reload.
But that too is a bit of a choice between how often you want to consume new
data and how much work you want to do to recompute new values.
On Mon,
I wouldn't use the in memory JDBC solution.
I was wondering do most people choose the JDBC backed solutions or the
File backed?
On 7/4/11 10:17 AM, Sean Owen wrote:
Yes. Both are just fine to use in production. For speed and avoiding abuse
of the database, I'd load into memory and tell it to
A look into a recent blogpost of mine might maybe be helpful with
choosing the appropriate data access strategies for your recommender
setup. It covers a very common usecase in great detail:
http://ssc.io/deploying-a-massively-scalable-recommender-system-with-apache-mahout/
--sebastian
2011/7/4
May I ask why you choose to go with
AllSimilarItemsCandidateItemsStrategy over the default
PreferredItemsNeighborhoodCandidateItemsStrategy?
On 7/4/11 10:23 AM, Sebastian Schelter wrote:
A look into a recent blogpost of mine might maybe be helpful with
choosing the appropriate data access
If the item similarities are already precomputed there's no sense in
fetching them from the data model, you can just read use the already
precomputed set of possibly similar items as no other items can be
recommended anyway and it's faster to fetch them from a similarity
implementation that holds
Hi,
This is my second time trying to post this - the first time did not seem to
work; my apologies if this ends up being a duplicate post.
I'm having an issue with FileDataModel. In particular, suppose you have a main
data file (say, /tmp/data.lst) and two incremental files (say, /tmp/data.1
lines of change.
On Mon, Nov 15, 2010 at 2:31 PM, Jordan, Eric eric.jor...@navteq.comwrote:
Hi,
This is my second time trying to post this - the first time did not seem to
work; my apologies if this ends up being a duplicate post.
I'm having an issue with FileDataModel. In particular
It would be a nice feature to have build into the api for sure. You could
use the getPreferencesFromUser to determine which users have the appropriate
level of options.
On Fri, Oct 8, 2010 at 6:27 PM, Sean Owen sro...@gmail.com wrote:
There's nothing built-in. Yeah I'd view that as a step
Maybe, my hunch is that it will affect so much in the code as to be hard to
support. It is rare you want to filter the data in different ways repeatedly
I think. And if you're filtering one way probably better to not have it in
memory.
On Oct 8, 2010 6:43 PM, Steven Bourke sbou...@gmail.com wrote:
In the past I've extended the FileDataModel (if I recall correctly) that did
this exact filtering that ChrisS was asking for. It worked well.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
What do you mean by this? I'm not clear yet.
On Sun, Aug 15, 2010 at 1:09 PM, Tamas Jambor jambo...@gmail.com wrote:
Hi,
One more possible bug, in FileDataModel, there is nothing to make sure that
the superclass - AbstractDataModel gets the value for maxPreference and
minPreference.
Tamas
DataModel model = new FileDataModel(new File(./data/test.txt));
//just to make sure it loads the model
model.getNumItems();
System.out.println(model.getMaxPreference());
this prints out a NaN
because you have maxPreference/minPreference calculated when it creates
the inner DataModel
40 matches
Mail list logo