Thanks Manuel, that's very helpful. So you're saying I can just use MemoryIDMigrator, even after my preferences have bee created with UUID values? Or, should I create my preferences using the MemoryIDMigrator?
- Matt On Wed, Aug 1, 2012 at 8:49 PM, Manuel Blechschmidt <manuel.blechschm...@gmx.de> wrote: > Hello Matt, > > On 01.08.2012, at 22:40, Matt Mitchell wrote: > >> Thanks Sean! That all makes sense. Would you mind recommended a >> hashing function for this? Is there something in Mahout I could use? > > The following class uses an string to long mapping based on a > MemoryIDMigrator: > > https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java > > Internally mahout uses parts of the md5 hashes. Which can be fir example > directly expressed in SQL: > > cast(conv(substring(md5([column name]), 1, 16),16,10) as signed) > > Javadoc can be found here: > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/IDMigrator.html > > /Manuel > >> >> - Matt >> >> On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <sro...@gmail.com> wrote: >>> Yep, just hash to a long, from UUID or String or whatever. The occasional >>> collision does not cause a real problem. If you mix the tastes of two users >>> or items once in a billion times, the overall results will hardly be >>> different. >>> >>> You have to maintain the reverse mapping of course. Look at the IDMigrator >>> class for a little help there. >>> >>> You can rewrite to use UUID or String, but believe me, it will be an >>> immense amount of change and make things much slower. It used to work this >>> way for recommenders in about 2006 and the Object overhead and GC pressure >>> was by far the bottleneck. That's why it's all long now. >>> >>> On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <goodie...@gmail.com> wrote: >>> >>>> Question about dealing with UUIDs as Mahout user IDs. I'm considering >>>> ways to deal with these values: >>>> >>>> 1. use getLeastSignificantBits >>>> 2. re-map to a database auto-increment number (this would take very >>>> long time to do?) >>>> 3. customize mahout so that it accepts UUIDs as user IDs >>>> >>>> Any feedback here? If I went with #3 (seems the safest) how would I do >>>> this and, what are the consequences? >>>> >>>> The user count is in the millions. >>>> >>>> Thanks! >>>> > > -- > Manuel Blechschmidt > M.Sc. IT Systems Engineering > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B >