Re: FileDataModel / FileIDMigrator

2011-08-12 Thread Ted Dunning
There is a Dictionary class that might help. Do you have some code to contribute? On Thu, Aug 11, 2011 at 7:30 PM, Charles McBrearty ctm...@gmail.com wrote: After having actually having implemented the import/export conversions it makes a little more sense why you didn't want to put this in

Re: FileDataModel / FileIDMigrator

2011-08-11 Thread Sebastian Schelter
If you bring your data in the expected inputformat, than there's no need to subclass. You can just use those classes. --sebastian Am 11.08.2011 02:02 schrieb Charles McBrearty ctm...@gmail.com: Hi, I am taking a look at running some of the recommender examples from Mahout in action on a data

Re: FileDataModel / FileIDMigrator

2011-08-11 Thread Sean Owen
Yes, it's just that it's much slower and takes up much more memory. You are strongly encouraged to use numeric IDs and not bother with this adapter at all. It's not a question of interning strings, and they need not be consecutive IDs, but avoiding them entirely. On Thu, Aug 11, 2011 at 1:02 AM,

Re: FileDataModel / FileIDMigrator

2011-08-11 Thread Charles McBrearty
If using Strings internally as ID's costs too much from a performance perspective that's totally fine and I wasn't trying to pick that fight. It sounds like there isn't much appetite for String wrappers however. In any event, your suggestion to switch to numeric IDs is a non-starter. This

Re: FileDataModel / FileIDMigrator

2011-08-11 Thread Sean Owen
You don't have to use these numeric IDs elsewhere in your system. For example if you have an additional column with a unique numeric ID then this ought to work fine, you can just have it reference that column while you use your real key elsewhere. That is you can map to/from numeric IDs only for

Re: FileDataModel / FileIDMigrator

2011-08-11 Thread Ted Dunning
You don't need to rekey those tables. You can use hashes of the strings. Or you can build a dictionary to use at the import/export points. On Thu, Aug 11, 2011 at 3:27 PM, Charles McBrearty ctm...@gmail.com wrote: In any event, your suggestion to switch to numeric IDs is a non-starter. This

FileDataModel / FileIDMigrator

2011-08-10 Thread Charles McBrearty
Hi, I am taking a look at running some of the recommender examples from Mahout in action on a data set that I have that uses strings as the ItemID's and it looks to me like the suggested way to do this is to subclass FileDataModel and then use FileIdMigrator to manage the String - Long

Re: FileDataModel / FileIDMigrator

2011-08-10 Thread Ted Dunning
The issue is that actually supporting strings through the whole process kills performance. Interning the strings to be consecutively assigned integers helps ginormously. On Wed, Aug 10, 2011 at 5:02 PM, Charles McBrearty ctm...@gmail.com wrote: This seems like a lot of complication to deal