There is a Dictionary class that might help.
Do you have some code to contribute?
On Thu, Aug 11, 2011 at 7:30 PM, Charles McBrearty ctm...@gmail.com wrote:
After having actually having implemented the import/export conversions it
makes a little more sense why you didn't want to put this in
If you bring your data in the expected inputformat, than there's no need to
subclass. You can just use those classes.
--sebastian
Am 11.08.2011 02:02 schrieb Charles McBrearty ctm...@gmail.com:
Hi,
I am taking a look at running some of the recommender examples from Mahout
in action on a data
Yes, it's just that it's much slower and takes up much more memory. You are
strongly encouraged to use numeric IDs and not bother with this adapter at
all. It's not a question of interning strings, and they need not be
consecutive IDs, but avoiding them entirely.
On Thu, Aug 11, 2011 at 1:02 AM,
If using Strings internally as ID's costs too much from a performance
perspective that's totally fine and I wasn't trying to pick that fight. It
sounds like there isn't much appetite for String wrappers however.
In any event, your suggestion to switch to numeric IDs is a non-starter. This
You don't have to use these numeric IDs elsewhere in your system. For
example if you have an additional column with a unique numeric ID then this
ought to work fine, you can just have it reference that column while you use
your real key elsewhere.
That is you can map to/from numeric IDs only for
You don't need to rekey those tables.
You can use hashes of the strings. Or you can build a dictionary to use at
the import/export points.
On Thu, Aug 11, 2011 at 3:27 PM, Charles McBrearty ctm...@gmail.com wrote:
In any event, your suggestion to switch to numeric IDs is a non-starter.
This
Hi,
I am taking a look at running some of the recommender examples from Mahout in
action on a data set that I have that uses strings as the ItemID's and it looks
to me like the suggested way to do this is to subclass FileDataModel and then
use FileIdMigrator to manage the String - Long
The issue is that actually supporting strings through the whole process
kills performance.
Interning the strings to be consecutively assigned integers helps
ginormously.
On Wed, Aug 10, 2011 at 5:02 PM, Charles McBrearty ctm...@gmail.com wrote:
This seems like a lot of complication to deal