Are these identifiers used as keys for mappers somewhere?
If yes, then the sorting phase of map reduce will be much faster with
long, as the key comparison time will be less ( long comparison will
take less time than String comparison, due to lesser number of bytes )
as well as more records can be kept in memory while sorting ( because
the size is less ).
I was once processing 1 billion records and just changing the keys from
String to Long increased the performance by 20%.
Ignore if this is not the case.
On 08-03-2012 19:23, Manuel Blechschmidt wrote:
Hallo Claudia,
the reason why longs are use is pure efficiency. When you have a lot of things
and a lot of users and you are using Strings as identifiers you will need a lot
of memory just for saving them. Further processes like equals or hash codes
will take longer.
So a long has 4 bytes (64 bits) a UUID string (e.g.
936DA01F-9ABD-4D9D-80C7-02AF85C822A8) encoded as utf-16 has 72 bytes that means
that UUID would consume more then18x the memory that longs are taking.
/Manuel
On 08.03.2012, at 14:27, Claudia Grieco wrote:
Do you think it's worth the work to change the internal code of Mahout in
order to use string identifiers?
Thanks
Claudia
-----Messaggio originale-----
Da: Manuel Blechschmidt [mailto:[email protected]]
Inviato: lunedì 5 marzo 2012 11.28
A: [email protected]
Oggetto: Re: Using recommenders with String identifiers
Hi Claudia,
you have to use an IDMigrator.
The following projects shows you an example:
https://github.com/ManuelB/facebook-recommender-demo
https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/ja
va/de/apaxo/bedcon/FacebookRecommender.java
Good luck
Manuel
On 05.03.2012, at 09:53, Claudia Grieco wrote:
Hi guys,
I'd like to use mahout to implement a recommender but I'm encountering a
problem:
Ids of items and users are represented in Mahout as long integers, while
my
data comes from an external database that uses strings to identify items
and
users.
Any suggestion as to how I can fix this problem?
Thanks a lot
Claudia
--
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B