Re: Which database should I use with Mahout

Sean Owen Sun, 19 May 2013 11:27:27 -0700

I think everyone is agreeing that it is essential to only access
information in memory at run-time, yes, whatever that info may be.
I don't think the original question was about Hadoop, but, the answer
is the same: Hadoop mappers are just reading the input serially. There
is no advantage to a relational database or NoSQL database; they're
just overkill. HDFS is sufficient, and probably even best of these at
allowing fast serial access to the data.


On Sun, May 19, 2013 at 11:19 AM, Tevfik Aytekin
<tevfik.ayte...@gmail.com> wrote:
> Hi Manuel,
> But if one uses matrix factorization and stores the user and item
> factors in memory then there will be no database access during
> recommendation.
> I thought that the original question was where to store the data and
> how to give it to hadoop.
>
> On Sun, May 19, 2013 at 9:01 PM, Manuel Blechschmidt
> <manuel.blechschm...@gmx.de> wrote:
>> Hi Tevfik,
>> one request to the recommender could become more then 1000 queries to the 
>> database depending on which recommender you use and the amount of 
>> preferences for the given user.
>>
>> The problem is not if you are using SQL, NoSQL, or any other query language. 
>> The problem is the latency of the answers.
>>
>> An average tcp package in the same data center takes 500 µs. A main memory 
>> reference 0,1 µs. This means that your main memory of your java process can 
>> be accessed 5000 times faster then any other process like a database 
>> connected via TCP/IP.
>>
>> http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
>>
>> Here you can see a screenshot that shows that database communication is by 
>> far (99%) the slowest component of a recommender request:
>>
>> https://source.apaxo.de/MahoutDatabaseLowPerformance.png
>>
>> If you do not want to cache your data in your Java process you can use a 
>> complete in memory database technology like SAP HANA 
>> http://www.saphana.com/welcome or EXASOL http://www.exasol.com/
>>
>> Nevertheless if you are using these you do not need Mahout anymore.
>>
>> An architecture of a Mahout system can be seen here:
>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/RecommenderArchitecture.png
>>
>> Hope that helps
>>     Manuel
>>
>> Am 19.05.2013 um 19:20 schrieb Sean Owen:
>>
>>> I'm first saying that you really don't want to use the database as a
>>> data model directly. It is far too slow.
>>> Instead you want to use a data model implementation that reads all of
>>> the data, once, serially, into memory. And in that case, it makes no
>>> difference where the data is being read from, because it is read just
>>> once, serially. A file is just as fine as a fancy database. In fact
>>> it's probably easier and faster.
>>>
>>> On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin
>>> <tevfik.ayte...@gmail.com> wrote:
>>>> Thanks Sean, but I could not get your answer. Can you please explain it 
>>>> again?
>>>>
>>>>
>>>> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <sro...@gmail.com> wrote:
>>>>> It doesn't matter, in the sense that it is never going to be fast
>>>>> enough for real-time at any reasonable scale if actually run off a
>>>>> database directly. One operation results in thousands of queries. It's
>>>>> going to read data into memory anyway and cache it there. So, whatever
>>>>> is easiest for you. The simplest solution is a file.
>>>>>
>>>>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz
>>>>> <ahmetyilmazefe...@yahoo.com> wrote:
>>>>>> Hi,
>>>>>> I would like to use Mahout to make recommendations on my web site. Since 
>>>>>> the data is going to be big, hopefully, I plan to use hadoop 
>>>>>> implementations of the recommender algorithms.
>>>>>>
>>>>>> I'm currently storing the data in mysql. Should I continue with it or 
>>>>>> should I switch to a nosql database such as mongodb or something else?
>>>>>>
>>>>>> Thanks
>>>>>> Ahmet
>>
>> --
>> Manuel Blechschmidt
>> M.Sc. IT Systems Engineering
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>>

Re: Which database should I use with Mahout

Reply via email to