I think everyone is agreeing that it is essential to only access information in memory at run-time, yes, whatever that info may be. I don't think the original question was about Hadoop, but, the answer is the same: Hadoop mappers are just reading the input serially. There is no advantage to a relational database or NoSQL database; they're just overkill. HDFS is sufficient, and probably even best of these at allowing fast serial access to the data.
On Sun, May 19, 2013 at 11:19 AM, Tevfik Aytekin <tevfik.ayte...@gmail.com> wrote: > Hi Manuel, > But if one uses matrix factorization and stores the user and item > factors in memory then there will be no database access during > recommendation. > I thought that the original question was where to store the data and > how to give it to hadoop. > > On Sun, May 19, 2013 at 9:01 PM, Manuel Blechschmidt > <manuel.blechschm...@gmx.de> wrote: >> Hi Tevfik, >> one request to the recommender could become more then 1000 queries to the >> database depending on which recommender you use and the amount of >> preferences for the given user. >> >> The problem is not if you are using SQL, NoSQL, or any other query language. >> The problem is the latency of the answers. >> >> An average tcp package in the same data center takes 500 µs. A main memory >> reference 0,1 µs. This means that your main memory of your java process can >> be accessed 5000 times faster then any other process like a database >> connected via TCP/IP. >> >> http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html >> >> Here you can see a screenshot that shows that database communication is by >> far (99%) the slowest component of a recommender request: >> >> https://source.apaxo.de/MahoutDatabaseLowPerformance.png >> >> If you do not want to cache your data in your Java process you can use a >> complete in memory database technology like SAP HANA >> http://www.saphana.com/welcome or EXASOL http://www.exasol.com/ >> >> Nevertheless if you are using these you do not need Mahout anymore. >> >> An architecture of a Mahout system can be seen here: >> https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/RecommenderArchitecture.png >> >> Hope that helps >> Manuel >> >> Am 19.05.2013 um 19:20 schrieb Sean Owen: >> >>> I'm first saying that you really don't want to use the database as a >>> data model directly. It is far too slow. >>> Instead you want to use a data model implementation that reads all of >>> the data, once, serially, into memory. And in that case, it makes no >>> difference where the data is being read from, because it is read just >>> once, serially. A file is just as fine as a fancy database. In fact >>> it's probably easier and faster. >>> >>> On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin >>> <tevfik.ayte...@gmail.com> wrote: >>>> Thanks Sean, but I could not get your answer. Can you please explain it >>>> again? >>>> >>>> >>>> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <sro...@gmail.com> wrote: >>>>> It doesn't matter, in the sense that it is never going to be fast >>>>> enough for real-time at any reasonable scale if actually run off a >>>>> database directly. One operation results in thousands of queries. It's >>>>> going to read data into memory anyway and cache it there. So, whatever >>>>> is easiest for you. The simplest solution is a file. >>>>> >>>>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz >>>>> <ahmetyilmazefe...@yahoo.com> wrote: >>>>>> Hi, >>>>>> I would like to use Mahout to make recommendations on my web site. Since >>>>>> the data is going to be big, hopefully, I plan to use hadoop >>>>>> implementations of the recommender algorithms. >>>>>> >>>>>> I'm currently storing the data in mysql. Should I continue with it or >>>>>> should I switch to a nosql database such as mongodb or something else? >>>>>> >>>>>> Thanks >>>>>> Ahmet >> >> -- >> Manuel Blechschmidt >> M.Sc. IT Systems Engineering >> Dortustr. 57 >> 14467 Potsdam >> Mobil: 0173/6322621 >> Twitter: http://twitter.com/Manuel_B >>