Yes, that's the only point of direct JDBC integration in the project. For the Hadoop-based bits, and that's most of Mahout, the question is really, does Hadoop integrate with JDBC? Since the code is really a bunch of Hadoop jobs, and not tied directly to a data store.
A relational database is not a common data source for Hadoop. Not that it couldn't be, it's just that Hadoop operates by sequentially accessing petabytes of potentially unstructured data. A relational database would be expensive overkill for just storing huge blobs. I would not be surprised if you can find an InputFormat implementation for Hadoop that reads from JDBC. Breaking news: I found DBInputFormat in Hadoop! http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html So, the high-level answer is that you would tweak the Mahout implementation to use DBInputFormat if you wanted to read stuff out of a database. On Thu, Nov 24, 2011 at 3:02 PM, Sturm, Martin <martin.st...@uc4.com> wrote: > The only JDBC related classes in Mahout are in the > org.apache.mahout.cf.taste.* package to load a JDBCDataModel used for > mining preferences. > But I need Dataset and Instance which are used for building a decision > tree (in org.apache.mahout.classifier.df.data). Can I somehow get these > directly from the DB also with a standalone application without a Hadoop? > > -----Original Message----- > From: Shern Shiou Tan [mailto:shernshiou....@mnc.com.my] > Sent: Donnerstag, 24. November 2011 15:53 > To: user@mahout.apache.org > Subject: Re: Load Dataset and Instances from database > > Yes, if not mistaken mahout support JDBC based database using JDBC driver. > ShernShiou > On 11/24/2011 10:34 PM, Sturm, Martin wrote: > > Hello, > > I am relatively new to Mahout, so it is possible that this is a quite > obvious question: Has Mahout any database access support besides the in the > recommender-related package? > > I want to use a decision forest classification and need the data from an > Oracle or MS SQL database. Is there any way to access them so that I > automatically get the Dataset and a list of Instance? > > > > Thanks in advance, > > Martin > > > > > UC4 Senactive Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben mit einer > weiteren Betriebsstaette in /with an office at Prinz-Eugen-Stra?e 72, 1040 > Wien Firmenbuchnummer/Commercial Register No. 261186y > Firmenbuchgericht/Commercial Register Court: Landesgericht St. Poelten > This email (including any attachments) may contain information which is > privileged, confidential, or protected. If you are not the intended > recipient, note that any disclosure, copying, distribution, or use of the > contents of this message and attached files is prohibited. If you have > received this email in error, please notify the sender and delete this > email and any attached files. >