I was afraid of that answer, but thanks anyway. Since I only want to try it out "standalone" I was hoping that this was possible without any Hadoop stuff. Are there any tutorials or examples available that show how to load a Dataset? Because I do not even know what files are expected here.. cvs?
-----Original Message----- From: Sean Owen [mailto:[email protected]] Sent: Donnerstag, 24. November 2011 18:30 To: [email protected] Subject: Re: Load Dataset and Instances from database Yes, that's the only point of direct JDBC integration in the project. For the Hadoop-based bits, and that's most of Mahout, the question is really, does Hadoop integrate with JDBC? Since the code is really a bunch of Hadoop jobs, and not tied directly to a data store. A relational database is not a common data source for Hadoop. Not that it couldn't be, it's just that Hadoop operates by sequentially accessing petabytes of potentially unstructured data. A relational database would be expensive overkill for just storing huge blobs. I would not be surprised if you can find an InputFormat implementation for Hadoop that reads from JDBC. Breaking news: I found DBInputFormat in Hadoop! http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html So, the high-level answer is that you would tweak the Mahout implementation to use DBInputFormat if you wanted to read stuff out of a database. On Thu, Nov 24, 2011 at 3:02 PM, Sturm, Martin <[email protected]> wrote: > The only JDBC related classes in Mahout are in the > org.apache.mahout.cf.taste.* package to load a JDBCDataModel used for > mining preferences. > But I need Dataset and Instance which are used for building a decision > tree (in org.apache.mahout.classifier.df.data). Can I somehow get > these directly from the DB also with a standalone application without a > Hadoop? > > -----Original Message----- > From: Shern Shiou Tan [mailto:[email protected]] > Sent: Donnerstag, 24. November 2011 15:53 > To: [email protected] > Subject: Re: Load Dataset and Instances from database > > Yes, if not mistaken mahout support JDBC based database using JDBC driver. > ShernShiou > On 11/24/2011 10:34 PM, Sturm, Martin wrote: > > Hello, > > I am relatively new to Mahout, so it is possible that this is a > > quite > obvious question: Has Mahout any database access support besides the > in the recommender-related package? > > I want to use a decision forest classification and need the data > > from an > Oracle or MS SQL database. Is there any way to access them so that I > automatically get the Dataset and a list of Instance? > > > > Thanks in advance, > > Martin > > > > > UC4 Senactive Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben mit > einer weiteren Betriebsstaette in /with an office at > Prinz-Eugen-Stra?e 72, 1040 Wien Firmenbuchnummer/Commercial Register > No. 261186y Firmenbuchgericht/Commercial Register Court: Landesgericht > St. Poelten This email (including any attachments) may contain > information which is privileged, confidential, or protected. If you > are not the intended recipient, note that any disclosure, copying, > distribution, or use of the contents of this message and attached > files is prohibited. If you have received this email in error, please > notify the sender and delete this email and any attached files. > UC4 Senactive Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben mit einer weiteren Betriebsstaette in /with an office at Prinz-Eugen-Straße 72, 1040 Wien Firmenbuchnummer/Commercial Register No. 261186y Firmenbuchgericht/Commercial Register Court: Landesgericht St. Poelten
