I was afraid of that answer, but thanks anyway.
Since I only want to try it out "standalone" I was hoping that this was 
possible without any Hadoop stuff. Are there any tutorials or examples 
available that show how to load a Dataset? Because I do not even know what 
files are expected here.. cvs?

-----Original Message-----
From: Sean Owen [mailto:[email protected]]
Sent: Donnerstag, 24. November 2011 18:30
To: [email protected]
Subject: Re: Load Dataset and Instances from database

Yes, that's the only point of direct JDBC integration in the project.

For the Hadoop-based bits, and that's most of Mahout, the question is really, 
does Hadoop integrate with JDBC? Since the code is really a bunch of Hadoop 
jobs, and not tied directly to a data store.

A relational database is not a common data source for Hadoop. Not that it 
couldn't be, it's just that Hadoop operates by sequentially accessing petabytes 
of potentially unstructured data. A relational database would be expensive 
overkill for just storing huge blobs.

I would not be surprised if you can find an InputFormat implementation for 
Hadoop that reads from JDBC.

Breaking news: I found DBInputFormat in Hadoop!
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/db/DBInputFormat.html

So, the high-level answer is that you would tweak the Mahout implementation to 
use DBInputFormat if you wanted to read stuff out of a database.



On Thu, Nov 24, 2011 at 3:02 PM, Sturm, Martin <[email protected]> wrote:

> The only JDBC related classes in Mahout are in the
> org.apache.mahout.cf.taste.* package to load a JDBCDataModel used for
> mining preferences.
> But I need Dataset and Instance which are used for building a decision
> tree (in org.apache.mahout.classifier.df.data). Can I somehow get
> these directly from the DB also with a standalone application without a 
> Hadoop?
>
> -----Original Message-----
> From: Shern Shiou Tan [mailto:[email protected]]
> Sent: Donnerstag, 24. November 2011 15:53
> To: [email protected]
> Subject: Re: Load Dataset and Instances from database
>
> Yes, if not mistaken mahout support JDBC based database using JDBC driver.
> ShernShiou
> On 11/24/2011 10:34 PM, Sturm, Martin wrote:
> > Hello,
> > I am relatively new to Mahout, so it is possible that this is a
> > quite
> obvious question: Has Mahout any database access support besides the
> in the recommender-related package?
> > I want to use a decision forest classification and need the data
> > from an
> Oracle or MS SQL database.  Is there any way to access them so that I
> automatically get the Dataset and a list of Instance?
> >
> > Thanks in advance,
> > Martin
> >
>
>
> UC4 Senactive Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben mit
> einer weiteren Betriebsstaette in /with an office at
> Prinz-Eugen-Stra?e 72, 1040 Wien Firmenbuchnummer/Commercial Register
> No. 261186y Firmenbuchgericht/Commercial Register Court: Landesgericht
> St. Poelten This email (including any attachments) may contain
> information which is privileged, confidential, or protected. If you
> are not the intended recipient, note that any disclosure, copying,
> distribution, or use of the contents of this message and attached
> files is prohibited. If you have received this email in error, please
> notify the sender and delete this email and any attached files.
>

UC4 Senactive Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben mit einer 
weiteren Betriebsstaette in /with an office at Prinz-Eugen-Straße 72, 1040 Wien 
Firmenbuchnummer/Commercial Register No. 261186y Firmenbuchgericht/Commercial 
Register Court: Landesgericht St. Poelten

Reply via email to