Those were all from 'mapreduce', not 'mapred' packages.
This seems like it's an issue with DBOutputFormat... org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutp utFormat.java:180) On 9/19/11 1:41 AM, "Steinmaurer Thomas" <[email protected]> wrote: >Hi Doug, > >looked at your example and this looks pretty much what we have been done >in our proof-of-concept implementation writing back to another HBase >table by using a TableReducer. This works fine. We want to change that >in a way that the final result is written to Oracle. > >When doing that, we end up with the following exception in the reduce >step (see also my post "MR-Job: Exception in DBOutputFormat"): > > >java.io.IOException > at >org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutp >utFormat.java:180) > at >org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio >n.java:1127) > at org.apache.hadoop.mapred.Child.main(Child.java:264) > > >Your examples a very welcome, because they are based on the mapreduce >package, right? Pretty much all examples out there are based on mapred, >which is AFAIK the "old" way to write MR-Jobs. > > >Regards, >Thomas > > > >-----Original Message----- >From: Doug Meil [mailto:[email protected]] >Sent: Freitag, 16. September 2011 21:42 >To: [email protected] >Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer >... > > >Chris, agreed... There are sometimes that reducers aren't required, and >then situations where they are useful. We have both kinds of jobs. > >For others following the thread, I updated the book recently with more >MR examples (read-only, read-write, read-summary) > >http://hbase.apache.org/book.html#mapreduce.example > > >As to the question that started this thread... > > >re: "Store aggregated data in Oracle. " > >To me, that sounds a like the "read-summary" example with JDBC-Oracle in >the reduce step. > > > > > >On 9/16/11 2:58 PM, "Chris Tarnas" <[email protected]> wrote: > >>If only I could make NY in Nov :) >> >>We extract out large numbers of DNA sequence reads from HBase, run them > >>through M/R pipelines to analyze and aggregate and then we load the >>results back in. Definitely specialized usage, but I could see other >>perfectly valid uses for reducers with HBase. >> >>-chris >> >>On Sep 16, 2011, at 11:43 AM, Michael Segel wrote: >> >>> >>> Sonal, >>> >>> You do realize that HBase is a "database", right? ;-) >>> >>> So again, why do you need a reducer? ;-) >>> >>> Using your example... >>> "Again, there will be many cases where one may want a reducer, say >>>trying to count the occurrence of words in a particular column." >>> >>> You can do this one of two ways... >>> 1) Dynamic Counters in Hadoop. >>> 2) Use a temp table and auto increment the value in a column which >>>contains the word count. (Fat row where rowkey is doc_id and column >>>is word or rowkey is doc_id|word) >>> >>> I'm sorry but if you go through all of your examples of why you would > >>>want to use a reducer, you end up finding out that writing to an HBase > >>>table would be faster than a reduce job. >>> (Again we haven't done an exhaustive search, but in all of the HBase >>>jobs we've run... no reducers were necessary.) >>> >>> The point I'm trying to make is that you want to avoid using a >>>reducer whenever possible and if you think about your problem... you >>>can probably come up with a solution that avoids the reducer... >>> >>> >>> HTH >>> >>> -Mike >>> PS. I haven't looked at *all* of the potential use cases of HBase >>>which is why I don't want to say you'll never need a reducer. I will >>>say that based on what we've done at my client's site, we try very >>>hard to avoid reducers. >>> [Note, I'm sure I'm going to get hammered on this when I head to NY >in >>>Nov. :-) ] >>> >>> >>>> Date: Fri, 16 Sep 2011 23:00:49 +0530 >>>> Subject: Re: Writing MR-Job: Something like OracleReducer, >>>>JDBCReducer ... >>>> From: [email protected] >>>> To: [email protected] >>>> >>>> Hi Michael, >>>> >>>> Yes, thanks, I understand the fact that reducers can be expensive >>>>with all the shuffling and the sorting, and you may not need them >>>>always. At the same time, there are many cases where reducers are >>>>useful, like secondary sorting. In many cases, one can have multiple > >>>>map phases and not have a reduce phase at all. Again, there will be >>>>many cases where one may want a reducer, say trying to count the >>>>occurrence of words in a particular column. >>>> >>>> >>>> With this thought chain, I do not feel ready to say that when >>>>dealing with HBase, I really dont want to use a reducer. Please >>>>correct me if I am wrong. >>>> >>>> Thanks again. >>>> >>>> Best Regards, >>>> Sonal >>>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> >>>> Nube Technologies <http://www.nubetech.co> >>>> >>>> <http://in.linkedin.com/in/sonalgoyal> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Sep 16, 2011 at 10:35 PM, Michael Segel >>>> <[email protected]>wrote: >>>> >>>>> >>>>> Sonal, >>>>> >>>>> Just because you have a m/r job doesn't mean that you need to >>>>> reduce anything. You can have a job that contains only a mapper. >>>>> Or your job runner can have a series of map jobs in serial. >>>>> >>>>> Most if not all of the map/reduce jobs where we pull data from >>>>>HBase, don't require a reducer. >>>>> >>>>> To give you a simple example... if I want to determine the table >>>>>schema where I am storing some sort of structured data... >>>>> I just write a m/r job which opens a table, scan's the table >>>>>counting the occurrence of each column name via dynamic counters. >>>>> >>>>> There is no need for a reducer. >>>>> >>>>> Does that help? >>>>> >>>>> >>>>>> Date: Fri, 16 Sep 2011 21:41:01 +0530 >>>>>> Subject: Re: Writing MR-Job: Something like OracleReducer, >>>>>>JDBCReducer >>>>> ... >>>>>> From: [email protected] >>>>>> To: [email protected] >>>>>> >>>>>> Michel, >>>>>> >>>>>> Sorry can you please help me understand what you mean when you say > >>>>>>that >>>>> when >>>>>> dealing with HBase, you really dont want to use a reducer? Here, >>>>>>Hbase is being used as the input to the MR job. >>>>>> >>>>>> Thanks >>>>>> Sonal >>>>>> >>>>>> >>>>>> On Fri, Sep 16, 2011 at 2:35 PM, Michel Segel >>>>>><[email protected] >>>>>> wrote: >>>>>> >>>>>>> I think you need to get a little bit more information. >>>>>>> Reducers are expensive. >>>>>>> When Thomas says that he is aggregating data, what exactly does >>>>>>> he >>>>> mean? >>>>>>> When dealing w HBase, you really don't want to use a reducer. >>>>>>> >>>>>>> You may want to run two map jobs and it could be that just >>>>>>>dumping the output via jdbc makes the most sense. >>>>>>> >>>>>>> We are starting to see a lot of questions where the OP isn't >>>>>>>providing enough information so that the recommendation could be >>>>>>>wrong... >>>>>>> >>>>>>> >>>>>>> Sent from a remote device. Please excuse any typos... >>>>>>> >>>>>>> Mike Segel >>>>>>> >>>>>>> On Sep 16, 2011, at 2:22 AM, Sonal Goyal <[email protected]> >>>>> wrote: >>>>>>> >>>>>>>> There is a DBOutputFormat class in the >>>>> org.apache,hadoop.mapreduce.lib.db >>>>>>>> package, you could use that. Or you could write to the hdfs and >>>>>>>>then >>>>> use >>>>>>>> something like HIHO[1] to export to the db. I have been working >>>>>>> extensively >>>>>>>> in this area, you can write to me directly if you need any help. >>>>>>>> >>>>>>>> 1. https://github.com/sonalgoyal/hiho >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Sonal >>>>>>>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> >>>>>>>> Nube Technologies <http://www.nubetech.co> >>>>>>>> >>>>>>>> <http://in.linkedin.com/in/sonalgoyal> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Sep 16, 2011 at 10:55 AM, Steinmaurer Thomas < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> writing a MR-Job to process HBase data and store aggregated >>>>>>>>>data in Oracle. How would you do that in a MR-job? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Currently, for test purposes we write the result into a HBase >>>>>>>>>table again by using a TableReducer. Is there something like a >>>>> OracleReducer, >>>>>>>>> RelationalReducer, JDBCReducer or whatever? Or should one >>>>>>>>>simply use plan JDBC code in the reduce step? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thomas >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>>> >>> >> >
