Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

Doug Meil Mon, 19 Sep 2011 06:33:55 -0700

Those were all from 'mapreduce', not 'mapred' packages.


This seems like it's an issue with DBOutputFormat...

org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutp
utFormat.java:180)





On 9/19/11 1:41 AM, "Steinmaurer Thomas" <[email protected]>
wrote:

>Hi Doug,
>
>looked at your example and this looks pretty much what we have been done
>in our proof-of-concept implementation writing back to another HBase
>table by using a TableReducer. This works fine. We want to change that
>in a way that the final result is written to Oracle.
>
>When doing that, we end up with the following exception in the reduce
>step (see also my post "MR-Job: Exception in DBOutputFormat"):
>
>
>java.io.IOException
>        at
>org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutp
>utFormat.java:180)
>        at
>org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
>n.java:1127)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
>
>Your examples a very welcome, because they are based on the mapreduce
>package, right? Pretty much all examples out there are based on mapred,
>which is AFAIK the "old" way to write MR-Jobs.
>
>
>Regards,
>Thomas
>
>
>
>-----Original Message-----
>From: Doug Meil [mailto:[email protected]]
>Sent: Freitag, 16. September 2011 21:42
>To: [email protected]
>Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer
>...
>
>
>Chris, agreed... There are sometimes that reducers aren't required, and
>then situations where they are useful.  We have both kinds of jobs.
>
>For others following the thread, I updated the book recently with more
>MR examples (read-only, read-write, read-summary)
>
>http://hbase.apache.org/book.html#mapreduce.example
>
>
>As to the question that started this thread...
>
>
>re:  "Store aggregated data in Oracle. "
>
>To me, that sounds a like the "read-summary" example with JDBC-Oracle in
>the reduce step.
>
>
>
>
>
>On 9/16/11 2:58 PM, "Chris Tarnas" <[email protected]> wrote:
>
>>If only I could make NY in Nov :)
>>
>>We extract out large numbers of DNA sequence reads from HBase, run them
>
>>through M/R pipelines to analyze and aggregate and then we load the
>>results back in. Definitely specialized usage, but I could see other
>>perfectly valid uses for reducers with HBase.
>>
>>-chris
>> 
>>On Sep 16, 2011, at 11:43 AM, Michael Segel wrote:
>>
>>> 
>>> Sonal,
>>> 
>>> You do realize that HBase is a "database", right? ;-)
>>> 
>>> So again, why do you need a reducer?  ;-)
>>> 
>>> Using your example...
>>> "Again, there will be many cases where one may want a reducer, say
>>>trying to count the occurrence of words in a particular column."
>>> 
>>> You can do this one of two ways...
>>> 1) Dynamic Counters in Hadoop.
>>> 2) Use a temp table and auto increment the value in a column which
>>>contains the word count.  (Fat row where rowkey is doc_id and column
>>>is word or rowkey is doc_id|word)
>>> 
>>> I'm sorry but if you go through all of your examples of why you would
>
>>>want to use a reducer, you end up finding out that writing to an HBase
>
>>>table would be faster than a reduce job.
>>> (Again we haven't done an exhaustive search, but in all of the HBase
>>>jobs we've run... no reducers were necessary.)
>>> 
>>> The point I'm trying to make is that you want to avoid using a
>>>reducer whenever possible and if you think about your problem... you
>>>can probably come up with a solution that avoids the reducer...
>>> 
>>> 
>>> HTH
>>> 
>>> -Mike
>>> PS. I haven't looked at *all* of the potential use cases of HBase
>>>which is why I don't want to say you'll never need a reducer. I will
>>>say that based on what we've done at my client's site, we try very
>>>hard to avoid reducers.
>>> [Note, I'm sure I'm going to get hammered on this when I head to NY
>in
>>>Nov. :-)   ]
>>> 
>>> 
>>>> Date: Fri, 16 Sep 2011 23:00:49 +0530
>>>> Subject: Re: Writing MR-Job: Something like OracleReducer,
>>>>JDBCReducer ...
>>>> From: [email protected]
>>>> To: [email protected]
>>>> 
>>>> Hi Michael,
>>>> 
>>>> Yes, thanks, I understand the fact that reducers can be expensive
>>>>with all  the shuffling and the sorting, and you may not need them
>>>>always. At the same  time, there are many cases where reducers are
>>>>useful, like secondary  sorting. In many cases, one can have multiple
>
>>>>map phases and not have a  reduce phase at all. Again, there will be
>>>>many cases where one may want a  reducer, say trying to count the
>>>>occurrence of words in a particular column.
>>>> 
>>>> 
>>>> With this thought chain, I do not feel ready to say that when
>>>>dealing with  HBase, I really dont want to use a reducer. Please
>>>>correct me if I am  wrong.
>>>> 
>>>> Thanks again.
>>>> 
>>>> Best Regards,
>>>> Sonal
>>>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>>>> Nube Technologies <http://www.nubetech.co>
>>>> 
>>>> <http://in.linkedin.com/in/sonalgoyal>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Sep 16, 2011 at 10:35 PM, Michael Segel
>>>> <[email protected]>wrote:
>>>> 
>>>>> 
>>>>> Sonal,
>>>>> 
>>>>> Just because you have a m/r job doesn't mean that you need to
>>>>> reduce anything. You can have a job that contains only a mapper.
>>>>> Or your job runner can have a series of map jobs in serial.
>>>>> 
>>>>> Most if not all of the map/reduce jobs where we pull data from
>>>>>HBase, don't  require a reducer.
>>>>> 
>>>>> To give you a simple example... if I want to determine the table
>>>>>schema  where I am storing some sort of structured data...
>>>>> I just write a m/r job which opens a table, scan's the table
>>>>>counting the  occurrence of each column name via dynamic counters.
>>>>> 
>>>>> There is no need for a reducer.
>>>>> 
>>>>> Does that help?
>>>>> 
>>>>> 
>>>>>> Date: Fri, 16 Sep 2011 21:41:01 +0530
>>>>>> Subject: Re: Writing MR-Job: Something like OracleReducer,
>>>>>>JDBCReducer
>>>>> ...
>>>>>> From: [email protected]
>>>>>> To: [email protected]
>>>>>> 
>>>>>> Michel,
>>>>>> 
>>>>>> Sorry can you please help me understand what you mean when you say
>
>>>>>>that
>>>>> when
>>>>>> dealing with HBase, you really dont want to use a reducer? Here,
>>>>>>Hbase is  being used as the input to the MR job.
>>>>>> 
>>>>>> Thanks
>>>>>> Sonal
>>>>>> 
>>>>>> 
>>>>>> On Fri, Sep 16, 2011 at 2:35 PM, Michel Segel
>>>>>><[email protected]
>>>>>> wrote:
>>>>>> 
>>>>>>> I think you need to get a little bit more information.
>>>>>>> Reducers are expensive.
>>>>>>> When Thomas says that he is aggregating data, what exactly does
>>>>>>> he
>>>>> mean?
>>>>>>> When dealing w HBase, you really don't want to use a reducer.
>>>>>>> 
>>>>>>> You may want to run two map jobs and it could be that just
>>>>>>>dumping the  output via jdbc makes the most sense.
>>>>>>> 
>>>>>>> We are starting to see a lot of questions where the OP isn't
>>>>>>>providing  enough information so that the recommendation could be
>>>>>>>wrong...
>>>>>>> 
>>>>>>> 
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>> 
>>>>>>> Mike Segel
>>>>>>> 
>>>>>>> On Sep 16, 2011, at 2:22 AM, Sonal Goyal <[email protected]>
>>>>> wrote:
>>>>>>> 
>>>>>>>> There is a DBOutputFormat class in the
>>>>> org.apache,hadoop.mapreduce.lib.db
>>>>>>>> package, you could use that. Or you could write to the hdfs and
>>>>>>>>then
>>>>> use
>>>>>>>> something like HIHO[1] to export to the db. I have been working
>>>>>>> extensively
>>>>>>>> in this area, you can write to me directly if you need any help.
>>>>>>>> 
>>>>>>>> 1. https://github.com/sonalgoyal/hiho
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Sonal
>>>>>>>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>>>>>>>> Nube Technologies <http://www.nubetech.co>
>>>>>>>> 
>>>>>>>> <http://in.linkedin.com/in/sonalgoyal>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Sep 16, 2011 at 10:55 AM, Steinmaurer Thomas <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> writing a MR-Job to process HBase data and store aggregated
>>>>>>>>>data in  Oracle. How would you do that in a MR-job?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Currently, for test purposes we write the result into a HBase
>>>>>>>>>table  again by using a TableReducer. Is there something like a
>>>>> OracleReducer,
>>>>>>>>> RelationalReducer, JDBCReducer or whatever? Or should one
>>>>>>>>>simply use  plan JDBC code in the reduce step?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks!
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thomas
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>                                       
>>
>

Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...

Reply via email to