Re: Some general questions about DBInputFormat

2012-09-12 Thread Yaron Gonen
Hi again Nick, DBInputFormat does use Connection.TRANSACTION_SERIALIZABLE, but this a per connection attribute. Since every mapper has its own connection, and every connection is opened in a different time, every connection sees a different snapshot of the DB and it can cause for example two mapper

Re: Some general questions about DBInputFormat

2012-09-11 Thread Nick Jones
Hi Yaron, I haven't looked at/used it in awhile but I seem to remember that each mapper's SQL request was wrapped in a transaction to prevent the number of rows changing. DBInputFormat uses Connection.TRANSACTION_SERIALIZABLE from java.sql.Connection to prevent changes in the number of rows

Re: Some general questions about DBInputFormat

2012-09-11 Thread Yaron Gonen
Thanks for the fast response. Nick, regarding locking a table: as far as I understood from the code, each mapper opens its own connection to the DB. I didn't see any code such that the job creates a transaction and passes it to the mapper. Did I miss something? again, thanks! On Tue, Sep 11, 2012

Re: Some general questions about DBInputFormat

2012-09-11 Thread Nick Jones
Hi Yaron Replies inline below. On 09/11/2012 07:41 AM, Yaron Gonen wrote: Hi, After reviewing the class's (not very complicated) code, I have some questions I hope someone can answer: * (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take t

Re: Some general questions about DBInputFormat

2012-09-11 Thread Bejoy KS
ions >allowed for that db. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Yaron Gonen Date: Tue, 11 Sep 2012 15:41:26 To: Reply-To: user@hadoop.apache.org Subject: Some general questions about DBInputFormat Hi, After reviewing the

Some general questions about DBInputFormat

2012-09-11 Thread Yaron Gonen
Hi, After reviewing the class's (not very complicated) code, I have some questions I hope someone can answer: - (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take their input from files or DBs? - What happens when the database is updated dur