Hi again Nick,
DBInputFormat does use Connection.TRANSACTION_SERIALIZABLE, but this a per
connection attribute. Since every mapper has its own connection, and every
connection is opened in a different time, every connection sees a different
snapshot of the DB and it can cause for example two mapper
Hi Yaron,
I haven't looked at/used it in awhile but I seem to remember that each
mapper's SQL request was wrapped in a transaction to prevent the number
of rows changing. DBInputFormat uses
Connection.TRANSACTION_SERIALIZABLE from java.sql.Connection to prevent
changes in the number of rows
Thanks for the fast response.
Nick, regarding locking a table: as far as I understood from the code, each
mapper opens its own connection to the DB. I didn't see any code such that
the job creates a transaction and passes it to the mapper. Did I
miss something?
again, thanks!
On Tue, Sep 11, 2012
Hi Yaron
Replies inline below.
On 09/11/2012 07:41 AM, Yaron Gonen wrote:
Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:
* (more general question) Are there many use-cases for using
DBInputFormat? Do most Hadoop jobs take t
ions
>allowed for that db.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Yaron Gonen
Date: Tue, 11 Sep 2012 15:41:26
To:
Reply-To: user@hadoop.apache.org
Subject: Some general questions about DBInputFormat
Hi,
After reviewing the
Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:
- (more general question) Are there many use-cases for using
DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
- What happens when the database is updated dur