Thanks Bejoy. We have considered adding reduce jobs to Sqoop to further partition the output files. See [SQOOP-137] for more details.
[SQOOP-137] https://issues.cloudera.org/browse/SQOOP-137 Thanks, Arvind On Tue, Aug 9, 2011 at 10:05 AM, <[email protected]> wrote: > Moving the discussion on apache sqoop mailing list. Please continue it here. > > Regards > Bejoy K S > > -----Original Message----- > From: [email protected] > Date: Tue, 9 Aug 2011 16:54:44 > To: <[email protected]> > Reply-To: [email protected] > Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even > table is partitions > > Yes Sqoop imports and exports are totally on parallel processing/ map only > processes . No reduce operation required in such scenarios. > You are not doing any sort of aggregated operation while performing imports > and exports, hence reducer do hardly come to play. > SQOOP with a reduce job, I don't have a clue. Are you looking out for some > specific implementation? If so please share more details. > > Regards > Bejoy K S > > -----Original Message----- > From: Sonal <[email protected]> > Date: Tue, 9 Aug 2011 07:52:55 > To: Sqoop Users<[email protected]> > Reply-To: [email protected] > Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is > partitions > > Hi, > > Thanks for reply. > So is it sqoop is just parallel processing , even if you have primary > key/unique index/partition on table? > > Is there any case in which sqoop can make use of reduce job.? > Is there any way we can set the batchsize/fetchsize in sqoop? > > Thanks & Regards, > Sonal Kumar > > > On Aug 9, 7:44 pm, [email protected] wrote: >> Hi Sonal >> AFAIK Sqoop import and export jobs kicks of map tasks alone, both >> are map only jobs. >> In imports the data set to be imported is equally distributed across the >> mappers and each mapper is responsible for firing its corresponding SQL >> query and fetch data to hdfs. Here no reduce operation required as it is >> just parallel processing(parallel fetching of data) happening under the >> hood. Similar case applies for SQOOP export as well, parallel inserts >> happening under the hood. For parallel processing just map tasks alone is >> fine no reduce operation needed. >> >> Regards >> Bejoy K S >> >> -----Original Message----- >> From: Sonal <[email protected]> >> Date: Tue, 9 Aug 2011 04:02:10 >> To: Sqoop Users<[email protected]> >> >> Reply-To: [email protected] >> Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is >> partitions >> >> Hi, >> >> I am trying to load the data into db using sqoop export with following >> command: >> sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com: >> 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/ >> work/SALES_input --table SALES_OLH_RANGE -m 4 >> >> It is able to insert the data , but it is only map jobs >> 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on >> the command-line is insecure. Consider using -P instead. >> 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation >> 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set >> to GMT >> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement: >> SELECT t.* FROM SALES_OLH_RANGE t >> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing >> ResultSet: java.sql.SQLException: Could not commit with auto-commit >> set on >> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement: >> SELECT t.* FROM SALES_OLH_RANGE t >> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing >> ResultSet: java.sql.SQLException: Could not commit with auto-commit >> set on >> 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/ >> hadoop >> 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar >> at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar >> Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/ >> work/./SALES_OLH_RANGE.java uses or overrides a deprecated API. >> Note: Recompile with -Xlint:deprecation for details. >> 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/ >> sqoop/compile/SALES_OLH_RANGE.jar >> 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of >> SALES_OLH_RANGE >> 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set >> to GMT >> 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement: >> SELECT t.* FROM SALES_OLH_RANGE t >> 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing >> ResultSet: java.sql.SQLException: Could not commit with auto-commit >> set on >> 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with >> processName=JobTracker, sessionId= >> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to >> process : 1 >> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to >> process : 1 >> 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001 >> 11/08/09 03:57:45 INFO mapred.JobClient: map 0% reduce 0% >> 11/08/09 03:57:50 INFO mapred.LocalJobRunner: >> 11/08/09 03:57:51 INFO mapred.JobClient: map 24% reduce 0% >> 11/08/09 03:57:53 INFO mapred.LocalJobRunner: >> 11/08/09 03:57:54 INFO mapred.JobClient: map 41% reduce 0% >> 11/08/09 03:57:56 INFO mapred.LocalJobRunner: >> 11/08/09 03:57:57 INFO mapred.JobClient: map 58% reduce 0% >> 11/08/09 03:57:59 INFO mapred.LocalJobRunner: >> 11/08/09 03:58:00 INFO mapred.JobClient: map 75% reduce 0% >> 11/08/09 03:58:02 INFO mapred.LocalJobRunner: >> 11/08/09 03:58:02 INFO mapred.JobClient: map 92% reduce 0% >> 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress >> thread is finished. keepGoing=false >> 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 >> is done. And is in the process of commiting >> 11/08/09 03:58:03 INFO mapred.LocalJobRunner: >> 11/08/09 03:58:03 INFO mapred.Task: Task >> 'attempt_local_0001_m_000000_0' done. >> 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null >> in cleanup >> 11/08/09 03:58:04 INFO mapred.JobClient: map 100% reduce 0% >> 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001 >> 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6 >> 11/08/09 03:58:04 INFO mapred.JobClient: FileSystemCounters >> 11/08/09 03:58:04 INFO mapred.JobClient: FILE_BYTES_READ=41209592 >> 11/08/09 03:58:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=309754 >> 11/08/09 03:58:04 INFO mapred.JobClient: Map-Reduce Framework >> 11/08/09 03:58:04 INFO mapred.JobClient: Map input records=918843 >> 11/08/09 03:58:04 INFO mapred.JobClient: Spilled Records=0 >> 11/08/09 03:58:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=154 >> 11/08/09 03:58:04 INFO mapred.JobClient: Map output records=918843 >> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in >> 20.3677 seconds (0 bytes/sec) >> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843 >> records. >> >> why reduce jobs are not coming up? Do i have to pass some other option >> as well? >> >> Quick reply will be appreciated. >> >> Thanks & Regards, >> Sonal Kumar >> >> -- >> NOTE: The mailing list [email protected] is deprecated in favor of >> Apache Sqoop mailing list [email protected]. Please subscribe >> to it by sending an email to [email protected]. > > -- > NOTE: The mailing list [email protected] is deprecated in favor of > Apache Sqoop mailing list [email protected]. Please subscribe > to it by sending an email to [email protected]. >
