Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Arvind Prabhakar Thu, 11 Aug 2011 12:20:08 -0700

Thanks Bejoy.

We have considered adding reduce jobs to Sqoop to further partition
the output files. See [SQOOP-137] for more details.


[SQOOP-137] https://issues.cloudera.org/browse/SQOOP-137

Thanks,
Arvind

On Tue, Aug 9, 2011 at 10:05 AM,  <[email protected]> wrote:
> Moving the discussion on apache sqoop mailing list. Please continue it here.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: [email protected]
> Date: Tue, 9 Aug 2011 16:54:44
> To: <[email protected]>
> Reply-To: [email protected]
> Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even 
> table is partitions
>
> Yes Sqoop imports and exports are totally on parallel processing/ map only 
> processes . No reduce operation required in such scenarios.
>  You are not  doing any sort of aggregated operation while performing imports 
> and exports, hence reducer do hardly come to play.
> SQOOP with a reduce job, I don't have a clue. Are you looking out for some 
> specific implementation? If so please share more details.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: Sonal <[email protected]>
> Date: Tue, 9 Aug 2011 07:52:55
> To: Sqoop Users<[email protected]>
> Reply-To: [email protected]
> Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is 
> partitions
>
> Hi,
>
> Thanks for reply.
> So is it sqoop is just parallel processing , even if you have primary
> key/unique index/partition on table?
>
> Is there any case in which sqoop can make use of reduce job.?
> Is there any way we can set the batchsize/fetchsize in sqoop?
>
> Thanks & Regards,
> Sonal Kumar
>
>
> On Aug 9, 7:44 pm, [email protected] wrote:
>> Hi Sonal
>>         AFAIK Sqoop import and export jobs kicks of map tasks alone, both 
>> are map only jobs.
>>  In imports the data set to be imported is equally distributed across the 
>> mappers and each mapper is responsible for firing its corresponding  SQL 
>> query and fetch data to hdfs. Here no reduce operation required as it is 
>> just  parallel processing(parallel fetching of data) happening under the 
>> hood. Similar case applies for SQOOP export as well, parallel inserts 
>> happening under the hood. For parallel processing just map tasks alone is 
>> fine no reduce operation needed.
>>
>> Regards
>> Bejoy K S
>>
>> -----Original Message-----
>> From: Sonal <[email protected]>
>> Date: Tue, 9 Aug 2011 04:02:10
>> To: Sqoop Users<[email protected]>
>>
>> Reply-To: [email protected]
>> Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is 
>> partitions
>>
>> Hi,
>>
>> I am trying to load the data into db using sqoop export with following
>> command:
>> sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com:
>> 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/
>> work/SALES_input --table SALES_OLH_RANGE -m 4
>>
>> It is able to insert the data , but it is only map jobs
>> 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on
>> the command-line is insecure. Consider using -P instead.
>> 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation
>> 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set
>> to GMT
>> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
>> hadoop
>> 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar
>> at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar
>> Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/
>> work/./SALES_OLH_RANGE.java uses or overrides a deprecated API.
>> Note: Recompile with -Xlint:deprecation for details.
>> 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/
>> sqoop/compile/SALES_OLH_RANGE.jar
>> 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of
>> SALES_OLH_RANGE
>> 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set
>> to GMT
>> 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM SALES_OLH_RANGE t
>> 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing
>> ResultSet: java.sql.SQLException: Could not commit with auto-commit
>> set on
>> 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
>> process : 1
>> 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001
>> 11/08/09 03:57:45 INFO mapred.JobClient:  map 0% reduce 0%
>> 11/08/09 03:57:50 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:51 INFO mapred.JobClient:  map 24% reduce 0%
>> 11/08/09 03:57:53 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:54 INFO mapred.JobClient:  map 41% reduce 0%
>> 11/08/09 03:57:56 INFO mapred.LocalJobRunner:
>> 11/08/09 03:57:57 INFO mapred.JobClient:  map 58% reduce 0%
>> 11/08/09 03:57:59 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:00 INFO mapred.JobClient:  map 75% reduce 0%
>> 11/08/09 03:58:02 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:02 INFO mapred.JobClient:  map 92% reduce 0%
>> 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress
>> thread is finished. keepGoing=false
>> 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
>> is done. And is in the process of commiting
>> 11/08/09 03:58:03 INFO mapred.LocalJobRunner:
>> 11/08/09 03:58:03 INFO mapred.Task: Task
>> 'attempt_local_0001_m_000000_0' done.
>> 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null
>> in cleanup
>> 11/08/09 03:58:04 INFO mapred.JobClient:  map 100% reduce 0%
>> 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001
>> 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6
>> 11/08/09 03:58:04 INFO mapred.JobClient:   FileSystemCounters
>> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_READ=41209592
>> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=309754
>> 11/08/09 03:58:04 INFO mapred.JobClient:   Map-Reduce Framework
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Map input records=918843
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Spilled Records=0
>> 11/08/09 03:58:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
>> 11/08/09 03:58:04 INFO mapred.JobClient:     Map output records=918843
>> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
>> 20.3677 seconds (0 bytes/sec)
>> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843
>> records.
>>
>> why reduce jobs are not coming up? Do i have to pass some other option
>> as well?
>>
>> Quick reply will be appreciated.
>>
>> Thanks & Regards,
>> Sonal Kumar
>>
>> --
>> NOTE: The mailing list [email protected] is deprecated in favor of 
>> Apache Sqoop mailing list [email protected]. Please subscribe 
>> to it by sending an email to [email protected].
>
> --
> NOTE: The mailing list [email protected] is deprecated in favor of 
> Apache Sqoop mailing list [email protected]. Please subscribe 
> to it by sending an email to [email protected].
>

Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Reply via email to