Hi Jarcec, I am running the command on the CLI of a cluster node. It appears to run a local MR job writing the results to /tmp before sending it to S3:
[..] [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: Beginning mysqldump fast path import [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: Performing import of table image from database some_db [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: Converting data to use specified delimiters. [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: (For the fastest possible import, use [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: --mysql-delimiters to specify the same field [hostaddress] out: 13/04/02 01:52:49 INFO mapreduce.MySQLDumpMapper: delimiters as are used by mysqldump.) [hostaddress] out: 13/04/02 01:52:54 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:52:55 INFO mapred.JobClient: map 100% reduce 0% [hostaddress] out: 13/04/02 01:52:57 INFO mapred.LocalJobRunner: [..] [hostaddress] out: 13/04/02 01:53:03 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper: Transfer loop complete. [hostaddress] out: 13/04/02 01:54:42 INFO mapreduce.MySQLDumpMapper: Transferred 668.9657 MB in 113.0105 seconds (5.9195 MB/sec) [hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:54:42 INFO s3native.NativeS3FileSystem: OutputStream for key 'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000' closed. Now beginning upload [hostaddress] out: 13/04/02 01:54:42 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:54:45 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:55:31 INFO s3native.NativeS3FileSystem: OutputStream for key 'some_table/_temporary/_attempt_local555455791_0001_m_000000_0/part-m-00000' upload complete [hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task: Task:attempt_local555455791_0001_m_000000_0 is done. And is in the process of commiting [hostaddress] out: 13/04/02 01:55:31 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:55:31 INFO mapred.Task: Task attempt_local555455791_0001_m_000000_0 is allowed to commit now [hostaddress] out: 13/04/02 01:55:36 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:56:03 WARN output.FileOutputCommitter: Failed to delete the temporary output directory of task: attempt_local555455791_0001_m_000000_0 - s3n://secret@bucketsomewhere /some_table/_temporary/_attempt_local555455791_0001_m_000000_0 [hostaddress] out: 13/04/02 01:56:03 INFO output.FileOutputCommitter: Saved output of task 'attempt_local555455791_0001_m_000000_0' to s3n://secret@bucketsomewhere/some_table [hostaddress] out: 13/04/02 01:56:03 INFO mapred.LocalJobRunner: [hostaddress] out: 13/04/02 01:56:03 INFO mapred.Task: Task 'attempt_local555455791_0001_m_000000_0' done. [hostaddress] out: 13/04/02 01:56:03 INFO mapred.LocalJobRunner: Finishing task: attempt_local555455791_0001_m_000000_0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.LocalJobRunner: Map task executor complete. [hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem: OutputStream for key 'some_table/_SUCCESS' writing to tempfile '* /tmp/hadoop-jenkins/s3/output-1400873345908825433.tmp*' [hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem: OutputStream for key 'some_table/_SUCCESS' closed. Now beginning upload [hostaddress] out: 13/04/02 01:56:03 INFO s3native.NativeS3FileSystem: OutputStream for key 'some_table/_SUCCESS' upload complete [...deleting cached jars...] [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Job complete: job_local555455791_0001 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Counters: 23 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: File System Counters [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: FILE: Number of bytes read=6471451 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: FILE: Number of bytes written=6623109 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: FILE: Number of read operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: FILE: Number of large read operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: FILE: Number of write operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: HDFS: Number of bytes read=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: HDFS: Number of bytes written=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: HDFS: Number of read operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: HDFS: Number of large read operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: HDFS: Number of write operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: S3N: Number of bytes read=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: S3N: Number of bytes written=773081963 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: S3N: Number of read operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: S3N: Number of large read operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: S3N: Number of write operations=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Map-Reduce Framework [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Map input records=1 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Map output records=14324124 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Input split bytes=87 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Spilled Records=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: CPU time spent (ms)=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 [hostaddress] out: 13/04/02 01:56:03 INFO mapred.JobClient: Total committed heap usage (bytes)=142147584 [hostaddress] out: 13/04/02 01:56:03 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 201.4515 seconds (0 bytes/sec) [hostaddress] out: 13/04/02 01:56:03 INFO mapreduce.ImportJobBase: Retrieved 14324124 records. On Thu, Mar 28, 2013 at 9:49 PM, Jarek Jarcec Cecho <[email protected]>wrote: > Hi Christian, > would you mind describing a bit more the behaviour you're observing? > > Sqoop should be touching /tmp only on machine where you've executed it for > generating and compiling code (<1MB!). The data transfer itself is done on > your Hadoop cluster from within a mapreduce job and the output is directly > stored in your destination folder. I'm not familiar with s3 file system > implementation, but can it happen that it's the S3 library which is storing > the data in /tmp? > > Jarcec > > On Thu, Mar 28, 2013 at 03:54:11PM +0000, Christian Prokopp wrote: > > Thanks for the idea Alex. I considered this but that would mean I have to > > change my cluster setup for sqoop (last resort solution). I'd very much > > rather point sqoop to existing large disks. > > > > Cheers, > > Christian > > > > > > On Thu, Mar 28, 2013 at 3:50 PM, Alexander Alten-Lorenz < > [email protected] > > > wrote: > > > > > You could mount a bigger disk into /tmp - or symlink /tmp to another > > > directory which have enough space. > > > > > > Best > > > - Alex > > > > > > On Mar 28, 2013, at 4:35 PM, Christian Prokopp < > [email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > I am using sqoop to copy data from MySQL to S3: > > > > > > > > (Sqoop 1.4.2-cdh4.2.0) > > > > $ sqoop import --connect jdbc:mysql://server:port/db --username user > > > --password pass --table tablename --target-dir s3n://xyz@somehwere > /a/b/c > > > --fields-terminated-by='\001' -m 1 --direct > > > > > > > > My problem is that sqoop temporarily stores the data on /tmp, which > is > > > not big enough for the data. I am unable to find a configuration > option to > > > point sqoop to a bigger partition/disk. Any suggestions? > > > > > > > > Cheers, > > > > Christian > > > > > > > > > > -- > > > Alexander Alten-Lorenz > > > http://mapredit.blogspot.com > > > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > > > > > > > > > > > > -- > > Best regards, > > > > *Christian Prokopp* > > Data Scientist, PhD > > Rangespan Ltd. <http://www.rangespan.com/> > -- Best regards, *Christian Prokopp* Data Scientist, PhD Rangespan Ltd. <http://www.rangespan.com/>
