Thanks Gwen for the information.
On Thu, Jun 26, 2014 at 7:22 PM, Gwen Shapira <[email protected]> wrote: > If you did not see it yet, check out this patch: > https://issues.apache.org/jira/browse/SQOOP-1341 > > It may solve your issue. > > If user_id is a primary key, you don't need to add a unique key. Primary > keys are always unique. > > Gwen > > > On Tue, Jun 24, 2014 at 11:22 AM, Buntu Dev <[email protected]> wrote: > >> Thanks Gwen for the response. >> >> Tried with 4 mappers, 100 records.per.statement and 100 >> statements.per.transactions.. got much lower throughput: >> >> [ExportJobBase] - Transferred 788.9209 KB in 623.7671 seconds (1.2648 >> KB/sec) >> [ExportJobBase] - Exported 63787 records. >> >> With 1 mapper it was too slow and we had to kill the job. >> >> When we tried with 100/100/100.. notice for same number of records the >> amount of data transferred is 5.9855MB vs 788KB from previous run: >> >> [ExportJobBase] - Transferred 5.9855 MB in 47.9844 seconds (127.7323 >> KB/sec) >> [ExportJobBase] - Exported 63787 records. >> >> Do we need to add the unique key on 'user_id' (already has primary key) >> as the INSERT statement constructed by sqoop seems fine. >> >> >> >> On Mon, Jun 23, 2014 at 2:54 PM, Gwen Shapira <[email protected]> >> wrote: >> >>> You are using super high number of mappers for very low amounts of data >>> (50MB or less) and getting very low throughput (less than 1MB/s) >>> >>> Can you try same jobs with just 1 mapper? And 4 mappers? >>> >>> Gwen >>> >>> >>> >>> On Mon, Jun 23, 2014 at 2:32 PM, Buntu Dev <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> We are using sqoop (v1.4.4) export for exporting the uniques per >>>> user_id into the mysql table with 2 integer columns and with 'user_id' as >>>> the unique key with these options: >>>> >>>> sqoop export \ >>>> -Dsqoop.export.records.per.statement=1000 \ >>>> -Dsqoop.export.statements.per.transaction=1 \ >>>> --connect "jdbc:mysql://host/db" \ >>>> --username user \ >>>> --password pwd \ >>>> --table tbl \ >>>> --batch \ >>>> --relaxed-isolation \ >>>> --update-mode allowinsert \ >>>> --update-key user_id \ >>>> --export-dir output/dir/ \ >>>> --input-fields-terminated-by '\t' \ >>>> --input-lines-terminated-by '\n' \ >>>> --num-mappers=200 >>>> >>>> Are the options such as batching, records/statement or statements per >>>> transaction applicable in case of the MySQL upserts? >>>> >>>> Also, we are noticing that for smaller jobs the throughput of the >>>> export job is higher compared to the larger jobs: >>>> >>>> large job: >>>> [ExportJobBase] - Transferred 37.3672 MB in 838.2908 seconds (45.6452 >>>> KB/sec) >>>> [ExportJobBase] - Exported 3025677 records. >>>> >>>> small job: >>>> [ExportJobBase] - Transferred 12.0951 MB in 40.9846 seconds (302.1965 >>>> KB/sec) >>>> [ExportJobBase] - Exported 88042 records. >>>> >>>> and bumping up the mappers to 400 has similar behavior as well: >>>> >>>> large job: >>>> [ExportJobBase] - Transferred 49.6578 MB in 638.6147 seconds (79.6249 >>>> KB/sec) >>>> [ExportJobBase] - Exported 3243995 records. >>>> >>>> small job: >>>> [ExportJobBase] - Transferred 24.4653 MB in 59.1785 seconds (423.3366 >>>> KB/sec) >>>> [ExportJobBase] - Exported 139181 records. >>>> >>>> Attempting to remove batch option or increasing the number of >>>> statements per transaction causes lock wait timeout exceeded exceptions. >>>> >>>> Please let me know if there is anything obvious we might be missing. >>>> >>>> Thanks! >>>> >>> >>> >> >
