What is the cluster size, number of salted buckets?

Are you using compression ? : SNAPPY recommended.




From: Bulvik, Noam [mailto:[email protected]]
Sent: Wednesday, January 07, 2015 7:00 PM
To: [email protected]
Subject: RE: high CPU when using bulk loading

Only when doing bulk loading and only during mapping phase

-----Original Message-----
From: Puneet Kumar Ojha [[email protected]]
Received: רביעי, 07 ינו 2015, 15:03
To: [email protected]<mailto:[email protected]> 
[[email protected]]
Subject: RE: high CPU when using bulk loading
Is the CPU usage 100% all the time OR only while doing bulk loading?



From: Bulvik, Noam [mailto:[email protected]]
Sent: Wednesday, January 07, 2015 6:26 PM
To: [email protected]<mailto:[email protected]>
Subject: high CPU when using bulk loading


Hi,

We  are tuning our system for bulk loading. We managed to load ~250M records 
per hour (~96G of raw input csv data ) on a cluster with 8 nodes. We use MR 
bulk loading tool with pre split table and salted key.

What we currently see is that while Mappers are working we have 100% CPU usage 
across the cluster. It was our impression that the mapper will be I/O bound and 
not so much CPU intensive

Any idea what else can we tune /check.


Regards

Noam


Information in this e-mail and its attachments is confidential and privileged 
under the TEOCO confidentiality terms that can be reviewed 
here<http://www.teoco.com/email-disclaimer>.

Reply via email to