Re: high CPU when using bulk loading

2015-01-07 Thread Gabriel Reid
Hi Noam, It doesn't sound all that surprising that you're CPU bound on a batch import job like this if you consider everything that is going on within the mappers. Let's say you're importing data for a table with 20 columns. For each line of input data, the following is then occurring within the

Re: RE: high CPU when using bulk loading

2015-01-07 Thread Wangwenli
: user@phoenix.apache.org<mailto:user@phoenix.apache.org> Subject: RE: high CPU when using bulk loading Only when doing bulk loading and only during mapping phase -Original Message- From: Puneet Kumar Ojha [puneet.ku...@pubmatic.com] Received: רביעי, 07 ינו 2015, 15:03 To: user@phoeni

RE: high CPU when using bulk loading

2015-01-07 Thread Puneet Kumar Ojha
What is the cluster size, number of salted buckets? Are you using compression ? : SNAPPY recommended. From: Bulvik, Noam [mailto:noam.bul...@teoco.com] Sent: Wednesday, January 07, 2015 7:00 PM To: user@phoenix.apache.org Subject: RE: high CPU when using bulk loading Only when doing bulk

RE: high CPU when using bulk loading

2015-01-07 Thread Bulvik, Noam
Only when doing bulk loading and only during mapping phase -Original Message- From: Puneet Kumar Ojha [puneet.ku...@pubmatic.com] Received: רביעי, 07 ינו 2015, 15:03 To: user@phoenix.apache.org [user@phoenix.apache.org] Subject: RE: high CPU when using bulk loading Is the CPU usage 100

RE: high CPU when using bulk loading

2015-01-07 Thread Puneet Kumar Ojha
Is the CPU usage 100% all the time OR only while doing bulk loading? From: Bulvik, Noam [mailto:noam.bul...@teoco.com] Sent: Wednesday, January 07, 2015 6:26 PM To: user@phoenix.apache.org Subject: high CPU when using bulk loading Hi, We are tuning our system for bulk loading. We managed to

high CPU when using bulk loading

2015-01-07 Thread Bulvik, Noam
Hi, We are tuning our system for bulk loading. We managed to load ~250M records per hour (~96G of raw input csv data ) on a cluster with 8 nodes. We use MR bulk loading tool with pre split table and salted key. What we currently see is that while Mappers are working we have 100% CPU usage ac