Hi Noam,
It doesn't sound all that surprising that you're CPU bound on a batch
import job like this if you consider everything that is going on
within the mappers.
Let's say you're importing data for a table with 20 columns. For each
line of input data, the following is then occurring within the
: user@phoenix.apache.org<mailto:user@phoenix.apache.org>
Subject: RE: high CPU when using bulk loading
Only when doing bulk loading and only during mapping phase
-Original Message-
From: Puneet Kumar Ojha [puneet.ku...@pubmatic.com]
Received: רביעי, 07 ינו 2015, 15:03
To: user@phoeni
What is the cluster size, number of salted buckets?
Are you using compression ? : SNAPPY recommended.
From: Bulvik, Noam [mailto:noam.bul...@teoco.com]
Sent: Wednesday, January 07, 2015 7:00 PM
To: user@phoenix.apache.org
Subject: RE: high CPU when using bulk loading
Only when doing bulk
Only when doing bulk loading and only during mapping phase
-Original Message-
From: Puneet Kumar Ojha [puneet.ku...@pubmatic.com]
Received: רביעי, 07 ינו 2015, 15:03
To: user@phoenix.apache.org [user@phoenix.apache.org]
Subject: RE: high CPU when using bulk loading
Is the CPU usage 100
Is the CPU usage 100% all the time OR only while doing bulk loading?
From: Bulvik, Noam [mailto:noam.bul...@teoco.com]
Sent: Wednesday, January 07, 2015 6:26 PM
To: user@phoenix.apache.org
Subject: high CPU when using bulk loading
Hi,
We are tuning our system for bulk loading. We managed to
Hi,
We are tuning our system for bulk loading. We managed to load ~250M records
per hour (~96G of raw input csv data ) on a cluster with 8 nodes. We use MR
bulk loading tool with pre split table and salted key.
What we currently see is that while Mappers are working we have 100% CPU usage
ac