Which version of Phoenix you are using? There were several bugs related to
local index and CSV bulkload in 4.7 and 4.8 I believe. Another problem I
remember is the RAM size for reducers. It may sound ridiculous, but using
less may help.

Thanks,
Sergey

On Fri, Jun 2, 2017 at 11:13 AM, cmbendre <chaitanya.ben...@zeotap.com>
wrote:

> Hi,
>
> I need some help in understanding how CsvBulkLoadTool works. I am trying to
> load data ~ 200 GB (There are 100 files of 2 GB each) from hdfs to Phoenix
> with 1 master and 4 region-servers. These region servers have 32 GB RAM and
> 16 cores each. Total HDFS disk space is 4 TB.
>
> The table is salted with 16. So 4 regions per regionservers. There are 400
> columns and more than 30 local indexes.
>
> Here is the command i am using -
> /HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf
> hadoop jar /usr/lib/phoenix/phoenix-client.jar
> org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode=
> 000
> --table TABLE_SNAPSHOT --input /user/table/*.csv/
>
> The job proceeds normally but gets stuck at reduce phase around 90 %. I
> also
> observed that initially it was using full resource of the cluster but it
> uses much less resources near completion. (10 percent of RAM and cores).
>
> What exactly is happening behind the scenes ? How i can tune it to work
> faster ? I am using HBase + HDFS deployed on YARN on AWS.
>
> Any help is appreciated.
>
> Thanks
> Chaitanya
>
>
>
>
> --
> View this message in context: http://apache-phoenix-user-
> list.1124778.n5.nabble.com/Large-CSV-bulk-load-stuck-tp3622.html
> Sent from the Apache Phoenix User List mailing list archive at Nabble.com.
>

Reply via email to