Which version of Phoenix you are using? There were several bugs related to local index and CSV bulkload in 4.7 and 4.8 I believe. Another problem I remember is the RAM size for reducers. It may sound ridiculous, but using less may help.
Thanks, Sergey On Fri, Jun 2, 2017 at 11:13 AM, cmbendre <chaitanya.ben...@zeotap.com> wrote: > Hi, > > I need some help in understanding how CsvBulkLoadTool works. I am trying to > load data ~ 200 GB (There are 100 files of 2 GB each) from hdfs to Phoenix > with 1 master and 4 region-servers. These region servers have 32 GB RAM and > 16 cores each. Total HDFS disk space is 4 TB. > > The table is salted with 16. So 4 regions per regionservers. There are 400 > columns and more than 30 local indexes. > > Here is the command i am using - > /HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf > hadoop jar /usr/lib/phoenix/phoenix-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode= > 000 > --table TABLE_SNAPSHOT --input /user/table/*.csv/ > > The job proceeds normally but gets stuck at reduce phase around 90 %. I > also > observed that initially it was using full resource of the cluster but it > uses much less resources near completion. (10 percent of RAM and cores). > > What exactly is happening behind the scenes ? How i can tune it to work > faster ? I am using HBase + HDFS deployed on YARN on AWS. > > Any help is appreciated. > > Thanks > Chaitanya > > > > > -- > View this message in context: http://apache-phoenix-user- > list.1124778.n5.nabble.com/Large-CSV-bulk-load-stuck-tp3622.html > Sent from the Apache Phoenix User List mailing list archive at Nabble.com. >