I have written a threaded program to batch upsert data into Phoenix. I am
using Phoenix because of secondary index capability. I am getting extremely
slow performance in write.
Explain query looks like following
0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT5
. . . . . . . . . . . . . . . . . . .> ;
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| PLAN
| EST_BYTES_READ | EST_ |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5 | 314572800 | 6838 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 6838 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 6838 |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
I have secondary index that looks like this:
0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from
VBQL_PHOENIX_TRANSCRIPT_INDEX5;
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| PLAN
| EST_BYTES_READ | EST_ |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
| CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN
OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5 | 314572800 | 6838 |
| SERVER FILTER BY FIRST KEY ONLY
| 314572800 | 6838 |
| SERVER AGGREGATE INTO SINGLE ROW
| 314572800 | 6838 |
+---------------------------------------------------------------------------------------------------------------+-----------------+------+
Tuning Settings used in HBase. Problem is it is very hard to scale, I tried
adding more nodes to Hbase cluster, and I also tried adding more threads to
the client program but it is not scaling beyond 6K per minutes, which is
VERY SLOW. Any help is greatly appreciated.
<property>
<name>index.writer.threads.max</name>
<value>30</value>
</property>
<property>
<name>index.builder.threads.max</name>
<value>30</value>
</property>
<property>
<name>phoenix.query.threadPoolSize</name>
<value>256</value>
</property>
<property>
<name>index.builder.threads.keepalivetime</name>
<value>90000</value>
</property>
<property>
<name>phoenix.query.timeoutMs</name>
<value>90000</value>
</property>
Rest of the settings I am using are default.
Source Code (Simple multithreded with write need to be saved individually/
we cannot use batch write)
public void generateAndInsertDataToHbase(int iterationCount ){
try {
int iterations =0;
Instant start = Instant.now();
ExecutorService executorService =
Executors.newCachedThreadPool();
while (true) {
if (iterations == iterationCount) {
verifyRowCountAndExit(start,executorService);
}
Future<Void> future = executorService.submit(new
Callable<Void>() {
public Void call() throws Exception {
List<VbqlBean> vbqlList =
VbqlUtils.generateRandomAccountVbql(accountCategoryMap);
if(vbqlList.size() >0) {
addVbqlAccountToHbaseAsBatch( vbqlList,connStr);
}
return null;
}
});
future.get();
iterations ++;
}
}catch(Exception e) {
e.printStackTrace();
}
}
public void addVbqlAccountToHbaseAsBatch(List<VbqlBean> vbqlList, String
connStr) throws Exception{
Connection connection = null;
try {
connection = DBUtils.initSingleConnection(connStr);
// connection.setAutoCommit(false);
if(vbqlList.size() >0) {
for (VbqlBean vbqlBean : vbqlList) {
DBUtils.executeSingleVbqlQuery( connStr, vbqlBean,
tableName);
}
}
connection.commit();
} finally {
if (connection != null) try { connection.close(); } catch
(SQLException ignore) {}
}
}
--
Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/