Re: Getting too many open files during table scan

Josh Elser Tue, 20 Jun 2017 11:55:05 -0700

I think this is more of an issue of your 78 salt buckets than the widthof your table. Each chunk, running in parallel, is spilling incrementalcounts to disk.

I'd check your ulimit settings on the node which you run this query fromand try to increase the number of open files allowed before going intothis one in more depth :)


On 6/16/17 2:31 PM, Michael Young wrote:

We are running a 13-node hbase cluster. One table uses 78 SALT BUCKETSwhich seems to work reasonable well for both read and write. This tablehas 130 columns with a PK having 30 columns (fairly wide table).
However, after adding several new tables we are seeing errors about toomany open files when running a full table scan.
Caused by: org.apache.phoenix.exception.PhoenixIOException: Too manyopen filesatorg.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)atorg.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:152)atorg.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:84)atorg.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:63)atorg.apache.phoenix.iterate.SpoolingResultIterator$SpoolingResultIteratorFactory.newIterator(SpoolingResultIterator.java:79)atorg.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:112)atorg.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:103)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
atorg.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Too many open files
         at java.io.UnixFileSystem.createFileExclusively(Native Method)
         at java.io.File.createTempFile(File.java:2024)
atorg.apache.phoenix.shaded.org.apache.commons.io.output.DeferredFileOutputStream.thresholdReached(DeferredFileOutputStream.java:176)atorg.apache.phoenix.iterate.SpoolingResultIterator$1.thresholdReached(SpoolingResultIterator.java:116)atorg.apache.phoenix.shaded.org.apache.commons.io.output.ThresholdingOutputStream.checkThreshold(ThresholdingOutputStream.java:224)atorg.apache.phoenix.shaded.org.apache.commons.io.output.ThresholdingOutputStream.write(ThresholdingOutputStream.java:92)
         at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
atorg.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)atorg.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
         at org.apache.phoenix.util.TupleUtil.write(TupleUtil.java:149)
atorg.apache.phoenix.iterate.SpoolingResultIterator.<init>(SpoolingResultIterator.java:127)
         ... 10 more


When running an explain plan:
explain select count(1) from MYBIGTABLE

+------------------------------------------------------------------------------------------------------------------+
|PLAN |
+------------------------------------------------------------------------------------------------------------------+
| CLIENT 8728-CHUNK 674830174 ROWS 2721056772632 BYTES PARALLEL 78-WAYFULL SCAN OVER ATT.PRE_ENG_CONVERSION_OLAP || ROW TIMESTAMP FILTER [0,9223372036854775807)|| SERVER FILTER BY FIRST KEYONLY|| SERVER AGGREGATE INTO SINGLEROW|
+------------------------------------------------------------------------------------------------------------------+
I has a lot of chunks. Normally this query would return at least someresult after running for a few minutes. With appropriate filters in theWHERE clause, the queries run fine.
Any suggestions on how to avoid this error and get better performancefrom the table scans? Realizing that we don't need to run full tablescans regularly, just trying to understand better best practices forPhoenix Hbase.
Thank you,
Michael

Re: Getting too many open files during table scan

Reply via email to