Hi Navin,
Thank you for the detailed information. Very helpful.
I may be confused about what "ECS" stands for in your case. I had assumed it is
the Amazon Elastic Container Service. However, I'm struggling to understand how
that ECS provides an S3 interface. Is it, instead the Dell EMC Elastic Cloud
Storage storage layer from Ipsilon? [1]
The stack trace shows that the delay/problem occurs when communicating with the
S3 endpoint. Assuming my sources match the version you are using, the problem
occurs when Drill tries to open the Parquet file footer:
private ParquetMetadata readFooter(Configuration conf, Path path,
ParquetReaderConfig readerConfig) throws IOException {
// Error is in the following line
try (ParquetFileReader reader =
ParquetFileReader.open(HadoopInputFile.fromPath(path,
readerConfig.addCountersToConf(conf)), readerConfig.toReadOptions())) {
We can see from the code above, and from the stack trace, that Drill is
blissfully ignorant of the fact that the S3 API is connecting to ECS. That is,
Drill does nothing differently for the ECS S3 case than it does for the Amazon
S3 case or the HDFS case. In all cases, it calls the HDFS client fromPath()
function.
Given this, my suspicion is that there is a problem with the Dell ECS
implementation of the S3 API. A previous note suggested that you check this
outside of Drill.
1. Use the HDFS client to download a Parquet (or any) file from ECS.
2. Use an S3 client to download the same file from ECS.
Do the above repeatedly in a loop to determine if the operations are stable
under load.
There is also a Parquet client tool that lets you inspect Parquet files. [2] I
think (but am not certain) that it uses the HDFS client API as well. Try using
that client to inspect your Parquet files. Again, run the operations in a loop
to test load. Does that tool hit the same issues?
If the problem is somehow related to Dell's implementation of the S3 API, then
there is little Drill can do to fix it. On the other hand, if the Dell
implemetation requires certain properties or settings to work well, then we can
figure out how to configure that in HDFS so that Drill can pick up those
settings. Information about Dell's S3 implementation is at [3].
Please let us know if the above suggestions are off the mark; all we have to go
on is the information which you've kindly shared. Perhahs there are other key
facts we do not yet know.
Thanks,
- Paul
[1]
http://doc.isilon.com/ECS/3.1/DataAccessGuide/index.html#ecs_c_docs_landing_page_content.html
[2] https://github.com/apache/parquet-mr/tree/master/parquet-cli
[3]
https://www.emc.com/techpubs/api/ecs/v2-2-0-0/S3ObjectOperations_ba672412ac371bb6cf4e69291344510e_overview.htm
On Saturday, March 28, 2020, 1:39:00 AM PDT, Navin Bhawsar
<[email protected]> wrote:
Thanks Paul.
To add more details we are comparing drill performance using below two storage
options1.dfs plugin pointing to single node hdfs cluster2. S3 plugin pointing
to ecs bucket ,no hdfs
In both storage we have data stored in parquet files for e.g. in this query we
are querying a directory with 19 parquet files close to 2gb in total same set
on s3 and hdfs.
Drillbits are running on 2 unix machines with (6 core,32 gb) each.On one of the
unix machine we have hdfs single node cluster + zookeeper + drillbit running
.Other unix machine is running drill bit.
On Both hdfs and s3 storage we have created parquet metadata file,additionally
we have statistics created for dfs .Based on analysis so far dfs is performing
better when compared to s3.Same query which completes in 2.121s on dfs ,times
out on s3.
Looking at plan mostly "parquet row group scan" is taking more time 99 %.Stack
trace shows error " unable to execute http request: Timeout waiting for
connection from (org.apache.drill.common.exceptions.ExecutionSetupException)
java.io.InterruptedIOException: getFileStatus on
s3a://test-bucket/TestDir/Test_1.parquet: com.amazonaws.SdkClientException:
Unable to execute HTTP request: Timeout waiting for connection from pool
org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():261
org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42
org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114
org.apache.drill.exec.physical.impl.ImplCreator.getExec():90
org.apache.drill.exec.work.fragment.FragmentExecutor.run():292
org.apache.drill.common.SelfCleaningRunnable.run():38
.......():0
Caused By (java.lang.Exception) getFileStatus on
s3a://test-bucket/TestDir/Test_1.parquet: com.amazonaws.SdkClientException:
Unable to execute HTTP request: Timeout waiting for connection from pool
org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException():352
org.apache.hadoop.fs.s3a.S3AUtils.translateException():177
org.apache.hadoop.fs.s3a.S3AUtils.translateException():151
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus():2242
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus():2204
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():2143
org.apache.parquet.hadoop.util.HadoopInputFile.fromPath():39
org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.readFooter():353
org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():149
org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42
org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114
org.apache.drill.exec.physical.impl.ImplCreator.getExec():90
org.apache.drill.exec.work.fragment.FragmentExecutor.run():292
org.apache.drill.common.SelfCleaningRunnable.run():38
.......():0Thanks & Regards ,Navin
On Sat, 28 Mar 2020, 09:27 Paul Rogers, <[email protected]> wrote:
Hi Navin,
You had mentioned your ECS solution in an earlier note. What are you using to
access data in your container? Is your ECS container running HDFS? Or, do you
have some other API?
Do you have Drill running in a container on ECS, or is that were your data is
located? It would be helpful if you could perhaps describe your setup in a bit
more detail so we can offer suggestions about where to look for an issue.
By the way: the query profile is often a good place to start. You'll find them
in the Drill Web Console. Looking at each operator you can see how much memory
was used and how long things took. Specifically, look at the time taken by the
scan: is the slowness due to reading the data, or is some other part of the
query taking the time?
When you get the error, what is the stack trace? Is the error coming from some
particular HDFS client? In some particular operation?
Thanks,
- Paul
On Friday, March 27, 2020, 6:59:42 AM PDT, Navin Bhawsar
<[email protected]> wrote:
Hi,
We are facing performance issue where apache drill query on ecs time out
with below error "ConnectionPoolTimeoutException: Timeout waiting for
connection from pool"
However same query works fine on hdfs single node with execution time of
2.1 sec.(planning =.483s)
Parquet file size <1.5 GB
Total parquet files scanned = 8( total 19 in directory)
Apache drill version 1.17
JDK 1.8.0_74
Total rows returned from query =71000
There are 2 drillbits running in distributed mode .
13 GB default allocated per drill bit.
Any ideas why ecs performance so bad when compared with hdfs for drill ?
Please advise if drill provides options to optimize ecs querying .
Please let me know if you need more details.
Thanks & Regards,
Navin