Re: ECS parquet files query timing out

Paul Rogers Sat, 28 Mar 2020 13:54:09 -0700

Hi Navin,

Thank you for the detailed information. Very helpful.


I may be confused about what "ECS" stands for in your case. I had assumed it is 
the Amazon Elastic Container Service. However, I'm struggling to understand how 
that ECS provides an S3 interface. Is it, instead the Dell EMC Elastic Cloud 
Storage storage layer from Ipsilon? [1]


The stack trace shows that the delay/problem occurs when communicating with the 
S3 endpoint. Assuming my sources match the version you are using, the problem 
occurs when Drill tries to open the Parquet file footer:

  private ParquetMetadata readFooter(Configuration conf, Path path, 
ParquetReaderConfig readerConfig) throws IOException {
    // Error is in the following line
    try (ParquetFileReader reader = 
ParquetFileReader.open(HadoopInputFile.fromPath(path,
      readerConfig.addCountersToConf(conf)), readerConfig.toReadOptions())) {

We can see from the code above, and from the stack trace, that Drill is 
blissfully ignorant of the fact that the S3 API is connecting to ECS. That is, 
Drill does nothing differently for the ECS S3 case than it does for the Amazon 
S3 case or the HDFS case. In all cases, it calls the HDFS client fromPath() 
function.

Given this, my suspicion is that there is a problem with the Dell ECS 
implementation of the S3 API. A previous note suggested that you check this 
outside of Drill.

1. Use the HDFS client to download a Parquet (or any) file from ECS.

2. Use an S3 client to download the same file from ECS.


Do the above repeatedly in a loop to determine if the operations are stable 
under load.

There is also a Parquet client tool that lets you inspect Parquet files. [2] I 
think (but am not certain) that it uses the HDFS client API as well. Try using 
that client to inspect your Parquet files. Again, run the operations in a loop 
to test load. Does that tool hit the same issues?

If the problem is somehow related to Dell's implementation of the S3 API, then 
there is little Drill can do to fix it. On the other hand, if the Dell 
implemetation requires certain properties or settings to work well, then we can 
figure out how to configure that in HDFS so that Drill can pick up those 
settings. Information about Dell's S3 implementation is at [3].

Please let us know if the above suggestions are off the mark; all we have to go 
on is the information which you've kindly shared. Perhahs there are other key 
facts we do not yet know.


Thanks,
- Paul


[1] 
http://doc.isilon.com/ECS/3.1/DataAccessGuide/index.html#ecs_c_docs_landing_page_content.html

[2] https://github.com/apache/parquet-mr/tree/master/parquet-cli

[3] 
https://www.emc.com/techpubs/api/ecs/v2-2-0-0/S3ObjectOperations_ba672412ac371bb6cf4e69291344510e_overview.htm


 

    On Saturday, March 28, 2020, 1:39:00 AM PDT, Navin Bhawsar 
<[email protected]> wrote:  
 
 Thanks Paul.
To add more details we are comparing drill performance using below two storage 
options1.dfs plugin pointing to single node hdfs cluster2. S3 plugin  pointing 
to ecs bucket ,no hdfs
In both storage we have data stored in parquet files for e.g. in this query we 
are querying a directory with 19 parquet files close to 2gb in total same set 
on s3 and hdfs.
Drillbits are running on 2 unix machines with (6 core,32 gb) each.On one of the 
unix machine we have hdfs single node cluster + zookeeper + drillbit running 
.Other unix machine is running drill bit.
On Both hdfs and s3 storage we have created parquet metadata file,additionally 
we have statistics created for dfs .Based on analysis so far dfs is performing 
better when compared to s3.Same query which completes in 2.121s on dfs ,times 
out on s3.
Looking at plan mostly "parquet row group scan" is taking more time 99 %.Stack 
trace shows error " unable to execute http request: Timeout waiting  for 
connection from (org.apache.drill.common.exceptions.ExecutionSetupException) 
java.io.InterruptedIOException: getFileStatus on 
s3a://test-bucket/TestDir/Test_1.parquet:  com.amazonaws.SdkClientException: 
Unable to execute HTTP request: Timeout waiting for connection from pool
    
org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():261
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114
    org.apache.drill.exec.physical.impl.ImplCreator.getExec():90
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():292
    org.apache.drill.common.SelfCleaningRunnable.run():38
    .......():0
  Caused By (java.lang.Exception) getFileStatus on 
s3a://test-bucket/TestDir/Test_1.parquet:  com.amazonaws.SdkClientException: 
Unable to execute HTTP request: Timeout waiting for connection from pool
    org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException():352
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():177
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():151
    org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus():2242
    org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus():2204
    org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():2143
    org.apache.parquet.hadoop.util.HadoopInputFile.fromPath():39
    
org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.readFooter():353
    
org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():149
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114
    org.apache.drill.exec.physical.impl.ImplCreator.getExec():90
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():292
    org.apache.drill.common.SelfCleaningRunnable.run():38
    .......():0Thanks & Regards ,Navin 
On Sat, 28 Mar 2020, 09:27 Paul Rogers, <[email protected]> wrote:

Hi Navin,

You had mentioned your ECS solution in an earlier note. What are you using to 
access data in your container? Is your ECS container running HDFS? Or, do you 
have some other API?

Do you have Drill running in a container on ECS, or is that were your data is 
located? It would be helpful if you could perhaps describe your setup in a bit 
more detail so we can offer suggestions about where to look for an issue.

By the way: the query profile is often a good place to start. You'll find them 
in the Drill Web Console. Looking at each operator you can see how much memory 
was used and how long things took. Specifically, look at the time taken by the 
scan: is the slowness due to reading the data, or is some other part of the 
query taking the time?

When you get the error, what is the stack trace? Is the error coming from some 
particular HDFS client? In some particular operation?


Thanks,
- Paul

 

    On Friday, March 27, 2020, 6:59:42 AM PDT, Navin Bhawsar 
<[email protected]> wrote:  
 
 Hi,

We are facing performance issue where apache drill query on ecs time out
with below error "ConnectionPoolTimeoutException: Timeout waiting for
connection from pool"

However  same query works fine on hdfs single node with execution time of
2.1 sec.(planning =.483s)

Parquet file size <1.5 GB
Total parquet files scanned = 8( total 19 in directory)
Apache drill version 1.17
JDK 1.8.0_74
Total rows returned from query =71000

There are 2 drillbits running in distributed mode .
13 GB default allocated per drill bit.

Any ideas why ecs performance so bad when compared with hdfs for drill  ?
Please advise if drill provides options to optimize ecs querying .

Please let me know if you need more details.

Thanks & Regards,
Navin

Re: ECS parquet files query timing out

Reply via email to