Error reading Parquet file from Azure Blob Storage using Apache Drill

Nathan Thu, 07 May 2020 05:34:00 -0700

Hey there,

I trust you are well.  I’m currently working on a POC to connect our end-user 
application to Azure Blob Storage.  I’ve been experimenting with using Apache 
Drill to connect to Blob storage and read a Parquet file.  I've added the .jar 
files for azure-storage-8.6.3.jar and hadoop-azure-3.2.1.jar to my installation 
(I’ve also tried the combinations of jar files suggested here 
(https://drill.apache.org/docs/azure-blob-storage-plugin/)


I'm able to read a JSON file stored in Blob storage (see first screenshot 
below), however, when I try to read the Parquet file I get the following error:

ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: SELECT * 
FROM az.default../CLTYP/CLTYP_2020_04_29_09_57.parquet LIMIT 100 [30038]Query 
execution error. Details:[ SYSTEM ERROR: StorageException: The requested 
operation is not allowed in the current state of the entity.
Please, refer to logs for more information.

I then downloaded the Parquet file to my laptop and was able to explore it 
without any issues (see second screenshot below).

I'm new to Drill and not sure how to proceed? I'm not sure why the JSON reads 
work while the Parquet doesn't? Spent some time searching for the specific 
error I'm seeing but without any luck. Any assistance on this would be greatly 
appreciated.

I'm running: Apache Drill 1.17.0 on Windows 10 with MapR Drill ODBC Driver 
version: 1.3.22.1055

JSON File read from BLOB storage – No Error



Parquet read when first stored to disk and not read directly form storage – No 
error

Error reading Parquet file from Azure Blob Storage using Apache Drill

Reply via email to