Hi,

I am exploring Apache Drill to analyse the data, coming from our snowplow 
pipeline, stored in S3 bucket. These files are in thrift format and lzo 
compressed. Instead of going traditional snowplow route of putting all this 
data in Redshift, I am working on an alternative idea to analyse this data 
using Apache Drill.

I am able to connect to S3 bucket from Drill but I am not able to figure out a 
way to read these thrift (lzo compressed) files. Apache Drill’s documentation 
says that it supports parquet format, which supports Thrift but at the bottom 
of the page it also mentions to implement custom storage readers and writers 
for Thrift 
(https://drill.apache.org/docs/parquet-format/#data-description-language-support
 
<https://drill.apache.org/docs/parquet-format/#data-description-language-support>).
 So, is standard Drill distribution capable of reading Thrift files on S3? 

Cheers
Nitish

Reply via email to