Hi, I am exploring Apache Drill to analyse the data, coming from our snowplow pipeline, stored in S3 bucket. These files are in thrift format and lzo compressed. Instead of going traditional snowplow route of putting all this data in Redshift, I am working on an alternative idea to analyse this data using Apache Drill.
I am able to connect to S3 bucket from Drill but I am not able to figure out a way to read these thrift (lzo compressed) files. Apache Drill’s documentation says that it supports parquet format, which supports Thrift but at the bottom of the page it also mentions to implement custom storage readers and writers for Thrift (https://drill.apache.org/docs/parquet-format/#data-description-language-support <https://drill.apache.org/docs/parquet-format/#data-description-language-support>). So, is standard Drill distribution capable of reading Thrift files on S3? Cheers Nitish