Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-05 Thread Shankar Mane
Yes, i went through the benchmarks and started testing this one. I have tested this one using Hadoop Map-Reduce. And it seems BZ worked faster than GZ. As i know GZ is non-splittable and BZ is splittable. Hadoop MR takes the advantage of this splittable property and launched multiple mappers and

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-05 Thread Khurram Faraaz
Shankar, This is expected behavior, bzip2 decompression is four to twelve times slower than decompressing gzip compressed files. You can look at the comparison benchmark here for numbers - http://tukaani.org/lzma/benchmarks.html On Thu, Aug 4, 2016 at 5:13 PM, Shankar Mane

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Khurram Faraaz
Ok so query planning took less than one second in both the aggregate queries. Looks like most of the time is getting spent in query execution. On Thu, Aug 4, 2016 at 5:13 PM, Shankar Mane wrote: > Please find the query plan for both queries. FYI: I am not seeing >

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Shankar Mane
Please find the query plan for both queries. FYI: I am not seeing any planning difference between these 2 queries except Cost. / Query on GZ / 0: jdbc:drill:> explain plan for select channelid, count(serverTime) from

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Khurram Faraaz
Can you please do an explain plan over the two aggregate queries. That way we can know where most of the time is being spent, is it in the query planning phase or is it query execution that is taking longer. Please share the query plans and the time taken for those explain plan statements. On

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-01 Thread Shankar Mane
It is plain json (1 json per line). Each json message size = ~4kb no. of json messages = ~5 Millions. store.parquet.compression = snappy ( i don't think, this parameter get used. As I am querying select only.) On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz wrote: > What