Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-05 Thread Shankar Mane
Yes, i went through the benchmarks and started testing this one. I have tested this one using Hadoop Map-Reduce. And it seems BZ worked faster than GZ. As i know GZ is non-splittable and BZ is splittable. Hadoop MR takes the advantage of this splittable property and launched multiple mappers and

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-05 Thread Khurram Faraaz
Shankar, This is expected behavior, bzip2 decompression is four to twelve times slower than decompressing gzip compressed files. You can look at the comparison benchmark here for numbers - http://tukaani.org/lzma/benchmarks.html On Thu, Aug 4, 2016 at 5:13 PM, Shankar Mane wrote: > Please find

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Khurram Faraaz
Ok so query planning took less than one second in both the aggregate queries. Looks like most of the time is getting spent in query execution. On Thu, Aug 4, 2016 at 5:13 PM, Shankar Mane wrote: > Please find the query plan for both queries. FYI: I am not seeing > any planning difference between

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Shankar Mane
Please find the query plan for both queries. FYI: I am not seeing any planning difference between these 2 queries except Cost. / Query on GZ / 0: jdbc:drill:> explain plan for select channelid, count(serverTime) from dfs.`/t

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-04 Thread Khurram Faraaz
Can you please do an explain plan over the two aggregate queries. That way we can know where most of the time is being spent, is it in the query planning phase or is it query execution that is taking longer. Please share the query plans and the time taken for those explain plan statements. On Mon,

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-01 Thread Shankar Mane
It is plain json (1 json per line). Each json message size = ~4kb no. of json messages = ~5 Millions. store.parquet.compression = snappy ( i don't think, this parameter get used. As I am querying select only.) On Mon, Aug 1, 2016 at 3:27 PM, Khurram Faraaz wrote: > What is the data format with

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-08-01 Thread Khurram Faraaz
What is the data format within those .gz and .bz2 files ? It is parquet or JSON or plain text (CSV) ? Also, what was this config parameter `store.parquet.compression` set to, when ypu ran your test ? - Khurram On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane wrote: > Awaiting for response.. > > O

Re: [Drill-Questions] Speed difference between GZ and BZ2

2016-07-31 Thread Shankar Mane
Awaiting for response.. On 30-Jul-2016 3:20 PM, "Shankar Mane" wrote: > > I am Comparing Querying speed between GZ and BZ2. > > Below are the 2 files and their sizes (This 2 files have same data): > kafka_3_25-Jul-2016-12a.json.gz = 1.8G > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G > > > > Results: