On Thu, Jul 23, 2015 at 2:19 PM, Juergen Kneissl <her...@gmx.net> wrote:
> On 07/23/15 22:04, Jason Altekruse wrote: > > I'm very glad to hear that it exceeded your expectations. An important > > point I would like to add, when you unzipped the file you likely allowed > > drill to ready not only on both nodes, but also on multiple threads on > each > > node. When the file was compressed, only a single thread was reading and > > processing it. > > > Also bzip2 does not work out of the box in drill. Parallelization seems > not possible > > So, when it comes to the need of compression it seems parquet is needed > or there are further tests made howto calculate an query plan for a > compressed file. (if this is even possible at all) > > Anyway, thanks for the help, using uncompressed csv did the trick for my > first problem anyway Parquet would help a bit with compression. Another alternative is to put uncompressed CSV on a file system that does transparent compression. The MapR distribution supports that, for instance. I am sure that there are others. If you use such a file system, Drill wouldn't know that the file is anything but ordinary CSV. With parquet transparent encryption should have much less impact.