Re: Which Hadoop File Format Should I Use?

2018-02-07 Thread Aman Sinha
The multi-level indexing feature in Carbondata seems very interesting...it will allow persisting OLAP cubes and provide efficient access; virtually providing the capability that specialized OLAP engines provide. The ORC format also provides indexing but it seems not multi-level indexing.

Re: Which Hadoop File Format Should I Use?

2018-02-07 Thread Ted Dunning
Carbondata does look very cool, but I haven't seen any significant user adoption which means that I haven't heard very many war stories. On Wed, Feb 7, 2018 at 11:58 AM, Saurabh Mahapatra < saurabhmahapatr...@gmail.com> wrote: > ... > The Carbondata project looks quite promising. > > Any

Which Hadoop File Format Should I Use?

2018-02-07 Thread Saurabh Mahapatra
Originally shared with me by Kuna Khatua but is a good read: https://www.jowanza.com/blog/which-hadoop-file-format-should-i-use The Carbondata project looks quite promising. Any thoughts on what file format you prefer? Thanks, Saurabh

Re: PCAP files with Apache Drill and Sergeant R

2018-02-07 Thread Ted Dunning
On Wed, Feb 7, 2018 at 10:18 AM, Bob Rudis wrote: > ... > I just wish I had time to PR into the project to have it not totally bork > on imperfect packets, support more PCAP formats and add in/port some helper > UDF decoders. > That is super frustrating. I just helped John Omernik

Re: PCAP files with Apache Drill and Sergeant R

2018-02-07 Thread Bob Rudis
Thank you :-) And, I've poked at PCAPs with Drill & sergeant to great effect (not on S3, but that — as you said — should work fine, too). I just wish I had time to PR into the project to have it not totally bork on imperfect packets, support more PCAP formats and add in/port some helper UDF

Re: PCAP files with Apache Drill and Sergeant R

2018-02-07 Thread Ted Dunning
On Tue, Feb 6, 2018 at 1:08 AM, Arjun kr wrote: > ... > I don't have any clue about using Drill with 'R Sergeant library' library. > Hopefully, others can throw any lights on this question. > I just looked this up and in their own words: Jul 17, 2017 - *sergeant*: Tools

Re: Drill Questions

2018-02-07 Thread Joel Pfaff
Hello, I think there is a confusion in the name "repartitioning" since it can be understood in two different ways: * changing the number of partitions independently from the content * regrouping data with the same value in a given column in the same partitions (potentially changing the number