The multi-level indexing feature in Carbondata seems very interesting...it
will allow persisting OLAP cubes and provide efficient access; virtually
providing the capability that specialized OLAP engines provide. The ORC
format also provides indexing but it seems not multi-level indexing.
Carbondata does look very cool, but I haven't seen any significant user
adoption which means that I haven't heard very many war stories.
On Wed, Feb 7, 2018 at 11:58 AM, Saurabh Mahapatra <
saurabhmahapatr...@gmail.com> wrote:
> ...
> The Carbondata project looks quite promising.
>
> Any
Originally shared with me by Kuna Khatua but is a good read:
https://www.jowanza.com/blog/which-hadoop-file-format-should-i-use
The Carbondata project looks quite promising.
Any thoughts on what file format you prefer?
Thanks,
Saurabh
On Wed, Feb 7, 2018 at 10:18 AM, Bob Rudis wrote:
> ...
> I just wish I had time to PR into the project to have it not totally bork
> on imperfect packets, support more PCAP formats and add in/port some helper
> UDF decoders.
>
That is super frustrating. I just helped John Omernik
Thank you :-)
And, I've poked at PCAPs with Drill & sergeant to great effect (not on S3, but
that — as you said — should work fine, too).
I just wish I had time to PR into the project to have it not totally bork on
imperfect packets, support more PCAP formats and add in/port some helper UDF
On Tue, Feb 6, 2018 at 1:08 AM, Arjun kr wrote:
> ...
> I don't have any clue about using Drill with 'R Sergeant library' library.
> Hopefully, others can throw any lights on this question.
>
I just looked this up and in their own words:
Jul 17, 2017 - *sergeant*: Tools
Hello,
I think there is a confusion in the name "repartitioning" since it can be
understood in two different ways:
* changing the number of partitions independently from the content
* regrouping data with the same value in a given column in the same
partitions (potentially changing the number