The patch we use in production is for 1.5. We're porting the patch to master
(and downstream to 2.0, which is presently very similar) with the intention of
submitting a PR "soon". We'll push it here when it's ready:
https://github.com/VideoAmp/spark-public.
Regarding benchmarking, we have a
2016-06-29 23:22 GMT+02:00 Michael Allman :
> I'm sorry I don't have any concrete advice for you, but I hope this helps
> shed some light on the current support in Spark for projection pushdown.
>
> Michael
Michael,
Thanks for the answer. This resolves one of my questions.
Hi Maciej,
In Spark, projection pushdown is currently limited to top-level columns
(StructFields). VideoAmp has very large parquet-based tables (many billions of
records accumulated per day) with deeply nested schema (four or five levels),
and we've spent a considerable amount of time
Hi,
Did anyone measure performance of Spark 2.0 vs Spark 1.6 ?
I did some test on parquet file with many nested columns (about 30G in
400 partitions) and Spark 2.0 is sometimes 2x slower.
I tested following queries:
1) select count(*) where id > some_id
In this query we have PPD and performance
Is it the traditional bitmap indexing? I would not recommend it for big data.
You could use bloom filters and min/max indexes in-memory which look to be more
appropriate. However, if you want to use bitmap indexes then you would have to
do it as you say. However, bitmap indexes may consume a
Hi All,
I am a CSE undergraduate and as for our final year project, we are
expecting to construct a cluster based, bit-oriented analytic platform
(storage engine) to provide fast query performance when used for OLAP with
the use of novel bitmap indexing techniques when and where appropriate.
For
ignore
--
Gav...