Circling back on this. Did you get a chance to re-look at this?
Thanks,
Aniket
On Sun, Feb 8, 2015, 2:53 AM Aniket Bhatnagar aniket.bhatna...@gmail.com
wrote:
Thanks for looking into this. If this true, isn't this an issue today? The
default implementation of sizeInBytes is 1 + broadcast
Unfortunately this is not to happen for 1.3 (as a snapshot release is
already cut). We need to figure out how we are going to do cardinality
estimation before implementing this. If we need to do this in the future, I
think we can do it in a way that doesn't break existing APIs. Given I think
this
Thanks for looking into this. If this true, isn't this an issue today? The
default implementation of sizeInBytes is 1 + broadcast threshold. So, if
catalyst's cardinality estimation estimates even a small filter
selectivity, it will result in broadcasting the relation. Therefore,
shouldn't the
We thought about this today after seeing this email. I actually built a
patch for this (adding filter/column to data source stat estimation), but
ultimately dropped it due to the potential problems the change the cause.
The main problem I see is that column pruning/predicate pushdowns are
Hi Spark SQL committers
I have started experimenting with data sources API and I was wondering if
it makes sense to move the method sizeInBytes from BaseRelation to Scan
interfaces. This is because that a relation may be able to leverage filter
push down to estimate size potentially making a very