Can you use the approximate count distinct?
On Sun, Nov 20, 2016 at 11:51 PM, Xinyu Zhang wrote:
>
> MapWithState is also very useful.
> I want to calculate UV in real time, but "distinct count" and "multiple
> streaming aggregations" are not supported.
> Is there any method to calculate real-t
MapWithState is also very useful.
I want to calculate UV in real time, but "distinct count" and "multiple
streaming aggregations" are not supported.
Is there any method to calculate real-time UV in the current version?
At 2016-11-19 06:01:45, "Michael Armbrust" wrote:
Doing this generally
FYI Github mirroring from Apache's official git repo to GitHub is broken
since Sat Nov 19, and as a result GitHub is now stale. Merged pull requests
won't show up in GitHub until ASF infra fixes the issue.
The estimator should perform data cleaning tasks. This means some rows will
be dropped, some columns dropped, some columns added, some values replaced
in existing columns. IT should also store the mean or min for some numeric
columns as a NaN replacement.
However,
override def transformSchema(sch
Was anyone able find a solution or recommended conf for this? I am running
into the same "java.lang.OutOfMemoryError: Direct buffer memory" but during
snappy compression.
Thanks,
Aniket
On Tue, Sep 23, 2014 at 7:04 PM Aaron Davidson [via Apache Spark Developers
List] wrote:
> This may be relat
Hi Michael,
Thanks a lot for your prompt answer. I greatly appreciate it.
Having said that, I think we might be...cough...cough...wrong :)
I think the "issue" is in QueryPlan.sameResult [1] as its scaladoc says:
* Since its likely undecidable to generally determine if two given
plans will pr