date:20161120

Re: Re: Multiple streaming aggregations in structured streaming

2016-11-20 Thread Reynold Xin

Can you use the approximate count distinct? On Sun, Nov 20, 2016 at 11:51 PM, Xinyu Zhang wrote: > > MapWithState is also very useful. > I want to calculate UV in real time, but "distinct count" and "multiple > streaming aggregations" are not supported. > Is there any method to calculate real-t

Re:Re: Multiple streaming aggregations in structured streaming

2016-11-20 Thread Xinyu Zhang

MapWithState is also very useful. I want to calculate UV in real time, but "distinct count" and "multiple streaming aggregations" are not supported. Is there any method to calculate real-time UV in the current version? At 2016-11-19 06:01:45, "Michael Armbrust" wrote: Doing this generally

github mirroring is broken

2016-11-20 Thread Reynold Xin

FYI Github mirroring from Apache's official git repo to GitHub is broken since Sat Nov 19, and as a result GitHub is now stale. Merged pull requests won't show up in GitHub until ASF infra fixes the issue.

Re: Develop custom Estimator / Transformer for pipeline

2016-11-20 Thread Georg Heiler

The estimator should perform data cleaning tasks. This means some rows will be dropped, some columns dropped, some columns added, some values replaced in existing columns. IT should also store the mean or min for some numeric columns as a NaN replacement. However, override def transformSchema(sch

Re: OutOfMemoryError on parquet SnappyDecompressor

2016-11-20 Thread Aniket

Was anyone able find a solution or recommended conf for this? I am running into the same "java.lang.OutOfMemoryError: Direct buffer memory" but during snappy compression. Thanks, Aniket On Tue, Sep 23, 2014 at 7:04 PM Aaron Davidson [via Apache Spark Developers List] wrote: > This may be relat

Re: Analyzing and reusing cached Datasets

2016-11-20 Thread Jacek Laskowski

Hi Michael, Thanks a lot for your prompt answer. I greatly appreciate it. Having said that, I think we might be...cough...cough...wrong :) I think the "issue" is in QueryPlan.sameResult [1] as its scaladoc says: * Since its likely undecidable to generally determine if two given plans will pr

Re: Re: Multiple streaming aggregations in structured streaming

Re:Re: Multiple streaming aggregations in structured streaming

github mirroring is broken

Re: Develop custom Estimator / Transformer for pipeline

Re: OutOfMemoryError on parquet SnappyDecompressor

Re: Analyzing and reusing cached Datasets

6 matches

Site Navigation

Mail list logo

Footer information