Confuse on Spark to_date function

2020-11-04 Thread 杨仲鲍
Code ```scala object Suit { case class Data(node:String,root:String) def apply[A](xs:A *):List[A] = xs.toList def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local") .appName("MoneyBackTest") .getOrCreate() import

Re: Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread act_coder
If I use for each function, then I may need to use custom Kafka stream writer right ?! And I might not be able to use default writestream.format(Kafka) method ?! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread lec ssmi
you can use *foreach* sink to achieve the logic you want. act_coder 于2020年11月4日周三 下午9:56写道: > I have a scenario where I would like to save the same streaming dataframe > to > two different streaming sinks. > > I have created a streaming dataframe which I need to send to both Kafka > topic and

Re: Spark reading from cassandra

2020-11-04 Thread Russell Spitzer
A where clause with a PK restriction should be identified by the Connector and transformed into a single request. This should still be much slower than doing the request directly but still much much faster than a full scan. On Wed, Nov 4, 2020 at 12:51 PM Russell Spitzer wrote: > Yes, the

Re: Spark reading from cassandra

2020-11-04 Thread Russell Spitzer
Yes, the "Allow filtering" part isn't actually important other than for letting the query run in the first place. A where clause that utilizes a clustering column restriction will perform much better than a full scan, column pruning as well can be extremely beneficial. On Wed, Nov 4, 2020 at

Spark reading from cassandra

2020-11-04 Thread Amit Sharma
Hi, i have a question while we are reading from cassandra should we use partition key only in where clause from performance perspective or it does not matter from spark perspective because it always allows filtering. Thanks Amit

Re: Best way to emit custom metrics to Prometheus in spark structured streaming

2020-11-04 Thread meetwes
So I tried it again in standalone mode (spark-shell) and the df.observe() functionality works. I tried sum, count, conditional aggregations using 'when', etc and all of this works in spark-shell. But, with spark-on-k8s, cluster mode, only using lit() as the aggregation column works. No other

Re: Best way to emit custom metrics to Prometheus in spark structured streaming

2020-11-04 Thread meetwes
Hi, Thanks for the reply. I tried it out today but I am unable to get it to work in cluster mode. The aggregation result is always 0. It works fine in standalone however with spark shell but with spark on Kubernetes in cluster mode, it doesn't. -- Sent from:

Using two WriteStreams in same spark structured streaming job

2020-11-04 Thread act_coder
I have a scenario where I would like to save the same streaming dataframe to two different streaming sinks. I have created a streaming dataframe which I need to send to both Kafka topic and delta lake. I thought of using forEachBatch, but looks like it doesn't support multiple STREAMING SINKS.