Code
```scala
object Suit {
case class Data(node:String,root:String)
def apply[A](xs:A *):List[A] = xs.toList
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local")
.appName("MoneyBackTest")
.getOrCreate()
import
If I use for each function, then I may need to use custom Kafka stream writer
right ?!
And I might not be able to use default writestream.format(Kafka) method ?!
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
you can use *foreach* sink to achieve the logic you want.
act_coder 于2020年11月4日周三 下午9:56写道:
> I have a scenario where I would like to save the same streaming dataframe
> to
> two different streaming sinks.
>
> I have created a streaming dataframe which I need to send to both Kafka
> topic and
A where clause with a PK restriction should be identified by the Connector
and transformed into a single request. This should still be much slower
than doing the request directly but still much much faster than a full scan.
On Wed, Nov 4, 2020 at 12:51 PM Russell Spitzer
wrote:
> Yes, the
Yes, the "Allow filtering" part isn't actually important other than for
letting the query run in the first place. A where clause that utilizes a
clustering column restriction will perform much better than a full scan,
column pruning as well can be extremely beneficial.
On Wed, Nov 4, 2020 at
Hi, i have a question while we are reading from cassandra should we use
partition key only in where clause from performance perspective or it does
not matter from spark perspective because it always allows filtering.
Thanks
Amit
So I tried it again in standalone mode (spark-shell) and the df.observe()
functionality works. I tried sum, count, conditional aggregations using
'when', etc and all of this works in spark-shell. But, with spark-on-k8s,
cluster mode, only using lit() as the aggregation column works. No other
Hi, Thanks for the reply. I tried it out today but I am unable to get it to
work in cluster mode. The aggregation result is always 0. It works fine in
standalone however with spark shell but with spark on Kubernetes in cluster
mode, it doesn't.
--
Sent from:
I have a scenario where I would like to save the same streaming dataframe to
two different streaming sinks.
I have created a streaming dataframe which I need to send to both Kafka
topic and delta lake.
I thought of using forEachBatch, but looks like it doesn't support multiple
STREAMING SINKS.