Pyspark: Issue using sql in foreachBatch sink

2020-07-28 Thread muru
In a pyspark SS job, trying to use sql instead of sql functions in foreachBatch sink throws AttributeError: 'JavaMember' object has no attribute 'format' exception. However, the same thing works in Scala API. Please note, I tested in spark 2.4.5/2.4.6 and 3.0.0 and got the same exception. Is it a

Re: how to copy from one cassandra cluster to another

2020-07-28 Thread Russell Spitzer
You do not need one spark session per cluster. Spark SQL with Datasource v1 http://www.russellspitzer.com/2016/02/16/Multiple-Clusters-SparkSql-Cassandra/ DatasourceV2 Would require making two catalog references then copying between them https://github.com/datastax/spark-cassandra-connector/bl

how spark collects non-match results after performing broadcast left outer join

2020-07-28 Thread farshaddp
Does anybody know how spark collects non-match results after performing broadcast hash left outer join? Suppose we have 4 nodes. 1 driver and 3 executors. We broadcast the left table. After left outer join is performed in each executor, how does spark recognize which records have not been matched,

how to copy from one cassandra cluster to another

2020-07-28 Thread Amit Sharma
Hi, I have table A in the cassandra cluster cluster -1 in one data center. I have table B in cluster -2 in another data center. I want to copy the data from one cluster to another using spark. I faced the problem that I can not create two spark sessions as we need spark sessions per cluster. Plea

Is possible to give options when reading semistructured files using SQL Syntax?

2020-07-28 Thread Daniel de Oliveira Mantovani
Is possible to give options when reading semistructured files using SQL Syntax like in the example below: "SELECT * FROM csv.`file.csv` For example, if I want to have header=true. Is it possible ? Thanks -- -- Daniel Mantovani