Re: Spark Small file issue

2020-06-23 Thread German SM
Hi, When reducing partitions is better to use coalesce because it doesn't need to shuffle the data. dataframe.coalesce(1) El mar., 23 jun. 2020 23:54, Hichki escribió: > Hello Team, > > > > I am new to Spark environment. I have converted Hive query to Spark Scala. > Now I am loading data and

Re: [spark-structured-streaming] [kafka] consume topics from multiple Kafka clusters

2020-06-09 Thread German SM
Hello, I've never tried that, this doesn't work? val df_cluster1 = spark .read .format("kafka") .option("kafka.bootstrap.servers", "cluster1_host:cluster1_port") .option("subscribe", "topic1") val df_cluster2 = spark .read .format("kafka") .option("kafka.bootstrap.servers",