Reading data slows down when Spark3.0 uses multiple cpu cores

2020-11-08 Thread 1650996069
Hello, I recently encountered a problem that confuses me when using spark3.0. I used the tpcx-bb dataset (200GB) and executed Query#5 in it. The SQL will read about 65.7GB of table data. Query#5 is as

Reading data slows down when Spark3.0 uses multiple cpu cores

2020-11-08 Thread 叶新
Hello, I recently encountered a problem that confuses me when using spark3.0. I used the tpcx-bb dataset (200GB) and executed Query#5 in it. The SQL will read about 65.7GB of table data. Query#5 is as

Re: Using two WriteStreams in same spark structured streaming job

2020-11-08 Thread Kevin Pis
you means sparkSession.streams.awaitAnyTermination()? May i have your code ? or you can see the following: my demo code: val hourDevice = beginTimeDevice.groupBy($"subsId",$"eventBeginHour",$"serviceType") .agg("duration" -> "sum").withColumnRenamed("sum(duration)",

Out of memory issue

2020-11-08 Thread Amit Sharma
Hi , I am using 16 nodes spark cluster with below config 1. Executor memory 8 GB 2. 5 cores per executor 3. Driver memory 12 GB. We have streaming job. We do not see problem but sometimes we get exception executor-1 heap memory issue. I am not understanding if data size is same and this job