date:20240131

Re: Issue in Creating Temp_view in databricks and using spark.sql().

2024-01-31 Thread Mich Talebzadeh

I agree with what is stated. This is the gist of my understanding having tested it. When working with Spark Structured Streaming, each streaming query runs in its own separate Spark session to ensure isolation and avoid conflicts between different queries. So here I have: def process_data(self, df

Re: Issue in Creating Temp_view in databricks and using spark.sql().

2024-01-31 Thread Mich Talebzadeh

hm. In your logic here def process_micro_batch(micro_batch_df, batchId) : micro_batch_df.createOrReplaceTempView("temp_view") df = spark.sql(f"select * from temp_view") return df Is this function called and if so do you check if micro_batch_df contains rows -> if len(micro_batch_df

deploy spark as cluster

2024-01-31 Thread ali sharifi

Hi everyone! I followed this guide https://dev.to/mvillarrealb/creating-a-spark-standalone-cluster-with-docker-and-docker-compose-2021-update-6l4 to create a Spark cluster on an Ubuntu server with Docker. However, when I try to submit my PySpark code to the master, the jobs are registered in the S

Create Custom Logs

2024-01-31 Thread PRASHANT L

Hi I justed wanted to check if there is a way to create custom log in Spark I want to write selective/custom log messages to S3 , running spark submit on EMR I would not want all the spark generated logs ... I would just need the log messages that are logged as part of Spark Application

Re: Issue in Creating Temp_view in databricks and using spark.sql().

2024-01-31 Thread Jungtaek Lim

Hi, Streaming query clones the spark session - when you create a temp view from DataFrame, the temp view is created under the cloned session. You will need to use micro_batch_df.sparkSession to access the cloned session. Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Jan 31, 2024 at 3:29 PM Karthick

randomsplit has issue?

2024-01-31 Thread second_co...@yahoo.com.INVALID

based on this blog post https://sergei-ivanov.medium.com/why-you-should-not-use-randomsplit-in-pyspark-to-split-data-into-train-and-test-58576d539a36 , I noticed a recommendation against using randomSplit for data splitting due to data sorting. Is the information provided in the blog accurate? I

Re: Issue in Creating Temp_view in databricks and using spark.sql().

Re: Issue in Creating Temp_view in databricks and using spark.sql().

deploy spark as cluster

Create Custom Logs

Re: Issue in Creating Temp_view in databricks and using spark.sql().

randomsplit has issue?

6 matches

Site Navigation

Mail list logo

Footer information