Re: mission statement : unified

2020-10-18 Thread Sonal Goyal
My thought is that Spark supports analytics for structured and unstructured data, batch as well as real time. This was pretty revolutionary when Spark first came out. That's where the unified term came from I think. Even after all these years, Spark remains the trusted framework for enterprise anal

Re: mission statement : unified

2020-10-18 Thread Gourav Sengupta
Hi, I think that it is just a marketing statement. But with SPARK 3.x, now that you are seeing that SPARK is no more than just another distributed data processing engine, they are trying to join data pre-processing into ML pipelines directly. I may call that unified. But you get the same with sev

Re: Count distinct and driver memory

2020-10-18 Thread Gourav Sengupta
Hi, 6 billion rows is quite small, I can do it in my laptop with around 4 GB RAM. What is the version of SPARK you are using and what is the effective memory that you have per executor? Regards, Gourav Sengupta On Mon, Oct 19, 2020 at 4:24 AM Lalwani, Jayesh wrote: > I have a Dataframe with aro

Count distinct and driver memory

2020-10-18 Thread Lalwani, Jayesh
I have a Dataframe with around 6 billion rows, and about 20 columns. First of all, I want to write this dataframe out to parquet. The, Out of the 20 columns, I have 3 columns of interest, and I want to find how many distinct values of the columns are there in the file. I don’t need the actual di

Re: Spark Streaming Job is stucked

2020-10-18 Thread Artemis User
If it was running fine before and stops working now, one thing I could think of may be your disk was full.  Check your disk space and clean up your old log files might help... On 10/18/20 12:06 PM, rajat kumar wrote: Hello Everyone, My spark streaming job is running too slow, it is having bat

mission statement : unified

2020-10-18 Thread Hulio andres
  Apache Spark's  mission statement is  Apache Spark™ is a unified analytics engine for large-scale data processing.    To what is the word "unified" inferring ?             - To unsubscribe e-mail: user-unsubscr...@spark

Spark Streaming Job is stucked

2020-10-18 Thread rajat kumar
Hello Everyone, My spark streaming job is running too slow, it is having batch time of 15 seconds and the batch gets completed in 20-22 secs. It was fine till 1st week October, but it is behaving this way suddenly. I know changing the batch time can help , but other than that any idea what can be