Re: Keeping track of how long something has been in a queue

2020-09-04 Thread Hamish Whittal
Sorry, I moved a paragraph, (2) If Ms green.th was first seen at 13:04:04, then at 13:04:05 and finally > at 13:04:17, she's been in the queue for 13 seconds (ignoring the ms). >

Keeping track of how long something has been in a queue

2020-09-04 Thread Hamish Whittal
Hi folks, I have a stream coming from Kafka. It has this schema: { "id": 4, "account_id": 1070998, "uid": "green.th", "last_activity_time": "2020-09-03 13:04:04.520129" } Another event arrives a few milliseconds/seconds later: { "id": 9, "account_id": 1070998, "uid":

Re: Spark Streaming Checkpointing

2020-09-04 Thread András Kolbert
Hi Gábor, Thanks for your reply on this! Internally that's used at the company I work at - it hasn't been changed mainly due to the compatibility of the current deployed java applications. Hence I am attempting to make the most of this version :) András On Fri, 4 Sep 2020, 14:09 Gabor Somog

Re: Iterating all columns in a pyspark dataframe

2020-09-04 Thread Sean Owen
Do you need to iterate anything? you can always write a function of all columns, df.columns. You can operate on a whole Row at a time too. On Fri, Sep 4, 2020 at 2:11 AM Devi P.V wrote: > > Hi all, > What is the best approach for iterating all columns in a pyspark dataframe?I > want to apply som

Re: Spark Streaming Checkpointing

2020-09-04 Thread Gabor Somogyi
Hi Andras, A general suggestion is to use Structured Streaming instead of DStreams because it provides several things out of the box (stateful streaming, etc...). Kafka 0.8 is super old and deprecated (no security...). Do you have a specific reason to use that? BR, G On Thu, Sep 3, 2020 at 11:4

Iterating all columns in a pyspark dataframe

2020-09-04 Thread Devi P.V
Hi all, What is the best approach for iterating all columns in a pyspark dataframe?I want to apply some conditions on all columns in the dataframe. Currently I am using for loop for iteration. Is it a good practice while using Spark and I am using Spark 3.0 Please advice Thanks, Devi