Sorry, I moved a paragraph,
(2) If Ms green.th was first seen at 13:04:04, then at 13:04:05 and finally
> at 13:04:17, she's been in the queue for 13 seconds (ignoring the ms).
>
Hi folks,
I have a stream coming from Kafka.
It has this schema:
{
"id": 4,
"account_id": 1070998,
"uid": "green.th",
"last_activity_time": "2020-09-03 13:04:04.520129"
}
Another event arrives a few milliseconds/seconds later:
{
"id": 9,
"account_id": 1070998,
"uid":
Hi Gábor,
Thanks for your reply on this!
Internally that's used at the company I work at - it hasn't been changed
mainly due to the compatibility of the current deployed java applications.
Hence I am attempting to make the most of this version :)
András
On Fri, 4 Sep 2020, 14:09 Gabor Somog
Do you need to iterate anything? you can always write a function of
all columns, df.columns. You can operate on a whole Row at a time too.
On Fri, Sep 4, 2020 at 2:11 AM Devi P.V wrote:
>
> Hi all,
> What is the best approach for iterating all columns in a pyspark dataframe?I
> want to apply som
Hi Andras,
A general suggestion is to use Structured Streaming instead of DStreams
because it provides several things out of the box (stateful streaming,
etc...).
Kafka 0.8 is super old and deprecated (no security...). Do you have a
specific reason to use that?
BR,
G
On Thu, Sep 3, 2020 at 11:4
Hi all,
What is the best approach for iterating all columns in a pyspark
dataframe?I want to apply some conditions on all columns in the dataframe.
Currently I am using for loop for iteration. Is it a good practice while
using Spark and I am using Spark 3.0
Please advice
Thanks,
Devi