Re: Spark2: Deciphering saving text file name

2019-04-09 Thread Jason Nerothin
Hi Subash, Short answer: It’s effectively random. Longer answer: In general the DataFrameWriter expects to be receiving data from multiple partitions. Let’s say you were writing to ORC instead of text. In this case, even when you specify the output path, the writer creates a directory at the

Refresh parquet metadata on Spark Thrift Server

2019-04-09 Thread Tomasz Krol
Hey Guys, Do you know any possible way to refresh parquet tables that will clear cached metadata for all users in Spark Thrift Server. Or can I somehow stop caching metadata at all for parquet tables? Seems like spark.sql.parquet.cacheMetadata doesnt work anymore. Thanks Tom -- Tomasz Krol

Re: Structured streaming flatMapGroupWithState results out of order messages when reading from Kafka

2019-04-09 Thread Jason Nerothin
I had that identical problem. Here’s what I came up with: https://github.com/ubiquibit-inc/sensor-failure On Tue, Apr 9, 2019 at 04:37 Akila Wajirasena wrote: > Hi > > I have a Kafka topic which is already loaded with data. I use a stateful > structured streaming pipeline using

Structured streaming flatMapGroupWithState results out of order messages when reading from Kafka

2019-04-09 Thread Akila Wajirasena
Hi I have a Kafka topic which is already loaded with data. I use a stateful structured streaming pipeline using flatMapGroupWithState to consume the data in kafka in a streaming manner. However when I set shuffle partition count > 1 I get some out of order messages in to each of my GroupState.