Hi Subash,
Short answer: It’s effectively random.
Longer answer: In general the DataFrameWriter expects to be receiving data
from multiple partitions. Let’s say you were writing to ORC instead of text.
In this case, even when you specify the output path, the writer creates a
directory at the spe
Hey Guys,
Do you know any possible way to refresh parquet tables that will clear
cached metadata for all users in Spark Thrift Server. Or can I somehow stop
caching metadata at all for parquet tables? Seems like
spark.sql.parquet.cacheMetadata doesnt work anymore.
Thanks
Tom
--
Tomasz Krol
patr
I had that identical problem. Here’s what I came up with:
https://github.com/ubiquibit-inc/sensor-failure
On Tue, Apr 9, 2019 at 04:37 Akila Wajirasena
wrote:
> Hi
>
> I have a Kafka topic which is already loaded with data. I use a stateful
> structured streaming pipeline using flatMapGroupWi
Hi
I have a Kafka topic which is already loaded with data. I use a stateful
structured streaming pipeline using flatMapGroupWithState to consume the
data in kafka in a streaming manner.
However when I set shuffle partition count > 1 I get some out of order
messages in to each of my GroupState. I