*Honestly, I don't know how to do this in Scala.* I tried something like
this...
*.mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())( new
StateUpdater(myAcc))*
StateUpdater is similar to what Zhang has provided but it's NOT
compiling 'cause I need to return a 'Dataset'.
Here's th
Ya, I had asked this question before. No one responded. By the way, what’s
your actual name “Something something” if you don’t mind me asking?
On Tue, Jun 9, 2020 at 12:27 AM Something Something <
mailinglist...@gmail.com> wrote:
> What is scary is this interface is marked as "experimental"
>
> @
What is scary is this interface is marked as "experimental"
@Experimental
@InterfaceStability.Evolving
public interface MapGroupsWithStateFunction extends Serializable {
R call(K key, Iterator values, GroupState state) throws Exception;
}
On Mon, Jun 8, 2020 at 11:54 AM Something Something <
Right, this is exactly how I've it right now. Problem is in the cluster
mode 'myAcc' does NOT get distributed. Try it out in the cluster mode & you
will see what I mean.
I think how Zhang is using will work. Will try & revert.
On Mon, Jun 8, 2020 at 10:58 AM Srinivas V wrote:
>
> You don’t need
You don’t need to have a separate class. I created that as it has lot of
code and logic in my case.
For you to quickly test you can use Zhang’s Scala code in this chain.
Pasting it below for your quick reference:
```scala
spark.streams.addListener(new StreamingQueryListener {
override de
May I ask how do you handle multiple partitions? Can't two files have the
same name with this approach, or am I missing something?
BR,
Panos
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e
I've found Anaconda encapsulates modules and dependencies and such nicely,
and you can deploy all the needed .so files and such by deploying a whole
conda environment.
I've used this method with success:
https://community.cloudera.com/t5/Community-Articles/Running-PySpark-with-Conda-Env/ta-p/24755
Yes, I think so.
Stefan Panayotov, PhD
spanayo...@outlook.com
spanayo...@comcast.net
spanayo...@gmail.com
-Original Message-
From: ilaimalka
Sent: Monday, June 8, 2020 9:17 AM
To: user@spark.apache.org
Subject: we control spark file names before we write them - should we
opensource it?
Hi, as part of our work we needed more control over the name of the files
written out by Spark, e.g instead of "part-...csv.gz" we want to get
something like this "15988891_1748330679_20200507124153.tsv.gz" where the
first number is hardcoded, the second one is the value from partitionBy and
third