Re: Dynamic metric names

2019-05-06 Thread Sergey Zhemzhitsky
t, it may requires code change to > support your needs. > > Thanks > Saisai > > Sergey Zhemzhitsky 于2019年5月4日周六 下午4:44写道: > >> Hello Spark Users! >> >> Just wondering whether it is possible to register a metric source without >> metrics known in advance

Dynamic metric names

2019-05-04 Thread Sergey Zhemzhitsky
Hello Spark Users! Just wondering whether it is possible to register a metric source without metrics known in advance and add the metrics themselves to this source later on? It seems that currently MetricSystem puts all the metrics from the source's MetricRegistry into a shared MetricRegistry of

Re: Accumulator guarantees

2018-05-10 Thread Sergey Zhemzhitsky
/a6fc300e91273230e7134ac6db95ccb4436c6f8f/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L36 [3] https://github.com/apache/spark/blob/3990daaf3b6ca2c5a9f7790030096262efb12cb2/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1204 On Thu, May 10, 2018 at 10:24 PM, Sergey Zhemzhitsky <sz

Accumulator guarantees

2018-05-10 Thread Sergey Zhemzhitsky
Hi there, Although Spark's docs state that there is a guarantee that - accumulators in actions will only be updated once - accumulators in transformations may be updated multiple times ... I'm wondering whether the same is true for transformations in the last stage of the job or there is a

Re: AccumulatorV2 vs AccumulableParam (V1)

2018-05-04 Thread Sergey Zhemzhitsky
if flexibility is more important to them. We can keep improving > accumulator v2 without breaking backward compatibility. > > Thanks, > Wenchen > > On Thu, May 3, 2018 at 6:20 AM, Sergey Zhemzhitsky <szh.s...@gmail.com> > wrote: >> >> Hello guys, >> >&g

AccumulatorV2 vs AccumulableParam (V1)

2018-05-02 Thread Sergey Zhemzhitsky
Hello guys, I've started to migrate my Spark jobs which use Accumulators V1 to AccumulatorV2 and faced with the following issues: 1. LegacyAccumulatorWrapper now requires the resulting type of AccumulableParam to implement equals. In other case the AccumulableParam, automatically wrapped into

Re: DataFrames :: Corrupted Data

2018-03-28 Thread Sergey Zhemzhitsky
the job completes successfully On Wed, Mar 28, 2018 at 10:31 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Encoding issue of the data? Eg spark uses utf-8 , but source encoding is > different? > >> On 28. Mar 2018, at 20:25, Sergey Zhemzhitsky <szh.s...@gmail.com> wrot

DataFrames :: Corrupted Data

2018-03-28 Thread Sergey Zhemzhitsky
Hello guys, I'm using Spark 2.2.0 and from time to time my job fails printing into the log the following errors scala.MatchError: profiles.total^@^@f2-a733-9304fda722ac^@^@^@^@profiles.10361.10005^@^@^@^@.total^@^@0075^@^@^@^@ scala.MatchError: pr^?files.10056.10040 (of class java.lang.String)

Best way of shipping self-contained pyspark jobs with 3rd-party dependencies

2017-12-08 Thread Sergey Zhemzhitsky
Hi PySparkers, What currently is the best way of shipping self-contained pyspark jobs with 3rd-party dependencies? There are some open JIRA issues [1], [2] as well as corresponding PRs [3], [4] and articles [5], [6], [7] regarding setting up the python environment with conda and virtualenv

Best way of shipping self-contained pyspark jobs with 3rd-party dependencies

2017-12-07 Thread Sergey Zhemzhitsky
Hi PySparkers, What currently is the best way of shipping self-contained pyspark jobs with 3rd-party dependencies? There are some open JIRA issues [1], [2] as well as corresponding PRs [3], [4] and articles [5], [6], regarding setting up the python environment with conda and virtualenv

What is the purpose of having RDD.context and RDD.sparkContext at the same time?

2017-06-27 Thread Sergey Zhemzhitsky
Hello spark gurus, Could you please shed some light on what is the purpose of having two identical functions in RDD, RDD.context [1] and RDD.sparkContext [2]. RDD.context seems to be used more frequently across the source code. [1]

Re: Is GraphX really deprecated?

2017-05-16 Thread Sergey Zhemzhitsky
lace, respectively. Jacek On 13 May 2017 3:00 p.m., "Sergey Zhemzhitsky" <szh.s...@gmail.com> wrote: > Hello Spark users, > > I just would like to know whether the GraphX component should be > considered deprecated and no longer actively maintained > and should not

Is GraphX really deprecated?

2017-05-13 Thread Sergey Zhemzhitsky
Hello Spark users, I just would like to know whether the GraphX component should be considered deprecated and no longer actively maintained and should not be considered when starting new graph-processing projects on top of Spark in favour of other graph-processing frameworks? I'm asking