Unsubscribe
o assign fixed JMX ports to driver and executors.
>
> @Alex,
> Is there any difference in fetching data via JMX or using banzaicloud jar.
>
>
> On Fri, 13 Sep 2019 at 10:47, Alex Landa wrote:
>
>> Hi,
>> We are starting to use https://github.com/banzaicloud/spark-m
Hi,
We are starting to use https://github.com/banzaicloud/spark-metrics .
Keep in mind that their solution is for Spark for K8s, to make it work for
Spark on Yarn you have to copy the dependencies of the spark-metrics into
Spark Jars folders on all the Spark machines (took me a while to figure).
T
Hi,
It sounds similar to what we do in our application.
We don't serialize every row, but instead we group first the rows into the
wanted representation and then apply protobuf serialization using map and
lambda.
I suggest not to serialize the entire DataFrame into a single protobuf
message since
ks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-26606
>
> On Tue, Aug 20, 2019 at 3:43 AM Alex Landa wrote:
>
>> Hi,
>>
>> We are using Spark Standalone 2.4.0 in production and publishing our
>> Scala app using cluster mode.
Hi,
We are using Spark Standalone 2.4.0 in production and publishing our Scala
app using cluster mode.
I saw that extra java options passed to the driver don't actually pass.
A submit example:
*spark-submit --deploy-mode cluster --master spark://:7077
--driver-memory 512mb --conf
"spark.driver.ext
th-chapman.com
>
>
> On Sun, Jul 21, 2019 at 12:19 AM Alex Landa wrote:
>
>> Thanks,
>> I looked into these options, the cleaner periodic interval is set to 30
>> min by default.
>> The block option for shuffle -
>> *spark.cleaner.referenceTracking.block
.referenceTracking
> spark.cleaner.referenceTracking.blocking.shuffle
>
> Regards
> Prathmesh Ranaut
>
> On Jul 21, 2019, at 11:31 AM, Alex Landa wrote:
>
> Hi,
>
> We are running a long running Spark application ( which executes lots of
> quick jobs using our scheduler ) on Spark stand-alone
Hi,
We are running a long running Spark application ( which executes lots of
quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0.
We see that old shuffle files ( a week old for example ) are not deleted
during the execution of the application, which leads to out of disk space
error