I am currently using Spark 2.3.0. Will try it with 2.3.1
On Tue, Jul 3, 2018 at 3:12 PM, Shixiong(Ryan) Zhu
wrote:
> Which version are you using? There is a known issue regarding this and
> should be fixed in 2.3.1. See https://issues.apache.org/
> jira/browse/SPARK-23623 for details.
>
> Best R
Which version are you using? There is a known issue regarding this and
should be fixed in 2.3.1. See
https://issues.apache.org/jira/browse/SPARK-23623 for details.
Best Regards,
Ryan
On Mon, Jul 2, 2018 at 3:56 AM, kant kodali wrote:
> Hi All,
>
> I get the below error quite often when I do an
Could you please describe the version of Spark, and how did you run your
app? If you don’t mind to share minimal app which can reproduce this, it
would be really great.
- Jungtaek Lim (HeartSaVioR)
On Mon, 2 Jul 2018 at 7:56 PM kant kodali wrote:
> Hi All,
>
> I get the below error quite often w
Hi,
We have 2 spark streaming job one using DStreams and the other using
Structured Streaming. I have observed that the number of records per
micro-batch (Per Trigger in case of Structured Streaming) is not the same
between the two jobs. The Structured Streaming job has higher numbers
compared to
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
I'm still validating my results, but my solution for the moment looks like
the below. I'm presently dealing with one-hot encoded values, so all the
numbers in my array are 1:
def udfMaker(feature_len):
return F.udf(lambda x: SparseVector(feature_len, sorted(x),
[1.0]*len(x)), VectorUDT())
in
Thanks a ton!
On Tue, Jul 3, 2018 at 6:26 PM, Vadim Semenov wrote:
> As typical `JAVA_OPTS` you need to pass as a single parameter:
>
> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:-ResizePLAB"
>
> Also you got an extra space in the parameter, there should be no space
> after the col
You can't change the executor/driver cores/memory on the fly once
you've already started a Spark Context.
On Tue, Jul 3, 2018 at 4:30 AM Aakash Basu wrote:
>
> We aren't using Oozie or similar, moreover, the end to end job shall be
> exactly the same, but the data will be extremely different (num
As typical `JAVA_OPTS` you need to pass as a single parameter:
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:-ResizePLAB"
Also you got an extra space in the parameter, there should be no space
after the colon symbol
On Tue, Jul 3, 2018 at 3:01 AM Aakash Basu wrote:
>
> Hi,
>
> I used
Hello Dear Spark User / Dev,
I would like to pass Python user defined function to Spark Job developed
using Scala and return value of that function would be returned to DF /
Dataset API.
Can someone please guide me, which would be best approach to do this.
Python function would be mostly transfor
We aren't using Oozie or similar, moreover, the end to end job shall be
exactly the same, but the data will be extremely different (number of
continuous and categorical columns, vertical size, horizontal size, etc),
hence, if there would have been a calculation of the parameters to arrive
at a conc
Don’t do this in your job. Create for different types of jobs different jobs
and orchestrate them using oozie or similar.
> On 3. Jul 2018, at 09:34, Aakash Basu wrote:
>
> Hi,
>
> Cluster - 5 node (1 Driver and 4 workers)
> Driver Config: 16 cores, 32 GB RAM
> Worker Config: 8 cores, 16 GB RA
Hi,
Cluster - 5 node (1 Driver and 4 workers)
Driver Config: 16 cores, 32 GB RAM
Worker Config: 8 cores, 16 GB RAM
I'm using the below parameters from which I know the first chunk is cluster
dependent and the second chunk is data/code dependent.
--num-executors 4
--executor-cores 5
--executor-me
Hi,
I used the below in the Spark Submit for using G1GC -
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC"
Now, I want to use *-XX: -ResizePLAB *of the G1GC to control to avoid
performance degradation caused by a large number of thread communications.
How to do it? I tried submitting in th
14 matches
Mail list logo