Hi,
I am using spark 2.3.2, i am facing issues due to data locality, even after
giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL,
can someone help me with this.
Thank you
Hi all,
I am using spark 2.3.2, i am facing issues due to data locality, even after
giving spark.locality.wait.rack=200, locality_level is always RACK_LOCAL,
can someone help me with this.
Thank you
On Thu, Jul 12, 2018 at 10:23 AM Arun Mahadevan wrote:
> Yes ForeachWriter [1] could be an option If you want to write to different
> sinks. You can put your custom logic to split the data into different sinks.
>
> The drawback here is that you cannot plugin existing sinks like Kafka and
> you ne
tps://issues.apache.org/jira/browse/KAFKA-4208>), is it supported in Spark.
? If yes, can anyone point me to an example ?
- Karthik
unsubscribe
For posterity, I found the root cause and filed a JIRA:
https://issues.apache.org/jira/browse/SPARK-21960. I plan to open a pull
request with the minor fix.
From: Karthik Palaniappan
Sent: Friday, September 1, 2017 9:49 AM
To: Akhil Das
Cc: user@spark.apache.org
Any ideas @Tathagata? I'd be happy to contribute a patch if you can point me in
the right direction.
From: Karthik Palaniappan
Sent: Friday, August 25, 2017 9:15 AM
To: Akhil Das
Cc: user@spark.apache.org; t...@databricks.com
Subject: RE: [Spark Stre
I definitely agree that dynamic allocation is useful, that's why I asked the
question :p
More specifically, does spark plan to solve the problems with DRA for
structured streaming mentioned in that Cloudera article?
If folks can give me pointers on where to start, I'd be happy to implement
s
explicitly set it to 0 after
hitting that error.
Setting executor cores > 1 seems like reasonable advice in general, but that
shouldn’t be my issue here, right?
From: Akhil Das<mailto:ak...@hacked.work>
Sent: Thursday, August 24, 2017 2:34 AM
To: Karthik Palaniappan<mailto:karthik...@hot
I ran the HdfsWordCount example using this command:
spark-submit run-example \
--conf spark.streaming.dynamicAllocation.enabled=true \
--conf spark.executor.instances=0 \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
o
rg/jira/browse/SPARK-12133. Is that actually
a supported feature? Or was that just an experiment? I had trouble getting this
to work, but I'll follow up in a different thread.
Also, does Structured Streaming have its own dynamic allocation algorithm?
Thanks,
Karthik Palaniappan
:18:24 ERROR client.TransportResponseHandler: Still have 1
requests outstanding when connection from /10.0.2.15:54561 is closed
PLEASE ADVISE.
Sincerely,
Karthik
me.
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/linalg/MatrixUDT.scala
Can anyone help me with this.
Really appreciate your help.
Thanks
Karthik Vadla
We used Storm for ETL, now currently thinking Spark might be advantageous
since some ML also is coming our way.
- Karthik
On Tue, Aug 2, 2016 at 1:10 PM, Rohit L wrote:
> Does anyone use Spark for ETL?
>
> On Tue, Aug 2, 2016 at 1:24 PM, Sonal Goyal wrote:
>
>> Hi Rohit,
&
eload this RDD
every say 10 minutes.
Is this possible ?
Apologies if this has been asked before.
Cheers,
Karthik
eating the sparkcontext and setting the
property should work.
Thanks
Karthik
On Monday, November 9, 2015, Akhil Das wrote:
> You can set it in your conf/spark-defaults.conf file, or you will have to
> set it before you create the SparkContext.
>
> Thanks
> Best Regards
>
>
Did any one had issue setting spark.driver.maxResultSize value ?
On Friday, October 30, 2015, karthik kadiyam
wrote:
> Hi Shahid,
>
> I played around with spark driver memory too. In the conf file it was set
> to " --driver-memory 20G " first. When i changed the spark
guess you need to increase spark driver memory as well. But that should
> be set in conf files
> Let me know if that resolves
> On Oct 30, 2015 7:33 AM, "karthik kadiyam"
> wrote:
>
>> Hi,
>>
>> In spark streaming job i had the following setting
>
Hi,
In spark streaming job i had the following setting
this.jsc.getConf().set("spark.driver.maxResultSize", “0”);
and i got the error in the job as below
User class threw exception: Job aborted due to stage failure: Total size of
serialized results of 120 tasks (1082.2 MB) is bigger
Hi,
In spark streaming job i had the following setting
this.jsc.getConf().set("spark.driver.maxResultSize", “0”);
and i got the error in the job as below
User class threw exception: Job aborted due to stage failure: Total size of
serialized results of 120 tasks (1082.2 MB) is bigge
Any ideas or suggestions?
Thanks,
Karthik.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Querying-on-multiple-Hive-stores-using-Apache-Spark-tp24765p24797.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I have a spark application which will successfully connect to hive and query
on hive tables using spark engine.
To build this, I just added hive-site.xml to classpath of the application
and spark will read the hive-site.xml to connect to its metastore. This
method was suggested in spark's mailing
*Problem Statement:*
While doing query on a partitioned table using Spark SQL (Version 1.4.0),
access denied exception is observed on the partition the user doesn’t belong
to (The user permission is controlled using HDF ACLs). The same works
correctly in hive.
*Usercase:* /To address Multitenancy/
24 matches
Mail list logo