We scanned 3 versions of spark 3.0.0, 3.1.3, 3.2.1
On Tue, 26 Apr, 2022, 18:46 Bjørn Jørgensen,
wrote:
> What version of spark is it that you have scanned?
>
>
>
> tir. 26. apr. 2022 kl. 12:48 skrev HARSH TAKKAR :
>
>> Hello,
>>
>> Please let me know if th
-14379
CVE-2019-12086
CVE-2018-7489
CVE-2018-5968
CVE-2018-14719
CVE-2018-14718
CVE-2018-12022
CVE-2018-11307
CVE-2017-7525
CVE-2017-17485
CVE-2017-15095
Kind Regards
Harsh Takkar
Unsubscribe
On 2021/09/02 06:00:26, Harsh Sharma wrote:
> Please Find reply :
> Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming?
>
> ans :its Spark SQL
>
> Do you use broadcast variables ?
>
> ans : yes we
[01/09/21 11:55:51,861 WARN pool-1-thread-1](Client) Exception encountered
while connecting to the server : java.lang.NullPointerException
[01/09/21 11:55:51,862 WARN pool-1-thread-1](Client) Exception encountered
while connecting to the server : java.lang.NullPointerException
[01/09/21
ila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma
> wrote:
>
> > We are facing issue in production where we are getting frequent
> >
> > Still
; Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma
> wrote:
>
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 requ
We are facing issue in production where we are getting frequent
Still have 1 request outstanding when connection with the hostname was closed
connection reset by peer : errors as well as warnings : failed to remove cache
rdd or failed to remove broadcast variable.
Please help us how to
hi Team ,
we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded
version of park from 1.6 to 2.4 . While executing a spark program we are
getting the below error :
Please help us how to resolve in cloudera parcels. There are suggestion to
install spark gateway roles
hi Team ,
we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded
version of park from 1.6 to 2.4 . While executing a spark program we are
getting the below error :
Please help us how to resolve in cloudera parcels. There are suggestion to
install spark gateway roles
[Stage 284:>(199 + 1) / 200][Stage 292:> (1 + 3) / 200]
[Stage 284:>(199 + 1) / 200][Stage 292:> (2 + 3) / 200]
[Stage 292:> (2 + 4) /
200][14/06/21 10:46:17,006 WARN
Hello Sean,
Thanks for the advice, can you please point me to an example where i can
find a custom wrapper for python.
Kind Regards
Harsh Takkar
On Tue, 16 Feb, 2021, 8:25 pm Sean Owen, wrote:
> You won't be able to use it in python if it is implemented in Java - needs
> a python wrapp
in the class pass using "spark.jars"
Can you please help, if i am missing something.
Kind Regards
Harsh Takkar
As per the solution, if we are closing and starting the query, then what
happens to the the state which is maintained in memory, will that be
retained ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To
Hi Kun,
You can use following spark property instead while launching the app
instead of manually enabling it in the code.
spark.sql.catalogImplementation=hive
Kind Regards
Harsh
On Tue, May 26, 2020 at 9:55 PM Kun Huang (COSMOS)
wrote:
>
> Hi Spark experts,
>
> I am seeking for
Hi
How can we deserialise avro record read from kafka in spark 2.3.0 in
optimised manner. I could see that native support for avro was added in
2.4.x.
Currently i am using following library which is very slow.
com.twitter
bijection-avro_2.11
Kind Regards
Harsh Takkar
Hi,
Is it possible to read 7z compressed file in spark?
Kind Regards
Harsh Takkar
Hi 10
Time taken: 0.356 seconds, Fetched: 2 row(s)
hive> describe longpartition;
OK
b string
a bigint
# Partition Information
# col_name data_type comment
a bigint
On Mon, Dec 16, 2019 at 11:05 AM SB M wrote:
> spark version 2
Please share the spark version you are using .
On Fri, 13 Dec, 2019, 4:02 PM SB M, wrote:
> Hi All,
>Am trying to create a dynamic partition with external table on hive
> metastore using spark sql.
>
> when am trying to create a partition column data type as bigint, partition
> is not
Please refere to the following documentation on how to write data into hive
in hdp3.1
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
Harsh
On Fri, 9 Aug, 2019, 10:21 PM Mich Talebzadeh,
wrote
rate it will increase the consumption of
records for processing in each batch.
However i, feel 10 is way to low number for 32 partitioned kafka topic.
Regards
Harsh
Happy New Year
On Wed 2 Jan, 2019, 08:33 JF Chen I have set spark.streaming.backpressure.enabled to true
Hi
You can call the java program directly though pyspark,
Following is the code that will help.
sc._jvm..
Harsh Takkar
On Sun, Aug 12, 2018 at 9:27 PM amit kumar singh
wrote:
> Hi /team,
>
> The way we call java program to executed stored procedure
> is there any way we
Hi
You can access your java packages using following in pySpark
obj = sc._jvm.yourPackage.className()
Kind Regards
Harsh Takkar
On Wed, Jul 18, 2018 at 4:00 AM Mohit Jaggi wrote:
> Thanks 0xF0F0F0 and Ashutosh for the pointers.
>
> Holden,
> I am trying to look into sparklingml
Hi,
Is there a way to load model saved using sklearn lib in pyspark/ scala
spark for prediction.
Thanks
Hi
I have a dataframe with a field of type array which is of large size, when
i am trying to save the data to parquet file and read it again , array
field comes out as empty array.
Please help
Harsh
.
How do I get the status of such long-running jobs so that I can do the
further tasks on my remote machine after the job completion? Livy is one
choice but I want to do it without that, if possible.
*Thanks!*
Harsh Choudhary
Hi
Does Random Forest in spark Ml supports multi label classification in scala
?
I found out, sklearn provides sklearn.ensemble.RandomForestClassifier in
python, do we have the similar functionality in scala ?
Hi
I am using the cloudera (cdh5.11.0) setup, which have the hive version as
1.1.0, but when i build spark with hive and thrift support it pack the hive
version as 1.6.0,
Please let me know how can i build spark with hive 1.1.0 ?
command i am using to build :
./dev/make-distribution.sh --name
Hi
I am facing the same issue while launching the application inside docker
container.
Kind Regards
Harsh
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr
.are the multiple rows being written dupes (they have all same
>> fields/values)?
>> Hth
>>
>>
>> On Oct 17, 2017 1:08 PM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote:
>>
>>> This is the code -
>>> hdfs_path=
>>> if(hd
val updatelambdaReq:InvokeRequest = new InvokeRequest();
updatelambdaReq.setFunctionName(updateFunctionName);
updatelambdaReq.setPayload(updatedLambdaJson.toString());
System.out.println("Calling lambda to add log");
val updateLambdaResult =
byteBufferToString(lambdaClient.invoke(updat
of code in workers? If it is so then how can I solve it so that it only
writes once.
*Thanks!*
*Cheers!*
Harsh Choudhary
;
> On Mon, Sep 18, 2017 at 1:56 AM, HARSH TAKKAR <takkarha...@gmail.com>
> wrote:
> > Hi
> >
> > Changing spark version if my last resort, is there any other workaround
> for
> > this problem.
> >
> >
> > On Mon, Sep 18, 2017 at 11:43 AM pandees
n Sep 17, 2017, at 11:08 PM, Anastasios Zouzias <zouz...@gmail.com>
> wrote:
>
> Hi,
>
> I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1
> solved my issue. Can you try with 2.1.1 as well and report back?
>
> Best,
> Anastasios
>
> Am 17
s in your application?
>
> On Sun, Sep 17, 2017 at 7:48 AM, HARSH TAKKAR <takkarha...@gmail.com>
> wrote:
>
>>
>> Hi
>>
>> I am using spark 2.1.0 with scala 2.11.8, and while iterating over the
>> partitions of each rdd in a dStream formed using KafkaUtils,
l.ms:"1000",
session.timeout.ms:"3",
Spark:
spark.streaming.backpressure.enabled=true
spark.streaming.kafka.maxRatePerPartition=200
Exception in task 0.2 in stage 3236.0 (TID 77795)
java.util.ConcurrentModificationException: KafkaConsumer is not safe for
multi-threaded access
--
Kind Regards
Harsh
Hi,
I have just started using spark session, with hive enabled. but i am facing
some issue while updating hive warehouse directory post spark session
creation,
usecase: i want to read data from hive one cluster and write to hive on
another cluster
Please suggest if this can be done?
Hi
I want to read a hdfs directory which contains parquet files, how can i
stream data from this directory using streaming context (ssc.fileStream) ?
Harsh
Hi
I can see that exception is caused by following, csn you check where in
your code you are using this path
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does
not exist:
hdfs://testcluster:8020/experiments/vol/spark_chomp_data/bak/restaurants-bak/latest
On Wed, 17 Aug
How many CPU cores are on that machine? Read http://qr.ae/8Uv3Xq
You can also confirm the above by running the pmap utility on your process
and most of the virtual memory would be under 'anon'.
On Fri, 13 May 2016 09:11 jone, wrote:
> The virtual memory is 9G When i run
Hi
Please help.
On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, <takkarha...@gmail.com> wrote:
> Hi Ted
>
> Following is my use case.
>
> I have a prediction algorithm where i need to update some records to
> predict the target.
>
> For eg.
> I have an eq. Y= mX
L allows you to leverage existing code.
>
> If you can share some more of your use case, that would help other people
> provide suggestions.
>
> Thanks
>
> On May 6, 2016, at 6:57 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote:
>
> Hi Ted
>
> I am aware that rdd are im
> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com>
> wrote:
>
>> Hi
>>
>> Is there a way i can modify a RDD, in for-each loop,
>>
>> Basically, i have a use case in which i need to perform multiple
>> iteration over data and modify few values in each iteration.
>>
>>
>> Please help.
>>
>
>
Hi
Is there a way i can modify a RDD, in for-each loop,
Basically, i have a use case in which i need to perform multiple iteration
over data and modify few values in each iteration.
Please help.
of application.
What else I need to do so that app runs as fresh ?
Harsh Rathi
You should be able to cast the object type to the real underlying type
(GenericRecord (if generic, which is so by default), or the actual type
class (if specific)). The underlying implementation of KafkaAvroDecoder
seems to use either one of those depending on a config switch:
dha...@manthan.com> wrote:
> Your logs are getting archived in your logs bucket in S3.
>
>
> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html
>
> Regards
> Sab
>
> On Mon, Feb 22, 2016 at 12:14 PM, HARSH TAKKAR <tak
Hi
In am using an EMR cluster for running my spark jobs, but after the job
finishes logs disappear,
I have added a log4j.properties in my jar, but all the logs still redirects
to EMR resource manager which vanishes after jobs completes, is there a way
i could redirect the logs to a location in
can it be injected using autowiring
> to other classes.
>
> 2. what is the best way to submit jobs to spark , using the api or using
> the shell script?
>
> Looking forward for your help,
>
>
> Kind Regards
> Harsh
>
Hi
Please can anyone reply on this.
On Mon, 1 Feb 2016, 4:28 p.m. HARSH TAKKAR <takkarha...@gmail.com> wrote:
> Hi
>>
>> I am new to apache spark and big data analytics, before starting to code
>> on spark data frames and rdd, i just wanted to confirm follo
ite your code using Scala/ Python using the spark shell
> or a notebook like Ipython, zeppelin or if you have written a application
> using Scala/Java using the Spark API you can create a jar and run it using
> spark-submit.
>
> *From:* HARSH TAKKAR [mailto:takkarha...@gmail.com]
>
General note: The /root is a protected local directory, meaning that if
your program spawns as a non-root user, it will never be able to access the
file.
On Sat, Dec 12, 2015 at 12:21 AM Zhan Zhang wrote:
> As Sean mentioned, you cannot referring to the local file in
Do you have all your hive jars listed in the classpath.txt /
SPARK_DIST_CLASSPATH env., specifically the hive-exec jar? Is the location
of that jar also the same on all the distributed hosts?
Passing an explicit executor classpath string may also help overcome this
(replace HIVE_BASE_DIR to the
Are you certain you are providing Spark with the right Hive configuration?
Is there a valid HIVE_CONF_DIR defined in your spark-env.sh, with a
hive-site.xml detailing the location/etc. of the metastore service and/or
DB?
Without a valid metastore config, Hive may switch to using a local
I couldn't spot it anywhere on the web so it doesn't look to be contributed
yet, but note that the HDFS APIs are already available per
https://issues.apache.org/jira/browse/HDFS-6634 (you can see the test case
for an implementation guideline in Java:
Are you sure you do not have any messages preceding the trace, such as one
quoting which class is found to be missing? That'd be helpful to see and
suggest what may (exactly) be going wrong. It appear similar to
https://issues.apache.org/jira/browse/SPARK-8368, but I cannot tell for
certain cause
You could take a look at Livy also:
https://github.com/cloudera/livy#welcome-to-livy-the-rest-spark-server
On Fri, Dec 11, 2015 at 8:17 AM Andrew Or wrote:
> Hello,
>
> The hidden API was implemented for use internally and there are no plans
> to make it public at this
While the DataFrame lookups can identify that anonymous column name,
SparkSql does not appear to do so. You should use an alias instead:
val rows = Seq (("X", 1), ("Y", 5), ("Z", 4))
val rdd = sc.parallelize(rows)
val dataFrame = rdd.toDF("user","clicks")
val sumDf =
> and then calling getRowID() in the lambda, because the function gets sent
to the executor right?
Yes, that is correct (vs. a one time evaluation, as was with your
assignment earlier).
On Thu, Dec 10, 2015 at 3:34 AM Pinela wrote:
> Hey Bryan,
>
> Thank for the answer ;) I
59 matches
Mail list logo