Re: Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-05-01 Thread HARSH TAKKAR
We scanned 3 versions of spark 3.0.0, 3.1.3, 3.2.1 On Tue, 26 Apr, 2022, 18:46 Bjørn Jørgensen, wrote: > What version of spark is it that you have scanned? > > > > tir. 26. apr. 2022 kl. 12:48 skrev HARSH TAKKAR : > >> Hello, >> >> Please let me know if th

Vulnerabilities in htrace-core4-4.1.0-incubating.jar jar used in spark.

2022-04-26 Thread HARSH TAKKAR
-14379 CVE-2019-12086 CVE-2018-7489 CVE-2018-5968 CVE-2018-14719 CVE-2018-14718 CVE-2018-12022 CVE-2018-11307 CVE-2017-7525 CVE-2017-17485 CVE-2017-15095 Kind Regards Harsh Takkar

Unsubscribe

2021-11-17 Thread HARSH TAKKAR
Unsubscribe

Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma
On 2021/09/02 06:00:26, Harsh Sharma wrote: > Please Find reply : > Do you know when in your application lifecycle it happens? Spark SQL or > > Structured Streaming? > > ans :its Spark SQL > > Do you use broadcast variables ? > > ans : yes we

Spark Phoenix Connection Exception while loading from Phoenix tables

2021-09-02 Thread Harsh Sharma
[01/09/21 11:55:51,861 WARN pool-1-thread-1](Client) Exception encountered while connecting to the server : java.lang.NullPointerException [01/09/21 11:55:51,862 WARN pool-1-thread-1](Client) Exception encountered while connecting to the server : java.lang.NullPointerException [01/09/21

Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma
ila.pl/> > Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma > wrote: > > > We are facing issue in production where we are getting frequent > > > > Still

Re: Connection reset by peer : failed to remove cache rdd

2021-09-01 Thread Harsh Sharma
; Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma > wrote: > > > We are facing issue in production where we are getting frequent > > > > Still have 1 requ

Connection reset by peer : failed to remove cache rdd

2021-08-30 Thread Harsh Sharma
We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to

Spark Issues while upgrade to 2.4 from 1.6 in Parcels

2021-08-11 Thread Harsh Sharma
hi Team , we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded version of park from 1.6 to 2.4 . While executing a spark program we are getting the below error : Please help us how to resolve in cloudera parcels. There are suggestion to install spark gateway roles

Cloudera Parcel : spark issues after upgrade 1.6 to 2.4

2021-07-30 Thread Harsh Sharma
hi Team , we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded version of park from 1.6 to 2.4 . While executing a spark program we are getting the below error : Please help us how to resolve in cloudera parcels. There are suggestion to install spark gateway roles

Re: Connection Reset by Peer : failed to remove cached rdd

2021-07-30 Thread Harsh Sharma
[Stage 284:>(199 + 1) / 200][Stage 292:> (1 + 3) / 200] [Stage 284:>(199 + 1) / 200][Stage 292:> (2 + 3) / 200] [Stage 292:> (2 + 4) / 200][14/06/21 10:46:17,006 WARN

Re: Using Custom Scala Spark ML Estimator in PySpark

2021-02-16 Thread HARSH TAKKAR
Hello Sean, Thanks for the advice, can you please point me to an example where i can find a custom wrapper for python. Kind Regards Harsh Takkar On Tue, 16 Feb, 2021, 8:25 pm Sean Owen, wrote: > You won't be able to use it in python if it is implemented in Java - needs > a python wrapp

Using Custom Scala Spark ML Estimator in PySpark

2021-02-15 Thread HARSH TAKKAR
in the class pass using "spark.jars" Can you please help, if i am missing something. Kind Regards Harsh Takkar

Re: Spark structured streaming: periodically refresh static data frame

2020-09-17 Thread Harsh
As per the solution, if we are closing and starting the query, then what happens to the the state which is maintained in memory, will that be retained ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To

Re: How to enable hive support on an existing Spark session?

2020-05-27 Thread HARSH TAKKAR
Hi Kun, You can use following spark property instead while launching the app instead of manually enabling it in the code. spark.sql.catalogImplementation=hive Kind Regards Harsh On Tue, May 26, 2020 at 9:55 PM Kun Huang (COSMOS) wrote: > > Hi Spark experts, > > I am seeking for

Structured Streaming using Kafka Avro Record in 2.3.0

2020-04-28 Thread HARSH TAKKAR
Hi How can we deserialise avro record read from kafka in spark 2.3.0 in optimised manner. I could see that native support for avro was added in 2.4.x. Currently i am using following library which is very slow. com.twitter bijection-avro_2.11 Kind Regards Harsh Takkar

Reading 7z file in spark

2020-01-13 Thread HARSH TAKKAR
Hi, Is it possible to read 7z compressed file in spark? Kind Regards Harsh Takkar

Re: Hive External Table Partiton Data Type.

2019-12-16 Thread HARSH TAKKAR
Hi 10 Time taken: 0.356 seconds, Fetched: 2 row(s) hive> describe longpartition; OK b string a bigint # Partition Information # col_name data_type comment a bigint On Mon, Dec 16, 2019 at 11:05 AM SB M wrote: > spark version 2

Re: Hive External Table Partiton Data Type.

2019-12-15 Thread HARSH TAKKAR
Please share the spark version you are using . On Fri, 13 Dec, 2019, 4:02 PM SB M, wrote: > Hi All, >Am trying to create a dynamic partition with external table on hive > metastore using spark sql. > > when am trying to create a partition column data type as bigint, partition > is not

Re: Unable to write data from Spark into a Hive Managed table

2019-08-20 Thread HARSH TAKKAR
Please refere to the following documentation on how to write data into hive in hdp3.1 https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html Harsh On Fri, 9 Aug, 2019, 10:21 PM Mich Talebzadeh, wrote

Re: Back pressure not working on streaming

2019-01-01 Thread HARSH TAKKAR
rate it will increase the consumption of records for processing in each batch. However i, feel 10 is way to low number for 32 partitioned kafka topic. Regards Harsh Happy New Year On Wed 2 Jan, 2019, 08:33 JF Chen I have set spark.streaming.backpressure.enabled to true

Re: executing stored procedure through spark

2018-08-13 Thread HARSH TAKKAR
Hi You can call the java program directly though pyspark, Following is the code that will help. sc._jvm.. Harsh Takkar On Sun, Aug 12, 2018 at 9:27 PM amit kumar singh wrote: > Hi /team, > > The way we call java program to executed stored procedure > is there any way we

Re: Pyspark access to scala/java libraries

2018-07-18 Thread HARSH TAKKAR
Hi You can access your java packages using following in pySpark obj = sc._jvm.yourPackage.className() Kind Regards Harsh Takkar On Wed, Jul 18, 2018 at 4:00 AM Mohit Jaggi wrote: > Thanks 0xF0F0F0 and Ashutosh for the pointers. > > Holden, > I am trying to look into sparklingml

Sklearn model in pyspark prediction

2018-05-15 Thread HARSH TAKKAR
Hi, Is there a way to load model saved using sklearn lib in pyspark/ scala spark for prediction. Thanks

Data of ArrayType field getting truncated when saving to parquet

2018-01-31 Thread HARSH TAKKAR
Hi I have a dataframe with a field of type array which is of large size, when i am trying to save the data to parquet file and read it again , array field comes out as empty array. Please help Harsh

Long running Spark Job Status on Remote Submission

2017-11-20 Thread Harsh Choudhary
. How do I get the status of such long-running jobs so that I can do the further tasks on my remote machine after the job completion? Livy is one choice but I want to do it without that, if possible. *Thanks!* Harsh Choudhary

Does Random Forest in spark ML supports multi label classification in scala

2017-11-07 Thread HARSH TAKKAR
Hi Does Random Forest in spark Ml supports multi label classification in scala ? I found out, sklearn provides sklearn.ensemble.RandomForestClassifier in python, do we have the similar functionality in scala ?

Building Spark with hive 1.1.0

2017-11-06 Thread HARSH TAKKAR
Hi I am using the cloudera (cdh5.11.0) setup, which have the hive version as 1.1.0, but when i build spark with hive and thrift support it pack the hive version as 1.6.0, Please let me know how can i build spark with hive 1.1.0 ? command i am using to build : ./dev/make-distribution.sh --name

Re: Spark job's application tracking URL not accessible from docker container

2017-10-31 Thread Harsh
Hi I am facing the same issue while launching the application inside docker container. Kind Regards Harsh -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr

Re: Database insert happening two times

2017-10-17 Thread Harsh Choudhary
.are the multiple rows being written dupes (they have all same >> fields/values)? >> Hth >> >> >> On Oct 17, 2017 1:08 PM, "Harsh Choudhary" <shry.ha...@gmail.com> wrote: >> >>> This is the code - >>> hdfs_path= >>> if(hd

Re: Database insert happening two times

2017-10-17 Thread Harsh Choudhary
val updatelambdaReq:InvokeRequest = new InvokeRequest(); updatelambdaReq.setFunctionName(updateFunctionName); updatelambdaReq.setPayload(updatedLambdaJson.toString()); System.out.println("Calling lambda to add log"); val updateLambdaResult = byteBufferToString(lambdaClient.invoke(updat

Database insert happening two times

2017-10-17 Thread Harsh Choudhary
of code in workers? If it is so then how can I solve it so that it only writes once. *Thanks!* *Cheers!* Harsh Choudhary

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-19 Thread HARSH TAKKAR
; > On Mon, Sep 18, 2017 at 1:56 AM, HARSH TAKKAR <takkarha...@gmail.com> > wrote: > > Hi > > > > Changing spark version if my last resort, is there any other workaround > for > > this problem. > > > > > > On Mon, Sep 18, 2017 at 11:43 AM pandees

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread HARSH TAKKAR
n Sep 17, 2017, at 11:08 PM, Anastasios Zouzias <zouz...@gmail.com> > wrote: > > Hi, > > I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 > solved my issue. Can you try with 2.1.1 as well and report back? > > Best, > Anastasios > > Am 17

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
s in your application? > > On Sun, Sep 17, 2017 at 7:48 AM, HARSH TAKKAR <takkarha...@gmail.com> > wrote: > >> >> Hi >> >> I am using spark 2.1.0 with scala 2.11.8, and while iterating over the >> partitions of each rdd in a dStream formed using KafkaUtils,

ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
l.ms:"1000", session.timeout.ms:"3", Spark: spark.streaming.backpressure.enabled=true spark.streaming.kafka.maxRatePerPartition=200 Exception in task 0.2 in stage 3236.0 (TID 77795) java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access -- Kind Regards Harsh

update hive metastore in spark session at runtime

2017-09-01 Thread HARSH TAKKAR
Hi, I have just started using spark session, with hive enabled. but i am facing some issue while updating hive warehouse directory post spark session creation, usecase: i want to read data from hive one cluster and write to hive on another cluster Please suggest if this can be done?

Reading parquet file in stream

2017-08-16 Thread HARSH TAKKAR
Hi I want to read a hdfs directory which contains parquet files, how can i stream data from this directory using streaming context (ssc.fileStream) ? Harsh

Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-17 Thread HARSH TAKKAR
Hi I can see that exception is caused by following, csn you check where in your code you are using this path Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://testcluster:8020/experiments/vol/spark_chomp_data/bak/restaurants-bak/latest On Wed, 17 Aug

Re: High virtual memory consumption on spark-submit client.

2016-05-12 Thread Harsh J
How many CPU cores are on that machine? Read http://qr.ae/8Uv3Xq You can also confirm the above by running the pmap utility on your process and most of the virtual memory would be under 'anon'. On Fri, 13 May 2016 09:11 jone, wrote: > The virtual memory is 9G When i run

Re: Updating Values Inside Foreach Rdd loop

2016-05-09 Thread HARSH TAKKAR
Hi Please help. On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, <takkarha...@gmail.com> wrote: > Hi Ted > > Following is my use case. > > I have a prediction algorithm where i need to update some records to > predict the target. > > For eg. > I have an eq. Y= mX

Re: Updating Values Inside Foreach Rdd loop

2016-05-07 Thread HARSH TAKKAR
L allows you to leverage existing code. > > If you can share some more of your use case, that would help other people > provide suggestions. > > Thanks > > On May 6, 2016, at 6:57 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote: > > Hi Ted > > I am aware that rdd are im

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread HARSH TAKKAR
> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com> > wrote: > >> Hi >> >> Is there a way i can modify a RDD, in for-each loop, >> >> Basically, i have a use case in which i need to perform multiple >> iteration over data and modify few values in each iteration. >> >> >> Please help. >> > >

Updating Values Inside Foreach Rdd loop

2016-05-06 Thread HARSH TAKKAR
Hi Is there a way i can modify a RDD, in for-each loop, Basically, i have a use case in which i need to perform multiple iteration over data and modify few values in each iteration. Please help.

Inconsistent performance across multiple iterations of same application

2016-03-02 Thread Harsh Rathi
of application. What else I need to do so that app runs as fresh ? Harsh Rathi

Re: Access fields by name/index from Avro data read from Kafka through Spark Streaming

2016-02-25 Thread Harsh J
You should be able to cast the object type to the real underlying type (GenericRecord (if generic, which is so by default), or the actual type class (if specific)). The underlying implementation of KafkaAvroDecoder seems to use either one of those depending on a config switch:

Re: [Please Help] Log redirection on EMR

2016-02-23 Thread HARSH TAKKAR
dha...@manthan.com> wrote: > Your logs are getting archived in your logs bucket in S3. > > > http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html > > Regards > Sab > > On Mon, Feb 22, 2016 at 12:14 PM, HARSH TAKKAR <tak

[Please Help] Log redirection on EMR

2016-02-21 Thread HARSH TAKKAR
Hi In am using an EMR cluster for running my spark jobs, but after the job finishes logs disappear, I have added a log4j.properties in my jar, but all the logs still redirects to EMR resource manager which vanishes after jobs completes, is there a way i could redirect the logs to a location in

Using Java spring injection with spark

2016-02-01 Thread HARSH TAKKAR
can it be injected using autowiring > to other classes. > > 2. what is the best way to submit jobs to spark , using the api or using > the shell script? > > Looking forward for your help, > > > Kind Regards > Harsh >

Re: Using Java spring injection with spark

2016-02-01 Thread HARSH TAKKAR
Hi Please can anyone reply on this. On Mon, 1 Feb 2016, 4:28 p.m. HARSH TAKKAR <takkarha...@gmail.com> wrote: > Hi >> >> I am new to apache spark and big data analytics, before starting to code >> on spark data frames and rdd, i just wanted to confirm follo

Re: Using Java spring injection with spark

2016-02-01 Thread HARSH TAKKAR
ite your code using Scala/ Python using the spark shell > or a notebook like Ipython, zeppelin or if you have written a application > using Scala/Java using the Spark API you can create a jar and run it using > spark-submit. > > *From:* HARSH TAKKAR [mailto:takkarha...@gmail.com] >

Re: how to access local file from Spark sc.textFile("file:///path to/myfile")

2015-12-11 Thread Harsh J
General note: The /root is a protected local directory, meaning that if your program spawns as a non-root user, it will never be able to access the file. On Sat, Dec 12, 2015 at 12:21 AM Zhan Zhang wrote: > As Sean mentioned, you cannot referring to the local file in

Re: Classpath problem trying to use DataFrames

2015-12-11 Thread Harsh J
Do you have all your hive jars listed in the classpath.txt / SPARK_DIST_CLASSPATH env., specifically the hive-exec jar? Is the location of that jar also the same on all the distributed hosts? Passing an explicit executor classpath string may also help overcome this (replace HIVE_BASE_DIR to the

Re: Unable to acces hive table (created through hive context) in hive console

2015-12-10 Thread Harsh J
Are you certain you are providing Spark with the right Hive configuration? Is there a valid HIVE_CONF_DIR defined in your spark-env.sh, with a hive-site.xml detailing the location/etc. of the metastore service and/or DB? Without a valid metastore config, Hive may switch to using a local

Re: INotifyDStream - where to find it?

2015-12-10 Thread Harsh J
I couldn't spot it anywhere on the web so it doesn't look to be contributed yet, but note that the HDFS APIs are already available per https://issues.apache.org/jira/browse/HDFS-6634 (you can see the test case for an implementation guideline in Java:

Re: Can't filter

2015-12-10 Thread Harsh J
Are you sure you do not have any messages preceding the trace, such as one quoting which class is found to be missing? That'd be helpful to see and suggest what may (exactly) be going wrong. It appear similar to https://issues.apache.org/jira/browse/SPARK-8368, but I cannot tell for certain cause

Re: Spark job submission REST API

2015-12-10 Thread Harsh J
You could take a look at Livy also: https://github.com/cloudera/livy#welcome-to-livy-the-rest-spark-server On Fri, Dec 11, 2015 at 8:17 AM Andrew Or wrote: > Hello, > > The hidden API was implemented for use internally and there are no plans > to make it public at this

Re: how to reference aggregate columns

2015-12-09 Thread Harsh J
While the DataFrame lookups can identify that anonymous column name, SparkSql does not appear to do so. You should use an alias instead: val rows = Seq (("X", 1), ("Y", 5), ("Z", 4)) val rdd = sc.parallelize(rows) val dataFrame = rdd.toDF("user","clicks") val sumDf =

Re: SparkStreaming variable scope

2015-12-09 Thread Harsh J
> and then calling getRowID() in the lambda, because the function gets sent to the executor right? Yes, that is correct (vs. a one time evaluation, as was with your assignment earlier). On Thu, Dec 10, 2015 at 3:34 AM Pinela wrote: > Hey Bryan, > > Thank for the answer ;) I