Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Anwar AliKhan
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes minor error Spark r test failed , I don't use r so it doesn't effect me. ***installing help indices ** building package indices ** install

Re: Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Hyukjin Kwon
Looks like you haven't installed the 'e1071' package. 2020년 6월 24일 (수) 오후 6:49, Anwar AliKhan 님이 작성: > ./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr > -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes > > > > minor error Spark r test f

Re: Spark Small file issue

2020-06-24 Thread Bobby Evans
First, you need to be careful with coalesce. It will impact upstream processing, so if you are doing a lot of computation in the last stage before the repartition then coalesce will make the problem worse because all of that computation will happen in a single thread instead of being spread out. M

Re: Spark Small file issue

2020-06-24 Thread Koert Kuipers
i second that. we have gotten bitten too many times by coalesce impacting upstream in an unintended way that i avoid coalesce on write altogether. i prefer to use repartition (and take the shuffle hit) before writing (especially if you are writing out partitioned), or if possible use adaptive quer

LynxKite is now open-source

2020-06-24 Thread Daniel Darabos
Hi, LynxKite is a graph analytics application built on Apache Spark. (From very early on, like Spark 0.9.) We have talked about it on occasion on Spark Summits. So I wanted to let you know that it's now open-source! https://github.com/lynxkite/lynxkite You should totally check it out if you work

LynxKite is now open-source

2020-06-24 Thread Daniel Darabos
Hi, LynxKite is a graph analytics application built on Apache Spark. (From very early on, like Spark 0.9.) We have talked about it on occasion on Spark Summits. So I wanted to let you know that it's now open-source! https://github.com/lynxkite/lynxkite You should totally check it out if you work

High Availability for spark streaming application running in kubernetes

2020-06-24 Thread Shenson Joseph
Hello, I have a spark streaming application running in kubernetes and we use spark operator to submit spark jobs. Any suggestion on 1. How to handle high availability for spark streaming applications. 2. What would be the best approach to handle high availability of checkpoint data if we don't us

Re: Where are all the jars gone ?

2020-06-24 Thread ArtemisDev
If you are using Maven to manage your jar dependencies, the jar files are located in the maven repository on your home directory. It is usually in the .m2 directory. Hope this helps. -ND On 6/23/20 3:21 PM, Anwar AliKhan wrote: Hi, I prefer to do most of my projects in Python and for that I

Re: Error: Vignette re-building failed. Execution halted

2020-06-24 Thread Anwar AliKhan
THANKS ! It appears that was the last dependency for the build. sudo apt-get install -y r-cran-e1071. Shout out to ZOOM https://zoomadmin.com/HowToInstall/UbuntuPackage/r-cran-e1071 again like they say it was "It’s Super Easy! " package knitr was the previous missing dependency which I was a

Re: Where are all the jars gone ?

2020-06-24 Thread Anwar AliKhan
THANKS It appears the directory containing the jars have been switched from download version to source version. In the download version it is just below parent directory called jars. level 1. In the git source version it is 4 levels down in the directory /spark/assembly/target/scala-2.12/jars

Re: Where are all the jars gone ?

2020-06-24 Thread Jeff Evans
If I'm understanding this correctly, you are building Spark from source and using the built artifacts (jars) in some other project. Correct? If so, then why are you concerning yourself with the directory structure that Spark, internally, uses when building its artifacts? It should be a black box

Re: Where are all the jars gone ?

2020-06-24 Thread Anwar AliKhan
I am using the method describe on this page for Scala development in eclipse. https://data-flair.training/blogs/create-spark-scala-project/ in the middle of the page you will find *“y**ou will see lots of error due to missing libraries.* viii. Add Spark Libraries” Now that I have my own buil

Arrow RecordBatches to Spark Dataframe

2020-06-24 Thread Tanveer Ahmad - EWI
Hi all, I have a small question, if you people can help me. In this code snippet

[Structured spak streaming] How does cassandra connector readstream deals with deleted record

2020-06-24 Thread Rahul Kumar
Hello everyone, I was wondering, how Cassandra spark connector deals with deleted/updated record while readstream operation. If the record was already fetched in spark memory, and it got updated or deleted in database, does it get reflected in streaming join? Thanks, Rahul -- Sent from: http:/