Re: convert java dataframe to pyspark dataframe

2021-03-31 Thread Aditya Singh
Thanks a lot Khalid for replying. I have one question though. The approach tou showed needs an understanding on python side before hand about the data type of columns of dataframe. Can we implement a generic approach where this info is not required and we just have the java dataframe as input on p

Re: convert java dataframe to pyspark dataframe

2021-03-31 Thread Khalid Mammadov
I think what you want to achieve is what PySpark is actually doing in it's API under the hood. So, specifically you need to look at PySpark's implementation of DataFrame, SparkSession and SparkContext API. Under the hood that what is happening, it start a py4j gateway and delegates all Spark o

Re: Spark thrift server ldap

2021-03-31 Thread Pavel Solomin
Hello! Can anyone give some attention this question about LDAP please? Me an my colleague have faced similar issues trying to setting up LDAP for ThriftServer, and actually ended up patching Hive Service code and shading Spark's LdapAuthenticationProviderImpl.java with our own custom jar, passed

Re: Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-31 Thread GUINKO Ferdinand
I have exited from the container, and logged in using:sudo docker run -it -p8080:8080 ubuntu Then I tried to launch Start Standalone Spark Master Server doing: start-master.sh and got the following message: bash: start-master.sh: command not foundSo I started the process of setting the environme

Re: Introducing Gallia: a Scala+Spark library for data manipulation

2021-03-31 Thread galliaproject
I posted another update on the scala mailing list: https://users.scala-lang.org/t/introducing-gallia-a-library-for-data-manipulation/7112/11 It notably pertains to: - A full *RDD*-powered example: https://github.com/galliaproject/gallia-genemania-spark#description (via EMR) - New license (*BSL*):

Re: Source.getBatch and schema vs qe.analyzed.schema?

2021-03-31 Thread Bartosz Konieczny
Hi Jacek, An interesting question! I don't know the exact answer and will be happy to learn by the way :) Below you can find my understanding for these 2 things, hoping it helps a little. For me, we can distinguish 2 different source categories. The first of them is a source with some fixed schem

Re: convert java dataframe to pyspark dataframe

2021-03-31 Thread Aditya Singh
Thanks a lot, this was really helpful. On Wed, 31 Mar 2021 at 4:13 PM, Khalid Mammadov wrote: > I think what you want to achieve is what PySpark is actually doing in it's > API under the hood. > > So, specifically you need to look at PySpark's implementation of > DataFrame, SparkSession and Spar

Re: Ubuntu 18.04: Docker: start-master.sh: command not found

2021-03-31 Thread JB Data31
find the *start-master.sh* file with command excuted as root *find / -name "start-master.sh" -print.* Check that the directory found is well put in the $PATH. As a first step go to the directory where the *start-master.sh* is with *cd* command and execute with command *./start-master.sh*. @*JB*Δ