Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Dennis Suhari
Currently we are trying AnalyticsZoo and Ray Von meinem iPhone gesendet > Am 23.02.2022 um 04:53 schrieb Bitfox : > >  > tensorflow itself can implement the distributed computing via a parameter > server. Why did you want spark here? > > regards. > >> On Wed, Feb 23, 2022 at 11:27 AM

Re: Is a Hive installation necessary for Spark SQL?

2021-04-25 Thread Dennis Suhari
Hi, you can also load other data source without Hive using spark read format into a spark Dataframe . From there you can also combine the results using the Dataframe world. The use cases of hive is to have a common Abstraction layer when you want to do data tagging, access management under

Re: Library to read health care EDI files in pyspark

2021-01-06 Thread Dennis Suhari
Hi, haven’t heard about this but maybe first use something like https://github.com/nerdocs/pydifact and then convert to spark Dataframe ? Br, Dennis Von meinem iPhone gesendet > Am 06.01.2021 um 21:36 schrieb Ramki Ram : > >  > Hi Team, > > I want to know the way to transform 834 edi

Re: How to apply ranger policies on Spark

2020-11-23 Thread Dennis Suhari
Hi Joyan, Spark uses its own metastore. Using Ranger you need to use the Hive Metastore. For this you need to point to Hive Metastore and use HiveContext in your Spark Code. Br, Dennis Von meinem iPhone gesendet > Am 23.11.2020 um 19:04 schrieb joyan sil : > >  > Hi, > > We have ranger

Re: How to submit a job via REST API?

2020-11-23 Thread Dennis Suhari
Hi Yang, I am using Livy Server for submitting jobs. Br, Dennis Von meinem iPhone gesendet > Am 24.11.2020 um 03:34 schrieb Zhou Yang : > >  > Dear experts, > > I found a convenient way to submit job via Rest API at >

Pyspark Framework for Apache Atlas (especially Tagging)

2020-10-20 Thread Dennis Suhari
Hi Spark Community, does somebody knows a Pyspark framework that integrates with Apache Atlas ? I want to trigger tagging etc. durch my Pyspark Dataframe Operations. Atlas has an API which I could use. So I could write my own framework. But before I do this I wanted to ask whether knows

Re:  IDE suitable for Spark

2020-04-07 Thread Dennis Suhari
We are using Pycharm resp. R Studio with Spark libraries to submit Spark Jobs. Von meinem iPhone gesendet > Am 07.04.2020 um 18:10 schrieb yeikel valdes : > >  > > Zeppelin is not an IDE but a notebook. It is helpful to experiment but it is > missing a lot of the features that we expect

Re: Optimising multiple hive table join and query in spark

2020-03-15 Thread Dennis Suhari
Hi, I am also using Spark on Hive Metastore. The performance is much more better esp. for larger datasets. I have the feeling that the performance is better if I load the data into dataframes and do a join instead of doing direct join within SparkSQL. But i can’t explain yet. Any body

Best practise local vs distributed python

2019-10-08 Thread Dennis Suhari
Hi, is there any „best practise“ rule of thumb when to use local python instead of distributed python on spark (data size, massive computation etc.) ? I mean spark can also generate overhead and sometimes local processing is faster. Br, Dennis

Memory Limits error

2019-08-15 Thread Dennis Suhari
Hi community, I am using Spark on Yarn. When submiting a job after a long time I get an error mesage and retry. It happens when I want to store the dataframe to a table. spark_df.write.option("path", "/nlb_datalake/golden_zone/webhose/sentiment").saveAsTable("news_summary_test",

Re: Spark and Oozie

2019-08-05 Thread Dennis Suhari
he >> reasons and change jobs priorities in YARN scheduling configuration. >> >> Alternatively check the Apache Airflow project which is a good alternative >> to Oozie. >> >> Regards, >> Bartek >> >>> On Fri, Jul 19, 2019, 09:09 Dennis Suh

Spark and Oozie

2019-07-19 Thread Dennis Suhari
Dear experts, I am using Spark for processing data from HDFS (hadoop). These Spark application are data pipelines, data wrangling and machine learning applications. Thus Spark submits its job using YARN. This also works well. For scheduling I am now trying to use Apache Oozie, but I am