from:"Dennis Suhari"

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

2022-02-23 Thread Dennis Suhari

Currently we are trying AnalyticsZoo and Ray Von meinem iPhone gesendet > Am 23.02.2022 um 04:53 schrieb Bitfox : > > > tensorflow itself can implement the distributed computing via a parameter > server. Why did you want spark here? > > regards. > >> On Wed, Feb 23, 2022 at 11:27 AM

Re: Is a Hive installation necessary for Spark SQL?

2021-04-25 Thread Dennis Suhari

Hi, you can also load other data source without Hive using spark read format into a spark Dataframe . From there you can also combine the results using the Dataframe world. The use cases of hive is to have a common Abstraction layer when you want to do data tagging, access management under

Re: Library to read health care EDI files in pyspark

2021-01-06 Thread Dennis Suhari

Hi, haven’t heard about this but maybe first use something like https://github.com/nerdocs/pydifact and then convert to spark Dataframe ? Br, Dennis Von meinem iPhone gesendet > Am 06.01.2021 um 21:36 schrieb Ramki Ram : > > > Hi Team, > > I want to know the way to transform 834 edi

Re: How to apply ranger policies on Spark

2020-11-23 Thread Dennis Suhari

Hi Joyan, Spark uses its own metastore. Using Ranger you need to use the Hive Metastore. For this you need to point to Hive Metastore and use HiveContext in your Spark Code. Br, Dennis Von meinem iPhone gesendet > Am 23.11.2020 um 19:04 schrieb joyan sil : > > > Hi, > > We have ranger

Re: How to submit a job via REST API?

2020-11-23 Thread Dennis Suhari

Hi Yang, I am using Livy Server for submitting jobs. Br, Dennis Von meinem iPhone gesendet > Am 24.11.2020 um 03:34 schrieb Zhou Yang : > > > Dear experts, > > I found a convenient way to submit job via Rest API at >

Pyspark Framework for Apache Atlas (especially Tagging)

2020-10-20 Thread Dennis Suhari

Hi Spark Community, does somebody knows a Pyspark framework that integrates with Apache Atlas ? I want to trigger tagging etc. durch my Pyspark Dataframe Operations. Atlas has an API which I could use. So I could write my own framework. But before I do this I wanted to ask whether knows

Re: IDE suitable for Spark

2020-04-07 Thread Dennis Suhari

We are using Pycharm resp. R Studio with Spark libraries to submit Spark Jobs. Von meinem iPhone gesendet > Am 07.04.2020 um 18:10 schrieb yeikel valdes : > > > > Zeppelin is not an IDE but a notebook. It is helpful to experiment but it is > missing a lot of the features that we expect

Re: Optimising multiple hive table join and query in spark

2020-03-15 Thread Dennis Suhari

Hi, I am also using Spark on Hive Metastore. The performance is much more better esp. for larger datasets. I have the feeling that the performance is better if I load the data into dataframes and do a join instead of doing direct join within SparkSQL. But i can’t explain yet. Any body

Best practise local vs distributed python

2019-10-08 Thread Dennis Suhari

Hi, is there any „best practise“ rule of thumb when to use local python instead of distributed python on spark (data size, massive computation etc.) ? I mean spark can also generate overhead and sometimes local processing is faster. Br, Dennis

Memory Limits error

2019-08-15 Thread Dennis Suhari

Hi community, I am using Spark on Yarn. When submiting a job after a long time I get an error mesage and retry. It happens when I want to store the dataframe to a table. spark_df.write.option("path", "/nlb_datalake/golden_zone/webhose/sentiment").saveAsTable("news_summary_test",

Re: Spark and Oozie

2019-08-05 Thread Dennis Suhari

he >> reasons and change jobs priorities in YARN scheduling configuration. >> >> Alternatively check the Apache Airflow project which is a good alternative >> to Oozie. >> >> Regards, >> Bartek >> >>> On Fri, Jul 19, 2019, 09:09 Dennis Suh

Spark and Oozie

2019-07-19 Thread Dennis Suhari

Dear experts, I am using Spark for processing data from HDFS (hadoop). These Spark application are data pipelines, data wrangling and machine learning applications. Thus Spark submits its job using YARN. This also works well. For scheduling I am now trying to use Apache Oozie, but I am

Re: [E] COMMERCIAL BULK: Re: TensorFlow on Spark

Re: Is a Hive installation necessary for Spark SQL?

Re: Library to read health care EDI files in pyspark

Re: How to apply ranger policies on Spark

Re: How to submit a job via REST API?

Pyspark Framework for Apache Atlas (especially Tagging)

Re: IDE suitable for Spark

Re: Optimising multiple hive table join and query in spark

Best practise local vs distributed python

Memory Limits error

Re: Spark and Oozie

Spark and Oozie

12 matches

Site Navigation

Mail list logo

Footer information