Currently we are trying AnalyticsZoo and Ray
Von meinem iPhone gesendet
> Am 23.02.2022 um 04:53 schrieb Bitfox :
>
>
> tensorflow itself can implement the distributed computing via a parameter
> server. Why did you want spark here?
>
> regards.
>
>> On Wed, Feb 23, 2022 at 11:27 AM
Hi,
you can also load other data source without Hive using spark read format into a
spark Dataframe . From there you can also combine the results using the
Dataframe world.
The use cases of hive is to have a common Abstraction layer when you want to do
data tagging, access management under
Hi,
haven’t heard about this but maybe first use something like
https://github.com/nerdocs/pydifact
and then convert to spark Dataframe ?
Br,
Dennis
Von meinem iPhone gesendet
> Am 06.01.2021 um 21:36 schrieb Ramki Ram :
>
>
> Hi Team,
>
> I want to know the way to transform 834 edi
Hi Joyan,
Spark uses its own metastore. Using Ranger you need to use the Hive Metastore.
For this you need to point to Hive Metastore and use HiveContext in your Spark
Code.
Br,
Dennis
Von meinem iPhone gesendet
> Am 23.11.2020 um 19:04 schrieb joyan sil :
>
>
> Hi,
>
> We have ranger
Hi Yang,
I am using Livy Server for submitting jobs.
Br,
Dennis
Von meinem iPhone gesendet
> Am 24.11.2020 um 03:34 schrieb Zhou Yang :
>
>
> Dear experts,
>
> I found a convenient way to submit job via Rest API at
>
Hi Spark Community, does somebody knows a Pyspark framework that integrates
with Apache Atlas ? I want to trigger tagging etc. durch my Pyspark Dataframe
Operations. Atlas has an API which I could use. So I could write my own
framework. But before I do this I wanted to ask whether knows
We are using Pycharm resp. R Studio with Spark libraries to submit Spark Jobs.
Von meinem iPhone gesendet
> Am 07.04.2020 um 18:10 schrieb yeikel valdes :
>
>
>
> Zeppelin is not an IDE but a notebook. It is helpful to experiment but it is
> missing a lot of the features that we expect
Hi,
I am also using Spark on Hive Metastore. The performance is much more better
esp. for larger datasets. I have the feeling that the performance is better if
I load the data into dataframes and do a join instead of doing direct join
within SparkSQL. But i can’t explain yet.
Any body
Hi,
is there any „best practise“ rule of thumb when to use local python instead of
distributed python on spark (data size, massive computation etc.) ? I mean
spark can also generate overhead and sometimes local processing is faster.
Br,
Dennis
Hi community,
I am using Spark on Yarn. When submiting a job after a long time I get an error
mesage and retry.
It happens when I want to store the dataframe to a table.
spark_df.write.option("path",
"/nlb_datalake/golden_zone/webhose/sentiment").saveAsTable("news_summary_test",
he
>> reasons and change jobs priorities in YARN scheduling configuration.
>>
>> Alternatively check the Apache Airflow project which is a good alternative
>> to Oozie.
>>
>> Regards,
>> Bartek
>>
>>> On Fri, Jul 19, 2019, 09:09 Dennis Suh
Dear experts,
I am using Spark for processing data from HDFS (hadoop). These Spark
application are data pipelines, data wrangling and machine learning
applications. Thus Spark submits its job using YARN.
This also works well. For scheduling I am now trying to use Apache Oozie, but I
am
12 matches
Mail list logo