Looking for a developer to help us with a small ETL project using Spark and Kubernetes

2019-07-18 Thread Information Technologies
Hello, We are looking for a developer to help us with a small ETL project using Spark and Kubernetes. Here are some of the requirements: 1. We need a REST API to run and schedule jobs. We would prefer this done in Node.js but can be done using Java. The REST API will not be available to the

Re: Usage of PyArrow in Spark

2019-07-18 Thread Bryan Cutler
It would be possible to use arrow on regular python udfs and avoid pandas, and there would probably be some performance improvement. The difficult part will be to ensure that the data remains consistent in the conversions between Arrow and Python, e.g. timestamps are a bit tricky. Given that we

Re: spark standalone mode problem about executor add and removed again and again!

2019-07-18 Thread Riccardo Ferrari
I would also check firewall rules. Is communication allowed on all the required port ranges and hosts ? On Thu, Jul 18, 2019 at 3:56 AM Amit Sharma wrote: > Do you have dynamic resource allocation enabled? > > > On Wednesday, July 17, 2019, zenglong chen > wrote: > >> Hi,all, >> My

Binding spark workers to a network interface

2019-07-18 Thread Supun Kamburugamuve
Hi all, Is there a configuration to force spark to use a specific network interface to communicate. The machines we are using have three network interfaces and we would like to bind them to a specific network interface. Best, Supun..

unsubscribe

2019-07-18 Thread Joevu
unsubscribe At 2019-07-16 23:24:28, "Dongjoon Hyun" wrote: Thank you for volunteering for 2.3.4 release manager, Kazuaki! It's great to see a new release manager in advance. :D Thank you for reply, Stavros. In addition to that issue, I'm also monitoring some other K8s issues and PRs.

Re: Usage of PyArrow in Spark

2019-07-18 Thread Abdeali Kothari
I was thinking of implementing that. But quickly realized that doing a conversion of Spark -> Pandas -> Python causes errors. A quick example being "None" in Numeric data types. Pandas supports only NaN. Spark supports NULL and NaN. This is just one of the issues I came to. I'm not sure about