Sounds good.
Should we add another paragraph after this paragraph in configuration.md to
explain executor env as well? I will be happy to upload a simple patch.
Note: When running Spark on YARN in cluster mode, environment variables
> need to be set using the spark.yarn.appMasterEnv.[EnvironmentV
Because spark-env.sh is something that makes sense only on the gateway
machine (where the app is being submitted from).
On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge wrote:
> Thanks Jacek and Marcelo!
>
> Any reason it is not sourced? Any security consideration?
>
>
> On Wed, Jan 3, 2018 at 9:59 AM,
Thanks Jacek and Marcelo!
Any reason it is not sourced? Any security consideration?
On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin wrote:
> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote:
> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
> > spark-env.sh sourced
1. It is all the result data in that trigger. Note that it takes a
DataFrame which is a purely logical representation of data and has no
association with partitions, etc. which are physical representations.
2. If you want to limit the amount of data that is processed in a trigger,
then you should
On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge wrote:
> I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is
> spark-env.sh sourced when starting the Spark AM container or the executor
> container?
No, it's not.
--
Marcelo
--
Hi:
The documentation for Sink.addBatch is as follows:
/** * Adds a batch of data to this sink. The data for a given `batchId` is
deterministic and if * this method is called more than once with the same
batchId (which will happen in the case of * failures), then `data` should
only be ad
Hi,
My understanding is that AM with the driver (in cluster deploy mode) and
executors are simple Java processes with their settings set one by one
while submitting a Spark application for execution and creating
ContainerLaunchContext for launching YARN containers. See
https://github.com/apache/sp