Spark - Hadoop custom filesystem service loading

2019-03-18 Thread Jhon Anderson Cardenas Diaz
Hi everyone, On spark 2.2.0, if you wanted to create a custom file system implementation, you just created an extension of org.apache.hadoop.fs.FileSystem and put the canonical name of the custom class on the file src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem. Once you

Re: Spark is only using one worker machine when more are available

2018-04-12 Thread Jhon Anderson Cardenas Diaz
, dt, connProp); > jdbcDF.createOrReplaceTempView(tableInfo.tmp_table_name); > } > } > > > // Then run a query and write the result set to mysql > > Dataset result = ss.sql(this.sql); > result.explain(true); >

Re: Spark is only using one worker machine when more are available

2018-04-11 Thread Jhon Anderson Cardenas Diaz
Hi, could you please share the environment variables values that you are sending when you run the jobs, spark version, etc.. more details. Btw, you should take a look on SPARK_WORKER_INSTANCES and SPARK_WORKER_CORES if you are using spark 2.0.0

How to create security filter for Spark UI in Spark on YARN

2018-01-09 Thread Jhon Anderson Cardenas Diaz
*Environment*: AWS EMR, yarn cluster. *Description*: I am trying to use a java filter to protect the access to spark ui, this by using the property spark.ui.filters; the problem is that when spark is running on yarn mode, that property is being allways overriden by hadoop with the filter

Spark UI stdout/stderr links point to executors internal address

2018-01-09 Thread Jhon Anderson Cardenas Diaz
*Environment:* AWS EMR, yarn cluster. *Description:* On Spark ui, in Environment and Executors tabs, the links of stdout and stderr point to the internal address of the executors. This would imply to expose the executors so that links can be accessed. Shouldn't those links be pointed to