Hello, I am running pio train *on an edge node* of distributed 8 node spark cluster & 3 node Hbase. When I run "pio train" the job runs but it runs on local spark & not submitted to cluster. If I do "pio train *--master spark://localhost:7077" *or "pio train *--master yarn-cluster" *I get below error -
* File does not exist: hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0024/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar* *java.io.FileNotFoundException: File does not exist: hdfs://mdc2vra179.federated.fds:8020/user/da_mcom_milan/.sparkStaging/application_1489598450058_0024/template-scala-parallel-universal-recommendation-assembly-0.5.0-deps.jar* *at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427)* *at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419)* mdc2vra179 is my Hbase cluster node, also running Namenode. Not sure why my spark expecting a jar file on hbase/Namenode. *$PIO_HOME/conf/pio-env.sh-* SPARK_HOME=$PIO_HOME/vendors/spark HBASE_CONF_DIR=$PIO_HOME/vendors/hbase/conf PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pros-prod PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=mdc2vra176 PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models PIO_STORAGE_SOURCES_HBASE_TYPE=hbase PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase Thanks, Malay
