Hi wojciech, I also faced many problems while setting yarn with PredictionIO. This may be the case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train e.g., pio train -- --master yarn --deploy-mode client
Thanks and Regards Ambuj Sharma Sunrise may late, But Morning is sure..... Team ML Betaout On Wed, May 23, 2018 at 4:53 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > Actually you might search the archives for “yarn” because I don’t recall > how the setup works off hand. > > Archives here: https://lists.apache.org/list.html?user@ > predictionio.apache.org > > Also check the Spark Yarn requirements and remember that `pio train … -- > various Spark params` allows you to pass arbitrary Spark params exactly as > you would to spark-submit on the pio command line. The double dash > separates PIO and Spark params. > > > From: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com> > Reply: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org> > Date: May 22, 2018 at 4:07:38 PM > To: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org>, Wojciech Kowalski > <wojci...@tomandco.co.uk> <wojci...@tomandco.co.uk> > > Subject: RE: Problem with training in yarn cluster > > What is the command line for `pio train …` Specifically are you using > yarn-cluster mode? This causes the driver code, which is a PIO process, to > be executed on an executor. Special setup is required for this. > > > From: Wojciech Kowalski <wojci...@tomandco.co.uk> > <wojci...@tomandco.co.uk> > Reply: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org> > Date: May 22, 2018 at 2:28:43 PM > To: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org> > Subject: RE: Problem with training in yarn cluster > > Hello, > > > > Actually I have another error in logs that is actually preventing train as > well: > > > > [INFO] [RecommendationEngine$] > > > > _ _ __ __ _ > > /\ | | (_) | \/ | | > > / \ ___| |_ _ ___ _ __ | \ / | | > > / /\ \ / __| __| |/ _ \| '_ \| |\/| | | > > / ____ \ (__| |_| | (_) | | | | | | | |____ > > /_/ \_\___|\__|_|\___/|_| |_|_| |_|______| > > > > > > > > [INFO] [Engine] Extracting datasource params... > > [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used. > > [INFO] [Engine] Datasource params: > (,DataSourceParams(shop_live,List(purchase, basket-add, wishlist-add, > view),None,None)) > > [INFO] [Engine] Extracting preparator params... > > [INFO] [Engine] Preparator params: (,Empty) > > [INFO] [Engine] Extracting serving params... > > [INFO] [Engine] Serving params: (,Empty) > > [INFO] [log] Logging initialized @6774ms > > [INFO] [Server] jetty-9.2.z-SNAPSHOT > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@1798eb08{/jobs,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@47c4c3cd{/jobs/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@3e080dea{/jobs/job,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@c75847b{/jobs/job/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@5ce5ee56{/stages,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@3dde94ac{/stages/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@4347b9a0{/stages/stage,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@63b1bbef{/stages/stage/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@10556e91{/stages/pool,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@5967f3c3{/stages/pool/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@2793dbf6{/storage,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@49936228{/storage/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@7289bc6d{/storage/rdd,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@1496b014{/storage/rdd/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@2de3951b{/environment,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@7f3330ad{/environment/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@40e681f2{/executors,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@61519fea{/executors/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@502b9596{/executors/threadDump,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@367b7166{/executors/threadDump/json,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@42669f4a{/static,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@2f25f623{/,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@23ae4174{/api,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@4e33e426{/jobs/job/kill,null,AVAILABLE,@Spark} > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@38d9ae65{/stages/stage/kill,null,AVAILABLE,@Spark} > > [INFO] [ServerConnector] Started Spark@17239b3{HTTP/1.1}{0.0.0.0:47948} > > [INFO] [Server] Started @7040ms > > [INFO] [ContextHandler] Started > o.s.j.s.ServletContextHandler@16cffbe4{/metrics/json,null,AVAILABLE,@Spark} > > [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request > executors before the AM has registered! > > [ERROR] [ApplicationMaster] Uncaught exception: > > > > Thanks, > > Wojciech > > > > *From: *Wojciech Kowalski <wojci...@tomandco.co.uk> > *Sent: *22 May 2018 23:20 > *To: *user@predictionio.apache.org > *Subject: *Problem with training in yarn cluster > > > > Hello, I am trying to setup distributed cluster with separate all services > but i have problem while running train: > > > > log4j:ERROR setFile(null,true) call failed. > > java.io.FileNotFoundException: /pio/pio.log (No such file or directory) > > at java.io.FileOutputStream.open0(Native Method) > > at java.io.FileOutputStream.open(FileOutputStream.java:270) > > at java.io.FileOutputStream.<init>(FileOutputStream.java:213) > > at java.io.FileOutputStream.<init>(FileOutputStream.java:133) > > at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) > > at > org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) > > at > org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) > > at > org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) > > at > org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) > > at > org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) > > at > org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) > > at > org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) > > at > org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) > > at > org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) > > at > org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) > > at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) > > at > org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117) > > at > org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:738) > > at org.apache.spark.internal.Logging$class.log(Logging.scala:46) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:738) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:753) > > at > org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) > > > > > > setup: > > hbase > > Hadoop > > Hdfs > > Spark cluster with yarn > > > > Training in cluster mode > > I assume spark worker is trying to save log to /pio/pio.log on worker > machine instead of pio host. How can I set pio destination to hdfs path ? > > > > Or any other advice ? > > > > Thanks, > > Wojciech > > > >