It looks like this Scala function is the source of that jars list: https://github.com/apache/predictionio/blob/develop/tools/src/main/scala/org/apache/predictionio/tools/Common.scala#L81
On Fri, Mar 9, 2018 at 17:42 Mars Hall <mars.h...@salesforce.com> wrote: > Where does the classpath in spark-submit originate? Is > compute-classpath.sh not the source? > > As noted previously, the stable-ordering fix by me in compute-classpath.sh > no longer seems to be effective either. > > Looks like some tracing of classpath assembly through the Spark command > runner is required: > https://github.com/apache/predictionio/blob/develop/tools/src/main/scala/org/apache/predictionio/tools/Runner.scala#L185 > > Unless someone with more knowledge of these internals could weigh-in… > Donald? 😬😊 > > On Fri, Mar 9, 2018 at 15:44 Shane Johnson <sh...@liftiq.com> wrote: > >> One additional item that you mentioned earlier is that we would need to >> remove or skip the aws-java-sdk.jar that is already in the CLASSPATH. Do >> you think this has impact? I did not write anything to skip or remove the >> existing aws-java-sdk.jar. >> >> aws-java-sdk.jar is already in the CLASSPATH though, So, the script will >>> need to skip or remove it first. >> >> >> *Shane Johnson | LIFT IQ* >> *Founder | CEO* >> >> *www.liftiq.com <http://www.liftiq.com/>* or *sh...@liftiq.com >> <sh...@liftiq.com>* >> mobile: (801) 360-3350 >> LinkedIn <https://www.linkedin.com/in/shanewjohnson/> | Twitter >> <https://twitter.com/SWaldenJ> | Facebook >> <https://www.facebook.com/shane.johnson.71653> >> >> >> >> On Fri, Mar 9, 2018 at 4:41 PM, Shane Johnson <sh...@liftiq.com> wrote: >> >>> Now that I am able to deploy I reset the buildpack to >>> ...#debug-custom-dist and redeployed. Here is the build log...URL does >>> point to the correct distribution with the edited compute-classpath.sh file. >>> >>> -----> JVM Common app detected >>> >>> -----> Installing JDK 1.8... done >>> >>> -----> PredictionIO app detected >>> >>> -----> Install core components >>> >>> + PredictionIO >>> (https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz) >>> >>> + Spark (spark-2.1.1-bin-hadoop2.7) >>> >>> -----> Install supplemental components >>> >>> + PostgreSQL (JDBC) >>> >>> + S3 HDFS (AWS SDK) >>> >>> + S3 HDFS (Hadoop-AWS) >>> >>> Writing default 'core-site.xml.erb' >>> >>> + local Maven repo from buildpack (contents) >>> >>> -----> Configure PredictionIO >>> >>> Writing default 'pio-env.sh' >>> >>> Writing default 'spark-defaults.conf.erb' >>> >>> + Maven repo from buildpack (build.sbt entry) >>> >>> Set-up environment via '.profile.d/' scripts >>> >>> -----> Install JVM (heroku/jvm-common) >>> >>> -----> PredictionIO engine >>> >>> Quietly logging. (Set `PIO_VERBOSE=true` for detailed build log.) >>> >>> [INFO] [Engine$] Using command >>> '/tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/PredictionIO-dist/sbt/sbt' >>> at >>> /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0 >>> to build. >>> >>> [INFO] [Engine$] If the path above is incorrect, this process will >>> fail. >>> >>> [INFO] [Engine$] Uber JAR disabled. Making sure >>> lib/pio-assembly-0.12.0-incubating.jar is absent. >>> >>> [INFO] [Engine$] Going to run: >>> /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0/PredictionIO-dist/sbt/sbt >>> package assemblyPackageDependency in >>> /tmp/build_67e7942abed821fccc839c9a79faf0eb/lift-iq-score-e92ed3de9212d04972e0e67e68b5407489e0c8d0 >>> >>> [INFO] [Engine$] Compilation finished successfully. >>> >>> [INFO] [Engine$] Looking for an engine... >>> >>> [INFO] [Engine$] Found >>> template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar >>> >>> [INFO] [Engine$] Found >>> template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar >>> >>> [INFO] [Engine$] Build finished successfully. >>> >>> [INFO] [Pio$] Your engine is ready for training. >>> >>> Using default Procfile for engine >>> >>> -----> Discovering process types >>> >>> Procfile declares types -> release, train, web >>> >>> -----> Compressing... >>> >>> Done: 376.7M >>> >>> >>> The release log is below...I am not seeing the >>> */app/PredictionIO-dist/lib/spark/aws-java-sdk.jar >>> *show up at the beginning of the CLASSPATH, this is what we should see >>> correct? I was also manipulating the compute-classpath.sh locally as well, >>> I observed that adding a line right before echo "$CLASSPATH" was not >>> changing what was in the logged spark-submit command as an FYI. This is >>> what I was testing locally... >>> >>> >>> CLASSPATH="*/Users/shanejohnson/Desktop/Apps/liftiq_platform/lift-s* >>> *core/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar* >>> :$CLASSPATH" >>> echo "$CLASSPATH" >>> >>> I did not see any change in the spark-submit command by adding this >>> when building and deploying locally. >>> >>> Release Log with new buildpack ..#debug-custom-dist >>> >>> Running train on release… >>> >>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8 >>> >>> [INFO] [Runner$] Submission command: >>> /app/PredictionIO-dist/vendors/spark-hadoop/bin/spark-submit >>> --driver-memory 13g --class org.apache.predictionio.workflow.CreateWorkflow >>> --jars >>> file:/app/PredictionIO-dist/lib/postgresql_jdbc.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-s3-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-hdfs-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*,file:/app/PredictionIO-dist/lib/spark/pio-data-hbase-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/._pio-data-jdbc-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/hadoop-aws.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hdfs-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar* >>> --files >>> file:/app/PredictionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist/conf/core-site.xml >>> --driver-class-path >>> /app/PredictionIO-dist/conf:/app/PredictionIO-dist/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/PredictionIO-dist/conf >>> --driver-java-options -Dpio.log.dir=/app >>> file:/app/PredictionIO-dist/lib/pio-assembly-0.12.0-incubating.jar >>> --engine-id org.template.liftscoring.LiftScoringEngine --engine-version >>> 0c35eebf403cf91fe77a64921d76aa1ca6411d20 --engine-variant >>> file:/app/engine.json --verbosity 0 --json-extractor Both --env >>> PIO_ENV_LOADED=1,PIO_EVENTSERVER_APP_NAME=classi,PIO_STORAGE_SOURCES_PGSQL_INDEX=enabled,PIO_S3_AWS_ACCESS_KEY_ID=AKIAJJX2S55QPCPZXGFQ,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/app/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_S3_BUCKET_NAME=lift-model-devmaster,PIO_EVENTSERVER_ACCESS_KEY=5954-20848-7512-17427-21660,PIO_HOME=/app/PredictionIO-dist,PIO_FS_ENGINESDIR=/app/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://ec2-52-70-46-243.compute-1.amazonaws.com:5432/dbvbo86hohutvb?sslmode=require,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_SPARK_OPTS=--driver-memory >>> 13g >>> ,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=p5c404ac780ab517d4ab249d7000809b51b4b987fdfb5c26e1bace511130337ac,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/app/PredictionIO-dist/vendors/elasticsearch,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/app/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=ubefhv668b1s1m,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http,PIO_S3_AWS_SECRET_ACCESS_KEY=tQwL1PgYR0Y5MHG+qwVgEXNEcDcdlupaN2oO6JuR,PIO_TRAIN_SPARK_OPTS=--driver-memory >>> 13g >>> ,PIO_STORAGE_SOURCES_PGSQL_CONNECTIONS=8,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/app/PredictionIO-dist/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_PGSQL_PARTITIONS=4,PIO_S3_AWS_REGION=us-east-1 >>> >>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8 >>> >>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8 >>> >>> [INFO] [Engine] Extracting datasource params... >>> >>> [INFO] [Engine] Datasource params: (,DataSourceParams(Some(5))) >>> >>> [INFO] [Engine] Extracting preparator params... >>> >>> [WARN] [WorkflowUtils$] Non-empty parameters supplied to >>> org.template.liftscoring.Preparator, but its constructor does not accept >>> any arguments. Stubbing with empty parameters. >>> >>> [INFO] [Engine] Preparator params: (,Empty) >>> >>> >>> >>> >>> >>> *Shane Johnson | LIFT IQ* >>> *Founder | CEO* >>> >>> *www.liftiq.com <http://www.liftiq.com/>* or *sh...@liftiq.com >>> <sh...@liftiq.com>* >>> mobile: (801) 360-3350 >>> LinkedIn <https://www.linkedin.com/in/shanewjohnson/> | Twitter >>> <https://twitter.com/SWaldenJ> | Facebook >>> <https://www.facebook.com/shane.johnson.71653> >>> >>> >>> >>> On Fri, Mar 9, 2018 at 11:17 AM, Mars Hall <mars.h...@salesforce.com> >>> wrote: >>> >>>> I'm lost as to how such direct manipulation of CLASSPATH is not >>>> appearing in the logged spark-submit command. >>>> >>>> What could cause this!? >>>> >>>> I just pushed a version of the buildpack which should help debug. >>>> Assuming only a single buildpack is assigned to the app, here's how to set >>>> it: >>>> >>>> heroku buildpacks:set >>>> https://github.com/heroku/predictionio-buildpack#debug-custom-dist >>>> >>>> Then redeploy the engine an check the build log for the line: >>>> >>>> + PredictionIO ($URL) >>>> >>>> Please confirm that it is the URL of your custom PredictionIO dist. >>>> >>>> On Fri, Mar 9, 2018 at 2:47 PM, Shane Johnson <sh...@liftiq.com> wrote: >>>> >>>>> Thanks Donald and Mars, >>>>> >>>>> I created a new distribution ( >>>>> <https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz> >>>>> https://s3-us-west-1.amazonaws.com/predictionio/0.12.0-incubating/apache-predictionio-0.12.0-incubating-bin.tar.gz) >>>>> with the added CLASSPATH code and pointed to the distribution with >>>>> the PREDICTIONIO_DIST_URL variable within the engine app in Heroku. >>>>> >>>>> CLASSPATH="/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar >>>>> :$CLASSPATH" >>>>> echo "$CLASSPATH" >>>>> >>>>> It didn't seem to force the aws-java-sdk to load first as I reviewed >>>>> the release logs. Should the aws-java-sdk.jar show up as the first file >>>>> within the --jars section when this is added CLASSPATH=" >>>>> /app/PredictionIO-dist/lib/spark/aws-java-sdk.jar:$CLASSPATH". >>>>> >>>>> I'm still getting the NoSuchMethodError when the *aws-java-sdk.jar* loads >>>>> after the *pio-data-s3-assembly-0.12.0-incubating.jar**. *Do you have >>>>> other suggestions to try? I was also testing locally to change the order >>>>> of >>>>> the --jars but changes to the compute-classpath.sh didn't seem to change >>>>> the order of the jars in the logs. >>>>> >>>>> Running train on release… >>>>> >>>>> Picked up JAVA_TOOL_OPTIONS: -Xmx12g -Dfile.encoding=UTF-8 >>>>> >>>>> [INFO] [Runner$] Submission command: >>>>> /app/PredictionIO-dist/vendors/spark-hadoop/bin/spark-submit >>>>> --driver-memory 13g --class >>>>> org.apache.predictionio.workflow.CreateWorkflow --jars >>>>> file:/app/PredictionIO-dist/lib/postgresql_jdbc.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring-assembly-0.1-SNAPSHOT-deps.jar,file:/app/target/scala-2.11/template-scala-parallel-liftscoring_2.11-0.1-SNAPSHOT.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hdfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-localfs-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-elasticsearch-assembly-0.12.0-incubating.jar,file:/app/PredictionIO-dist/lib/spark/hadoop-aws.jar,file:/app/PredictionIO-dist/lib/spark/pio-data-hbase-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/pio-data-s3-assembly-0.12.0-incubating.jar*,file:/app/PredictionIO-dist/lib/spark/pio-data-jdbc-assembly-0.12.0-incubating.jar,*file:/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar* >>>>> --files >>>>> file:/app/PredictionIO-dist/conf/log4j.properties,file:/app/PredictionIO-dist/conf/core-site.xml >>>>> --driver-class-path >>>>> /app/PredictionIO-dist/conf:/app/PredictionIO-dist/conf:/app/PredictionIO-dist/lib/postgresql_jdbc.jar:/app/PredictionIO-dist/conf >>>>> --driver-java-options -Dpio.log.dir=/app >>>>> file:/app/PredictionIO-dist/lib/pio-assembly-0.12.0-incubating.jar >>>>> --engine-id org.template.liftscoring.LiftScoringEngine --engine-version >>>>> 0c35eebf403cf91fe77a64921d76aa1ca6411d20 --engine-variant >>>>> file:/app/engine.json --verbosity 0 --json-extractor Both --env >>>>> >>>>> >>>>> Error: >>>>> >>>>> Exception in thread "main" java.lang.NoSuchMethodError: >>>>> com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V >>>>> >>>>> at >>>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287) >>>>> >>>>> at >>>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) >>>>> >>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *Shane Johnson | LIFT IQ* >>>>> *Founder | CEO* >>>>> >>>>> *www.liftiq.com <http://www.liftiq.com/>* or *sh...@liftiq.com >>>>> <sh...@liftiq.com>* >>>>> mobile: (801) 360-3350 >>>>> LinkedIn <https://www.linkedin.com/in/shanewjohnson/> | Twitter >>>>> <https://twitter.com/SWaldenJ> | Facebook >>>>> <https://www.facebook.com/shane.johnson.71653> >>>>> >>>>> >>>>> >>>>> On Wed, Mar 7, 2018 at 1:01 PM, Mars Hall <mars.h...@salesforce.com> >>>>> wrote: >>>>> >>>>>> Shane, >>>>>> >>>>>> On Wed, Mar 7, 2018 at 4:49 AM, Shane Johnson <sh...@liftiq.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Re: adding a line to ensure a jar is loaded first. Is this what you >>>>>>> are referring to...(line at the bottom in red)? >>>>>>> >>>>>> >>>>>> >>>>>> I believe the code would need to look like this to effect the output >>>>>> classpath as intended: >>>>>> >>>>>> >>>>>>> CLASSPATH="/app/PredictionIO-dist/lib/spark/aws-java-sdk.jar >>>>>>> :$CLASSPATH" >>>>>>> echo "$CLASSPATH" >>>>>>> >>>>>> >>>>>> >>>>>> aws-java-sdk.jar is already in the CLASSPATH though, So, the script >>>>>> will need to be skip or remove it first. >>>>>> >>>>>> -- >>>>>> *Mars Hall >>>>>> 415-818-7039 <(415)%20818-7039> >>>>>> Customer Facing Architect >>>>>> Salesforce Platform / Heroku >>>>>> San Francisco, California >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Mars Hall >>>> 415-818-7039 <(415)%20818-7039> >>>> Customer Facing Architect >>>> Salesforce Platform / Heroku >>>> San Francisco, California >>>> >>> >>> >> -- > *Mars Hall > 415-818-7039 > Customer Facing Architect > Salesforce Platform / Heroku > San Francisco, California > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California