Hi,

I add similar errors when the event data was not fully  in there as
expected during the spark operations. Double check that your training data
format is compatible with what's described in the DataSource.

Thanks,
Natu

On Fri, Dec 2, 2016 at 1:47 PM, TJ <[email protected]> wrote:

> Thank you for the prompt reply.
>
> interestingly when I import the test events i get a conformation that 1501
> events have been imported. When checking these events I only see perhaps 40
> or so.
>
>
> [INFO] [Console$] Using existing engine manifest JSON at
> /MyTest/manifest.json
> [INFO] [Runner$] Submission command: /home/aml/apache-predictionio-
> 0.10.0-incubating/PredictionIO-0.10.0-incubating/vendors/spark-1.5.
> 1-bin-hadoop2.6/bin/spark-submit --class 
> org.apache.predictionio.workflow.CreateWorkflow
> --jars file:/MyTest/target/scala-2.10/template-scala-parallel-
> recommendation_2.10-0.1-SNAPSHOT.jar,file:/MyTest/
> target/scala-2.10/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar
> --files file:/home/aml/apache-predictionio-0.10.0-
> incubating/PredictionIO-0.10.0-incubating/conf/log4j.properties
> --driver-class-path /home/aml/apache-predictionio-0.10.0-incubating/
> PredictionIO-0.10.0-incubating/conf:/home/aml/apache-predictionio-0.10.0-
> incubating/PredictionIO-0.10.0-incubating/lib/postgresql-9.
> 4-1204.jdbc41.jar:/home/aml/apache-predictionio-0.10.0-
> incubating/PredictionIO-0.10.0-incubating/lib/mysql-connector-java-5.1.37.jar
> file:/home/aml/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.
> 0-incubating/lib/pio-assembly-0.10.0-incubating.jar --engine-id
> 9W0iETr9CtdIIKbR0NeFIgnIbBkU7lld --engine-version
> 7c44394624c51f53e02b09d30efd749a6979b4ac --engine-variant
> file:/MyTest/engine.json --verbosity 0 --json-extractor Both --env
> PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=
> pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_
> SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_
> HBASE_HOME=/home/aml/apache-predictionio-0.10.0-
> incubating/PredictionIO-0.10.0-incubating/vendors/hbase-1.
> 0.0,PIO_HOME=/home/aml/apache-predictionio-0.10.0-
> incubating/PredictionIO-0.10.0-incubating,PIO_FS_
> ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:
> postgresql://localhost/pio,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=
> elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_
> REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_
> REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_
> PGSQL_PASSWORD=pio,PIO_STORAGE_SOURCES_ELASTICSEARCH_
> HOME=/home/aml/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.
> 0-incubating/vendors/elasticsearch-1.4.4,PIO_STORAGE_SOURCES_PGSQL_TYPE=
> jdbc,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_SOURCES_
> PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_
> STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/home/aml/apache-
> predictionio-0.10.0-incubating/PredictionIO-0.10.
> 0-incubating/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
> [INFO] [Engine] Extracting datasource params...
> [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be
> used.
> [INFO] [Engine] Datasource params: (,DataSourceParams(MyTestApp,None))
> [INFO] [Engine] Extracting preparator params...
> [INFO] [Engine] Preparator params: (,Empty)
> [INFO] [Engine] Extracting serving params...
> [INFO] [Engine] Serving params: (,Empty)
> [INFO] [Remoting] Starting remoting
> [INFO] [Remoting] Remoting started; listening on addresses :[
> akka.tcp://[email protected]:52419]
> [WARN] [MetricsSystem] Using default name DAGScheduler for source because
> spark.app.id is not set.
> [INFO] [Engine$] EngineWorkflow.train
> [INFO] [Engine$] DataSource: com.iqchef.DataSource@5dfe23e8
> [INFO] [Engine$] Preparator: com.iqchef.Preparator@1989e8c6
> [INFO] [Engine$] AlgorithmList: List(com.iqchef.ALSAlgorithm@67d32a54)
> [INFO] [Engine$] Data sanity check is on.
> [INFO] [Engine$] com.iqchef.TrainingData does not support data sanity
> check. Skipping check.
> [INFO] [Engine$] com.iqchef.PreparedData does not support data sanity
> check. Skipping check.
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task serialization failed: java.lang.reflect.
> InvocationTargetException
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> org.apache.spark.io.CompressionCodec$.createCodec(
> CompressionCodec.scala:67)
> org.apache.spark.io.CompressionCodec$.createCodec(
> CompressionCodec.scala:60)
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$
> TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
> org.apache.spark.broadcast.TorrentBroadcast.<init>(
> TorrentBroadcast.scala:80)
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(
> TorrentBroadcastFactory.scala:34)
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(
> BroadcastManager.scala:63)
> org.apache.spark.SparkContext.broadcast(SparkContext.scala:1327)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:861)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:772)
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(
> DAGScheduler.scala:757)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1466)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1458)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1447)
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1283)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> DAGScheduler.scala:1271)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
> DAGScheduler.scala:1270)
> at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1270)
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:871)
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:772)
> at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(
> DAGScheduler.scala:757)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1466)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1458)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1447)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848)
> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1298)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:147)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:108)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
> at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
> at com.iqchef.ALSAlgorithm.train(ALSAlgorithm.scala:35)
> at com.iqchef.ALSAlgorithm.train(ALSAlgorithm.scala:22)
> at org.apache.predictionio.controller.PAlgorithm.
> trainBase(PAlgorithm.scala:50)
> at org.apache.predictionio.controller.Engine$$anonfun$18.
> apply(Engine.scala:692)
> at org.apache.predictionio.controller.Engine$$anonfun$18.
> apply(Engine.scala:692)
> at scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:244)
> at scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.predictionio.controller.Engine$.train(Engine.scala:692)
> at org.apache.predictionio.controller.Engine.train(Engine.scala:177)
> at org.apache.predictionio.workflow.CoreWorkflow$.
> runTrain(CoreWorkflow.scala:67)
> at org.apache.predictionio.workflow.CreateWorkflow$.main(
> CreateWorkflow.scala:250)
> at org.apache.predictionio.workflow.CreateWorkflow.main(
> CreateWorkflow.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.spark.io.CompressionCodec$.createCodec(
> CompressionCodec.scala:67)
> at org.apache.spark.io.CompressionCodec$.createCodec(
> CompressionCodec.scala:60)
> at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$
> TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
> at org.apache.spark.broadcast.TorrentBroadcast.<init>(
> TorrentBroadcast.scala:80)
> at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(
> TorrentBroadcastFactory.scala:34)
> at org.apache.spark.broadcast.BroadcastManager.newBroadcast(
> BroadcastManager.scala:63)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1327)
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:861)
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:772)
> at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(
> DAGScheduler.scala:757)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1466)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1458)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1447)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Caused by: java.lang.IllegalArgumentException: java.lang.UnsatisfiedLinkError:
> /tmp/snappy-unknown-831510ee-61fb-4c34-b3cb-12befb6e94df-libsnappyjava.so:
> /tmp/snappy-unknown-831510ee-61fb-4c34-b3cb-12befb6e94df-libsnappyjava.so:
> failed to map segment from shared object: Operation not permitted
> at org.apache.spark.io.SnappyCompressionCodec.<init>(
> CompressionCodec.scala:151)
> ... 18 more
> Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-unknown-831510ee-
> 61fb-4c34-b3cb-12befb6e94df-libsnappyjava.so:
> /tmp/snappy-unknown-831510ee-61fb-4c34-b3cb-12befb6e94df-libsnappyjava.so:
> failed to map segment from shared object: Operation not permitted
> at java.lang.ClassLoader$NativeLibrary.load(Native Method)
> at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
> at java.lang.Runtime.load0(Runtime.java:809)
> at java.lang.System.load(System.java:1086)
> at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:166)
> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:145)
> at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
> at org.apache.spark.io.SnappyCompressionCodec.<init>(
> CompressionCodec.scala:149)
> ... 18 more
> root@server1 [/MyTest]#
>
> On 2 Dec 2016, at 12:21, Harsh Mathur <[email protected]> wrote:
>
> Can you paste the exception? Also make sure there is minimum one event for
> the primary event.
>
> Regards
> Harsh Mathur
> [email protected]
>
> *“Perseverance is the hard work you do after you get tired of doing the
> hard work you already did."*
>
> On Fri, Dec 2, 2016 at 4:41 PM, TJ <[email protected]> wrote:
>
>> Hi,
>>
>> I am a complete newbie with this software and after nearly a week now of
>> trying to install prediction incubating and its dependencies I have finally
>> got to the stage where I can see the ‘pio status’
>>
>> However, I now get an error when running ‘pio train’
>>
>>
>>
>> [INFO] [Engine] Datasource params: (,DataSourceParams(MyTestApp,None))
>> [INFO] [Engine] Extracting preparator params...
>> [INFO] [Engine] Preparator params: (,Empty)
>> [INFO] [Engine] Extracting serving params...
>> [INFO] [Engine] Serving params: (,Empty)
>> [INFO] [Remoting] Starting remoting
>> [INFO] [Remoting] Remoting started; listening on addresses :[
>> akka.tcp://[email protected]:33115]
>> [WARN] [MetricsSystem] Using default name DAGScheduler for source because
>> spark.app.id is not set.
>> [INFO] [Engine$] EngineWorkflow.train
>> [INFO] [Engine$] DataSource: com.iqchef.DataSource@5dfe23e8
>> [INFO] [Engine$] Preparator: com.iqchef.Preparator@1989e8c6
>> [INFO] [Engine$] AlgorithmList: List(com.iqchef.ALSAlgorithm@67d32a54)
>> [INFO] [Engine$] Data sanity check is on.
>> [INFO] [Engine$] com.iqchef.TrainingData does not support data sanity
>> check. Skipping check.
>> [INFO] [Engine$] com.iqchef.PreparedData does not support data sanity
>> check. Skipping check.
>> Exception in thread “main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task serialization failed:
>> java.lang.reflect.InvocationTargetException
>>
>>
>> Could somebody point me in the right direction as to how to fix this
>> problem?
>>
>> My pio status result is as follows
>>
>> root@server1 [/MyTest]# pio status
>> [INFO] [Console$] Inspecting PredictionIO...
>> [INFO] [Console$] PredictionIO 0.10.0-incubating is installed at
>> /home/aml/apache-predictionio-0.10.0-incubating/PredictionIO
>> -0.10.0-incubating
>> [INFO] [Console$] Inspecting Apache Spark...
>> [INFO] [Console$] Apache Spark is installed at
>> /home/aml/apache-predictionio-0.10.0-incubating/PredictionIO
>> -0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6
>> [INFO] [Console$] Apache Spark 1.5.1 detected (meets minimum requirement
>> of 1.3.0)
>> [INFO] [Console$] Inspecting storage backend connections...
>> [INFO] [Storage$] Verifying Meta Data Backend (Source: PGSQL)...
>> [INFO] [Storage$] Verifying Model Data Backend (Source: PGSQL)...
>> [INFO] [Storage$] Verifying Event Data Backend (Source: PGSQL)...
>> [INFO] [Storage$] Test writing to Event Store (App Id 0)...
>> [INFO] [Console$] (sleeping 5 seconds for all messages to show up...)
>> [INFO] [Console$] Your system is all ready to go.
>>
>> Many thanks in advance.
>>
>> /Mike
>>
>
>
>

Reply via email to