This is probably a timing issue in the integration test, which has to wait for `pio deploy` to finish before the queries can be made. If it doesn’t finish the queries will fail. By the time the rest of the test quits the model has been deployed so you can run queries. In the integration-test script increase the delay after `pio deploy…` and see if it passes then.
This is probably an integrtion-test script problem not a problem in the system On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <no...@vicomtech.org> wrote: Pat, I have run the integration test for the handmade example out of curiosity. Strangely enough things go more or less as expected apart from the fact that I get a message saying: ... [INFO] [CoreWorkflow$] Updating engine instance [INFO] [CoreWorkflow$] Training completed successfully. Model will remain deployed after this test Waiting 30 seconds for the server to start nohup: redirecting stderr to stdout % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8000: Connection refused So the integration test does not manage to get the recommendations even though the model trained and deployed successfully. However, as soon as the integration test finishes, on the same terminal, I can get the recommendations by doing the following: $ curl -H "Content-Type: application/json" -d ' > { > "user": "u1" > }' http://localhost:8000/queries.json <http://localhost:8000/queries.json> {"itemScores":[{"item":"Nexus","score":0.057719700038433075},{"item":"Surface","score":0.0}]} Isn't this odd? Can you guess what's going on? Thank you very much for all your support! noelia On 5 October 2017 at 19:22, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: Ok, that config should work. Does the integration test pass? The data you are using is extremely small and though it does look like it has cooccurrences, they may not meet minimum “big-data” thresholds used by default. Try adding more data or use the handmade example data, rename purchase to view and discard the existing view data if you wish. The error is very odd and I’ve never seen it. If the integration test works I can only surmise it's your data. On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <no...@vicomtech.org <mailto:no...@vicomtech.org>> wrote: SPARK: spark-1.6.3-bin-hadoop2.6 PIO: 0.11.0-incubating Scala: whatever gets installed when installing PIO 0.11.0-incubating, I haven't installed Scala separately UR: ActionML's UR v0.6.0 I suppose as that's the last version mentioned in the readme file. I have attached the UR zip file I downloaded from the actionml github account. Thank you for your help!! On 4 October 2017 at 17:20, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: What version of Scala. Spark, PIO, and UR are you using? On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <no...@vicomtech.org <mailto:no...@vicomtech.org>> wrote: Hi all, I'm still trying to create a very simple app to learn to use PredictionIO and still having trouble. I have done pio build no problem. But when I do pio train I get a very long error message related to serialisation (error message copied below). pio status reports system is all ready to go. The app I'm trying to build is very simple, it only has 'view' events. Here's the engine.json: =========================================================== { "comment":" This config file uses default settings for all but the required values see README.md for docs", "id": "default", "description": "Default settings", "engineFactory": "com.actionml.RecommendationEngine", "datasource": { "params" : { "name": "tiny_app_data.csv", "appName": "TinyApp", "eventNames": ["view"] } }, "algorithms": [ { "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames", "name": "ur", "params": { "appName": "TinyApp", "indexName": "urindex", "typeName": "items", "comment": "must have data for the first event or the model will not build, other events are optional", "eventNames": ["view"] } } ] } =========================================================== The data I'm using is: "u1","i1" "u2","i1" "u2","i2" "u3","i2" "u3","i3" "u4","i4" meaning user u viewed item i. The data has been added to the database with the following python code: =========================================================== """ Import sample data for recommendation engine """ import predictionio import argparse import random RATE_ACTIONS_DELIMITER = "," SEED = 1 def import_events(client, file): f = open(file, 'r') random.seed(SEED) count = 0 print "Importing data..." items = [] users = [] f = open(file, 'r') for line in f: data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER) users.append(data[0]) items.append(data[1]) client.create_event( event="view", entity_type="user", entity_id=data[0], target_entity_type="item", target_entity_id=data[1] ) print "Event: " + "view" + " entity_id: " + data[0] + " target_entity_id: " + data[1] count += 1 f.close() users = set(users) items = set(items) print "All users: " + str(users) print "All items: " + str(items) for item in items: client.create_event( event="$set", entity_type="item", entity_id=item ) count += 1 print "%s events are imported." % count if __name__ == '__main__': parser = argparse.ArgumentParser( description="Import sample data for recommendation engine") parser.add_argument('--access_key', default='invald_access_key') parser.add_argument('--url', default="http://localhost:7070 <http://localhost:7070/>") parser.add_argument('--file', default="./data/tiny_app_data.csv") args = parser.parse_args() print args client = predictionio.EventClient( access_key=args.access_key, url=args.url, threads=5, qsize=500) import_events(client, args.file) =========================================================== My pio_env.sh is the following: =========================================================== #!/usr/bin/env bash # # Copy this file as pio-env.sh and edit it for your site's configuration. # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 <http://www.apache.org/licenses/LICENSE-2.0> # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # PredictionIO Main Configuration # # This section controls core behavior of PredictionIO. It is very likely that # you need to change these to fit your site. # SPARK_HOME: Apache Spark is a hard dependency and must be configured. # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7 SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6 POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar # ES_CONF_DIR: You must configure this if you have advanced configuration for # your Elasticsearch setup. # ES_CONF_DIR=/opt/elasticsearch #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6 # HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO # with Hadoop 2. # HADOOP_CONF_DIR=/opt/hadoop # HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO # with HBase on a remote cluster. # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf # Filesystem paths where PredictionIO uses as block storage. PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp # PredictionIO Storage Configuration # # This section controls programs that make use of PredictionIO's built-in # storage facilities. Default values are shown below. # # For more information on storage configuration please refer to # http://predictionio.incubator.apache.org/system/anotherdatastore/ <http://predictionio.incubator.apache.org/system/anotherdatastore/> # Storage Repositories # Default is to use PostgreSQL PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS # Storage Data Sources # PostgreSQL Default Settings # Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio <> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio # MySQL Example # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio <> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio # Elasticsearch Example # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.2.1 # Elasticsearch 1.x Example PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.7.6 # Local File System Example PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models # HBase Example PIO_STORAGE_SOURCES_HBASE_TYPE=hbase PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6 =========================================================== Error message: =========================================================== [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector Serialization stack: - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value: {3:1.0,2:1.0}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector Serialization stack: - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value: {0:1.0,3:1.0}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector Serialization stack: - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value: {1:1.0}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (1,{1:1.0})); not retrying [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector Serialization stack: - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value: {0:1.0}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{0:1.0})); not retrying Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not serializable result: org.apache.mahout.math.RandomAccessSparseVector Serialization stack: - object not serializable (class: org.apache.mahout.math.RandomAccessSparseVector, value: {3:1.0,2:1.0}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (2,{3:1.0,2:1.0})) at org.apache.spark.scheduler.DAGScheduler.org <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.fold(RDD.scala:1082) at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com <http://s.drm.checkpointeddrmspark.com/>puteNRow(CheckpointedDrmSpark.scala:188) at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow$lzycompute(CheckpointedDrmSpark.scala:55) at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow(CheckpointedDrmSpark.scala:55) at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.newRowCardinality(CheckpointedDrmSpark.scala:219) at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213) at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71) at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at com.actionml.Preparator.prepare(Preparator.scala:49) at com.actionml.Preparator.prepare(Preparator.scala:32) at org.apache.predictionio.controller.PPreparator.prepareBase(PPreparator.scala:37) at org.apache.predictionio.controller.Engine$.train(Engine.scala:671) at org.apache.predictionio.controller.Engine.train(Engine.scala:177) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) =========================================================== Thank you all for your help. Best regards, noelia -- <http://www.vicomtech.org/> Noelia Osés Fernández, PhD Senior Researcher | Investigadora Senior no...@vicomtech.org <mailto:no...@vicomtech.org> +[34] 943 30 92 30 Data Intelligence for Energy and Industrial Processes | Inteligencia de Datos para Energía y Procesos Industriales <https://www.linkedin.com/company/vicomtech> <https://www.youtube.com/user/VICOMTech> <https://twitter.com/@Vicomtech_IK4> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/> Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos> -- <http://www.vicomtech.org/> Noelia Osés Fernández, PhD Senior Researcher | Investigadora Senior no...@vicomtech.org <mailto:no...@vicomtech.org> +[34] 943 30 92 30 Data Intelligence for Energy and Industrial Processes | Inteligencia de Datos para Energía y Procesos Industriales <https://www.linkedin.com/company/vicomtech> <https://www.youtube.com/user/VICOMTech> <https://twitter.com/@Vicomtech_IK4> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/> Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos> -- You received this message because you are subscribed to the Google Groups "actionml-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to actionml-user+unsubscr...@googlegroups.com <mailto:actionml-user+unsubscr...@googlegroups.com>. To post to this group, send email to actionml-u...@googlegroups.com <mailto:actionml-u...@googlegroups.com>. To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZZc_ygpB8k%2Bmjq4A%40mail.gmail.com <https://groups.google.com/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZZc_ygpB8k%2Bmjq4A%40mail.gmail.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.