Pat,
I have run the integration test for the handmade example out of curiosity.
Strangely enough things go more or less as expected apart from the fact
that I get a message saying:
*...[INFO] [CoreWorkflow$] Updating engine instance[INFO] [CoreWorkflow$]
Training completed successfully.Model will remain deployed after this
testWaiting 30 seconds for the server to startnohup: redirecting stderr to
stdout % Total % Received % Xferd Average Speed Time Time
Time Current Dload Upload Total
Spent Left Speed 0 0 0 0 0 0 0 0 --:--:--
--:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8000:
Connection refused*
So the integration test does not manage to get the recommendations even
though the model trained and deployed successfully. However, as soon as the
integration test finishes, on the same terminal, I can get the
recommendations by doing the following:
$ curl -H "Content-Type: application/json" -d '
> {
> "user": "u1"
> }' http://localhost:8000/queries.json
{"itemScores":[{"item":"Nexus","score":0.057719700038433075},{"item":"Surface","score":0.0}]}
Isn't this odd? Can you guess what's going on?
Thank you very much for all your support!
noelia
On 5 October 2017 at 19:22, Pat Ferrel <[email protected]> wrote:
> Ok, that config should work. Does the integration test pass?
>
> The data you are using is extremely small and though it does look like it
> has cooccurrences, they may not meet minimum “big-data” thresholds used by
> default. Try adding more data or use the handmade example data, rename
> purchase to view and discard the existing view data if you wish.
>
> The error is very odd and I’ve never seen it. If the integration test
> works I can only surmise it's your data.
>
>
> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <[email protected]>
> wrote:
>
> SPARK: spark-1.6.3-bin-hadoop2.6
>
> PIO: 0.11.0-incubating
>
> Scala: whatever gets installed when installing PIO 0.11.0-incubating, I
> haven't installed Scala separately
>
> UR: ActionML's UR v0.6.0 I suppose as that's the last version mentioned in
> the readme file. I have attached the UR zip file I downloaded from the
> actionml github account.
>
> Thank you for your help!!
>
> On 4 October 2017 at 17:20, Pat Ferrel <[email protected]> wrote:
>
>> What version of Scala. Spark, PIO, and UR are you using?
>>
>>
>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <[email protected]>
>> wrote:
>>
>> Hi all,
>>
>> I'm still trying to create a very simple app to learn to use PredictionIO
>> and still having trouble. I have done pio build no problem. But when I do
>> pio train I get a very long error message related to serialisation (error
>> message copied below).
>>
>> pio status reports system is all ready to go.
>>
>> The app I'm trying to build is very simple, it only has 'view' events.
>> Here's the engine.json:
>>
>> *===========================================================*
>> {
>> "comment":" This config file uses default settings for all but the
>> required values see README.md for docs",
>> "id": "default",
>> "description": "Default settings",
>> "engineFactory": "com.actionml.RecommendationEngine",
>> "datasource": {
>> "params" : {
>> "name": "tiny_app_data.csv",
>> "appName": "TinyApp",
>> "eventNames": ["view"]
>> }
>> },
>> "algorithms": [
>> {
>> "comment": "simplest setup where all values are default, popularity
>> based backfill, must add eventsNames",
>> "name": "ur",
>> "params": {
>> "appName": "TinyApp",
>> "indexName": "urindex",
>> "typeName": "items",
>> "comment": "must have data for the first event or the model will
>> not build, other events are optional",
>> "eventNames": ["view"]
>> }
>> }
>> ]
>> }
>> *===========================================================*
>>
>> The data I'm using is:
>>
>> "u1","i1"
>> "u2","i1"
>> "u2","i2"
>> "u3","i2"
>> "u3","i3"
>> "u4","i4"
>>
>> meaning user u viewed item i.
>>
>> The data has been added to the database with the following python code:
>>
>> *===========================================================*
>> """
>> Import sample data for recommendation engine
>> """
>>
>> import predictionio
>> import argparse
>> import random
>>
>> RATE_ACTIONS_DELIMITER = ","
>> SEED = 1
>>
>>
>> def import_events(client, file):
>> f = open(file, 'r')
>> random.seed(SEED)
>> count = 0
>> print "Importing data..."
>>
>> items = []
>> users = []
>> f = open(file, 'r')
>> for line in f:
>> data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
>> users.append(data[0])
>> items.append(data[1])
>> client.create_event(
>> event="view",
>> entity_type="user",
>> entity_id=data[0],
>> target_entity_type="item",
>> target_entity_id=data[1]
>> )
>> print "Event: " + "view" + " entity_id: " + data[0] + "
>> target_entity_id: " + data[1]
>> count += 1
>> f.close()
>>
>> users = set(users)
>> items = set(items)
>> print "All users: " + str(users)
>> print "All items: " + str(items)
>> for item in items:
>> client.create_event(
>> event="$set",
>> entity_type="item",
>> entity_id=item
>> )
>> count += 1
>>
>>
>> print "%s events are imported." % count
>>
>>
>> if __name__ == '__main__':
>> parser = argparse.ArgumentParser(
>> description="Import sample data for recommendation engine")
>> parser.add_argument('--access_key', default='invald_access_key')
>> parser.add_argument('--url', default="http://localhost:7070")
>> parser.add_argument('--file', default="./data/tiny_app_data.csv")
>>
>> args = parser.parse_args()
>> print args
>>
>> client = predictionio.EventClient(
>> access_key=args.access_key,
>> url=args.url,
>> threads=5,
>> qsize=500)
>> import_events(client, args.file)
>> *===========================================================*
>>
>> My pio_env.sh is the following:
>>
>> *===========================================================*
>> #!/usr/bin/env bash
>> #
>> # Copy this file as pio-env.sh and edit it for your site's configuration.
>> #
>> # Licensed to the Apache Software Foundation (ASF) under one or more
>> # contributor license agreements. See the NOTICE file distributed with
>> # this work for additional information regarding copyright ownership.
>> # The ASF licenses this file to You under the Apache License, Version 2.0
>> # (the "License"); you may not use this file except in compliance with
>> # the License. You may obtain a copy of the License at
>> #
>> # http://www.apache.org/licenses/LICENSE-2.0
>> #
>> # Unless required by applicable law or agreed to in writing, software
>> # distributed under the License is distributed on an "AS IS" BASIS,
>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> # See the License for the specific language governing permissions and
>> # limitations under the License.
>> #
>>
>> # PredictionIO Main Configuration
>> #
>> # This section controls core behavior of PredictionIO. It is very likely
>> that
>> # you need to change these to fit your site.
>>
>> # SPARK_HOME: Apache Spark is a hard dependency and must be configured.
>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6
>>
>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
>>
>> # ES_CONF_DIR: You must configure this if you have advanced configuration
>> for
>> # your Elasticsearch setup.
>> # ES_CONF_DIR=/opt/elasticsearch
>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6
>>
>> # HADOOP_CONF_DIR: You must configure this if you intend to run
>> PredictionIO
>> # with Hadoop 2.
>> # HADOOP_CONF_DIR=/opt/hadoop
>>
>> # HBASE_CONF_DIR: You must configure this if you intend to run
>> PredictionIO
>> # with HBase on a remote cluster.
>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
>>
>> # Filesystem paths where PredictionIO uses as block storage.
>> PIO_FS_BASEDIR=$HOME/.pio_store
>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>
>> # PredictionIO Storage Configuration
>> #
>> # This section controls programs that make use of PredictionIO's built-in
>> # storage facilities. Default values are shown below.
>> #
>> # For more information on storage configuration please refer to
>> # http://predictionio.incubator.apache.org/system/anotherdatastore/
>>
>> # Storage Repositories
>>
>> # Default is to use PostgreSQL
>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>
>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>
>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>
>> # Storage Data Sources
>>
>> # PostgreSQL Default Settings
>> # Please change "pio" to your database name in
>> PIO_STORAGE_SOURCES_PGSQL_URL
>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
>>
>> # MySQL Example
>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
>>
>> # Elasticsearch Example
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/
>> elasticsearch-5.2.1
>> # Elasticsearch 1.x Example
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/
>> elasticsearch-1.7.6
>>
>> # Local File System Example
>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>
>> # HBase Example
>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
>>
>>
>> *===========================================================Error
>> message:*
>>
>> *===========================================================*
>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>> Serialization stack:
>> - object not serializable (class:
>> org.apache.mahout.math.RandomAccessSparseVector,
>> value: {3:1.0,2:1.0})
>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying
>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not
>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>> Serialization stack:
>> - object not serializable (class:
>> org.apache.mahout.math.RandomAccessSparseVector,
>> value: {0:1.0,3:1.0})
>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
>> - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying
>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>> Serialization stack:
>> - object not serializable (class:
>> org.apache.mahout.math.RandomAccessSparseVector,
>> value: {1:1.0})
>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
>> - object (class scala.Tuple2, (1,{1:1.0})); not retrying
>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not
>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>> Serialization stack:
>> - object not serializable (class:
>> org.apache.mahout.math.RandomAccessSparseVector,
>> value: {0:1.0})
>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
>> - object (class scala.Tuple2, (0,{0:1.0})); not retrying
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not
>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>> Serialization stack:
>> - object not serializable (class:
>> org.apache.mahout.math.RandomAccessSparseVector,
>> value: {3:1.0,2:1.0})
>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
>> at org.apache.spark.scheduler.DAGScheduler.org
>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$sch
>> eduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>> 1.apply(DAGScheduler.scala:1419)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>> 1.apply(DAGScheduler.scala:1418)
>> at scala.collection.mutable.ResizableArray$class.foreach(Resiza
>> bleArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>> ler.scala:1418)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>> etFailed$1.apply(DAGScheduler.scala:799)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>> etFailed$1.apply(DAGScheduler.scala:799)
>> at scala.Option.foreach(Option.scala:236)
>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>> DAGScheduler.scala:799)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>> Receive(DAGScheduler.scala:1640)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1599)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1588)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
>> scala:620)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
>> at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:150)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:111)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>> at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com
>> puteNRow(CheckpointedDrmSpark.scala:188)
>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.
>> nrow$lzycompute(CheckpointedDrmSpark.scala:55)
>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.
>> nrow(CheckpointedDrmSpark.scala:55)
>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new
>> RowCardinality(CheckpointedDrmSpark.scala:219)
>> at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213)
>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71)
>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49)
>> at scala.collection.TraversableLike$$anonfun$map$1.apply(
>> TraversableLike.scala:244)
>> at scala.collection.TraversableLike$$anonfun$map$1.apply(
>> TraversableLike.scala:244)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at scala.collection.TraversableLike$class.map(TraversableLike.
>> scala:244)
>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>> at com.actionml.Preparator.prepare(Preparator.scala:49)
>> at com.actionml.Preparator.prepare(Preparator.scala:32)
>> at org.apache.predictionio.controller.PPreparator.prepareBase(
>> PPreparator.scala:37)
>> at org.apache.predictionio.controller.Engine$.train(Engine.scala:671)
>> at org.apache.predictionio.controller.Engine.train(Engine.scala:177)
>> at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(
>> CoreWorkflow.scala:67)
>> at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>> Workflow.scala:250)
>> at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>> orkflow.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:731)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:181)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> *===========================================================*
>> Thank you all for your help.
>>
>> Best regards,
>> noelia
>>
>>
>
>
> --
> <http://www.vicomtech.org/>
>
> Noelia Osés Fernández, PhD
> Senior Researcher |
> Investigadora Senior
>
> [email protected]
> +[34] 943 30 92 30
> Data Intelligence for Energy and
> Industrial Processes | Inteligencia
> de Datos para Energía y Procesos
> Industriales
>
> <https://www.linkedin.com/company/vicomtech>
> <https://www.youtube.com/user/VICOMTech>
> <https://twitter.com/@Vicomtech_IK4>
>
> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/>
>
> Legal Notice - Privacy policy
> <http://www.vicomtech.org/en/proteccion-datos>
>
>
>
>
--
<http://www.vicomtech.org>
Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior
[email protected]
+[34] 943 30 92 30
Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales
<https://www.linkedin.com/company/vicomtech>
<https://www.youtube.com/user/VICOMTech>
<https://twitter.com/@Vicomtech_IK4>
member of: <http://www.graphicsmedia.net/> <http://www.ik4.es>
Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>