Re: [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 had a not serializable result

Pat Ferrel Sun, 15 Oct 2017 10:10:49 -0700

This is probably a timing issue in the integration test, which has to wait for 
`pio deploy` to finish before the queries can be made. If it doesn’t finish the 
queries will fail. By the time the rest of the test quits the model has been 
deployed so you can run queries. In the integration-test script increase the 
delay after `pio deploy…` and see if it passes then.


This is probably an integrtion-test script problem not a problem in the system



On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <[email protected]> wrote:

Pat,

I have run the integration test for the handmade example out of curiosity. 
Strangely enough things go more or less as expected apart from the fact that I 
get a message saying:

...
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
Model will remain deployed after this test
Waiting 30 seconds for the server to start
nohup: redirecting stderr to stdout
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     
0curl: (7) Failed to connect to localhost port 8000: Connection refused

So the integration test does not manage to get the recommendations even though 
the model trained and deployed successfully. However, as soon as the 
integration test finishes, on the same terminal, I can get the recommendations 
by doing the following:

$ curl -H "Content-Type: application/json" -d '
> {
>     "user": "u1"
> }' http://localhost:8000/queries.json <http://localhost:8000/queries.json>
{"itemScores":[{"item":"Nexus","score":0.057719700038433075},{"item":"Surface","score":0.0}]}

Isn't this odd? Can you guess what's going on?

Thank you very much for all your support!
noelia



On 5 October 2017 at 19:22, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
Ok, that config should work. Does the integration test pass?

The data you are using is extremely small and though it does look like it has 
cooccurrences, they may not meet minimum “big-data” thresholds used by default. 
Try adding more data or use the handmade example data, rename purchase to view 
and discard the existing view data if you wish.

The error is very odd and I’ve never seen it. If the integration test works I 
can only surmise it's your data.


On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <[email protected] 
<mailto:[email protected]>> wrote:

SPARK: spark-1.6.3-bin-hadoop2.6

PIO: 0.11.0-incubating

Scala: whatever gets installed when installing PIO 0.11.0-incubating, I haven't 
installed Scala separately

UR: ActionML's UR v0.6.0 I suppose as that's the last version mentioned in the 
readme file. I have attached the UR zip file I downloaded from the actionml 
github account.

Thank you for your help!!

On 4 October 2017 at 17:20, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
What version of Scala. Spark, PIO, and UR are you using?


On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <[email protected] 
<mailto:[email protected]>> wrote:

Hi all,

I'm still trying to create a very simple app to learn to use PredictionIO and 
still having trouble. I have done pio build no problem. But when I do pio train 
I get a very long error message related to serialisation (error message copied 
below).

pio status reports system is all ready to go.

The app I'm trying to build is very simple, it only has 'view' events. Here's 
the engine.json:

===========================================================
{
  "comment":" This config file uses default settings for all but the required 
values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "com.actionml.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "tiny_app_data.csv",
      "appName": "TinyApp",
      "eventNames": ["view"]
    }
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based 
backfill, must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "TinyApp",
        "indexName": "urindex",
        "typeName": "items",
        "comment": "must have data for the first event or the model will not 
build, other events are optional",
        "eventNames": ["view"]
      }
    }
  ]
}
===========================================================

The data I'm using is:

"u1","i1"
"u2","i1"
"u2","i2"
"u3","i2"
"u3","i3"
"u4","i4"

meaning user u viewed item i.

The data has been added to the database with the following python code:

===========================================================
"""
Import sample data for recommendation engine
"""

import predictionio
import argparse
import random

RATE_ACTIONS_DELIMITER = ","
SEED = 1


def import_events(client, file):
  f = open(file, 'r')
  random.seed(SEED)
  count = 0
  print "Importing data..."

  items = []
  users = []
  f = open(file, 'r')
  for line in f:
    data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
    users.append(data[0])
    items.append(data[1])
    client.create_event(
      event="view",
      entity_type="user",
      entity_id=data[0],
      target_entity_type="item",
      target_entity_id=data[1]
    )
    print "Event: " + "view" + " entity_id: " + data[0] + " target_entity_id: " 
+ data[1]
    count += 1
  f.close()

  users = set(users)
  items = set(items)
  print "All users: " + str(users)
  print "All items: " + str(items)
  for item in items:
    client.create_event(
      event="$set",
      entity_type="item",
      entity_id=item
    )
    count += 1


  print "%s events are imported." % count


if __name__ == '__main__':
  parser = argparse.ArgumentParser(
    description="Import sample data for recommendation engine")
  parser.add_argument('--access_key', default='invald_access_key')
  parser.add_argument('--url', default="http://localhost:7070 
<http://localhost:7070/>")
  parser.add_argument('--file', default="./data/tiny_app_data.csv")

  args = parser.parse_args()
  print args

  client = predictionio.EventClient(
    access_key=args.access_key,
    url=args.url,
    threads=5,
    qsize=500)
  import_events(client, args.file)
===========================================================

My pio_env.sh is the following:

===========================================================
#!/usr/bin/env bash
#
# Copy this file as pio-env.sh and edit it for your site's configuration.
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0 
<http://www.apache.org/licenses/LICENSE-2.0>
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.

# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6

POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar

# ES_CONF_DIR: You must configure this if you have advanced configuration for
#              your Elasticsearch setup.
# ES_CONF_DIR=/opt/elasticsearch
#ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6

# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
#                  with Hadoop 2.
# HADOOP_CONF_DIR=/opt/hadoop

# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
#                 with HBase on a remote cluster.
# HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.incubator.apache.org/system/anotherdatastore/ 
<http://predictionio.incubator.apache.org/system/anotherdatastore/>

# Storage Repositories

# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

# Storage Data Sources

# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio <>
PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio

# MySQL Example
# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio <>
# PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio

# Elasticsearch Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
# PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.2.1
# Elasticsearch 1.x Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.7.6

# Local File System Example
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models

# HBase Example
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
===========================================================

Error message:

===========================================================
[ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not serializable 
result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: 
org.apache.mahout.math.RandomAccessSparseVector, value: {3:1.0,2:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying
[ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not serializable 
result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: 
org.apache.mahout.math.RandomAccessSparseVector, value: {0:1.0,3:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying
[ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not serializable 
result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: 
org.apache.mahout.math.RandomAccessSparseVector, value: {1:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (1,{1:1.0})); not retrying
[ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not serializable 
result: org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: 
org.apache.mahout.math.RandomAccessSparseVector, value: {0:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (0,{0:1.0})); not retrying
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 2.0 in stage 10.0 (TID 24) had a not serializable result: 
org.apache.mahout.math.RandomAccessSparseVector
Serialization stack:
    - object not serializable (class: 
org.apache.mahout.math.RandomAccessSparseVector, value: {3:1.0,2:1.0})
    - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
    - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
    at org.apache.spark.scheduler.DAGScheduler.org 
<http://org.apache.spark.scheduler.dagscheduler.org/>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
    at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at scala.Option.foreach(Option.scala:236)
    at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
    at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088)
    at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
    at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com 
<http://s.drm.checkpointeddrmspark.com/>puteNRow(CheckpointedDrmSpark.scala:188)
    at 
org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow$lzycompute(CheckpointedDrmSpark.scala:55)
    at 
org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nrow(CheckpointedDrmSpark.scala:55)
    at 
org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.newRowCardinality(CheckpointedDrmSpark.scala:219)
    at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213)
    at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71)
    at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49)
    at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at com.actionml.Preparator.prepare(Preparator.scala:49)
    at com.actionml.Preparator.prepare(Preparator.scala:32)
    at 
org.apache.predictionio.controller.PPreparator.prepareBase(PPreparator.scala:37)
    at org.apache.predictionio.controller.Engine$.train(Engine.scala:671)
    at org.apache.predictionio.controller.Engine.train(Engine.scala:177)
    at 
org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)
    at 
org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
    at 
org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
===========================================================

Thank you all for your help.

Best regards,
noelia




-- 
 <http://www.vicomtech.org/>

Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior

[email protected] <mailto:[email protected]>
+[34] 943 30 92 30
Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales

 <https://www.linkedin.com/company/vicomtech>  
<https://www.youtube.com/user/VICOMTech>  <https://twitter.com/@Vicomtech_IK4>

member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es/>

Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>





-- 
 <http://www.vicomtech.org/>

Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior

[email protected] <mailto:[email protected]>
+[34] 943 30 92 30
Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales

 <https://www.linkedin.com/company/vicomtech>  
<https://www.youtube.com/user/VICOMTech>  <https://twitter.com/@Vicomtech_IK4>

member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es/>

Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>

-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZZc_ygpB8k%2Bmjq4A%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZZc_ygpB8k%2Bmjq4A%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.

Re: [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 had a not serializable result

Reply via email to