Pat, you mentioned the problem could be that the data I was using was too
small. So now I'm using the attached data file as the data (4 users and 100
items). But I'm still getting the same error. I'm sorry I forgot to mention
I had increased the dataset.

The reason why I want to make it work with a very small dataset is because
I want to be able to follow the calculations. I want to understand what the
UR is doing and understand the impact of changing this or that, here or
there... I find that easier to achieve with a small example in which I know
exactly what's happening. I want to build my trust on my understanding of
the UR before I move on to applying it to a real problem. If I'm not
confident that I know how to use it, how can I tell my client that the
results I'm getting are good with any degree of confidence?





On 16 October 2017 at 20:44, Pat Ferrel <p...@occamsmachete.com> wrote:

> So all setup is the same for the integration-test and your modified test
> *except the data*?
>
> The error looks like a setup problem because the serialization should
> happen with either test. But if the only difference really is the data,
> then toss it and use either real data or the integration test data, why are
> you trying to synthesize fake data if it causes the error?
>
> BTW the data you include below in this thread would never create internal
> IDs as high as 94 in the vector. You must have switched to a new dataset???
>
> I would get a dump of your data using `pio export` and make sure it’s what
> you thought it was. You claim to have only 4 user ids and 4 item ids but
> the serialized vector thinks you have at least 94 of user or item ids.
> Something doesn’t add up.
>
>
> On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <no...@vicomtech.org>
> wrote:
>
> Pat, you are absolutely right! I increased the sleep time and now the
> integration test for handmade works perfectly.
>
> However, the integration test adapted to run with my tiny app runs into
> the same problem I've been having with this app:
>
> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
> serializable result: org.apache.mahout.math.RandomAccessSparseVector
> Serialization stack:
>     - object not serializable (class: 
> org.apache.mahout.math.RandomAccessSparseVector,
> value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:
> 1.0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,
> 72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0})
>     - field (class: scala.Tuple2, name: _2, type: class java.lang.Object)
>     - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1.
> 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,
> 20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:
> 1.0,46:1.0,81:1.0,86:1.0,43:1.0})); not retrying
> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
> serializable result: org.apache.mahout.math.RandomAccessSparseVector
> Serialization stack:
>
> ...
>
> Any ideas?
>
> On 15 October 2017 at 19:09, Pat Ferrel <p...@occamsmachete.com> wrote:
>
>> This is probably a timing issue in the integration test, which has to
>> wait for `pio deploy` to finish before the queries can be made. If it
>> doesn’t finish the queries will fail. By the time the rest of the test
>> quits the model has been deployed so you can run queries. In the
>> integration-test script increase the delay after `pio deploy…` and see if
>> it passes then.
>>
>> This is probably an integrtion-test script problem not a problem in the
>> system
>>
>>
>>
>> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <no...@vicomtech.org>
>> wrote:
>>
>> Pat,
>>
>> I have run the integration test for the handmade example out of
>> curiosity. Strangely enough things go more or less as expected apart from
>> the fact that I get a message saying:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO] [CoreWorkflow$]
>> Training completed successfully.Model will remain deployed after this
>> testWaiting 30 seconds for the server to startnohup: redirecting stderr to
>> stdout  % Total    % Received % Xferd  Average Speed   Time    Time
>> Time  Current                                 Dload  Upload   Total
>> Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:--
>> --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 8000:
>> Connection refused*
>> So the integration test does not manage to get the recommendations even
>> though the model trained and deployed successfully. However, as soon as the
>> integration test finishes, on the same terminal, I can get the
>> recommendations by doing the following:
>>
>> $ curl -H "Content-Type: application/json" -d '
>> > {
>> >     "user": "u1"
>> > }' http://localhost:8000/queries.json
>> {"itemScores":[{"item":"Nexus","score":0.057719700038433075}
>> ,{"item":"Surface","score":0.0}]}
>>
>> Isn't this odd? Can you guess what's going on?
>>
>> Thank you very much for all your support!
>> noelia
>>
>>
>>
>> On 5 October 2017 at 19:22, Pat Ferrel <p...@occamsmachete.com> wrote:
>>
>>> Ok, that config should work. Does the integration test pass?
>>>
>>> The data you are using is extremely small and though it does look like
>>> it has cooccurrences, they may not meet minimum “big-data” thresholds used
>>> by default. Try adding more data or use the handmade example data, rename
>>> purchase to view and discard the existing view data if you wish.
>>>
>>> The error is very odd and I’ve never seen it. If the integration test
>>> works I can only surmise it's your data.
>>>
>>>
>>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <no...@vicomtech.org>
>>> wrote:
>>>
>>> SPARK: spark-1.6.3-bin-hadoop2.6
>>>
>>> PIO: 0.11.0-incubating
>>>
>>> Scala: whatever gets installed when installing PIO 0.11.0-incubating, I
>>> haven't installed Scala separately
>>>
>>> UR: ActionML's UR v0.6.0 I suppose as that's the last version mentioned
>>> in the readme file. I have attached the UR zip file I downloaded from the
>>> actionml github account.
>>>
>>> Thank you for your help!!
>>>
>>> On 4 October 2017 at 17:20, Pat Ferrel <p...@occamsmachete.com> wrote:
>>>
>>>> What version of Scala. Spark, PIO, and UR are you using?
>>>>
>>>>
>>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <no...@vicomtech.org>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm still trying to create a very simple app to learn to use
>>>> PredictionIO and still having trouble. I have done pio build no problem.
>>>> But when I do pio train I get a very long error message related to
>>>> serialisation (error message copied below).
>>>>
>>>> pio status reports system is all ready to go.
>>>>
>>>> The app I'm trying to build is very simple, it only has 'view' events.
>>>> Here's the engine.json:
>>>>
>>>> *===========================================================*
>>>> {
>>>>   "comment":" This config file uses default settings for all but the
>>>> required values see README.md for docs",
>>>>   "id": "default",
>>>>   "description": "Default settings",
>>>>   "engineFactory": "com.actionml.RecommendationEngine",
>>>>   "datasource": {
>>>>     "params" : {
>>>>       "name": "tiny_app_data.csv",
>>>>       "appName": "TinyApp",
>>>>       "eventNames": ["view"]
>>>>     }
>>>>   },
>>>>   "algorithms": [
>>>>     {
>>>>       "comment": "simplest setup where all values are default,
>>>> popularity based backfill, must add eventsNames",
>>>>       "name": "ur",
>>>>       "params": {
>>>>         "appName": "TinyApp",
>>>>         "indexName": "urindex",
>>>>         "typeName": "items",
>>>>         "comment": "must have data for the first event or the model
>>>> will not build, other events are optional",
>>>>         "eventNames": ["view"]
>>>>       }
>>>>     }
>>>>   ]
>>>> }
>>>> *===========================================================*
>>>>
>>>> The data I'm using is:
>>>>
>>>> "u1","i1"
>>>> "u2","i1"
>>>> "u2","i2"
>>>> "u3","i2"
>>>> "u3","i3"
>>>> "u4","i4"
>>>>
>>>> meaning user u viewed item i.
>>>>
>>>> The data has been added to the database with the following python code:
>>>>
>>>> *===========================================================*
>>>> """
>>>> Import sample data for recommendation engine
>>>> """
>>>>
>>>> import predictionio
>>>> import argparse
>>>> import random
>>>>
>>>> RATE_ACTIONS_DELIMITER = ","
>>>> SEED = 1
>>>>
>>>>
>>>> def import_events(client, file):
>>>>   f = open(file, 'r')
>>>>   random.seed(SEED)
>>>>   count = 0
>>>>   print "Importing data..."
>>>>
>>>>   items = []
>>>>   users = []
>>>>   f = open(file, 'r')
>>>>   for line in f:
>>>>     data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
>>>>     users.append(data[0])
>>>>     items.append(data[1])
>>>>     client.create_event(
>>>>       event="view",
>>>>       entity_type="user",
>>>>       entity_id=data[0],
>>>>       target_entity_type="item",
>>>>       target_entity_id=data[1]
>>>>     )
>>>>     print "Event: " + "view" + " entity_id: " + data[0] + "
>>>> target_entity_id: " + data[1]
>>>>     count += 1
>>>>   f.close()
>>>>
>>>>   users = set(users)
>>>>   items = set(items)
>>>>   print "All users: " + str(users)
>>>>   print "All items: " + str(items)
>>>>   for item in items:
>>>>     client.create_event(
>>>>       event="$set",
>>>>       entity_type="item",
>>>>       entity_id=item
>>>>     )
>>>>     count += 1
>>>>
>>>>
>>>>   print "%s events are imported." % count
>>>>
>>>>
>>>> if __name__ == '__main__':
>>>>   parser = argparse.ArgumentParser(
>>>>     description="Import sample data for recommendation engine")
>>>>   parser.add_argument('--access_key', default='invald_access_key')
>>>>   parser.add_argument('--url', default="http://localhost:7070";)
>>>>   parser.add_argument('--file', default="./data/tiny_app_data.csv")
>>>>
>>>>   args = parser.parse_args()
>>>>   print args
>>>>
>>>>   client = predictionio.EventClient(
>>>>     access_key=args.access_key,
>>>>     url=args.url,
>>>>     threads=5,
>>>>     qsize=500)
>>>>   import_events(client, args.file)
>>>> *===========================================================*
>>>>
>>>> My pio_env.sh is the following:
>>>>
>>>> *===========================================================*
>>>> #!/usr/bin/env bash
>>>> #
>>>> # Copy this file as pio-env.sh and edit it for your site's
>>>> configuration.
>>>> #
>>>> # Licensed to the Apache Software Foundation (ASF) under one or more
>>>> # contributor license agreements.  See the NOTICE file distributed with
>>>> # this work for additional information regarding copyright ownership.
>>>> # The ASF licenses this file to You under the Apache License, Version
>>>> 2.0
>>>> # (the "License"); you may not use this file except in compliance with
>>>> # the License.  You may obtain a copy of the License at
>>>> #
>>>> #    http://www.apache.org/licenses/LICENSE-2.0
>>>> #
>>>> # Unless required by applicable law or agreed to in writing, software
>>>> # distributed under the License is distributed on an "AS IS" BASIS,
>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>>> implied.
>>>> # See the License for the specific language governing permissions and
>>>> # limitations under the License.
>>>> #
>>>>
>>>> # PredictionIO Main Configuration
>>>> #
>>>> # This section controls core behavior of PredictionIO. It is very
>>>> likely that
>>>> # you need to change these to fit your site.
>>>>
>>>> # SPARK_HOME: Apache Spark is a hard dependency and must be configured.
>>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
>>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6
>>>>
>>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar
>>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar
>>>>
>>>> # ES_CONF_DIR: You must configure this if you have advanced
>>>> configuration for
>>>> #              your Elasticsearch setup.
>>>> # ES_CONF_DIR=/opt/elasticsearch
>>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6
>>>>
>>>> # HADOOP_CONF_DIR: You must configure this if you intend to run
>>>> PredictionIO
>>>> #                  with Hadoop 2.
>>>> # HADOOP_CONF_DIR=/opt/hadoop
>>>>
>>>> # HBASE_CONF_DIR: You must configure this if you intend to run
>>>> PredictionIO
>>>> #                 with HBase on a remote cluster.
>>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
>>>>
>>>> # Filesystem paths where PredictionIO uses as block storage.
>>>> PIO_FS_BASEDIR=$HOME/.pio_store
>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
>>>>
>>>> # PredictionIO Storage Configuration
>>>> #
>>>> # This section controls programs that make use of PredictionIO's
>>>> built-in
>>>> # storage facilities. Default values are shown below.
>>>> #
>>>> # For more information on storage configuration please refer to
>>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/
>>>>
>>>> # Storage Repositories
>>>>
>>>> # Default is to use PostgreSQL
>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
>>>>
>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
>>>>
>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
>>>>
>>>> # Storage Data Sources
>>>>
>>>> # PostgreSQL Default Settings
>>>> # Please change "pio" to your database name in
>>>> PIO_STORAGE_SOURCES_PGSQL_URL
>>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
>>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
>>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
>>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
>>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
>>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
>>>>
>>>> # MySQL Example
>>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
>>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
>>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
>>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
>>>>
>>>> # Elasticsearch Example
>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela
>>>> sticsearch-5.2.1
>>>> # Elasticsearch 1.x Example
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela
>>>> sticsearch-1.7.6
>>>>
>>>> # Local File System Example
>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
>>>>
>>>> # HBase Example
>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
>>>>
>>>>
>>>> *===========================================================Error
>>>> message:*
>>>>
>>>> *===========================================================*
>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>     - object not serializable (class: 
>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>> value: {3:1.0,2:1.0})
>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>> java.lang.Object)
>>>>     - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying
>>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>     - object not serializable (class: 
>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>> value: {0:1.0,3:1.0})
>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>> java.lang.Object)
>>>>     - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying
>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>     - object not serializable (class: 
>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>> value: {1:1.0})
>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>> java.lang.Object)
>>>>     - object (class scala.Tuple2, (1,{1:1.0})); not retrying
>>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>     - object not serializable (class: 
>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>> value: {0:1.0})
>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>> java.lang.Object)
>>>>     - object (class scala.Tuple2, (0,{0:1.0})); not retrying
>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not
>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector
>>>> Serialization stack:
>>>>     - object not serializable (class: 
>>>> org.apache.mahout.math.RandomAccessSparseVector,
>>>> value: {3:1.0,2:1.0})
>>>>     - field (class: scala.Tuple2, name: _2, type: class
>>>> java.lang.Object)
>>>>     - object (class scala.Tuple2, (2,{3:1.0,2:1.0}))
>>>>     at org.apache.spark.scheduler.DAGScheduler.org
>>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$
>>>> spark$scheduler$DAGScheduler$$failJobAndIndependentStages(
>>>> DAGScheduler.scala:1431)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>> 1.apply(DAGScheduler.scala:1419)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
>>>> 1.apply(DAGScheduler.scala:1418)
>>>>     at scala.collection.mutable.ResizableArray$class.foreach(Resiza
>>>> bleArray.scala:59)
>>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.sca
>>>> la:47)
>>>>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
>>>> ler.scala:1418)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>> etFailed$1.apply(DAGScheduler.scala:799)
>>>>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
>>>> etFailed$1.apply(DAGScheduler.scala:799)
>>>>     at scala.Option.foreach(Option.scala:236)
>>>>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>>> DAGScheduler.scala:799)
>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>>>> Receive(DAGScheduler.scala:1640)
>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>>> ceive(DAGScheduler.scala:1599)
>>>>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>>> ceive(DAGScheduler.scala:1588)
>>>>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>>>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
>>>> scala:620)
>>>>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>>>>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
>>>>     at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088)
>>>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>> onScope.scala:150)
>>>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>>> onScope.scala:111)
>>>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>>>     at org.apache.spark.rdd.RDD.fold(RDD.scala:1082)
>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com
>>>> <http://s.drm.checkpointeddrmspark.com/>puteNRow(CheckpointedDrmSpark.
>>>> scala:188)
>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro
>>>> w$lzycompute(CheckpointedDrmSpark.scala:55)
>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro
>>>> w(CheckpointedDrmSpark.scala:55)
>>>>     at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new
>>>> RowCardinality(CheckpointedDrmSpark.scala:219)
>>>>     at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213)
>>>>     at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71)
>>>>     at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49)
>>>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>>> sableLike.scala:244)
>>>>     at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>>> sableLike.scala:244)
>>>>     at scala.collection.immutable.List.foreach(List.scala:318)
>>>>     at scala.collection.TraversableLike$class.map(TraversableLike.s
>>>> cala:244)
>>>>     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>>>     at com.actionml.Preparator.prepare(Preparator.scala:49)
>>>>     at com.actionml.Preparator.prepare(Preparator.scala:32)
>>>>     at org.apache.predictionio.controller.PPreparator.prepareBase(P
>>>> Preparator.scala:37)
>>>>     at org.apache.predictionio.controller.Engine$.train(Engine.scal
>>>> a:671)
>>>>     at org.apache.predictionio.controller.Engine.train(Engine.scala
>>>> :177)
>>>>     at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core
>>>> Workflow.scala:67)
>>>>     at org.apache.predictionio.workflow.CreateWorkflow$.main(Create
>>>> Workflow.scala:250)
>>>>     at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW
>>>> orkflow.scala)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:62)
>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>>>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>>> $SparkSubmit$$runMain(SparkSubmit.scala:731)
>>>>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>>> .scala:181)
>>>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal
>>>> a:206)
>>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>
>>>> *===========================================================*
>>>> Thank you all for your help.
>>>>
>>>> Best regards,
>>>> noelia
>>>>
>>>>
>>>
>>>
>>> --
>>> <http://www.vicomtech.org/>
>>>
>>> Noelia Osés Fernández, PhD
>>> Senior Researcher |
>>> Investigadora Senior
>>>
>>> no...@vicomtech.org
>>> +[34] 943 30 92 30
>>> Data Intelligence for Energy and
>>> Industrial Processes | Inteligencia
>>> de Datos para Energía y Procesos
>>> Industriales
>>>
>>> <https://www.linkedin.com/company/vicomtech>
>>> <https://www.youtube.com/user/VICOMTech>
>>> <https://twitter.com/@Vicomtech_IK4>
>>>
>>> member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es/>
>>>
>>> Legal Notice - Privacy policy
>>> <http://www.vicomtech.org/en/proteccion-datos>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> <http://www.vicomtech.org/>
>>
>> Noelia Osés Fernández, PhD
>> Senior Researcher |
>> Investigadora Senior
>>
>> no...@vicomtech.org
>> +[34] 943 30 92 30
>> Data Intelligence for Energy and
>> Industrial Processes | Inteligencia
>> de Datos para Energía y Procesos
>> Industriales
>>
>> <https://www.linkedin.com/company/vicomtech>
>> <https://www.youtube.com/user/VICOMTech>
>> <https://twitter.com/@Vicomtech_IK4>
>>
>> member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es/>
>>
>> Legal Notice - Privacy policy
>> <http://www.vicomtech.org/en/proteccion-datos>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "actionml-user" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to actionml-user+unsubscr...@googlegroups.com.
>> To post to this group, send email to actionml-u...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.co
>> m/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZ
>> Zc_ygpB8k%2Bmjq4A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZZc_ygpB8k%2Bmjq4A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>
>
> --
> <http://www.vicomtech.org/>
>
> Noelia Osés Fernández, PhD
> Senior Researcher |
> Investigadora Senior
>
> no...@vicomtech.org
> +[34] 943 30 92 30
> Data Intelligence for Energy and
> Industrial Processes | Inteligencia
> de Datos para Energía y Procesos
> Industriales
>
> <https://www.linkedin.com/company/vicomtech>
> <https://www.youtube.com/user/VICOMTech>
> <https://twitter.com/@Vicomtech_IK4>
>
> member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es/>
>
> Legal Notice - Privacy policy
> <http://www.vicomtech.org/en/proteccion-datos>
>
>


-- 
<http://www.vicomtech.org>

Noelia Osés Fernández, PhD
Senior Researcher |
Investigadora Senior

no...@vicomtech.org
+[34] 943 30 92 30
Data Intelligence for Energy and
Industrial Processes | Inteligencia
de Datos para Energía y Procesos
Industriales

<https://www.linkedin.com/company/vicomtech>
<https://www.youtube.com/user/VICOMTech>
<https://twitter.com/@Vicomtech_IK4>

member of:  <http://www.graphicsmedia.net/>     <http://www.ik4.es>

Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>
"u1","i1"
"u2","i1"
"u2","i2"
"u3","i2"
"u3","i3"
"u4","i4"
"u1","i5"
"u2","i5"
"u2","i6"
"u3","i6"
"u3","i7"
"u4","i8"
"u1","i9"
"u2","i9"
"u2","i10"
"u3","i10"
"u3","i11"
"u4","i12"
"u1","i13"
"u2","i13"
"u2","i14"
"u3","i14"
"u3","i15"
"u4","i16"
"u1","i17"
"u2","i17"
"u2","i18"
"u3","i18"
"u3","i19"
"u4","i20"
"u1","i21"
"u2","i21"
"u2","i22"
"u3","i22"
"u3","i23"
"u4","i24"
"u1","i25"
"u2","i25"
"u2","i26"
"u3","i26"
"u3","i27"
"u4","i28"
"u1","i29"
"u2","i29"
"u2","i30"
"u3","i30"
"u3","i31"
"u4","i32"
"u1","i33"
"u2","i33"
"u2","i34"
"u3","i34"
"u3","i35"
"u4","i36"
"u1","i37"
"u2","i37"
"u2","i38"
"u3","i38"
"u3","i39"
"u4","i40"
"u1","i41"
"u2","i41"
"u2","i42"
"u3","i42"
"u3","i43"
"u4","i44"
"u1","i45"
"u2","i45"
"u2","i46"
"u3","i46"
"u3","i47"
"u4","i48"
"u1","i49"
"u2","i49"
"u2","i50"
"u3","i50"
"u3","i51"
"u4","i52"
"u1","i53"
"u2","i53"
"u2","i54"
"u3","i54"
"u3","i55"
"u4","i56"
"u1","i57"
"u2","i57"
"u2","i58"
"u3","i58"
"u3","i59"
"u4","i60"
"u1","i61"
"u2","i61"
"u2","i62"
"u3","i62"
"u3","i63"
"u4","i64"
"u1","i65"
"u2","i65"
"u2","i66"
"u3","i66"
"u3","i67"
"u4","i68"
"u1","i69"
"u2","i69"
"u2","i70"
"u3","i70"
"u3","i71"
"u4","i72"
"u1","i73"
"u2","i73"
"u2","i74"
"u3","i74"
"u3","i75"
"u4","i76"
"u1","i77"
"u2","i77"
"u2","i78"
"u3","i78"
"u3","i79"
"u4","i80"
"u1","i81"
"u2","i81"
"u2","i82"
"u3","i82"
"u3","i83"
"u4","i84"
"u1","i85"
"u2","i85"
"u2","i86"
"u3","i86"
"u3","i87"
"u4","i88"
"u1","i89"
"u2","i89"
"u2","i90"
"u3","i90"
"u3","i91"
"u4","i92"
"u1","i93"
"u2","i93"
"u2","i94"
"u3","i94"
"u3","i95"
"u4","i96"
"u1","i97"
"u2","i97"
"u2","i98"
"u3","i98"
"u3","i99"
"u4","i100"

Attachment: engine.json
Description: application/json

Attachment: integration-test
Description: Binary data

"""
Import sample data for recommendation engine
"""

import predictionio
import argparse
import random

RATE_ACTIONS_DELIMITER = ","
SEED = 1


def import_events(client, file):
  f = open(file, 'r')
  random.seed(SEED)
  count = 0
  print "Importing data..."

  items = []
  users = []
  f = open(file, 'r')
  for line in f:
    data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER)
    users.append(data[0])
    items.append(data[1])
    client.create_event(
      event="view",
      entity_type="user",
      entity_id=data[0],
      target_entity_type="item",
      target_entity_id=data[1]
    )
    print "Event: " + "view" + " entity_id: " + data[0] + " target_entity_id: " + data[1]
    count += 1
  f.close()

  users = set(users)
  items = set(items)
  print "All users: " + str(users)
  print "All items: " + str(items)
  for item in items:
    client.create_event(
      event="$set",
      entity_type="item",
      entity_id=item
    )
    count += 1


  print "%s events are imported." % count


if __name__ == '__main__':
  parser = argparse.ArgumentParser(
    description="Import sample data for recommendation engine")
  parser.add_argument('--access_key', default='invald_access_key')
  parser.add_argument('--url', default="http://localhost:7070";)
  parser.add_argument('--file', default="./data/tiny_app_data.csv")

  args = parser.parse_args()
  print args

  client = predictionio.EventClient(
    access_key=args.access_key,
    url=args.url,
    threads=5,
    qsize=500)
  import_events(client, args.file)

Reply via email to