Pat, you mentioned the problem could be that the data I was using was too small. So now I'm using the attached data file as the data (4 users and 100 items). But I'm still getting the same error. I'm sorry I forgot to mention I had increased the dataset.
The reason why I want to make it work with a very small dataset is because I want to be able to follow the calculations. I want to understand what the UR is doing and understand the impact of changing this or that, here or there... I find that easier to achieve with a small example in which I know exactly what's happening. I want to build my trust on my understanding of the UR before I move on to applying it to a real problem. If I'm not confident that I know how to use it, how can I tell my client that the results I'm getting are good with any degree of confidence? On 16 October 2017 at 20:44, Pat Ferrel <p...@occamsmachete.com> wrote: > So all setup is the same for the integration-test and your modified test > *except the data*? > > The error looks like a setup problem because the serialization should > happen with either test. But if the only difference really is the data, > then toss it and use either real data or the integration test data, why are > you trying to synthesize fake data if it causes the error? > > BTW the data you include below in this thread would never create internal > IDs as high as 94 in the vector. You must have switched to a new dataset??? > > I would get a dump of your data using `pio export` and make sure it’s what > you thought it was. You claim to have only 4 user ids and 4 item ids but > the serialized vector thinks you have at least 94 of user or item ids. > Something doesn’t add up. > > > On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <no...@vicomtech.org> > wrote: > > Pat, you are absolutely right! I increased the sleep time and now the > integration test for handmade works perfectly. > > However, the integration test adapted to run with my tiny app runs into > the same problem I've been having with this app: > > [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not > serializable result: org.apache.mahout.math.RandomAccessSparseVector > Serialization stack: > - object not serializable (class: > org.apache.mahout.math.RandomAccessSparseVector, > value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94: > 1.0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0, > 72:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0}) > - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) > - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1. > 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0, > 20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77: > 1.0,46:1.0,81:1.0,86:1.0,43:1.0})); not retrying > [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not > serializable result: org.apache.mahout.math.RandomAccessSparseVector > Serialization stack: > > ... > > Any ideas? > > On 15 October 2017 at 19:09, Pat Ferrel <p...@occamsmachete.com> wrote: > >> This is probably a timing issue in the integration test, which has to >> wait for `pio deploy` to finish before the queries can be made. If it >> doesn’t finish the queries will fail. By the time the rest of the test >> quits the model has been deployed so you can run queries. In the >> integration-test script increase the delay after `pio deploy…` and see if >> it passes then. >> >> This is probably an integrtion-test script problem not a problem in the >> system >> >> >> >> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <no...@vicomtech.org> >> wrote: >> >> Pat, >> >> I have run the integration test for the handmade example out of >> curiosity. Strangely enough things go more or less as expected apart from >> the fact that I get a message saying: >> >> >> >> >> >> >> >> >> >> >> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO] [CoreWorkflow$] >> Training completed successfully.Model will remain deployed after this >> testWaiting 30 seconds for the server to startnohup: redirecting stderr to >> stdout % Total % Received % Xferd Average Speed Time Time >> Time Current Dload Upload Total >> Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- >> --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8000: >> Connection refused* >> So the integration test does not manage to get the recommendations even >> though the model trained and deployed successfully. However, as soon as the >> integration test finishes, on the same terminal, I can get the >> recommendations by doing the following: >> >> $ curl -H "Content-Type: application/json" -d ' >> > { >> > "user": "u1" >> > }' http://localhost:8000/queries.json >> {"itemScores":[{"item":"Nexus","score":0.057719700038433075} >> ,{"item":"Surface","score":0.0}]} >> >> Isn't this odd? Can you guess what's going on? >> >> Thank you very much for all your support! >> noelia >> >> >> >> On 5 October 2017 at 19:22, Pat Ferrel <p...@occamsmachete.com> wrote: >> >>> Ok, that config should work. Does the integration test pass? >>> >>> The data you are using is extremely small and though it does look like >>> it has cooccurrences, they may not meet minimum “big-data” thresholds used >>> by default. Try adding more data or use the handmade example data, rename >>> purchase to view and discard the existing view data if you wish. >>> >>> The error is very odd and I’ve never seen it. If the integration test >>> works I can only surmise it's your data. >>> >>> >>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández <no...@vicomtech.org> >>> wrote: >>> >>> SPARK: spark-1.6.3-bin-hadoop2.6 >>> >>> PIO: 0.11.0-incubating >>> >>> Scala: whatever gets installed when installing PIO 0.11.0-incubating, I >>> haven't installed Scala separately >>> >>> UR: ActionML's UR v0.6.0 I suppose as that's the last version mentioned >>> in the readme file. I have attached the UR zip file I downloaded from the >>> actionml github account. >>> >>> Thank you for your help!! >>> >>> On 4 October 2017 at 17:20, Pat Ferrel <p...@occamsmachete.com> wrote: >>> >>>> What version of Scala. Spark, PIO, and UR are you using? >>>> >>>> >>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández <no...@vicomtech.org> >>>> wrote: >>>> >>>> Hi all, >>>> >>>> I'm still trying to create a very simple app to learn to use >>>> PredictionIO and still having trouble. I have done pio build no problem. >>>> But when I do pio train I get a very long error message related to >>>> serialisation (error message copied below). >>>> >>>> pio status reports system is all ready to go. >>>> >>>> The app I'm trying to build is very simple, it only has 'view' events. >>>> Here's the engine.json: >>>> >>>> *===========================================================* >>>> { >>>> "comment":" This config file uses default settings for all but the >>>> required values see README.md for docs", >>>> "id": "default", >>>> "description": "Default settings", >>>> "engineFactory": "com.actionml.RecommendationEngine", >>>> "datasource": { >>>> "params" : { >>>> "name": "tiny_app_data.csv", >>>> "appName": "TinyApp", >>>> "eventNames": ["view"] >>>> } >>>> }, >>>> "algorithms": [ >>>> { >>>> "comment": "simplest setup where all values are default, >>>> popularity based backfill, must add eventsNames", >>>> "name": "ur", >>>> "params": { >>>> "appName": "TinyApp", >>>> "indexName": "urindex", >>>> "typeName": "items", >>>> "comment": "must have data for the first event or the model >>>> will not build, other events are optional", >>>> "eventNames": ["view"] >>>> } >>>> } >>>> ] >>>> } >>>> *===========================================================* >>>> >>>> The data I'm using is: >>>> >>>> "u1","i1" >>>> "u2","i1" >>>> "u2","i2" >>>> "u3","i2" >>>> "u3","i3" >>>> "u4","i4" >>>> >>>> meaning user u viewed item i. >>>> >>>> The data has been added to the database with the following python code: >>>> >>>> *===========================================================* >>>> """ >>>> Import sample data for recommendation engine >>>> """ >>>> >>>> import predictionio >>>> import argparse >>>> import random >>>> >>>> RATE_ACTIONS_DELIMITER = "," >>>> SEED = 1 >>>> >>>> >>>> def import_events(client, file): >>>> f = open(file, 'r') >>>> random.seed(SEED) >>>> count = 0 >>>> print "Importing data..." >>>> >>>> items = [] >>>> users = [] >>>> f = open(file, 'r') >>>> for line in f: >>>> data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER) >>>> users.append(data[0]) >>>> items.append(data[1]) >>>> client.create_event( >>>> event="view", >>>> entity_type="user", >>>> entity_id=data[0], >>>> target_entity_type="item", >>>> target_entity_id=data[1] >>>> ) >>>> print "Event: " + "view" + " entity_id: " + data[0] + " >>>> target_entity_id: " + data[1] >>>> count += 1 >>>> f.close() >>>> >>>> users = set(users) >>>> items = set(items) >>>> print "All users: " + str(users) >>>> print "All items: " + str(items) >>>> for item in items: >>>> client.create_event( >>>> event="$set", >>>> entity_type="item", >>>> entity_id=item >>>> ) >>>> count += 1 >>>> >>>> >>>> print "%s events are imported." % count >>>> >>>> >>>> if __name__ == '__main__': >>>> parser = argparse.ArgumentParser( >>>> description="Import sample data for recommendation engine") >>>> parser.add_argument('--access_key', default='invald_access_key') >>>> parser.add_argument('--url', default="http://localhost:7070") >>>> parser.add_argument('--file', default="./data/tiny_app_data.csv") >>>> >>>> args = parser.parse_args() >>>> print args >>>> >>>> client = predictionio.EventClient( >>>> access_key=args.access_key, >>>> url=args.url, >>>> threads=5, >>>> qsize=500) >>>> import_events(client, args.file) >>>> *===========================================================* >>>> >>>> My pio_env.sh is the following: >>>> >>>> *===========================================================* >>>> #!/usr/bin/env bash >>>> # >>>> # Copy this file as pio-env.sh and edit it for your site's >>>> configuration. >>>> # >>>> # Licensed to the Apache Software Foundation (ASF) under one or more >>>> # contributor license agreements. See the NOTICE file distributed with >>>> # this work for additional information regarding copyright ownership. >>>> # The ASF licenses this file to You under the Apache License, Version >>>> 2.0 >>>> # (the "License"); you may not use this file except in compliance with >>>> # the License. You may obtain a copy of the License at >>>> # >>>> # http://www.apache.org/licenses/LICENSE-2.0 >>>> # >>>> # Unless required by applicable law or agreed to in writing, software >>>> # distributed under the License is distributed on an "AS IS" BASIS, >>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >>>> implied. >>>> # See the License for the specific language governing permissions and >>>> # limitations under the License. >>>> # >>>> >>>> # PredictionIO Main Configuration >>>> # >>>> # This section controls core behavior of PredictionIO. It is very >>>> likely that >>>> # you need to change these to fit your site. >>>> >>>> # SPARK_HOME: Apache Spark is a hard dependency and must be configured. >>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7 >>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6 >>>> >>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar >>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar >>>> >>>> # ES_CONF_DIR: You must configure this if you have advanced >>>> configuration for >>>> # your Elasticsearch setup. >>>> # ES_CONF_DIR=/opt/elasticsearch >>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6 >>>> >>>> # HADOOP_CONF_DIR: You must configure this if you intend to run >>>> PredictionIO >>>> # with Hadoop 2. >>>> # HADOOP_CONF_DIR=/opt/hadoop >>>> >>>> # HBASE_CONF_DIR: You must configure this if you intend to run >>>> PredictionIO >>>> # with HBase on a remote cluster. >>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf >>>> >>>> # Filesystem paths where PredictionIO uses as block storage. >>>> PIO_FS_BASEDIR=$HOME/.pio_store >>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines >>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp >>>> >>>> # PredictionIO Storage Configuration >>>> # >>>> # This section controls programs that make use of PredictionIO's >>>> built-in >>>> # storage facilities. Default values are shown below. >>>> # >>>> # For more information on storage configuration please refer to >>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/ >>>> >>>> # Storage Repositories >>>> >>>> # Default is to use PostgreSQL >>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta >>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH >>>> >>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event >>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE >>>> >>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model >>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS >>>> >>>> # Storage Data Sources >>>> >>>> # PostgreSQL Default Settings >>>> # Please change "pio" to your database name in >>>> PIO_STORAGE_SOURCES_PGSQL_URL >>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and >>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly >>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc >>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio >>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio >>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio >>>> >>>> # MySQL Example >>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc >>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio >>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio >>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio >>>> >>>> # Elasticsearch Example >>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 >>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http >>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela >>>> sticsearch-5.2.1 >>>> # Elasticsearch 1.x Example >>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES >>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 >>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ela >>>> sticsearch-1.7.6 >>>> >>>> # Local File System Example >>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs >>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models >>>> >>>> # HBase Example >>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase >>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6 >>>> >>>> >>>> *===========================================================Error >>>> message:* >>>> >>>> *===========================================================* >>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> - object not serializable (class: >>>> org.apache.mahout.math.RandomAccessSparseVector, >>>> value: {3:1.0,2:1.0}) >>>> - field (class: scala.Tuple2, name: _2, type: class >>>> java.lang.Object) >>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying >>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> - object not serializable (class: >>>> org.apache.mahout.math.RandomAccessSparseVector, >>>> value: {0:1.0,3:1.0}) >>>> - field (class: scala.Tuple2, name: _2, type: class >>>> java.lang.Object) >>>> - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying >>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> - object not serializable (class: >>>> org.apache.mahout.math.RandomAccessSparseVector, >>>> value: {1:1.0}) >>>> - field (class: scala.Tuple2, name: _2, type: class >>>> java.lang.Object) >>>> - object (class scala.Tuple2, (1,{1:1.0})); not retrying >>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> - object not serializable (class: >>>> org.apache.mahout.math.RandomAccessSparseVector, >>>> value: {0:1.0}) >>>> - field (class: scala.Tuple2, name: _2, type: class >>>> java.lang.Object) >>>> - object (class scala.Tuple2, (0,{0:1.0})); not retrying >>>> Exception in thread "main" org.apache.spark.SparkException: Job >>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> - object not serializable (class: >>>> org.apache.mahout.math.RandomAccessSparseVector, >>>> value: {3:1.0,2:1.0}) >>>> - field (class: scala.Tuple2, name: _2, type: class >>>> java.lang.Object) >>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})) >>>> at org.apache.spark.scheduler.DAGScheduler.org >>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$ >>>> spark$scheduler$DAGScheduler$$failJobAndIndependentStages( >>>> DAGScheduler.scala:1431) >>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>>> 1.apply(DAGScheduler.scala:1419) >>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>>> 1.apply(DAGScheduler.scala:1418) >>>> at scala.collection.mutable.ResizableArray$class.foreach(Resiza >>>> bleArray.scala:59) >>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.sca >>>> la:47) >>>> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu >>>> ler.scala:1418) >>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>>> etFailed$1.apply(DAGScheduler.scala:799) >>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>>> etFailed$1.apply(DAGScheduler.scala:799) >>>> at scala.Option.foreach(Option.scala:236) >>>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >>>> DAGScheduler.scala:799) >>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn >>>> Receive(DAGScheduler.scala:1640) >>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>>> ceive(DAGScheduler.scala:1599) >>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>>> ceive(DAGScheduler.scala:1588) >>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler. >>>> scala:620) >>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) >>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) >>>> at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1088) >>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>> onScope.scala:150) >>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>> onScope.scala:111) >>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >>>> at org.apache.spark.rdd.RDD.fold(RDD.scala:1082) >>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com >>>> <http://s.drm.checkpointeddrmspark.com/>puteNRow(CheckpointedDrmSpark. >>>> scala:188) >>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro >>>> w$lzycompute(CheckpointedDrmSpark.scala:55) >>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.nro >>>> w(CheckpointedDrmSpark.scala:55) >>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new >>>> RowCardinality(CheckpointedDrmSpark.scala:219) >>>> at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213) >>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71) >>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49) >>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>>> sableLike.scala:244) >>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver >>>> sableLike.scala:244) >>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>> at scala.collection.TraversableLike$class.map(TraversableLike.s >>>> cala:244) >>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105) >>>> at com.actionml.Preparator.prepare(Preparator.scala:49) >>>> at com.actionml.Preparator.prepare(Preparator.scala:32) >>>> at org.apache.predictionio.controller.PPreparator.prepareBase(P >>>> Preparator.scala:37) >>>> at org.apache.predictionio.controller.Engine$.train(Engine.scal >>>> a:671) >>>> at org.apache.predictionio.controller.Engine.train(Engine.scala >>>> :177) >>>> at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(Core >>>> Workflow.scala:67) >>>> at org.apache.predictionio.workflow.CreateWorkflow$.main(Create >>>> Workflow.scala:250) >>>> at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW >>>> orkflow.scala) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>> ssorImpl.java:62) >>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>> thodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >>>> $SparkSubmit$$runMain(SparkSubmit.scala:731) >>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >>>> .scala:181) >>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scal >>>> a:206) >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> >>>> *===========================================================* >>>> Thank you all for your help. >>>> >>>> Best regards, >>>> noelia >>>> >>>> >>> >>> >>> -- >>> <http://www.vicomtech.org/> >>> >>> Noelia Osés Fernández, PhD >>> Senior Researcher | >>> Investigadora Senior >>> >>> no...@vicomtech.org >>> +[34] 943 30 92 30 >>> Data Intelligence for Energy and >>> Industrial Processes | Inteligencia >>> de Datos para Energía y Procesos >>> Industriales >>> >>> <https://www.linkedin.com/company/vicomtech> >>> <https://www.youtube.com/user/VICOMTech> >>> <https://twitter.com/@Vicomtech_IK4> >>> >>> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/> >>> >>> Legal Notice - Privacy policy >>> <http://www.vicomtech.org/en/proteccion-datos> >>> >>> >>> >>> >> >> >> -- >> <http://www.vicomtech.org/> >> >> Noelia Osés Fernández, PhD >> Senior Researcher | >> Investigadora Senior >> >> no...@vicomtech.org >> +[34] 943 30 92 30 >> Data Intelligence for Energy and >> Industrial Processes | Inteligencia >> de Datos para Energía y Procesos >> Industriales >> >> <https://www.linkedin.com/company/vicomtech> >> <https://www.youtube.com/user/VICOMTech> >> <https://twitter.com/@Vicomtech_IK4> >> >> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/> >> >> Legal Notice - Privacy policy >> <http://www.vicomtech.org/en/proteccion-datos> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "actionml-user" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to actionml-user+unsubscr...@googlegroups.com. >> To post to this group, send email to actionml-u...@googlegroups.com. >> To view this discussion on the web visit https://groups.google.co >> m/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZ >> Zc_ygpB8k%2Bmjq4A%40mail.gmail.com >> <https://groups.google.com/d/msgid/actionml-user/CAMyseftewAGvt2_XPsRQrDvmFVti4sZLFkZZc_ygpB8k%2Bmjq4A%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >> > > > -- > <http://www.vicomtech.org/> > > Noelia Osés Fernández, PhD > Senior Researcher | > Investigadora Senior > > no...@vicomtech.org > +[34] 943 30 92 30 > Data Intelligence for Energy and > Industrial Processes | Inteligencia > de Datos para Energía y Procesos > Industriales > > <https://www.linkedin.com/company/vicomtech> > <https://www.youtube.com/user/VICOMTech> > <https://twitter.com/@Vicomtech_IK4> > > member of: <http://www.graphicsmedia.net/> <http://www.ik4.es/> > > Legal Notice - Privacy policy > <http://www.vicomtech.org/en/proteccion-datos> > > -- <http://www.vicomtech.org> Noelia Osés Fernández, PhD Senior Researcher | Investigadora Senior no...@vicomtech.org +[34] 943 30 92 30 Data Intelligence for Energy and Industrial Processes | Inteligencia de Datos para Energía y Procesos Industriales <https://www.linkedin.com/company/vicomtech> <https://www.youtube.com/user/VICOMTech> <https://twitter.com/@Vicomtech_IK4> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es> Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>
"u1","i1" "u2","i1" "u2","i2" "u3","i2" "u3","i3" "u4","i4" "u1","i5" "u2","i5" "u2","i6" "u3","i6" "u3","i7" "u4","i8" "u1","i9" "u2","i9" "u2","i10" "u3","i10" "u3","i11" "u4","i12" "u1","i13" "u2","i13" "u2","i14" "u3","i14" "u3","i15" "u4","i16" "u1","i17" "u2","i17" "u2","i18" "u3","i18" "u3","i19" "u4","i20" "u1","i21" "u2","i21" "u2","i22" "u3","i22" "u3","i23" "u4","i24" "u1","i25" "u2","i25" "u2","i26" "u3","i26" "u3","i27" "u4","i28" "u1","i29" "u2","i29" "u2","i30" "u3","i30" "u3","i31" "u4","i32" "u1","i33" "u2","i33" "u2","i34" "u3","i34" "u3","i35" "u4","i36" "u1","i37" "u2","i37" "u2","i38" "u3","i38" "u3","i39" "u4","i40" "u1","i41" "u2","i41" "u2","i42" "u3","i42" "u3","i43" "u4","i44" "u1","i45" "u2","i45" "u2","i46" "u3","i46" "u3","i47" "u4","i48" "u1","i49" "u2","i49" "u2","i50" "u3","i50" "u3","i51" "u4","i52" "u1","i53" "u2","i53" "u2","i54" "u3","i54" "u3","i55" "u4","i56" "u1","i57" "u2","i57" "u2","i58" "u3","i58" "u3","i59" "u4","i60" "u1","i61" "u2","i61" "u2","i62" "u3","i62" "u3","i63" "u4","i64" "u1","i65" "u2","i65" "u2","i66" "u3","i66" "u3","i67" "u4","i68" "u1","i69" "u2","i69" "u2","i70" "u3","i70" "u3","i71" "u4","i72" "u1","i73" "u2","i73" "u2","i74" "u3","i74" "u3","i75" "u4","i76" "u1","i77" "u2","i77" "u2","i78" "u3","i78" "u3","i79" "u4","i80" "u1","i81" "u2","i81" "u2","i82" "u3","i82" "u3","i83" "u4","i84" "u1","i85" "u2","i85" "u2","i86" "u3","i86" "u3","i87" "u4","i88" "u1","i89" "u2","i89" "u2","i90" "u3","i90" "u3","i91" "u4","i92" "u1","i93" "u2","i93" "u2","i94" "u3","i94" "u3","i95" "u4","i96" "u1","i97" "u2","i97" "u2","i98" "u3","i98" "u3","i99" "u4","i100"
engine.json
Description: application/json
integration-test
Description: Binary data
""" Import sample data for recommendation engine """ import predictionio import argparse import random RATE_ACTIONS_DELIMITER = "," SEED = 1 def import_events(client, file): f = open(file, 'r') random.seed(SEED) count = 0 print "Importing data..." items = [] users = [] f = open(file, 'r') for line in f: data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER) users.append(data[0]) items.append(data[1]) client.create_event( event="view", entity_type="user", entity_id=data[0], target_entity_type="item", target_entity_id=data[1] ) print "Event: " + "view" + " entity_id: " + data[0] + " target_entity_id: " + data[1] count += 1 f.close() users = set(users) items = set(items) print "All users: " + str(users) print "All items: " + str(items) for item in items: client.create_event( event="$set", entity_type="item", entity_id=item ) count += 1 print "%s events are imported." % count if __name__ == '__main__': parser = argparse.ArgumentParser( description="Import sample data for recommendation engine") parser.add_argument('--access_key', default='invald_access_key') parser.add_argument('--url', default="http://localhost:7070") parser.add_argument('--file', default="./data/tiny_app_data.csv") args = parser.parse_args() print args client = predictionio.EventClient( access_key=args.access_key, url=args.url, threads=5, qsize=500) import_events(client, args.file)