Thanks for the explanation, Pat! I think the best course of action is for me to read the documentation and understand how the algorithm works. Then, try again with a slightly larger dataset.
Thank you very much! On 19 October 2017 at 17:15, Pat Ferrel <p...@occamsmachete.com> wrote: > This sample dataset is too small with too few cooccurrences. U1 will never > get i1 due to the blacklist (u1 has already viewed i1 so will not be > recommended that again). The blacklist can be disable if you want to > recommend viewed items again but beware that they may predominate every > recommendations set if you do tun it off since it is self-fulfilling. Why > not i2, not sure without running the math, the UR looks at things > statistically and with this small a dataset anomalies can be seen since the > data is not statistically significant. I1 will show up in internal > intermediate results (A’A for instance) but these are then filtered by a > statistical test called LLR, which requires a certain amount of data to > work. > > Notice the handmade dataset has many more cooccurrences and produces > understandable results. Also notice that in your dataset i3 and i4 can only > be recommended by “popularity” since they have no cooccurrence. > > > > On Oct 19, 2017, at 1:28 AM, Noelia Osés Fernández <no...@vicomtech.org> > wrote: > > Pat, this worked!!!!! Thank you very much!!!! > > The only odd thing now is that all the results I get now are 0s. For > example: > > Using the dataset: > > "u1","i1" > "u2","i1" > "u2","i2" > "u3","i2" > "u3","i3" > "u4","i4" > > echo "Recommendations for user: u1" > echo "" > curl -H "Content-Type: application/json" -d ' > { > "user": "u1" > }' http://localhost:8000/queries.json > echo "" > > What I get is: > > {"itemScores":[{"item":"\"i2\"","score":0.0},{"item":"\"i1\" > ","score":0.0},{"item":"\"i3\"","score":0.0},{"item":"\"i4\" > ","score":0.0}]} > > > If user u1 has viewed i1 and user u2 has viewed i1 and i2 then I think the > algorithm should return a non-zore score for i2 (and possible i1, too). > > Even using the bigger dataset with 100 items I still get all scores 0s. > > So now I'm going to spend some time reading the following documentation, > unless there is some other documentation you recommend I read first! > > - [The Universal Recommender](http://actionml.com/docs/ur) > - [The Correlated Cross-Occurrence Algorithm](http://mahout. > apache.org/users/algorithms/intro-cooccurrence-spark.html) > - [The Universal Recommender Slide Deck](http://www.slideshare. > net/pferrel/unified-recommender-39986309) > - [Multi-domain predictive AI or how to make one thing predict another]( > https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross- > occurences/) > > Thank you very much for all your patience and help getting me to this > point!!! > > Best regards, > Noelia > > > On 18 October 2017 at 18:33, Pat Ferrel <p...@occamsmachete.com> wrote: > >> It is the UR so Events are taken from the EventStore and converted into a >> Mahout DistributedRowMatrix of RandomAccessSparseVectors, which are both >> serializable. This path works fine and has for several years. >> >> This must be a config problem, like not using the MahoutKryoRegistrator, >> which registers the serializers for these. >> >> @Noelia, you have left out the sparkConf section of the engine.json. The >> one used in the integration test should work: >> >> { >> "comment":" This config file uses default settings for all but the >> required values see README.md for docs", >> "id": "default", >> "description": "Default settings", >> "engineFactory": "com.actionml.RecommendationEngine", >> "datasource": { >> "params" : { >> "name": "tiny_app_data.csv", >> "appName": "TinyApp", >> "eventNames": ["view"] >> } >> }, >> "sparkConf": { <================= THIS WAS LEFT OUT IN YOUR >> ENGINE.JSON BELOW IN THIS THREAD >> "spark.serializer": "org.apache.spark.serializer.KryoSerializer", >> "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io >> .MahoutKryoRegistrator", >> "spark.kryo.referenceTracking": "false", >> "spark.kryoserializer.buffer": "300m", >> "es.index.auto.create": "true" >> }, >> "algorithms": [ >> { >> "comment": "simplest setup where all values are default, >> popularity based backfill, must add eventsNames", >> "name": "ur", >> "params": { >> "appName": "TinyApp", >> "indexName": "urindex", >> "typeName": "items", >> "comment": "must have data for the first event or the model will >> not build, other events are optional", >> "eventNames": ["view"] >> } >> } >> ] >> } >> >> >> On Oct 18, 2017, at 8:49 AM, Donald Szeto <don...@apache.org> wrote: >> >> Chiming in a bit. Looking at the serialization error, it looks like we >> are just one little step away from getting this to work. >> >> Noelia, what does your synthesized data look like? All data that is >> processed by Spark needs to be serializable. At some point, a >> non-serializable vector object showing in the stack is created out of your >> synthesized data. It would be great to know what your input event looks >> like and see where in the code path has caused this. >> >> Regards, >> Donald >> >> On Tue, Oct 17, 2017 at 12:14 AM Noelia Osés Fernández < >> no...@vicomtech.org> wrote: >> >>> Pat, you mentioned the problem could be that the data I was using was >>> too small. So now I'm using the attached data file as the data (4 users and >>> 100 items). But I'm still getting the same error. I'm sorry I forgot to >>> mention I had increased the dataset. >>> >>> The reason why I want to make it work with a very small dataset is >>> because I want to be able to follow the calculations. I want to understand >>> what the UR is doing and understand the impact of changing this or that, >>> here or there... I find that easier to achieve with a small example in >>> which I know exactly what's happening. I want to build my trust on my >>> understanding of the UR before I move on to applying it to a real problem. >>> If I'm not confident that I know how to use it, how can I tell my client >>> that the results I'm getting are good with any degree of confidence? >>> >>> >>> >>> >>> >>> On 16 October 2017 at 20:44, Pat Ferrel <p...@occamsmachete.com> wrote: >>> >>>> So all setup is the same for the integration-test and your modified >>>> test *except the data*? >>>> >>>> The error looks like a setup problem because the serialization should >>>> happen with either test. But if the only difference really is the data, >>>> then toss it and use either real data or the integration test data, why are >>>> you trying to synthesize fake data if it causes the error? >>>> >>>> BTW the data you include below in this thread would never create >>>> internal IDs as high as 94 in the vector. You must have switched to a new >>>> dataset??? >>>> >>>> I would get a dump of your data using `pio export` and make sure it’s >>>> what you thought it was. You claim to have only 4 user ids and 4 item ids >>>> but the serialized vector thinks you have at least 94 of user or item ids. >>>> Something doesn’t add up. >>>> >>>> >>>> On Oct 16, 2017, at 4:43 AM, Noelia Osés Fernández <no...@vicomtech.org> >>>> wrote: >>>> >>>> Pat, you are absolutely right! I increased the sleep time and now the >>>> integration test for handmade works perfectly. >>>> >>>> However, the integration test adapted to run with my tiny app runs into >>>> the same problem I've been having with this app: >>>> >>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> - object not serializable (class: >>>> org.apache.mahout.math.RandomAccessSparseVector, >>>> value: {66:1.0,29:1.0,70:1.0,91:1.0,58:1.0,37:1.0,13:1.0,8:1.0,94:1 >>>> .0,30:1.0,57:1.0,22:1.0,20:1.0,35:1.0,97:1.0,60:1.0,27:1.0,7 >>>> 2:1.0,3:1.0,34:1.0,77:1.0,46:1.0,81:1.0,86:1.0,43:1.0}) >>>> - field (class: scala.Tuple2, name: _2, type: class >>>> java.lang.Object) >>>> - object (class scala.Tuple2, (1,{66:1.0,29:1.0,70:1.0,91:1. >>>> 0,58:1.0,37:1.0,13:1.0,8:1.0,94:1.0,30:1.0,57:1.0,22:1.0,20: >>>> 1.0,35:1.0,97:1.0,60:1.0,27:1.0,72:1.0,3:1.0,34:1.0,77:1.0, >>>> 46:1.0,81:1.0,86:1.0,43:1.0})); not retrying >>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not >>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>> Serialization stack: >>>> >>>> ... >>>> >>>> Any ideas? >>>> >>>> On 15 October 2017 at 19:09, Pat Ferrel <p...@occamsmachete.com> wrote: >>>> >>>>> This is probably a timing issue in the integration test, which has to >>>>> wait for `pio deploy` to finish before the queries can be made. If it >>>>> doesn’t finish the queries will fail. By the time the rest of the test >>>>> quits the model has been deployed so you can run queries. In the >>>>> integration-test script increase the delay after `pio deploy…` and see if >>>>> it passes then. >>>>> >>>>> This is probably an integrtion-test script problem not a problem in >>>>> the system >>>>> >>>>> >>>>> >>>>> On Oct 6, 2017, at 4:21 AM, Noelia Osés Fernández <no...@vicomtech.org> >>>>> wrote: >>>>> >>>>> Pat, >>>>> >>>>> I have run the integration test for the handmade example out of >>>>> curiosity. Strangely enough things go more or less as expected apart from >>>>> the fact that I get a message saying: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *...[INFO] [CoreWorkflow$] Updating engine instance[INFO] >>>>> [CoreWorkflow$] Training completed successfully.Model will remain deployed >>>>> after this testWaiting 30 seconds for the server to startnohup: >>>>> redirecting >>>>> stderr to stdout % Total % Received % Xferd Average Speed Time >>>>> Time Time Current Dload Upload >>>>> Total Spent Left Speed 0 0 0 0 0 0 0 0 >>>>> --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost >>>>> port 8000: Connection refused* >>>>> So the integration test does not manage to get the recommendations >>>>> even though the model trained and deployed successfully. However, as soon >>>>> as the integration test finishes, on the same terminal, I can get the >>>>> recommendations by doing the following: >>>>> >>>>> $ curl -H "Content-Type: application/json" -d ' >>>>> > { >>>>> > "user": "u1" >>>>> > }' http://localhost:8000/queries.json >>>>> {"itemScores":[{"item":"Nexus","score":0.057719700038433075} >>>>> ,{"item":"Surface","score":0.0}]} >>>>> >>>>> Isn't this odd? Can you guess what's going on? >>>>> >>>>> Thank you very much for all your support! >>>>> noelia >>>>> >>>>> >>>>> >>>>> On 5 October 2017 at 19:22, Pat Ferrel <p...@occamsmachete.com> wrote: >>>>> >>>>>> Ok, that config should work. Does the integration test pass? >>>>>> >>>>>> The data you are using is extremely small and though it does look >>>>>> like it has cooccurrences, they may not meet minimum “big-data” >>>>>> thresholds >>>>>> used by default. Try adding more data or use the handmade example data, >>>>>> rename purchase to view and discard the existing view data if you wish. >>>>>> >>>>>> The error is very odd and I’ve never seen it. If the integration test >>>>>> works I can only surmise it's your data. >>>>>> >>>>>> >>>>>> On Oct 5, 2017, at 12:02 AM, Noelia Osés Fernández < >>>>>> no...@vicomtech.org> wrote: >>>>>> >>>>>> SPARK: spark-1.6.3-bin-hadoop2.6 >>>>>> >>>>>> PIO: 0.11.0-incubating >>>>>> >>>>>> Scala: whatever gets installed when installing PIO 0.11.0-incubating, >>>>>> I haven't installed Scala separately >>>>>> >>>>>> UR: ActionML's UR v0.6.0 I suppose as that's the last version >>>>>> mentioned in the readme file. I have attached the UR zip file I >>>>>> downloaded >>>>>> from the actionml github account. >>>>>> >>>>>> Thank you for your help!! >>>>>> >>>>>> On 4 October 2017 at 17:20, Pat Ferrel <p...@occamsmachete.com> wrote: >>>>>> >>>>>>> What version of Scala. Spark, PIO, and UR are you using? >>>>>>> >>>>>>> >>>>>>> On Oct 4, 2017, at 6:10 AM, Noelia Osés Fernández < >>>>>>> no...@vicomtech.org> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'm still trying to create a very simple app to learn to use >>>>>>> PredictionIO and still having trouble. I have done pio build no problem. >>>>>>> But when I do pio train I get a very long error message related to >>>>>>> serialisation (error message copied below). >>>>>>> >>>>>>> pio status reports system is all ready to go. >>>>>>> >>>>>>> The app I'm trying to build is very simple, it only has 'view' >>>>>>> events. Here's the engine.json: >>>>>>> >>>>>>> *===========================================================* >>>>>>> { >>>>>>> "comment":" This config file uses default settings for all but >>>>>>> the required values see README.md for docs", >>>>>>> "id": "default", >>>>>>> "description": "Default settings", >>>>>>> "engineFactory": "com.actionml.RecommendationEngine", >>>>>>> "datasource": { >>>>>>> "params" : { >>>>>>> "name": "tiny_app_data.csv", >>>>>>> "appName": "TinyApp", >>>>>>> "eventNames": ["view"] >>>>>>> } >>>>>>> }, >>>>>>> "algorithms": [ >>>>>>> { >>>>>>> "comment": "simplest setup where all values are default, >>>>>>> popularity based backfill, must add eventsNames", >>>>>>> "name": "ur", >>>>>>> "params": { >>>>>>> "appName": "TinyApp", >>>>>>> "indexName": "urindex", >>>>>>> "typeName": "items", >>>>>>> "comment": "must have data for the first event or the model >>>>>>> will not build, other events are optional", >>>>>>> "eventNames": ["view"] >>>>>>> } >>>>>>> } >>>>>>> ] >>>>>>> } >>>>>>> *===========================================================* >>>>>>> >>>>>>> The data I'm using is: >>>>>>> >>>>>>> "u1","i1" >>>>>>> "u2","i1" >>>>>>> "u2","i2" >>>>>>> "u3","i2" >>>>>>> "u3","i3" >>>>>>> "u4","i4" >>>>>>> >>>>>>> meaning user u viewed item i. >>>>>>> >>>>>>> The data has been added to the database with the following python >>>>>>> code: >>>>>>> >>>>>>> *===========================================================* >>>>>>> """ >>>>>>> Import sample data for recommendation engine >>>>>>> """ >>>>>>> >>>>>>> import predictionio >>>>>>> import argparse >>>>>>> import random >>>>>>> >>>>>>> RATE_ACTIONS_DELIMITER = "," >>>>>>> SEED = 1 >>>>>>> >>>>>>> >>>>>>> def import_events(client, file): >>>>>>> f = open(file, 'r') >>>>>>> random.seed(SEED) >>>>>>> count = 0 >>>>>>> print "Importing data..." >>>>>>> >>>>>>> items = [] >>>>>>> users = [] >>>>>>> f = open(file, 'r') >>>>>>> for line in f: >>>>>>> data = line.rstrip('\r\n').split(RATE_ACTIONS_DELIMITER) >>>>>>> users.append(data[0]) >>>>>>> items.append(data[1]) >>>>>>> client.create_event( >>>>>>> event="view", >>>>>>> entity_type="user", >>>>>>> entity_id=data[0], >>>>>>> target_entity_type="item", >>>>>>> target_entity_id=data[1] >>>>>>> ) >>>>>>> print "Event: " + "view" + " entity_id: " + data[0] + " >>>>>>> target_entity_id: " + data[1] >>>>>>> count += 1 >>>>>>> f.close() >>>>>>> >>>>>>> users = set(users) >>>>>>> items = set(items) >>>>>>> print "All users: " + str(users) >>>>>>> print "All items: " + str(items) >>>>>>> for item in items: >>>>>>> client.create_event( >>>>>>> event="$set", >>>>>>> entity_type="item", >>>>>>> entity_id=item >>>>>>> ) >>>>>>> count += 1 >>>>>>> >>>>>>> >>>>>>> print "%s events are imported." % count >>>>>>> >>>>>>> >>>>>>> if __name__ == '__main__': >>>>>>> parser = argparse.ArgumentParser( >>>>>>> description="Import sample data for recommendation engine") >>>>>>> parser.add_argument('--access_key', default='invald_access_key') >>>>>>> parser.add_argument('--url', default="http://localhost:7070") >>>>>>> parser.add_argument('--file', default="./data/tiny_app_data.csv") >>>>>>> >>>>>>> args = parser.parse_args() >>>>>>> print args >>>>>>> >>>>>>> client = predictionio.EventClient( >>>>>>> access_key=args.access_key, >>>>>>> url=args.url, >>>>>>> threads=5, >>>>>>> qsize=500) >>>>>>> import_events(client, args.file) >>>>>>> *===========================================================* >>>>>>> >>>>>>> My pio_env.sh is the following: >>>>>>> >>>>>>> *===========================================================* >>>>>>> #!/usr/bin/env bash >>>>>>> # >>>>>>> # Copy this file as pio-env.sh and edit it for your site's >>>>>>> configuration. >>>>>>> # >>>>>>> # Licensed to the Apache Software Foundation (ASF) under one or more >>>>>>> # contributor license agreements. See the NOTICE file distributed >>>>>>> with >>>>>>> # this work for additional information regarding copyright ownership. >>>>>>> # The ASF licenses this file to You under the Apache License, >>>>>>> Version 2.0 >>>>>>> # (the "License"); you may not use this file except in compliance >>>>>>> with >>>>>>> # the License. You may obtain a copy of the License at >>>>>>> # >>>>>>> # http://www.apache.org/licenses/LICENSE-2.0 >>>>>>> # >>>>>>> # Unless required by applicable law or agreed to in writing, software >>>>>>> # distributed under the License is distributed on an "AS IS" BASIS, >>>>>>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >>>>>>> implied. >>>>>>> # See the License for the specific language governing permissions and >>>>>>> # limitations under the License. >>>>>>> # >>>>>>> >>>>>>> # PredictionIO Main Configuration >>>>>>> # >>>>>>> # This section controls core behavior of PredictionIO. It is very >>>>>>> likely that >>>>>>> # you need to change these to fit your site. >>>>>>> >>>>>>> # SPARK_HOME: Apache Spark is a hard dependency and must be >>>>>>> configured. >>>>>>> # SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7 >>>>>>> SPARK_HOME=$PIO_HOME/vendors/spark-1.6.3-bin-hadoop2.6 >>>>>>> >>>>>>> POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.1.4.jar >>>>>>> MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar >>>>>>> >>>>>>> # ES_CONF_DIR: You must configure this if you have advanced >>>>>>> configuration for >>>>>>> # your Elasticsearch setup. >>>>>>> # ES_CONF_DIR=/opt/elasticsearch >>>>>>> #ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-1.7.6 >>>>>>> >>>>>>> # HADOOP_CONF_DIR: You must configure this if you intend to run >>>>>>> PredictionIO >>>>>>> # with Hadoop 2. >>>>>>> # HADOOP_CONF_DIR=/opt/hadoop >>>>>>> >>>>>>> # HBASE_CONF_DIR: You must configure this if you intend to run >>>>>>> PredictionIO >>>>>>> # with HBase on a remote cluster. >>>>>>> # HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf >>>>>>> >>>>>>> # Filesystem paths where PredictionIO uses as block storage. >>>>>>> PIO_FS_BASEDIR=$HOME/.pio_store >>>>>>> PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines >>>>>>> PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp >>>>>>> >>>>>>> # PredictionIO Storage Configuration >>>>>>> # >>>>>>> # This section controls programs that make use of PredictionIO's >>>>>>> built-in >>>>>>> # storage facilities. Default values are shown below. >>>>>>> # >>>>>>> # For more information on storage configuration please refer to >>>>>>> # http://predictionio.incubator.apache.org/system/anotherdatastore/ >>>>>>> >>>>>>> # Storage Repositories >>>>>>> >>>>>>> # Default is to use PostgreSQL >>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta >>>>>>> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH >>>>>>> >>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event >>>>>>> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE >>>>>>> >>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model >>>>>>> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS >>>>>>> >>>>>>> # Storage Data Sources >>>>>>> >>>>>>> # PostgreSQL Default Settings >>>>>>> # Please change "pio" to your database name in >>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL >>>>>>> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and >>>>>>> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly >>>>>>> PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc >>>>>>> PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio >>>>>>> PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio >>>>>>> PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio >>>>>>> >>>>>>> # MySQL Example >>>>>>> # PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc >>>>>>> # PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio >>>>>>> # PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio >>>>>>> # PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio >>>>>>> >>>>>>> # Elasticsearch Example >>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 >>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http >>>>>>> # PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ >>>>>>> elasticsearch-5.2.1 >>>>>>> # Elasticsearch 1.x Example >>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch >>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=myprojectES >>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost >>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300 >>>>>>> PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/ >>>>>>> elasticsearch-1.7.6 >>>>>>> >>>>>>> # Local File System Example >>>>>>> PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs >>>>>>> PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models >>>>>>> >>>>>>> # HBase Example >>>>>>> PIO_STORAGE_SOURCES_HBASE_TYPE=hbase >>>>>>> PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6 >>>>>>> >>>>>>> >>>>>>> *===========================================================Error >>>>>>> message:* >>>>>>> >>>>>>> *===========================================================* >>>>>>> [ERROR] [TaskSetManager] Task 2.0 in stage 10.0 (TID 24) had a not >>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>>>> Serialization stack: >>>>>>> - object not serializable (class: >>>>>>> org.apache.mahout.math.RandomAccessSparseVector, >>>>>>> value: {3:1.0,2:1.0}) >>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>> java.lang.Object) >>>>>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})); not retrying >>>>>>> [ERROR] [TaskSetManager] Task 3.0 in stage 10.0 (TID 25) had a not >>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>>>> Serialization stack: >>>>>>> - object not serializable (class: >>>>>>> org.apache.mahout.math.RandomAccessSparseVector, >>>>>>> value: {0:1.0,3:1.0}) >>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>> java.lang.Object) >>>>>>> - object (class scala.Tuple2, (3,{0:1.0,3:1.0})); not retrying >>>>>>> [ERROR] [TaskSetManager] Task 1.0 in stage 10.0 (TID 23) had a not >>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>>>> Serialization stack: >>>>>>> - object not serializable (class: >>>>>>> org.apache.mahout.math.RandomAccessSparseVector, >>>>>>> value: {1:1.0}) >>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>> java.lang.Object) >>>>>>> - object (class scala.Tuple2, (1,{1:1.0})); not retrying >>>>>>> [ERROR] [TaskSetManager] Task 0.0 in stage 10.0 (TID 22) had a not >>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>>>> Serialization stack: >>>>>>> - object not serializable (class: >>>>>>> org.apache.mahout.math.RandomAccessSparseVector, >>>>>>> value: {0:1.0}) >>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>> java.lang.Object) >>>>>>> - object (class scala.Tuple2, (0,{0:1.0})); not retrying >>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job >>>>>>> aborted due to stage failure: Task 2.0 in stage 10.0 (TID 24) had a not >>>>>>> serializable result: org.apache.mahout.math.RandomAccessSparseVector >>>>>>> Serialization stack: >>>>>>> - object not serializable (class: >>>>>>> org.apache.mahout.math.RandomAccessSparseVector, >>>>>>> value: {3:1.0,2:1.0}) >>>>>>> - field (class: scala.Tuple2, name: _2, type: class >>>>>>> java.lang.Object) >>>>>>> - object (class scala.Tuple2, (2,{3:1.0,2:1.0})) >>>>>>> at org.apache.spark.scheduler.DAGScheduler.org >>>>>>> <http://org.apache.spark.scheduler.dagscheduler.org/>$apache$ >>>>>>> spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DA >>>>>>> GScheduler.scala:1431) >>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>>>>>> 1.apply(DAGScheduler.scala:1419) >>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ >>>>>>> 1.apply(DAGScheduler.scala:1418) >>>>>>> at scala.collection.mutable.ResizableArray$class.foreach(Resiza >>>>>>> bleArray.scala:59) >>>>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer. >>>>>>> scala:47) >>>>>>> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu >>>>>>> ler.scala:1418) >>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>>>>>> etFailed$1.apply(DAGScheduler.scala:799) >>>>>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS >>>>>>> etFailed$1.apply(DAGScheduler.scala:799) >>>>>>> at scala.Option.foreach(Option.scala:236) >>>>>>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >>>>>>> DAGScheduler.scala:799) >>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn >>>>>>> Receive(DAGScheduler.scala:1640) >>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>>>>>> ceive(DAGScheduler.scala:1599) >>>>>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe >>>>>>> ceive(DAGScheduler.scala:1588) >>>>>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala: >>>>>>> 48) >>>>>>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler. >>>>>>> scala:620) >>>>>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) >>>>>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala: >>>>>>> 1088) >>>>>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>>>>> onScope.scala:150) >>>>>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>>>>>> onScope.scala:111) >>>>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) >>>>>>> at org.apache.spark.rdd.RDD.fold(RDD.scala:1082) >>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.com >>>>>>> <http://s.drm.checkpointeddrmspark.com/> >>>>>>> puteNRow(CheckpointedDrmSpark.scala:188) >>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark. >>>>>>> nrow$lzycompute(CheckpointedDrmSpark.scala:55) >>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark. >>>>>>> nrow(CheckpointedDrmSpark.scala:55) >>>>>>> at org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark.new >>>>>>> RowCardinality(CheckpointedDrmSpark.scala:219) >>>>>>> at com.actionml.IndexedDatasetSpark$.apply(Preparator.scala:213) >>>>>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:71) >>>>>>> at com.actionml.Preparator$$anonfun$3.apply(Preparator.scala:49) >>>>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply( >>>>>>> TraversableLike.scala:244) >>>>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply( >>>>>>> TraversableLike.scala:244) >>>>>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>>>>> at scala.collection.TraversableLike$class.map(TraversableLike. >>>>>>> scala:244) >>>>>>> at scala.collection.AbstractTraversable.map(Traversable.scala: >>>>>>> 105) >>>>>>> at com.actionml.Preparator.prepare(Preparator.scala:49) >>>>>>> at com.actionml.Preparator.prepare(Preparator.scala:32) >>>>>>> at org.apache.predictionio.controller.PPreparator.prepareBase( >>>>>>> PPreparator.scala:37) >>>>>>> at org.apache.predictionio.controller.Engine$.train(Engine. >>>>>>> scala:671) >>>>>>> at org.apache.predictionio.controller.Engine.train(Engine. >>>>>>> scala:177) >>>>>>> at org.apache.predictionio.workflow.CoreWorkflow$.runTrain( >>>>>>> CoreWorkflow.scala:67) >>>>>>> at org.apache.predictionio.workflow.CreateWorkflow$.main(Create >>>>>>> Workflow.scala:250) >>>>>>> at org.apache.predictionio.workflow.CreateWorkflow.main(CreateW >>>>>>> orkflow.scala) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>>>>> ssorImpl.java:62) >>>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>>>>> thodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >>>>>>> $SparkSubmit$$runMain(SparkSubmit.scala:731) >>>>>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >>>>>>> .scala:181) >>>>>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit. >>>>>>> scala:206) >>>>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: >>>>>>> 121) >>>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>>>> >>>>>>> *===========================================================* >>>>>>> Thank you all for your help. >>>>>>> >>>>>>> Best regards, >>>>>>> noelia >>>>>>> >>>>>>> >>>>>> > > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "actionml-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to actionml-user+unsubscr...@googlegroups.com. > To post to this group, send email to actionml-u...@googlegroups.com. > To view this discussion on the web visit https://groups.google. > com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_ > xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com > <https://groups.google.com/d/msgid/actionml-user/CAMysefsW%3DeYPjUE1pc67C9D312HL_xNMtzmStUwDsUdHCCVU-Q%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "actionml-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to actionml-user+unsubscr...@googlegroups.com. > To post to this group, send email to actionml-u...@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/actionml-user/ACE11A1B-C887-41F1-820B-3B161EDCDABA% > 40occamsmachete.com > <https://groups.google.com/d/msgid/actionml-user/ACE11A1B-C887-41F1-820B-3B161EDCDABA%40occamsmachete.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- <http://www.vicomtech.org> Noelia Osés Fernández, PhD Senior Researcher | Investigadora Senior no...@vicomtech.org +[34] 943 30 92 30 Data Intelligence for Energy and Industrial Processes | Inteligencia de Datos para Energía y Procesos Industriales <https://www.linkedin.com/company/vicomtech> <https://www.youtube.com/user/VICOMTech> <https://twitter.com/@Vicomtech_IK4> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es> Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>