Thank you, it was the escape character, option("escape", "\"")
Regards On Sat, Nov 19, 2016 at 11:10 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > I triied .option("quote", "\""), which I believe is the default, still the > same error. This is the offending record. > > Primo 4-In-1 Soft Seat Toilet Trainer and Step Stool White with Pastel > Blue Seat,"I chose this potty for my son because of the good reviews. I do > not like it. I'm honestly baffled by all the great reviews now that I have > this thing in front of me.1)It is made of cheap material, feels flimsy, the > grips on the bottom of the thing do nothing to keep it in place when the > child sits on it.2)It comes apart into 5 or 6 different pieces and all my > son likes to do is take it apart. I did not want a potty that would turn > into a toy, and this has just become like a puzzle for him, with all the > different pieces.3)It is a little big for him. He is young still but he's a > big boy for his age. I looked at one of the pictures posted and he looks > about the same size as the curly haired kid reading the book, but the potty > in that picture is NOT this potty! This one is a little bigger and he can't > get quite touch his feet on the ground, which is important.4)And one final > thing, maybe most importantly, the ""soft"" seat is not so soft. Doesn't > seem very comfortable to me. It's just plastic on top of plastic... and > after my son sits on it for just a few minutes his butt has horrible red > marks all over it! Definitely not comfortable.So, overall, i'm not > impressed at all.I gave it 2 stars because... it gets the job done I > suppose, and for a child a little bit older than my son it might fit a > little better. Also I really liked the idea that it was 4-in-1.Overall > though, I do not suggest getting this potty. Look elseware!It's probably > best to actually go to a store and look at them first hand, and not order > online. That's what I should have done in the first place.",2 > > On Sat, Nov 19, 2016 at 10:59 PM, Meeraj Kunnumpurath < > mee...@servicesymphony.com> wrote: > >> Digging through it looks like an issue with reading CSV. Some of the data >> have embedded commas in them, these fields are rightly quoted. However, the >> CSV reader seems to be getting to a pickle, when the records contain quoted >> and unquoted data. Fields are only quoted, when there are commas within the >> fields, otherwise they are unquoted. >> >> Regards >> Meeraj >> >> On Sat, Nov 19, 2016 at 10:10 PM, Meeraj Kunnumpurath < >> mee...@servicesymphony.com> wrote: >> >>> Hello, >>> >>> I have the following code that trains a mapping of review text to >>> ratings. I use a tokenizer to get all the words from the review, and use a >>> count vectorizer to get all the words. However, when I train the classifier >>> I get a match error. Any pointers will be very helpful. >>> >>> The code is below, >>> >>> val spark = SparkSession.builder().appName("Logistic >>> Regression").master("local").getOrCreate() >>> import spark.implicits._ >>> >>> val df = spark.read.option("header", "true").option("inferSchema", >>> "true").csv("data/amazon_baby.csv") >>> val tk = new Tokenizer().setInputCol("review").setOutputCol("words") >>> val cv = new CountVectorizer().setInputCol("words").setOutputCol("features") >>> >>> val isGood = udf((x: Int) => if (x >= 4) 1 else 0) >>> >>> val words = tk.transform(df.withColumn("label", isGood('rating))) >>> val Array(training, test) = >>> cv.fit(words).transform(words).randomSplit(Array(0.8, 0.2), 1) >>> >>> val classifier = new LogisticRegression() >>> >>> training.show(10) >>> >>> val simpleModel = classifier.fit(training) >>> simpleModel.evaluate(test).predictions.select("words", "label", >>> "prediction", "probability").show(10) >>> >>> >>> And the error I get is below. >>> >>> 16/11/19 22:06:45 ERROR Executor: Exception in task 0.0 in stage 8.0 >>> (TID 9) >>> scala.MatchError: [null,1.0,(257358,[0,1,2,3,4,5 >>> ,6,7,8,9,10,13,15,16,20,25,27,29,34,37,40,42,45,48,49,52,58, >>> 68,71,76,77,86,89,93,98,99,100,108,109,116,122,124,129,169,2 >>> 08,219,221,235,249,255,260,353,355,371,431,442,641,711,972, >>> 1065,1411,1663,1776,1925,2596,2957,3355,3828,4860,6288,7294, >>> 8951,9758,12203,18319,21779,48525,72732,75420,146476, >>> 192184],[3.0,8.0,1.0,1.0,4.0,2.0,7.0,4.0,2.0,1.0,1.0,2.0,1.0 >>> ,4.0,3.0,1.0,1.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,1.0 >>> ,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0 >>> ,1.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0 >>> ,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0 >>> ,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])] (of class >>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) >>> at org.apache.spark.ml.classification.LogisticRegression$$anonf >>> un$6.apply(LogisticRegression.scala:266) >>> at org.apache.spark.ml.classification.LogisticRegression$$anonf >>> un$6.apply(LogisticRegression.scala:266) >>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) >>> at org.apache.spark.storage.memory.MemoryStore.putIteratorAsVal >>> ues(MemoryStore.scala:214) >>> at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator >>> $1.apply(BlockManager.scala:919) >>> at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator >>> $1.apply(BlockManager.scala:910) >>> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) >>> at org.apache.spark.storage.BlockManager.doPutIterator(BlockMan >>> ager.scala:910) >>> at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockM >>> anager.scala:668) >>> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) >>> >>> Many thanks >>> -- >>> *Meeraj Kunnumpurath* >>> >>> >>> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* >>> >>> *00 971 50 409 0169mee...@servicesymphony.com >>> <mee...@servicesymphony.com>* >>> >> >> >> >> -- >> *Meeraj Kunnumpurath* >> >> >> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* >> >> *00 971 50 409 0169mee...@servicesymphony.com >> <mee...@servicesymphony.com>* >> > > > > -- > *Meeraj Kunnumpurath* > > > *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* > > *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>* > -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*