I triied .option("quote", "\""), which I believe is the default, still the
same error. This is the offending record.

Primo 4-In-1 Soft Seat Toilet Trainer and Step Stool White with Pastel Blue
Seat,"I chose this potty for my son because of the good reviews. I do not
like it. I'm honestly baffled by all the great reviews now that I have this
thing in front of me.1)It is made of cheap material, feels flimsy, the
grips on the bottom of the thing do nothing to keep it in place when the
child sits on it.2)It comes apart into 5 or 6 different pieces and all my
son likes to do is take it apart. I did not want a potty that would turn
into a toy, and this has just become like a puzzle for him, with all the
different pieces.3)It is a little big for him. He is young still but he's a
big boy for his age. I looked at one of the pictures posted and he looks
about the same size as the curly haired kid reading the book, but the potty
in that picture is NOT this potty! This one is a little bigger and he can't
get quite touch his feet on the ground, which is important.4)And one final
thing, maybe most importantly, the ""soft"" seat is not so soft. Doesn't
seem very comfortable to me. It's just plastic on top of plastic... and
after my son sits on it for just a few minutes his butt has horrible red
marks all over it! Definitely not comfortable.So, overall, i'm not
impressed at all.I gave it 2 stars because... it gets the job done I
suppose, and for a child a little bit older than my son it might fit a
little better. Also I really liked the idea that it was 4-in-1.Overall
though, I do not suggest getting this potty. Look elseware!It's probably
best to actually go to a store and look at them first hand, and not order
online. That's what I should have done in the first place.",2

On Sat, Nov 19, 2016 at 10:59 PM, Meeraj Kunnumpurath <
mee...@servicesymphony.com> wrote:

> Digging through it looks like an issue with reading CSV. Some of the data
> have embedded commas in them, these fields are rightly quoted. However, the
> CSV reader seems to be getting to a pickle, when the records contain quoted
> and unquoted data. Fields are only quoted, when there are commas within the
> fields, otherwise they are unquoted.
>
> Regards
> Meeraj
>
> On Sat, Nov 19, 2016 at 10:10 PM, Meeraj Kunnumpurath <
> mee...@servicesymphony.com> wrote:
>
>> Hello,
>>
>> I have the following code that trains a mapping of review text to
>> ratings. I use a tokenizer to get all the words from the review, and use a
>> count vectorizer to get all the words. However, when I train the classifier
>> I get a match error. Any pointers will be very helpful.
>>
>> The code is below,
>>
>> val spark = SparkSession.builder().appName("Logistic 
>> Regression").master("local").getOrCreate()
>> import spark.implicits._
>>
>> val df = spark.read.option("header", "true").option("inferSchema", 
>> "true").csv("data/amazon_baby.csv")
>> val tk = new Tokenizer().setInputCol("review").setOutputCol("words")
>> val cv = new CountVectorizer().setInputCol("words").setOutputCol("features")
>>
>> val isGood = udf((x: Int) => if (x >= 4) 1 else 0)
>>
>> val words = tk.transform(df.withColumn("label", isGood('rating)))
>> val Array(training, test) = 
>> cv.fit(words).transform(words).randomSplit(Array(0.8, 0.2), 1)
>>
>> val classifier = new LogisticRegression()
>>
>> training.show(10)
>>
>> val simpleModel = classifier.fit(training)
>> simpleModel.evaluate(test).predictions.select("words", "label", 
>> "prediction", "probability").show(10)
>>
>>
>> And the error I get is below.
>>
>> 16/11/19 22:06:45 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID
>> 9)
>> scala.MatchError: [null,1.0,(257358,[0,1,2,3,4,5
>> ,6,7,8,9,10,13,15,16,20,25,27,29,34,37,40,42,45,48,49,52,58,
>> 68,71,76,77,86,89,93,98,99,100,108,109,116,122,124,129,169,
>> 208,219,221,235,249,255,260,353,355,371,431,442,641,711,
>> 972,1065,1411,1663,1776,1925,2596,2957,3355,3828,4860,6288,
>> 7294,8951,9758,12203,18319,21779,48525,72732,75420,146476
>> ,192184],[3.0,8.0,1.0,1.0,4.0,2.0,7.0,4.0,2.0,1.0,1.0,2.0,1.
>> 0,4.0,3.0,1.0,1.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,1.
>> 0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.
>> 0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.
>> 0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.
>> 0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])] (of class
>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>> at org.apache.spark.ml.classification.LogisticRegression$$
>> anonfun$6.apply(LogisticRegression.scala:266)
>> at org.apache.spark.ml.classification.LogisticRegression$$
>> anonfun$6.apply(LogisticRegression.scala:266)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at org.apache.spark.storage.memory.MemoryStore.putIteratorAsVal
>> ues(MemoryStore.scala:214)
>> at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator
>> $1.apply(BlockManager.scala:919)
>> at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator
>> $1.apply(BlockManager.scala:910)
>> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
>> at org.apache.spark.storage.BlockManager.doPutIterator(BlockMan
>> ager.scala:910)
>> at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockM
>> anager.scala:668)
>> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
>>
>> Many thanks
>> --
>> *Meeraj Kunnumpurath*
>>
>>
>> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
>>
>> *00 971 50 409 0169mee...@servicesymphony.com
>> <mee...@servicesymphony.com>*
>>
>
>
>
> --
> *Meeraj Kunnumpurath*
>
>
> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
>
> *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*
>



-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*

Reply via email to