Hi Suela, (Please subscribe our user mailing list and send your questions there in the future.) For your case, each file contains a column of numbers. So you can use `sc.textFile` to read them first, zip them together, and then create labeled points:
val xx = sc.textFile("/path/to/ex2x.dat").map(x => Vectors.dense(_.toDouble)) val yy = sc.textFile("/path/to/ex2y.dat").map(_.toDouble) val examples = yy.zip(xx).map { case (y, x) => LabeledPoint(y, x) } Best, Xiangrui On Thu, May 29, 2014 at 2:35 AM, Suela Haxhi <suelaha...@gmail.com> wrote: > > Hello Xiangrui , > my name is Suela Haxhi. Let me ask you a little help. I find some difficulty > in uploading files in Mllib , namely: > Binary Classification ; > Linear Regression ; > ........ > > E.g. , the file " mllib / data / sample_svm_data.txt " contains the > following data : > 1 0 2.52078447201548 0 0 0 2.004684436494304 2.000347299268466 0 > 2.228387042742021 2.228387042742023 0 0 0 0 0 0 > 0 2.857738033247042 0 0 2.619965104088255 0 2.004684436494304 > 2.000347299268466 0 2.228387042742021 2.228387042742023 0 0 0 0 0 0 > > etc .... ...... > > I don't understand what are the input / output. > The problem comes when I want to load another type of dataset. E.g. , I want > to make a Binary Classification on the presence of a disease. > > For example, the estimated proffessor Andrew Ng, on courses in machine > learning explains: > > Download ex2Data.zip, and extract the files from the zip file.The files > Contain some example measurements of various heights for boys between the > ages of two and eights. The y-values are the heights Measured in meters, and > the x-values are the ages of the boys Corresponding to the heights. Each > height and age tuples constitutes one training example $ (x ^ {(i)}, y ^ > {(i)} $ in our dataset. = There are $ m $ 50 training examples, and you will > use them to develop a linear regression model . > In this problem, you'll Implement linear regression using gradient descent. > In Matlab / Octave, you can load the training set using the commands > x = load ( ' ex2x.dat ' ) ; > y = load ( ' ex2y.dat ' ) ; > > > > But, in Mllib, I can't figure out what these data mean (mllib / data / > sample_svm_data.txt). > And I don't know how to load another type of data set using the following > code: > > Binary Classification > import org.apache.spark.SparkContext > import org.apache.spark.mllib.classification.SVMWithSGD > import org.apache.spark.mllib.regression.LabeledPoint > > / / Load and parse the data file > > / / Run training algorithm to build the model > > / / Evaluate model on training examples and compute the training error > > > > Can you help me please? Thank you in advance. > > Best Regards > Suela Haxhi