We dump fairly big libsvm to compare against liblinear/libsvm...the following code dumps out libsvm format from SparseVector...
def toLibSvm(features: SparseVector): String = { val indices = features.indices.map(_ + 1) val values = features.values indices.zip(values).mkString(" ").replace(',', ':').replace("(", "" ).replace(")", "") } On Wed, Sep 17, 2014 at 9:11 PM, Burak Yavuz <bya...@stanford.edu> wrote: > Hi, > > The spacing between the inputs should be a single space, not a tab. I feel > like your inputs have tabs between them instead of a single space. > Therefore the parser > cannot parse the input. > > Best, > Burak > > ----- Original Message ----- > From: "Sameer Tilak" <ssti...@live.com> > To: user@spark.apache.org > Sent: Wednesday, September 17, 2014 7:25:10 PM > Subject: MLLib: LIBSVM issue > > Hi All,We have a fairly large amount of sparse data. I was following the > following instructions in the manual: > Sparse dataIt is very common in practice to have sparse training data. > MLlib supports reading training examples stored in LIBSVM format, which is > the default format used by LIBSVM and LIBLINEAR. It is a text format in > which each line represents a labeled sparse feature vector using the > following format:label index1:value1 index2:value2 ... > import org.apache.spark.mllib.regression.LabeledPointimport > org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD > val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, > "data/mllib/sample_libsvm_data.txt") > I believe that I have formatted my data as per the required Libsvm format. > Here is a snippet of that: > 1 122:1 1693:1 1771:1 1974:1 2334:1 > 2378:1 2562:1 1 118:1 1389:1 1413:1 > 1454:1 1780:1 2562:1 5051:1 5417:1 > 5548:1 5798:1 5862:1 0 150:1 214:1 > 468:1 1013:1 1078:1 1092:1 1117:1 > 1489:1 1546:1 1630:1 1635:1 1827:1 > 2024:1 2215:1 2478:1 2761:1 5985:1 > 6115:1 6218:1 0 251:1 5578:1 > However,When I use MLUtils.loadLibSVMFile(sc, "path-to-data-file")I get > the following error messages in mt spark-shell. Can someone please point me > in right direction. > java.lang.NumberFormatException: For input string: "150:1 214:1 > 468:1 1013:1 1078:1 1092:1 1117:1 > 1489:1 1546:1 1630:1 1635:1 1827:1 > 2024:1 2215:1 2478:1 2761:1 5985:1 > 6115:1 6218:1" at > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) > at java.lang.Double.parseDouble(Double.java:540) at > scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232) > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >