Hi,
I modified the example code for logistic regression to compute the error in classification. Please see below. However the code is failing when it makes a call to: labelsAndPreds.filter(lambda (v, p): v != p).count() with the error message (something related to numpy or dot product): File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/classification.py", line 65, in predict margin = _dot(x, self._coeff) + self._intercept File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/_common.py", line 443, in _dot return vec.dot(target) AttributeError: 'numpy.ndarray' object has no attribute 'dot' FYI, I am running the code using spark-submit i.e. ./bin/spark-submit examples/src/main/python/mllib/logistic_regression2.py The code is posted below if it will be useful in any way: from math import exp import sys import time from pyspark import SparkContext from pyspark.mllib.classification import LogisticRegressionWithSGD from pyspark.mllib.regression import LabeledPoint from numpy import array # Load and parse the data def parsePoint(line): values = [float(x) for x in line.split(',')] if values[0] == -1: # Convert -1 labels to 0 for MLlib values[0] = 0 return LabeledPoint(values[0], values[1:]) ? sc = SparkContext(appName="PythonLR") # start timing start = time.time() #start = time.clock() data = sc.textFile("sWAMSpark_train.csv") parsedData = data.map(parsePoint) # Build the model model = LogisticRegressionWithSGD.train(parsedData) #load test data testdata = sc.textFile("sWSpark_test.csv") parsedTestData = testdata.map(parsePoint) # Evaluating the model on test data labelsAndPreds = parsedTestData.map(lambda p: (p.label, model.predict(p.features))) trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count()) print("Training Error = " + str(trainErr)) end = time.time() print("Time is = " + str(end - start))