You were using an old version of numpy, 1.4? I think this is fixed in the latest master. Try to replace vec.dot(target) by numpy.dot(vec, target), or use the latest master. -Xiangrui
On Mon, Jun 30, 2014 at 2:04 PM, Sam Jacobs <sam.jac...@us.abb.com> wrote: > Hi, > > > I modified the example code for logistic regression to compute the error in > classification. Please see below. However the code is failing when it makes > a call to: > > > labelsAndPreds.filter(lambda (v, p): v != p).count() > > > with the error message (something related to numpy or dot product): > > > File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/classification.py", > line 65, in predict > > margin = _dot(x, self._coeff) + self._intercept > > File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/_common.py", line > 443, in _dot > > return vec.dot(target) > > AttributeError: 'numpy.ndarray' object has no attribute 'dot' > > > FYI, I am running the code using spark-submit i.e. > > > ./bin/spark-submit examples/src/main/python/mllib/logistic_regression2.py > > > > The code is posted below if it will be useful in any way: > > > from math import exp > > import sys > import time > > from pyspark import SparkContext > > from pyspark.mllib.classification import LogisticRegressionWithSGD > from pyspark.mllib.regression import LabeledPoint > from numpy import array > > > # Load and parse the data > def parsePoint(line): > values = [float(x) for x in line.split(',')] > if values[0] == -1: # Convert -1 labels to 0 for MLlib > values[0] = 0 > return LabeledPoint(values[0], values[1:]) > > sc = SparkContext(appName="PythonLR") > # start timing > start = time.time() > #start = time.clock() > > data = sc.textFile("sWAMSpark_train.csv") > parsedData = data.map(parsePoint) > > # Build the model > model = LogisticRegressionWithSGD.train(parsedData) > > #load test data > > testdata = sc.textFile("sWSpark_test.csv") > parsedTestData = testdata.map(parsePoint) > > # Evaluating the model on test data > labelsAndPreds = parsedTestData.map(lambda p: (p.label, > model.predict(p.features))) > trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / > float(parsedData.count()) > print("Training Error = " + str(trainErr)) > end = time.time() > print("Time is = " + str(end - start)) > > > > > > >