Hi,

I modified the example code for logistic regression to compute the error in 
classification. Please see below. However the code is failing when it makes a 
call to:


labelsAndPreds.filter(lambda (v, p): v != p).count()


with the error message (something related to numpy or dot product):


File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/classification.py", 
line 65, in predict

    margin = _dot(x, self._coeff) + self._intercept

  File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/_common.py", line 
443, in _dot

    return vec.dot(target)

AttributeError: 'numpy.ndarray' object has no attribute 'dot'


FYI, I am running the code using spark-submit i.e.


./bin/spark-submit examples/src/main/python/mllib/logistic_regression2.py



The code is posted below if it will be useful in any way:


from math import exp

import sys
import time

from pyspark import SparkContext

from pyspark.mllib.classification import LogisticRegressionWithSGD
from pyspark.mllib.regression import LabeledPoint
from numpy import array


# Load and parse the data
def parsePoint(line):
    values = [float(x) for x in line.split(',')]
    if values[0] == -1:   # Convert -1 labels to 0 for MLlib
        values[0] = 0
    return LabeledPoint(values[0], values[1:])
    ?
sc = SparkContext(appName="PythonLR")
# start timing
start = time.time()
#start = time.clock()

data = sc.textFile("sWAMSpark_train.csv")
parsedData = data.map(parsePoint)

# Build the model
model = LogisticRegressionWithSGD.train(parsedData)

#load test data

testdata = sc.textFile("sWSpark_test.csv")
parsedTestData = testdata.map(parsePoint)

# Evaluating the model on test data
labelsAndPreds = parsedTestData.map(lambda p: (p.label, 
model.predict(p.features)))
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / 
float(parsedData.count())
print("Training Error = " + str(trainErr))
end = time.time()
print("Time is = " + str(end - start))







Reply via email to