You were using an old version of numpy, 1.4? I think this is fixed in
the latest master. Try to replace vec.dot(target) by numpy.dot(vec,
target), or use the latest master. -Xiangrui

On Mon, Jun 30, 2014 at 2:04 PM, Sam Jacobs <sam.jac...@us.abb.com> wrote:
> Hi,
>
>
> I modified the example code for logistic regression to compute the error in
> classification. Please see below. However the code is failing when it makes
> a call to:
>
>
> labelsAndPreds.filter(lambda (v, p): v != p).count()
>
>
> with the error message (something related to numpy or dot product):
>
>
> File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/classification.py",
> line 65, in predict
>
>     margin = _dot(x, self._coeff) + self._intercept
>
>   File "/opt/spark-1.0.0-bin-hadoop2/python/pyspark/mllib/_common.py", line
> 443, in _dot
>
>     return vec.dot(target)
>
> AttributeError: 'numpy.ndarray' object has no attribute 'dot'
>
>
> FYI, I am running the code using spark-submit i.e.
>
>
> ./bin/spark-submit examples/src/main/python/mllib/logistic_regression2.py
>
>
>
> The code is posted below if it will be useful in any way:
>
>
> from math import exp
>
> import sys
> import time
>
> from pyspark import SparkContext
>
> from pyspark.mllib.classification import LogisticRegressionWithSGD
> from pyspark.mllib.regression import LabeledPoint
> from numpy import array
>
>
> # Load and parse the data
> def parsePoint(line):
>     values = [float(x) for x in line.split(',')]
>     if values[0] == -1:   # Convert -1 labels to 0 for MLlib
>         values[0] = 0
>     return LabeledPoint(values[0], values[1:])
>
> sc = SparkContext(appName="PythonLR")
> # start timing
> start = time.time()
> #start = time.clock()
>
> data = sc.textFile("sWAMSpark_train.csv")
> parsedData = data.map(parsePoint)
>
> # Build the model
> model = LogisticRegressionWithSGD.train(parsedData)
>
> #load test data
>
> testdata = sc.textFile("sWSpark_test.csv")
> parsedTestData = testdata.map(parsePoint)
>
> # Evaluating the model on test data
> labelsAndPreds = parsedTestData.map(lambda p: (p.label,
> model.predict(p.features)))
> trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() /
> float(parsedData.count())
> print("Training Error = " + str(trainErr))
> end = time.time()
> print("Time is = " + str(end - start))
>
>
>
>
>
>
>

Reply via email to