Alberto Bonsanto created SPARK-11992: ----------------------------------------
Summary: Severl numbers in my spark shell (pyspark) Key: SPARK-11992 URL: https://issues.apache.org/jira/browse/SPARK-11992 Project: Spark Issue Type: Question Components: MLlib, PySpark Affects Versions: 1.5.2 Environment: Linux Ubuntu 14.04 LTS Jupyter Spark 1.5.2 Reporter: Alberto Bonsanto Priority: Blocker The problem is very weird, I am currently trying to fit some classifiers from mllib library (SVM, LogisticRegression, RandomForest, DecisionTree and NaiveBayes), so they might classify the data properly, I am trying to compare their performances evaluating their predictions using my current validation data (the typical pipeline), and the problem is that when I try to fit any of those, my spark-shell console prints millions and millions of entries, and after that the fitting process gets stopped, you can see it [here|http://i.imgur.com/mohLnwr.png] Some details: - My data has around 15M of entries. - I use LabeledPoints to represent each entry, where the features are SparseVectors and they have *104* features or dimensions. - I don't show many things in the console, [log4j.properties|https://gist.github.com/Bonsanto/c487624db805f56882b8] - The program is running locally in a computer with 16GB of RAM. I have already asked this, in StackOverflow, you can see it here [Crazy print|http://stackoverflow.com/questions/33807347/pyspark-shell-outputs-several-numbers-instead-of-the-loading-arrow] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org