Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
Please ignore this whole thread. It's working out of nowhere. I'm not sure what was the root cause. After I restarted the VM the previous SIFT code also started working. On Fri, Jun 5, 2015 at 10:40 PM, Sam Stoelinga sammiest...@gmail.com wrote: Thanks Davies. I will file a bug later with code

Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
? If the bytes came from sequenceFile() is broken, it's easy to crash a C library in Python (OpenCV). On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com wrote: Hi sparkers, I am working on a PySpark application which uses the OpenCV library. It runs fine when running

Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
.COLOR_BGR2GRAY) sift = cv2.xfeatures2d.SIFT_create() kp, descriptors = sift.detectAndCompute(gray, None) return (imgfilename, test) And corresponding tests.py: https://gist.github.com/samos123/d383c26f6d47d34d32d6 On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga sammiest...@gmail.com wrote

PySpark with OpenCV causes python worker to crash

2015-05-28 Thread Sam Stoelinga
This is the error message taken from STDERR of the worker log: https://gist.github.com/samos123/3300191684aee7fc8013 Would like pointers or tips on how to debug further? Would be nice to know the reason why the worker crashed. Thanks, Sam Stoelinga org.apache.spark.SparkException: Python worker exited

MLib KMeans on large dataset issues

2015-04-29 Thread Sam Stoelinga
. Looking forward to hear you point out my stupidity or provide work-arounds that could make Spark KMeans work well on large datasets. Regards, Sam Stoelinga

Re: MLib KMeans on large dataset issues

2015-04-29 Thread Sam Stoelinga
PM, Jeetendra Gangele gangele...@gmail.com wrote: How you are passing feature vector to K means? its in 2-D space of 1-D array? Did you try using Streaming Kmeans? will you be able to paste code here? On 29 April 2015 at 17:23, Sam Stoelinga sammiest...@gmail.com wrote: Hi Sparkers, I

Re: MLib KMeans on large dataset issues

2015-04-29 Thread Sam Stoelinga
Guys, great feedback by pointing out my stupidity :D Rows and columns got intermixed hence the weird results I was seeing. Ignore my previous issues will reformat my data first. On Wed, Apr 29, 2015 at 8:47 PM, Sam Stoelinga sammiest...@gmail.com wrote: I'm mostly using example code, see here

monit with spark

2015-02-15 Thread Mike Sam
We want to monitor spark master and spark slaves using monit but we want to use the sbin scripts to do so. The scripts create the spark master and salve processes independent from themselves so monit would not know the started processed pid to watch. Is this correct? Should we watch the ports?

Strategy to automatically configure spark workers env params in standalone mode

2015-02-14 Thread Mike Sam
We are planning to use varying servers spec (32 GB, 64GB, 244GB RAM or even higher and varying cores) for an standalone deployment of spark but we do not know the spec of the server ahead of time and we need to script up some logic that will run on the server on boot and automatically set the

Re: Spark (Streaming?) holding on to Mesos resources

2015-01-27 Thread Sam Bessalah
Hi Geraard, isn't this the same issueas this? https://issues.apache.org/jira/browse/MESOS-1688 On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi, We are observing with certain regularity that our Spark jobs, as Mesos framework, are hoarding resources and not

Spark response times for queries seem slow

2015-01-05 Thread Sam Flint
if there is a configuration that needs to be tweaked or if this is expected response time. Machines are 30g RAM and 4 cores. Seems the CPU's are just getting pegged and that is what is taking so long. Any help on this would be amazing. Thanks, -- *MAGNE**+**I**C* *Sam Flint* | *Lead Developer, Data

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-01-05 Thread Sam Flint
) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) -- *MAGNE**+**I**C* *Sam Flint* | *Lead Developer, Data Analytics*

Strange results of running Spark GenSort.scala

2014-12-28 Thread Sam Liu
. Why? Thanks! Sam Liu

Actor System Corrupted!

2014-12-10 Thread Stephen Samuel (Sam)
Hi all, Having a strange issue that I can't find any previous issues for on the mailing list or stack overflow. Frequently we are getting ACTOR SYSTEM CORRUPTED!! A Dispatcher can't have less than 0 inhabitants! with a stack trace, from akka, in the executor logs, and the executor is marked as

Re: NEW to spark and sparksql

2014-11-20 Thread Sam Flint
that contains all the data. On Wed, Nov 19, 2014 at 2:46 PM, Sam Flint sam.fl...@magnetic.com wrote: Michael, Thanks for your help. I found a wholeTextFiles() that I can use to import all files in a directory. I believe this would be the case if all the files existed in the same directory

single worker vs multiple workers on each machine

2014-09-11 Thread Mike Sam
Hi There, I am new to Spark and I was wondering when you have so much memory on each machine of the cluster, is it better to run multiple workers with limited memory on each machine or is it better to run a single worker with access to the majority of the machine memory? If the answer is it

Why spark-submit command hangs?

2014-07-21 Thread Sam Liu
! Sam Liu

RE: Spark 1.0 and Logistic Regression Python Example

2014-07-01 Thread Sam Jacobs
Thanks Xiangrui, your suggestion fixed the problem. I will see if I can upgrade the numpy/python for a permanent fix. My current versions of python and numpy are 2.6 and 4.1.9 respectively. Thanks, Sam -Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Tuesday

Spark 1.0 and Logistic Regression Python Example

2014-06-30 Thread Sam Jacobs
Hi, I modified the example code for logistic regression to compute the error in classification. Please see below. However the code is failing when it makes a call to: labelsAndPreds.filter(lambda (v, p): v != p).count() with the error message (something related to numpy or dot product):

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread sam
Any idea when they will release it? Also I'm uncertain what we will need to do to fix the shell? Will we have to reinstall spark? or reinstall hadoop? (i'm not a devops so maybe this question sounds silly) -- View this message in context:

Re: spark on yarn fail with IOException

2014-06-04 Thread sam
I get a very similar stack trace and have no idea what could be causing it (see below). I've created a SO: http://stackoverflow.com/questions/24038908/spark-fails-on-big-jobs-with-java-io-ioexception-filesystem-closed 14/06/02 20:44:04 INFO client.AppClient$ClientActor: Executor updated:

Re: Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Sam Taylor Steyer
be much appreciated! Sam - Original Message - From: Krishna Sankar ksanka...@gmail.com To: user@spark.apache.org Sent: Wednesday, June 4, 2014 8:52:59 AM Subject: Re: Trouble launching EC2 Cluster with Spark One reason could be that the keys are in a different region. Need to create the keys

Re: Trouble launching EC2 Cluster with Spark

2014-06-04 Thread Sam Taylor Steyer
PM, Sam Taylor Steyer sste...@stanford.edu wrote: Also, once my friend logged in to his cluster he received the error Permissions 0644 for 'FinalKey.pem' are too open. This sounds like the other problem described. How do we make the permissions more private? Thanks very much, Sam

Apache Spark Throws java.lang.IllegalStateException: unread block data

2014-05-17 Thread sam
What we are doing is: 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster I've also included code snippets, and sbt

Re: Spark is slow

2014-04-21 Thread Sam Bessalah
Why don't start by explaining what kind of operation you're running on spark that's faster than hadoop mapred. Mybewe could start there. And yes this mailing is very busy since many people are getting into Spark, it's hard to answer to everyone. On 21 Apr 2014 20:23, Joe L selme...@yahoo.com

Re: [ann] Spark-NYC Meetup

2014-04-21 Thread Sam Bessalah
Sounds great François. On 21 Apr 2014 22:31, François Le Lay f...@spotify.com wrote: Hi everyone, This is a quick email to announce the creation of a Spark-NYC Meetup. We have 2 upcoming events, one at PlaceIQ, another at Spotify where Reynold Xin (Databricks) and Christopher Johnson

Re: worker keeps getting disassociated upon a failed job spark version 0.90

2014-03-22 Thread sam
I have this problem too. Eventually the job fails (on the UI) and hangs the terminal until I CTRL + C. (Logs below) Now the Spark docs explain the heartbeat configuration stuff can be tweaked to handle GC hangs. I'm wondering if this is symptomatic of pushing the cluster a little too hard (we

<    1   2