Re: Map output statuses exceeds frameSize

2014-11-13 Thread pouryas
Anyone experienced this before? Any help would be appreciated -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Map-output-statuses-exceeds-frameSize-tp18783p18866.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Map output statuses exceeds frameSize

2014-11-12 Thread pouryas
Hey all I am doing a groupby on nearly 2TB of data and I am getting this error: 2014-11-13 00:25:30 ERROR org.apache.spark.MapOutputTrackerMasterActor - Map output statuses were 32163619 bytes which exceeds spark.akka.frameSize (10485760 bytes). org.apache.spark.SparkException: Map output

Re: S3 - Extra $_folder$ files for every directory node

2014-09-30 Thread pouryas
I would like to know a way for not adding those $_folder$ files to S3 as well. I can go ahead and delete them but it would be nice if Spark handles this for you. -- View this message in context:

Re: Re:

2014-09-25 Thread pouryas
I had similar problem writing to cassandra using the connector for cassandra. I am not sure whether this will work or not but I reduced the number of cores to 1 per machine and my job was stable. More explanation of my issue...

Spark Cassandra Connector Issue and performance

2014-09-24 Thread pouryas
Hey all I tried spark connector with Cassandra and I ran into a problem that I was blocked on for couple of weeks. I managed to find a solution to the problem but I am not sure whether it was a bug of the connector/spark or not. I had three tables in Cassandra (Running Cassandra on 5 node

Optimal Cluster Setup for Spark

2014-09-24 Thread pouryas
Hi there What is an optimal cluster setup for spark? Given X amount of resources, would you favour more worker nodes with less resources or less worker node with more resources. Is this application dependent? If so what are the things to consider, what are good practices? Cheers -- View this