scala Vector vs mllib Vector

2014-10-03 Thread ll
what are the pros/cons of each? when should we use mllib Vector, and when to use standard scala Vector? thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/scala-Vector-vs-mllib-Vector-tp15736.html Sent from the Apache Spark User List mailing list arch

Re: android + spark streaming?

2014-10-03 Thread ll
any comment/feedback/advice on this is much appreciated! thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/android-spark-streaming-tp15661p15735.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Trouble getting filtering on field correct

2014-10-03 Thread Davies Liu
rdd.filter(lambda line: int(line.split(' ')[8]) >= 125) On Fri, Oct 3, 2014 at 8:16 PM, Chop wrote: > Given an RDD with multiple lines of the form: > > u'207.86.121.131 207.86.121.131 2012-11-27 13:02:17 titlestring 622592 27 > 184464' > (fields are separated by a " ") > > What pyspark function/

Re: Fwd: Breeze Library usage in Spark

2014-10-03 Thread DB Tsai
You dont have to include breeze jar which is already in spark assembly jar. For native one, its optional. Sent from my Google Nexus 5 On Oct 3, 2014 8:04 PM, "Priya Ch" wrote: > > > > yes. I have included breeze-0.9 in build.sbt file. I ll change this to > 0.7. Apart from this, do we need to inc

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Li HM
If I change it to org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider The error becomes: Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.hi

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Li HM
It won't work with org.apache.hadoop.hive.ql.security. HadoopDefaultAuthenticator. Just wonder how and why it works with you guys. Here is the new error: Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apac

Re: Null values in Date field only when RDD is saved as File.

2014-10-03 Thread manasdebashiskar
Correction to my question. (5) should read 5) save the tuple RDD(created at step 3) to HDFS using SaveAsTextFile. Can someone please guide me in the right direction? Thanks in advance Manas On Fri, Oct 3, 2014 at 11:42 PM, manasdebashiskar [via Apache Spark User List] wrote: > Correction to my

Re: Null values in Date field only when RDD is saved as File.

2014-10-03 Thread manasdebashiskar
Correction to my question. (5) should read 5) save the tuple RDD(created at step 3) to HDFS using SaveAsTextFile. Can someone please guide me in the right direction? Thanks in advance Manas - Manas Kar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/N

Trouble getting filtering on field correct

2014-10-03 Thread Chop
Given an RDD with multiple lines of the form: u'207.86.121.131 207.86.121.131 2012-11-27 13:02:17 titlestring 622592 27 184464' (fields are separated by a " ") What pyspark function/commands do I use to filter out those lines where line[8] >= x? (i.e line[8] >= 125) when I use line.split(" ") I

Null values in Date field only when RDD is saved as File.

2014-10-03 Thread Manas Kar
Hi, I am using a library that parses Ais Messages. My code which follows the simple steps gives me null values in Date field. 1) Get the message from file. 2) parse the message. 3) map the message RDD to only keep the (Date, SomeInfo) 4) take top 100 element. Result = the Date field appears fine

Re: pyspark on python 3

2014-10-03 Thread Josh Rosen
It would be great if we supported Python 3 and I'd be happy to review any pull requests to add it. I don't know that Python 3 is very widely-used, but I'm open to supporting it if it won't require too much work. By the way, we recently added support for PyPy: https://github.com/apache/spark/pull/

Re: pyspark on python 3

2014-10-03 Thread tomo cocoa
Hi, I prefer that PySpark can also be executed on Python 3. Do you have some reason or demand to use PySpark through Python3? If you create an issue on JIRA, I would try to resolve it. On 4 October 2014 06:47, Gen wrote: > According to the official site of spark, for the latest version of > s

Re: Spark inside Eclipse

2014-10-03 Thread Sanjay Subramanian
So some progress but still errors  object WordCount {  def main(args: Array[String]) {    if (args.length < 1) {      System.err.println("Usage: WordCount ")      System.exit(1)    }    val conf = new SparkConf().setMaster("local").setAppName(s"Whatever")    val sc = new SparkContext(conf);    

Spark Streaming writing to HDFS

2014-10-03 Thread Abraham Jacob
Hi All, Would really appreciate if someone in the community can help me with this. I have a simple Java spark streaming application - NetworkWordCount SparkConf sparkConf = new SparkConf().setMaster("yarn-cluster").setAppName("Streaming WordCount"); JavaStreamingContext jssc = ne

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Hmxxyy
No, it is hive 0.12.4. Let me try your suggestion. It is an existing hive db. I am using the original hive-site.xml as is. Sent from my iPhone > On Oct 3, 2014, at 5:02 PM, Edwin Chiu wrote: > > Are you using hive 0.13? > > Switching back to HadoopDefaultAuthenticator in your hive-site.xml w

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Edwin Chiu
Are you using hive 0.13? Switching back to HadoopDefaultAuthenticator in your hive-site.xml worth a shot hive.security.authenticator.manager org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator - Edwin On Fri, Oct 3, 2014 at 4:25 PM, Li HM wrote: > If I

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Li HM
If I don't have that jar, I am getting the following error: xception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hcatalog.security.HdfsAuthorizationProvider at org.apache.hadoop.hive.ql.session.SessionS

Re: MLlib Collaborative Filtering failed to run with rank 1000

2014-10-03 Thread Xiangrui Meng
It would be really helpful if you can help test the scalability of the new ALS impl: https://github.com/mengxr/spark-als/blob/master/src/main/scala/org/apache/spark/ml/SimpleALS.scala . It should be faster and more scalable, but the code is messy now. Best, Xiangrui On Fri, Oct 3, 2014 at 11:57

any good library to implement multilabel classification on spark?

2014-10-03 Thread critikaled
Hi, Going through spark mllib doc I have noticed that it supports multiclass classification can any body help me in implementing multilabel classification on spark like in " Mulan " and " Meka " libraries. -- View this me

My task is finished successfully, however, I find some exceptions in webpage.

2014-10-03 Thread Tim Chou
Hi All, Sorry to disturb you. I have built a spark cluster based on mesos. I run some tests on spark shell. It works. However I can find some exceptions in webpage. scala> val textFile = sc.textFile("hdfs://10.1.2.12:9000/README.md") scala> textFile.count() 14/10/03 15:20:54 INFO mapred.FileInpu

Accumulator question

2014-10-03 Thread Nathan Kronenfeld
I notice that accumulators register themselves with a private Accumulators object. I don't notice any way to unregister them when one is done. Am I missing something? If not, is there any plan for how to free up that memory? I've a case where we're gathering data from repeated queries using some

problem with user@spark.apache.org spam filter

2014-10-03 Thread Andy Davidson
Any idea why my email was returned with the following error message? Thanks Andy This is the mail system at host smtprelay06.hostedemail.com. I'm sorry to have to inform you that your message could not be delivered to one or more recipients. It's attached below. For further assistance, please

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Michael Armbrust
Why are you including hcatalog-core.jar? That is probably causing the issues. On Fri, Oct 3, 2014 at 3:03 PM, Li HM wrote: > This is my SPARK_CLASSPATH after cleanup > SPARK_CLASSPATH=/home/test/lib/hcatalog-core.jar:$SPARK_CLASSPATH > > now use mydb works. > > but "show tables" and "select * f

Re: [ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread tian zhang
Hi, Rohit, Thank you for sharing this good news. I have some relevant issue that I would like to ask your help. I am using spark 1.1.0 and I have a spark application using "com.tuplejump"% "calliope-core_2.10"% "1.1.0-CTP-U2", At runtime there are following errors that seem indicate that calli

Re: partitions number with variable number of cores

2014-10-03 Thread Gen
Maybe I am wrong, but how many resource that a spark application can use depends on the mode of deployment(the type of resource manager), you can take a look at https://spark.apache.org/docs/latest/job-scheduling.html . For your case, I

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Li HM
This is my SPARK_CLASSPATH after cleanup SPARK_CLASSPATH=/home/test/lib/hcatalog-core.jar:$SPARK_CLASSPATH now use mydb works. but "show tables" and "select * from test" still gives exception: spark-sql> show tables; OK java.io.IOException: java.io.IOException: Cannot create an instance of Input

Worker with no Executor (YARN client-mode)

2014-10-03 Thread jonathan.keebler
Hi all, We're running Spark 1.0 on CDH 5.1.2. We're using Spark in YARN-client mode. We're seeing that one of our nodes is not being assigned any tasks, and no resources (RAM,cpu) are being used on this node. In the CM UI this worker node is in good health and the spark Worker process is runnin

Re: pyspark on python 3

2014-10-03 Thread Gen
According to the official site of spark, for the latest version of spark(1.1.0), it does not work with python 3 Spark 1.1.0 works with Python 2.6 or higher (but not Python 3). It uses the standard CPython interpreter, so C libraries like NumPy can be used. -- View this message in context: htt

pyspark on python 3

2014-10-03 Thread Ariel Rokem
Hi everyone, What is the state of affairs w.r.t python 3? Is this still post still a good description of the situation? https://groups.google.com/forum/#!topic/spark-users/GRKmVo0ZDBc Thanks! Ariel

Re: Spark Monitoring with Ganglia

2014-10-03 Thread TANG Gen
Maybe you can follow the instruction in this link https://github.com/mesos/spark-ec2/tree/v3/ganglia . For me it works well -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Monitoring-with-Ganglia-t

Re: The question about mount ephemeral disk in slave-setup.sh

2014-10-03 Thread TANG Gen
I have taken a look at the code of mesos spark-ec2 and documentation of AWS. I think that maybe I found the answer. In fact, there are two types AMI in AWS EBS backed AMI and instance store backed AMI. For EBS backed AMI, we can add instance store volume when we create the images(The details can

Re: Handling tree reduction algorithm with Spark in parallel

2014-10-03 Thread Boromir Widas
Thanks Matei, will check out the MLLib implementation. On Wed, Oct 1, 2014 at 2:24 PM, Andy Twigg wrote: > Yes, that makes sense. It's similar to the all reduce pattern in vw. > > > On Wednesday, 1 October 2014, Matei Zaharia > wrote: > >> Some of the MLlib algorithms do tree reduction in 1.1:

Re: Akka "connection refused" when running standalone Scala app on Spark 0.9.2

2014-10-03 Thread Yana Kadiyska
I don't think it's a red herring... (btw. spark.driver.host needs to be set to the IP or FQDN of the machine where you're running the program). I am running 0.9.2 on CDH4 and the beginning of my executor log looks like below (I've obfuscated the IP -- this is the log from executor a100-2-200-245)

Re: window every n elements instead of time based

2014-10-03 Thread Michael Allman
Hi, I also have a use for count-based windowing. I'd like to process data batches by size as opposed to time. Is this feature on the development roadmap? Is there a JIRA ticket for it? Thank you, Michael -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/win

Re: [SparkSQL] Function parity with Shark?

2014-10-03 Thread Michael Armbrust
Thanks for digging in! These both look like they should have JIRAs. On Fri, Oct 3, 2014 at 8:14 AM, Yana Kadiyska wrote: > Thanks -- it does appear that I misdiagnosed a bit: case works generally > but it doesn't seem to like the bit operation, which does not seem to work > (type of bit_field i

Re: MLlib Collaborative Filtering failed to run with rank 1000

2014-10-03 Thread jw.cmu
Thanks, Xiangrui. I didn't check the test error yet. I agree that rank 1000 might overfit for this particular dataset. Currently I'm just running some scalability tests - I'm trying to see how large the model can be scaled to given a fixed amount of hardware. -- View this message in context:

Re: Akka "connection refused" when running standalone Scala app on Spark 0.9.2

2014-10-03 Thread Irina Fedulova
Yana, many thanks for looking into this! I am not running spark-shell in local mode, I am really starting spark-shell with --master spark://master:7077 and run in cluster mode. Second thing is I tried to set "spark.driver.host" to "master" both in scala app when creating context, and in conf/

Re: MLlib Collaborative Filtering failed to run with rank 1000

2014-10-03 Thread Xiangrui Meng
The current impl of ALS constructs least squares subproblems in memory. So for rank 100, the total memory it requires is about 480,189 * 100^2 / 2 * 8 bytes ~ 20GB, divided by the number of blocks. For rank 1000, this number goes up to 2TB, unfortunately. There is a JIRA for optimizing ALS: https:/

array size limit vs partition number

2014-10-03 Thread anny9699
Hi, Sorry I am not very familiar with Java. I found that if I set the RDD partition number to be higher, I meet this error message"java.lang.OutOfMemoryError: Requested array size exceeds VM limit"; however if I set the RDD partition number to be lower, the error is gone. My aws ec2 cluster has 7

[ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread Rohit Rai
Hi All, An year ago we started this journey and laid the path for Spark + Cassandra stack. We established the ground work and direction for Spark Cassandra connectors and we have been happy seeing the results. With Spark 1.1.0 and SparkSQL release, we its time to take Calliope

Fwd: Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
yes. I have included breeze-0.9 in build.sbt file. I ll change this to 0.7. Apart from this, do we need to include breeze jars explicitly in the spark context as sc.addJar() ? and what about the dependencies netlib-native_ref-linux- x86_64-1.1-natives.jar, netlib-native_system-linux-x86_64-1.

MLlib Collaborative Filtering failed to run with rank 1000

2014-10-03 Thread jw.cmu
I was able to run collaborative filtering with low rank numbers, like 20~160 on the netflix dataset, but it fails due to the following error when I set the rank to 1000: 14/10/03 03:27:36 WARN TaskSetManager: Loss was due to java.lang.IllegalArgumentException java.lang.IllegalArgumentException: Si

Re: HiveContext: cache table not supported for partitioned table?

2014-10-03 Thread Du Li
Thanks for your explanation. From: Cheng Lian mailto:lian.cs@gmail.com>> Date: Thursday, October 2, 2014 at 8:01 PM To: Du Li mailto:l...@yahoo-inc.com.INVALID>>, "d...@spark.apache.org" mailto:d...@spark.apache.org>> Cc: "user@spark.apache.org

Re: Breeze Library usage in Spark

2014-10-03 Thread Xiangrui Meng
Did you add a different version of breeze to the classpath? In Spark 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze version you used is different from the one comes with Spark, you might see class not found. -Xiangrui On Fri, Oct 3, 2014 at 4:22 AM, Priya Ch wrote: > Hi Team,

Using GraphX with Spark Streaming?

2014-10-03 Thread Arko Provo Mukherjee
Hello Spark Gurus, I am trying to learn Spark. I am specially interested in GraphX. Since Spark can used in streaming context as well, I wanted to know whether it is possible to use the Spark Toolkits like GraphX or MLlib in the streaming context? Apologies if this is a stupid question but I am

Re: Akka "connection refused" when running standalone Scala app on Spark 0.9.2

2014-10-03 Thread Yana Kadiyska
when you're running spark-shell and the example, are you actually specifying --master spark://master:7077 as shown here: http://spark.apache.org/docs/latest/programming-guide.html#initializing-spark because if you're not, your spark-shell is running in local mode and not actually connecting to the

Re: [SparkSQL] Function parity with Shark?

2014-10-03 Thread Yana Kadiyska
Thanks -- it does appear that I misdiagnosed a bit: case works generally but it doesn't seem to like the bit operation, which does not seem to work (type of bit_field in Hive is bigint): Error: java.lang.RuntimeException: Unsupported language features in query: select (case when bit_field & 1=1 th

Question about addFiles()

2014-10-03 Thread Tom Weber
Just getting started with Spark, so hopefully this is all there and I just haven't found it yet. I have a driver pgm on my client machine, I can use addFiles to distribute files to the remote worker nodes of the cluster. They are there to be found by my code running in the executors, so al is g

Re: Akka Connection refused - standalone cluster using spark-0.9.0

2014-10-03 Thread irina
Hi ssimanta, were you able to resolve the problem with failing standalone scala program, but spark repl working just fine? I am getting the same issue... Thanks, Irina -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Akka-Connection-refused-standalone-cluster

Re: Spark inside Eclipse

2014-10-03 Thread jay vyas
For intelliJ + SBT, also you can follow the directions http://jayunit100.blogspot.com/2014/07/set-up-spark-application-devleopment.html . ITs really easy to run spark in an IDE . The process for eclipse is virtually identical. On Fri, Oct 3, 2014 at 10:03 AM, Sanjay Subramanian < sanjaysubraman.

Re: Spark inside Eclipse

2014-10-03 Thread Sanjay Subramanian
cool thanks will set this up and report back how things wentregardssanjay From: Daniel Siegmann To: Ashish Jain Cc: Sanjay Subramanian ; "user@spark.apache.org" Sent: Thursday, October 2, 2014 6:52 AM Subject: Re: Spark inside Eclipse You don't need to do anything special to run

Re: Setup/Cleanup for RDD closures?

2014-10-03 Thread Sean Owen
Yes, though it's a little more complex than that: http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAPH-c_O9kQO6yJ4khXUVdO=+D4vj=JfG2tP9eqn5RPko=dr...@mail.gmail.com%3E On Fri, Oct 3, 2014 at 9:58 AM, Mayur Rustagi wrote: > Current approach is to use mappartition, initialize th

Re: how to debug ExecutorLostFailure

2014-10-03 Thread jamborta
digging a bit deeper on, the executors get lost when the memory gets close to the physical memory size: I'm not clear if I am allocating too much, or too less memory in this case. thanks, -- View this messag

How to save Spark log into file

2014-10-03 Thread arthur.hk.c...@gmail.com
Hi, How can the spark log be saved into file instead of showing them on console? Below is my conf/log4j.properties conf/log4j.properties ### # Root logger option log4j.rootLogger=INFO, file # Direct log messages to a log file log4j.appender.file=org.apache.log4j.RollingFileAppender #Redirect

Breeze Library usage in Spark

2014-10-03 Thread Priya Ch
Hi Team, When I am trying to use DenseMatrix of breeze library in spark, its throwing me the following error: java.lang.noclassdeffounderror: breeze/storage/Zero Can someone help me on this ? Thanks, Padma Ch

Re: Could Spark make use of Intel Xeon Phi?

2014-10-03 Thread 牛兆捷
What are the specific features of intel Xeon Phi that can be utilized by Spark? 2014-10-03 18:09 GMT+08:00 余 浪 : > Hi, > > I have set up Spark 1.0.2 on the cluster using standalone mode and the > input is managed by HDFS. One node of the cluster has Intel Xeon Phi 5110P > coprocessor. Is there an

Could Spark make use of Intel Xeon Phi?

2014-10-03 Thread 余 浪
Hi, I have set up Spark 1.0.2 on the cluster using standalone mode and the input is managed by HDFS. One node of the cluster has Intel Xeon Phi 5110P coprocessor. Is there any possibility that spark could be aware of Phi and run job on Xeon Phi? Do I have to modify the code of scheduler? Thank

The question about mount ephemeral disk in slave-setup.sh

2014-10-03 Thread TANG Gen
Hi, I am quite a new user of spark, and I have a stupid question about mount ephemeral disk for AWS EC2. If I well understand the spark_ec.py script, it is spark-ec2/setup-slave.sh that mounts the ephemeral disk for AWS EC2(Instance Store Volumes). However, in setup-slave.sh, it seems that these

Re: Any issues with repartition?

2014-10-03 Thread jamborta
I have two nodes with 96G ram 16 cores, my setup is as follows: conf = (SparkConf() .setMaster("yarn-cluster") .set("spark.executor.memory", "30G") .set("spark.cores.max", 32) .set("spark.executor.instances", 2) .set("spark.executor.c

Akka "connection refused" when running standalone Scala app on Spark 0.9.2

2014-10-03 Thread Irina Fedulova
Hi, I have set up Spark 0.9.2 standalone cluster using CDH5 and pre-built spark distribution archive for Hadoop 2. I was not using spark-ec2 scripts because I am not on EC2 cloud. Spark-shell seems to be working properly -- I am able to perform simple RDD operations, as well as e.g. SparkPi

Re: SparkSQL on Hive error

2014-10-03 Thread Cheng Lian
Also make sure to call |hiveContext.sql| within the same thread where |hiveContext| is created, because Hive uses thread-local variable to initialize the |Driver.conf|. On 10/3/14 4:52 PM, Michael Armbrust wrote: Are you running master? There was briefly a regression here that is hopefully f

Re: SparkSQL on Hive error

2014-10-03 Thread arthur.hk.c...@gmail.com
hi, I have just tested the same command, it works here, can you please provide your create table command? regards Arthur scala> hiveContext.hql("show tables") warning: there were 1 deprecation warning(s); re-run with -deprecation for details 2014-10-03 17:14:33,575 INFO [main] parse.ParseDriv

Re: Setup/Cleanup for RDD closures?

2014-10-03 Thread Mayur Rustagi
Current approach is to use mappartition, initialize the connection in the beginning, iterate through the data & close off the connector. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Fri, Oct 3, 2014 at 10:16 AM, Stephe

spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-03 Thread serkan.dogan
Hi, I installed hbase-0.98.6-hadoop2. It's working not any problem with that. When i am try to run spark hbase python examples, (wordcount examples working - not python issue) ./bin/spark-submit --master local --driver-class-path ./examples/target/spark-examples_2.10-1.1.0.jar ./examples/src

Re: How to make ./bin/spark-sql work with hive?

2014-10-03 Thread Michael Armbrust
Often java.lang.NoSuchMethodError means that you have more than one version of a library on your classpath, in this case it looks like hive. On Thu, Oct 2, 2014 at 8:44 PM, Li HM wrote: > I have rebuild package with -Phive > Copied hive-site.xml to conf (I am using hive-0.12) > > When I run ./bi

spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-03 Thread serkan.dogan
Hi, I installed hbase-0.98.6-hadoop2. It's working not any problem with that. When i am try to run spark hbase python examples, (wordcount examples working - not python issue) ./bin/spark-submit --master local --driver-class-path ./examples/target/spark-examples_2.10-1.1.0.jar ./examples/src

Re: SparkSQL on Hive error

2014-10-03 Thread Michael Armbrust
Are you running master? There was briefly a regression here that is hopefully fixed by spark#2635 . On Fri, Oct 3, 2014 at 1:43 AM, Kevin Paul wrote: > Hi all, I tried to launch my application with spark-submit, the command I > use is: > > bin/spark-su

SparkSQL on Hive error

2014-10-03 Thread Kevin Paul
Hi all, I tried to launch my application with spark-submit, the command I use is: bin/spark-submit --class ${MY_CLASS} --jars ${MY_JARS} --master local myApplicationJar.jar I've buillt spark with SPARK_HIVE=true, and was able to start HiveContext, and was able to run command like, hiveContext.sql

Re: Any issues with repartition?

2014-10-03 Thread Akhil Das
What is your cluster setup? and how much memory are you allocating to the executor? Thanks Best Regards On Fri, Oct 3, 2014 at 7:52 AM, jamborta wrote: > Hi Arun, > > Have you found a solution? Seems that I have the same problem. > > thanks, > > > > -- > View this message in context: > http://a

Re: new error for me

2014-10-03 Thread Akhil Das
I used to face this while running it on a single node machine and when i allocate more memory for the executor. (ie, my machine was 28Gb memory and i allocated 26Gb for the executor, dropping the memory from 26 to 20Gb solved my issue.). If you are seeing an executor lost exception then you can try

Re: Type problem in Java when using flatMapValues

2014-10-03 Thread Robin Keunen
Damn, you're right, I wasn't looking at it properly. I was confused by intelliJ I guess. Many thanks! On 2014-10-02 19:02, Sean Owen wrote: Eh, is it not that you are mapping the values of an RDD whose keys are StringStrings, but expecting the keys are Strings? That's also about what the comp