spark.kryo.classesToRegister

2016-01-27 Thread amit tewari
This is what I have added in my code: rdd.persist(StorageLevel.MEMORY_ONLY_SER()) conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer"); Do I compulsorily need to do anything via : spark.kryo.classesToRegister? Or the above code sufficient to achieve performance gain usi

JavaPairRDD.treeAggregate

2015-11-25 Thread amit tewari
Hi, does someone has experience/knowledge on using JavaPairRDD.treeAggregate? Even sample code will be helpful. Not many articles etc. available on web. Thanks Amit

Avoid RDD.saveAsTextFile() generating empty part-* and .crc files

2015-11-04 Thread amit tewari
Dear Spark Users I have RDD.saveAsTextFile() statement which is generating many empty part-* and .crc files. I understand that the empty part-* files are due to number of partitions, but still I would not like to generate either empty part-* or .crc files. How to achieve this? Thanks

Re: Programatically create RDDs based on input

2015-11-02 Thread amit tewari
tu > > > On Sat, Oct 31, 2015 at 11:18 PM, ayan guha wrote: > >> My java knowledge is limited, but you may try with a hashmap and put RDDs >> in it? >> >> On Sun, Nov 1, 2015 at 4:34 AM, amit tewari >> wrote: >> >>> Thanks Ayan thats somethi

Re: Programatically create RDDs based on input

2015-10-31 Thread amit tewari
python equivalent: >> >> # In Driver >> fileList=["/file1.txt","/file2.txt"] >> rdd = [] >> for f in fileList: >> rdd = jsc.textFile(f) >> rdds.append(rdd) >> >> >> >> On Sat, Oct 31, 2015 at 11:09 PM,

Programatically create RDDs based on input

2015-10-31 Thread amit tewari
Hi I need the ability to be able to create RDDs programatically inside my program (e.g. based on varaible number of input files). Can this be done? I need this as I want to run the following statement inside an iteration: JavaRDD rdd1 = jsc.textFile("/file1.txt"); Thanks Amit

How to run scala script in Datastax Spark distribution?

2015-06-11 Thread amit tewari
Hi I am struggling to find how to run a scala script on Datastax Spark. (SPARK_HOME/bin/spark-shell -i test.scala is depricated) I dont want to use the scala prompt. Thanks AT

Re: Spark error "value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]"

2015-06-09 Thread amit tewari
Tue, Jun 9, 2015 at 1:54 PM, amit tewari > wrote: > >> Actually the question was will keyBy() take accept multiple fields (eg >> x(0), x(1)) as Key? >> >> >> On Tue, Jun 9, 2015 at 1:07 PM, amit tewari >> wrote: >> >>> Thanks Akhil, as you su

Re: Spark error "value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]"

2015-06-09 Thread amit tewari
Actually the question was will keyBy() take accept multiple fields (eg x(0), x(1)) as Key? On Tue, Jun 9, 2015 at 1:07 PM, amit tewari wrote: > Thanks Akhil, as you suggested, I have to go keyBy(route) as need the > columns intact. > But wil keyBy() take accept multiple fields (eg

Re: Spark error "value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]"

2015-06-09 Thread amit tewari
(x(0) + x(1)*),x(2),x(3))) > > scala> input11.join(input22).take(10) > > > PairFunctions basically requires RDD[K,V] and in your case its ((String, > String), String, String). You can also look in keyBy if you don't want to > concatenate your keys. > > Thanks >

Re: Spark error "value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]"

2015-06-08 Thread amit tewari
Thanks, but Spark 1.2 doesnt yet have DataFrame I guess? Regards Amit On Tue, Jun 9, 2015 at 10:25 AM, Ted Yu wrote: > join is operation of DataFrame > > You can call sc.createDataFrame(myRDD) to obtain DataFrame where sc is > sqlContext > > Cheers > > On Mon, Jun

Spark error "value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]"

2015-06-08 Thread amit tewari
Hi Dear Spark Users I am very new to Spark/Scala. Am using Datastax (4.7/Spark 1.2.1) and struggling with following error/issue. Already tried options like import org.apache.spark.SparkContext._ or explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions. But error not resolved. Help

Re: Hadoop 2.X Spark Client Jar 0.9.0 problem

2014-04-04 Thread Amit Tewari
I believe you got to set following SPARK_HADOOP_VERSION=2.2.0 (or whatever your version is) SPARK_YARN=true then type sbt/sbt assembly If you are using Maven to compile mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package Hope this helps -A On Fri, Apr 4, 2014 a