This is what I have added in my code:
rdd.persist(StorageLevel.MEMORY_ONLY_SER())
conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
Do I compulsorily need to do anything via : spark.kryo.classesToRegister?
Or the above code sufficient to achieve performance gain usi
Hi, does someone has experience/knowledge on using
JavaPairRDD.treeAggregate?
Even sample code will be helpful.
Not many articles etc. available on web.
Thanks
Amit
Dear Spark Users
I have RDD.saveAsTextFile() statement which is generating many empty part-*
and .crc files.
I understand that the empty part-* files are due to number of partitions,
but still I would not like to generate either empty part-* or .crc files.
How to achieve this?
Thanks
tu
>
>
> On Sat, Oct 31, 2015 at 11:18 PM, ayan guha wrote:
>
>> My java knowledge is limited, but you may try with a hashmap and put RDDs
>> in it?
>>
>> On Sun, Nov 1, 2015 at 4:34 AM, amit tewari
>> wrote:
>>
>>> Thanks Ayan thats somethi
python equivalent:
>>
>> # In Driver
>> fileList=["/file1.txt","/file2.txt"]
>> rdd = []
>> for f in fileList:
>> rdd = jsc.textFile(f)
>> rdds.append(rdd)
>>
>>
>>
>> On Sat, Oct 31, 2015 at 11:09 PM,
Hi
I need the ability to be able to create RDDs programatically inside my
program (e.g. based on varaible number of input files).
Can this be done?
I need this as I want to run the following statement inside an iteration:
JavaRDD rdd1 = jsc.textFile("/file1.txt");
Thanks
Amit
Hi
I am struggling to find how to run a scala script on Datastax Spark.
(SPARK_HOME/bin/spark-shell -i test.scala is depricated)
I dont want to use the scala prompt.
Thanks
AT
Tue, Jun 9, 2015 at 1:54 PM, amit tewari
> wrote:
>
>> Actually the question was will keyBy() take accept multiple fields (eg
>> x(0), x(1)) as Key?
>>
>>
>> On Tue, Jun 9, 2015 at 1:07 PM, amit tewari
>> wrote:
>>
>>> Thanks Akhil, as you su
Actually the question was will keyBy() take accept multiple fields (eg
x(0), x(1)) as Key?
On Tue, Jun 9, 2015 at 1:07 PM, amit tewari wrote:
> Thanks Akhil, as you suggested, I have to go keyBy(route) as need the
> columns intact.
> But wil keyBy() take accept multiple fields (eg
(x(0) + x(1)*),x(2),x(3)))
>
> scala> input11.join(input22).take(10)
>
>
> PairFunctions basically requires RDD[K,V] and in your case its ((String,
> String), String, String). You can also look in keyBy if you don't want to
> concatenate your keys.
>
> Thanks
>
Thanks, but Spark 1.2 doesnt yet have DataFrame I guess?
Regards
Amit
On Tue, Jun 9, 2015 at 10:25 AM, Ted Yu wrote:
> join is operation of DataFrame
>
> You can call sc.createDataFrame(myRDD) to obtain DataFrame where sc is
> sqlContext
>
> Cheers
>
> On Mon, Jun
Hi Dear Spark Users
I am very new to Spark/Scala.
Am using Datastax (4.7/Spark 1.2.1) and struggling with following
error/issue.
Already tried options like import org.apache.spark.SparkContext._ or
explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions.
But error not resolved.
Help
I believe you got to set following
SPARK_HADOOP_VERSION=2.2.0 (or whatever your version is)
SPARK_YARN=true
then type sbt/sbt assembly
If you are using Maven to compile
mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean
package
Hope this helps
-A
On Fri, Apr 4, 2014 a
13 matches
Mail list logo