Re: Getting the type of an RDD in spark AND pyspark

2014-09-06 Thread Aaron Davidson
Pretty easy to do in Scala: rdd.elementClassTag.runtimeClass You can access this method from Python as well by using the internal _jrdd. It would look something like this (warning, I have not tested it): rdd._jrdd.classTag().runtimeClass() (The method name is classTag for JavaRDDLike, and

Re: Getting the type of an RDD in spark AND pyspark

2014-09-06 Thread Davies Liu
But you can not get what you expected in PySpark, because the RDD in Scala is serialized, so it will always be RDD[Array[Byte]], whatever the type of RDD in Python is. Davies On Sat, Sep 6, 2014 at 4:09 AM, Aaron Davidson ilike...@gmail.com wrote: Pretty easy to do in Scala:

Getting the type of an RDD in spark AND pyspark

2014-09-04 Thread esamanas
Hi, I'm new to spark and scala, so apologies if this is obvious. Every RDD appears to be typed, which I can see by seeing the output in the spark-shell when I execute 'take': scala val t = sc.parallelize(Array(1,2,3)) t: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at