are super slow in spark. 100x slower than
hadoop
Sent from my iPhone
On 14-Jul-2015, at 10:59 PM, Wush Wu wush...@gmail.com wrote:
I don't understand.
By the way, the `joinWithCassandraTable` does improve my query time
from 40 mins to 3 mins.
2015-07-15 13:19 GMT+08:00 ÐΞ€ρ@Ҝ
Dear all,
I am trying to join two RDDs, named rdd1 and rdd2.
rdd1 is loaded from a textfile with about 33000 records.
rdd2 is loaded from a table in cassandra which has about 3 billions records.
I tried the following code:
```scala
val rdd1 : (String, XXX) = sc.textFile(...).map(...)
import
/datastax/spark-cassandra-connector/blob/v1.3.0-M2/doc/2_loading.md
Wush
2015-07-15 12:15 GMT+08:00 Wush Wu wush...@gmail.com:
Dear all,
I am trying to join two RDDs, named rdd1 and rdd2.
rdd1 is loaded from a textfile with about 33000 records.
rdd2 is loaded from a table in cassandra which has
, 2015 at 9:35 PM, Wush Wu wush...@gmail.com wrote:
Dear all,
I have found a post discussing the same thing:
https://groups.google.com/a/lists.datastax.com/forum/#!searchin/spark-connector-user/join/spark-connector-user/q3GotS-n0Wk/g-LPTteCEg0J
The solution is using joinWithCassandraTable
Dear all,
I am trying to upgrade the spark from 1.2 to 1.3 and switch the existed API
of creating SchemaRDD to DataFrame.
After testing, I notice that the following behavior is changed:
```
import java.sql.Date
import com.bridgewell.SparkTestUtils
import org.apache.spark.rdd.RDD
import
Dear all,
I am a new spark user from R.
After exploring the schemaRDD, I notice that it is similar to data.frame.
Is there a feature like `model.matrix` in R to convert schemaRDD to model
matrix automatically according to the type without explicitly converting
them one by one?
Thanks,
Wush
:38 GMT+08:00 Wush Wu w...@bridgewell.com:
Dear all,
I want to implement some sequential algorithm on RDD.
For example:
val conf = new SparkConf()
conf.setMaster(local[2]).
setAppName(SequentialSuite)
val sc = new SparkContext(conf)
val rdd = sc.
parallelize(Array(1, 3, 2, 7, 1, 4
Dear all,
I want to implement some sequential algorithm on RDD.
For example:
val conf = new SparkConf()
conf.setMaster(local[2]).
setAppName(SequentialSuite)
val sc = new SparkContext(conf)
val rdd = sc.
parallelize(Array(1, 3, 2, 7, 1, 4, 2, 5, 1, 8, 9), 2).
sortBy(x = x, true)
Dear Cheng Hao,
You are right!
After using the HiveContext, the issue is solved.
Thanks,
Wush
2015-02-15 10:42 GMT+08:00 Cheng, Hao hao.ch...@intel.com:
Are you using the SQLContext? I think the HiveContext is recommended.
Cheng Hao
*From:* Wush Wu [mailto:w...@bridgewell.com
Dear all,
I am new to Spark SQL and have no experience of Hive.
I tried to use the built-in Hive Function to extract the hour from
timestamp in spark sql, but got : java.util.NoSuchElementException: key
not found: hour
How should I extract the hour from timestamp?
And I am very confusing about
Dear all,
Does spark support sparse matrix/vector for LR now?
Best,
Wush
2014/6/2 下午3:19 於 praveshjain1991 praveshjain1...@gmail.com 寫道:
Thank you for your replies. I've now been using integer datasets but ran
into
another issue.
Dear all,
We have a spark 0.8.1 cluster on mesos 0.15. Some of my colleagues are
familiar with python, but some of features are developed under java. I am
looking for a way to integrate java and python on spark.
I notice that the initialization of pyspark does not include a field to
distribute
12 matches
Mail list logo