Re: spark stream and spark sql with data warehouse

2015-06-11 Thread Yi Tian
Here is an example: |val sc = new SparkContext(new SparkConf) // access hive tables val hqlc = new HiveContext(sc) import hqlc.implicits._ // access files on hdfs val sqlc = new SQLContext(sc) import sqlc.implicits._ sqlc.jsonFile("xxx").registerTempTable("xxx") // access other DB sqlc.jdbc("u

Re: Exception while select into table.

2015-03-02 Thread Yi Tian
Hi, Some suggestions: 1 You should tell us the version of spark and hive you are using. 2 You shoul paste the full trace stack of the exception. In this case, I guess you have a nested directory in the path which |bak_startup_log_uid_20150227| point to. and the config field |hive.mapred.suppor

Build spark failed with maven

2015-02-10 Thread Yi Tian
Hi, all I got an ERROR when I build spark master branch with maven (commit: |2d1e916730492f5d61b97da6c483d3223ca44315|) |[INFO] [INFO] [INFO] Building Spark Project Catalyst 1.3.0-SNAPSHOT [INFO] -

Re: SparkSQL tasks spend too much time to finish.

2015-01-26 Thread Yi Tian
Hi, San You need to provide more information to diagnose this problem, like : 1. What kind of SQL did you execute? 2. If there are some |group| operation in this SQL, could you do some statistic about how many unique group keys in this case? On 1/26/15 17:01, luohui20...@sina.com wrote: Hi

Is there any way to support multiple users executing SQL on thrift server?

2015-01-19 Thread Yi Tian
Is there any way to support multiple users executing SQL on one thrift server? I think there are some problems for spark 1.2.0, for example: 1. Start thrift server with user A 2. Connect to thrift server via beeline with user B 3. Execute “insert into table dest select … from table src” then w

Re: Spark SQL configuration

2014-10-26 Thread Yi Tian
Regards, Yi Tian tianyi.asiai...@gmail.com > On Oct 27, 2014, at 07:59, Pagliari, Roberto wrote: > > I’m a newbie with Spark. After installing it on all the machines I want to > use, do I need to tell it about Hadoop configuration, or will it be able to > find it himself? > > Thank you,

Re: How to not write empty RDD partitions in RDD.saveAsTextFile()

2014-10-20 Thread Yi Tian
I think you could use `repartition` to make sure there would be no empty partitions. You could also try `coalesce` to combine partitions , but it can't make sure there are no more empty partitions. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 18, 2014, at 20:30, j

Re: default parallelism bug?

2014-10-20 Thread Yi Tian
Could you show your spark version ? And the value of `spark.default.parallelism` you are setting? Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 20, 2014, at 12:38, Kevin Jung wrote: > Hi, > I usually use file on hdfs to make PairRDD and analyze it by using > com

Re: Spark SQL DDL, DML commands

2014-10-16 Thread Yi Tian
what is your meaning of "executed directly”? Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 16, 2014, at 22:50, neeraj wrote: > Hi, > > Does Spark SQL have DDL, DML commands to be executed directly. If yes, > please share the link. > > If No, please help

Re: Build spark with Intellij IDEA 13

2014-09-28 Thread Yi Tian
spark/yarn/stable/src/main/scala” the source path of module “yarn-parent_2.10" 7 then you can run "Build -> Rebuild Project" in IDEA. PS: you should run “rebuild” after you run mvn or sbt command to spark project. Best Regards, Yi Tian tianyi.asiai...@gmail.com On

Re: Memory used in Spark-0.9.0-incubating

2014-09-25 Thread Yi Tian
“yarn.scheduler.increment-allocation-mb”, default is 1024 it means if you ask 4097mb memory for a container, the resource manager will create a container which use 5120mb memory. But I can’t figure out where 5GB come from. Maybe there are some codes which mistake 1024 and 1000? Best Regards, Yi Tian tianyi.asiai

Re: What does "appMasterRpcPort: -1" indicate ?

2014-08-31 Thread Yi Tian
I think -1 means your application master has not been started yet. > 在 2014年8月31日,23:02,Tao Xiao 写道: > > I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it. > > Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit > my job in local mode, Spark Standalone mode and

Powered By Spark

2014-08-30 Thread Yi Tian
. Meantime we also build innovative big data applications to help our customer in real time marketing, cross product selling, customer behavior analysis as well as other areas by using Spark technology. Yi Tian