Here is an example:
|val sc = new SparkContext(new SparkConf)
// access hive tables
val hqlc = new HiveContext(sc)
import hqlc.implicits._
// access files on hdfs
val sqlc = new SQLContext(sc)
import sqlc.implicits._
sqlc.jsonFile("xxx").registerTempTable("xxx")
// access other DB
sqlc.jdbc("u
Hi,
Some suggestions:
1 You should tell us the version of spark and hive you are using.
2 You shoul paste the full trace stack of the exception.
In this case, I guess you have a nested directory in the path which
|bak_startup_log_uid_20150227| point to.
and the config field |hive.mapred.suppor
Hi, all
I got an ERROR when I build spark master branch with maven (commit:
|2d1e916730492f5d61b97da6c483d3223ca44315|)
|[INFO]
[INFO]
[INFO] Building Spark Project Catalyst 1.3.0-SNAPSHOT
[INFO] -
Hi, San
You need to provide more information to diagnose this problem, like :
1. What kind of SQL did you execute?
2. If there are some |group| operation in this SQL, could you do some
statistic about how many unique group keys in this case?
On 1/26/15 17:01, luohui20...@sina.com wrote:
Hi
Is there any way to support multiple users executing SQL on one thrift
server?
I think there are some problems for spark 1.2.0, for example:
1. Start thrift server with user A
2. Connect to thrift server via beeline with user B
3. Execute “insert into table dest select … from table src”
then w
Regards,
Yi Tian
tianyi.asiai...@gmail.com
> On Oct 27, 2014, at 07:59, Pagliari, Roberto wrote:
>
> I’m a newbie with Spark. After installing it on all the machines I want to
> use, do I need to tell it about Hadoop configuration, or will it be able to
> find it himself?
>
> Thank you,
I think you could use `repartition` to make sure there would be no empty
partitions.
You could also try `coalesce` to combine partitions , but it can't make sure
there are no more empty partitions.
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 18, 2014, at 20:30, j
Could you show your spark version ?
And the value of `spark.default.parallelism` you are setting?
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 20, 2014, at 12:38, Kevin Jung wrote:
> Hi,
> I usually use file on hdfs to make PairRDD and analyze it by using
> com
what is your meaning of "executed directly”?
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 16, 2014, at 22:50, neeraj wrote:
> Hi,
>
> Does Spark SQL have DDL, DML commands to be executed directly. If yes,
> please share the link.
>
> If No, please help
spark/yarn/stable/src/main/scala” the source path of module “yarn-parent_2.10"
7 then you can run "Build -> Rebuild Project" in IDEA.
PS: you should run “rebuild” after you run mvn or sbt command to spark project.
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On
“yarn.scheduler.increment-allocation-mb”, default is 1024
it means if you ask 4097mb memory for a container, the resource manager will
create a container which use 5120mb memory.
But I can’t figure out where 5GB come from.
Maybe there are some codes which mistake 1024 and 1000?
Best Regards,
Yi Tian
tianyi.asiai
I think -1 means your application master has not been started yet.
> 在 2014年8月31日,23:02,Tao Xiao 写道:
>
> I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
>
> Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit
> my job in local mode, Spark Standalone mode and
. Meantime we also build innovative big data applications to help our
customer in real time marketing, cross product selling, customer behavior
analysis as well as other areas by using Spark technology.
Yi Tian
13 matches
Mail list logo