回复:Re: How to connect JDBC DB based on Spark Sql

2015-04-13 Thread doovsaid
Great! It works. Thanks. Best,Yi - 原始邮件 - 发件人:Augustin Borsu 收件人:doovs...@sina.com 抄送人:dev 主题:Re: How to connect JDBC DB based on Spark Sql 日期:2015年04月14日 14点14分 Hello Yi, You can actually pass the username and password in the url. E.g. val url = " jdbc:postgresql://ip.ip.ip.ip/ow-fe

Re: Streamline contribution process with update to Contribution wiki, JIRA rules

2015-04-13 Thread Sree V
Hi Sean, This is not the first time, I am hearing it. I agree with JIRA suggestion. In most of the companies that I worked, we have 'no status', 'no type', when a jira is created. And we set both, in sprint planning meetings. I am not sure, how easy it would be for apache jira.  As any change mig

Re: How to connect JDBC DB based on Spark Sql

2015-04-13 Thread Augustin Borsu
Hello Yi, You can actually pass the username and password in the url. E.g. val url = " jdbc:postgresql://ip.ip.ip.ip/ow-feeder?user=MY_LOGIN&password=MY_PASSWORD" val query = "(SELECT * FROM \"YadaYada\" WHERE type='item' LIMIT 100) as MY_DB" val jdbcDF = sqlContext.load("jdbc", Map( "url" -> ur

How to connect JDBC DB based on Spark Sql

2015-04-13 Thread doovsaid
Hi all, According to the official document, SparkContext can load datatable to dataframe using the DataSources API. However, it just supports the following properties:Property NameMeaningurlThe JDBC URL to connect to.dbtableThe JDBC table that should be read. Note that anything that is valid in

Using memory mapped file for shuffle

2015-04-13 Thread Kannan Rajah
DiskStore.getBytes uses memory mapped files if the length is more than a configured limit. This code path is used during map side shuffle in ExternalSorter. I want to know if its possible for the length to exceed the limit in the case of shuffle. The reason I ask is in the case of Hadoop, each map

Eliminate partition filters in execution.Filter after filter pruning

2015-04-13 Thread Yijie Shen
Hi, Suppose I have a table t(id: String, event: String) saved as parquet file, and have directory hierarchy:   hdfs://path/to/data/root/dt=2015-01-01/hr=00 After partition discovery, the result schema should be (id: String, event: String, dt: String, hr: Int) If I have a query like: df.select(

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-13 Thread GuoQiang Li
+1 (non-binding) -- Original -- From: "Patrick Wendell";; Date: Sat, Apr 11, 2015 02:05 PM To: "dev@spark.apache.org"; Subject: [VOTE] Release Apache Spark 1.3.1 (RC3) Please vote on releasing the following candidate as Apache Spark version 1.3.1! The

Re: Query regarding infering data types in pyspark

2015-04-13 Thread Davies Liu
Hey Suraj, You should use "date" for DataType: df.withColumn(df.DateCol.cast("date")) Davies On Sat, Apr 11, 2015 at 10:57 PM, Suraj Shetiya wrote: > Humble reminder > > On Sat, Apr 11, 2015 at 12:16 PM, Suraj Shetiya > wrote: >> >> Hi, >> >> Below is one line from the json file. >> I have hi

Re: Streamline contribution process with update to Contribution wiki, JIRA rules

2015-04-13 Thread Nicholas Chammas
Wow, I had an open email draft to whine (yet again) about our open PR count and provide some suggestions. Will redirect that to the JIRA Sean created. Sweet! Nick On Mon, Apr 13, 2015 at 7:05 PM Patrick Wendell wrote: > Would just like to encourage everyone who is active in day-to-day > develo

Re: Streamline contribution process with update to Contribution wiki, JIRA rules

2015-04-13 Thread Patrick Wendell
Would just like to encourage everyone who is active in day-to-day development to give feedback on this (and I will do same). Sean has spent a lot of time looking through different ways we can streamline our dev process. - Patrick On Mon, Apr 13, 2015 at 3:59 PM, Sean Owen wrote: > Pardon, I want

Streamline contribution process with update to Contribution wiki, JIRA rules

2015-04-13 Thread Sean Owen
Pardon, I wanted to call attention to a JIRA I just created... https://issues.apache.org/jira/browse/SPARK-6889 ... in which I propose what I hope are some changes to the contribution process wiki that could help a bit with the flood of reviews and PRs. I'd be grateful for your thoughts and comme

Re: Spark Sql reading hive partitioned tables?

2015-04-13 Thread Michael Armbrust
Yeah, we don't currently push down predicates into the metastore. Though, we do prune partitions based on predicates (so we don't read the data). On Mon, Apr 13, 2015 at 2:53 PM, Tom Graves wrote: > Hey, > I was trying out spark sql using the HiveContext and doing a select on a > partitioned ta

Spark Sql reading hive partitioned tables?

2015-04-13 Thread Tom Graves
Hey, I was trying out spark sql using the HiveContext and doing a select on a partitioned table with lots of partitions (16,000+). It took over 6 minutes before it even started the job. It looks like it was querying the Hive metastore and got a good chunk of data back.  Which I'm guessing is inf

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-13 Thread Marcelo Vanzin
+1 (non-binding) Tested 2.6 build with standalone and yarn (no external shuffle service this time, although it does come up). On Fri, Apr 10, 2015 at 11:05 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.3.1! > > The tag to be voted on i

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-13 Thread Sree V
+1builds - checktests - checkinstalls and sample run - check Thanking you. With Regards Sree On Friday, April 10, 2015 11:07 PM, Patrick Wendell wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted on is v1.3.1-rc2 (commit 3e8

Re: SPARK-5364

2015-04-13 Thread Sree V
Thank you, Raynold.  Thanking you. With Regards Sree On Sunday, April 12, 2015 11:18 AM, Reynold Xin wrote: I closed it. Thanks. On Sun, Apr 12, 2015 at 11:08 AM, Sree V wrote: > Hi, > I was browsing through the JIRAs and found this can be closed.If anyone > who has edit permi

Re: How is hive-site.xml loaded?

2015-04-13 Thread Steve Loughran
There's some magic in the process that is worth knowing/being cautious of Those special HDFSConfiguration, YarnConfiguration, HiveConf objects are all doing work in their class initializer to call Configuration.addDefaultResource this puts their -default and -site XML files onto the list of defa

RE: Spark ThriftServer encounter java.lang.IllegalArgumentException: Unknown auth type: null Allowed values are: [auth-int, auth-conf, auth]

2015-04-13 Thread Andrew Lee
Hi Cheng, I couldn't find the component for Spark ThriftServer, will that be 'SQL' component? JIRA created.https://issues.apache.org/jira/browse/SPARK-6882 > Date: Sun, 15 Mar 2015 21:03:34 +0800 > From: lian.cs@gmail.com > To: alee...@hotmail.com; dev@spark.apache.org > Subject: Re: Spark Th

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-13 Thread Sean McNamara
+1 Sean > On Apr 11, 2015, at 12:07 AM, Patrick Wendell wrote: > > Please vote on releasing the following candidate as Apache Spark version > 1.3.1! > > The tag to be voted on is v1.3.1-rc2 (commit 3e83913): > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e8391327ba586eaf544

Spark SQL 1.3.1 "saveAsParquetFile" will output tachyon file with different block size

2015-04-13 Thread zhangxiongfei
Hi experts I run below code in Spark Shell to access parquet files in Tachyon. 1.First,created a DataFrame by loading a bunch of Parquet Files in Tachyon val ta3 =sqlContext.parquetFile("tachyon://tachyonserver:19998/apps/tachyon/zhangxf/parquetAdClick-6p-256m"); 2.Second, set the "fs.local.block

Manning looking for a co-author for the GraphX in Action book

2015-04-13 Thread Reynold Xin
Hi all, Manning (the publisher) is looking for a co-author for the GraphX in Action book. The book currently has one author (Michael Malak), but they are looking for a co-author to work closely with Michael and improve the writings and make it more consumable. Early access page for the book: http

R: Integrating D3 with Spark

2015-04-13 Thread Paolo Platter
Hi, I integrated charts on spark-notebook, very similar task. In order to reduce D3 boiler plate I suggest to use dimple.js. It provides out of the box d3 based charts. Bye Paolo Inviata dal mio Windows Phone Da: anshu shukla Invi