[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-05 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 Filed https://issues.apache.org/jira/browse/SPARK-18725 https://issues.apache.org/jira/browse/SPARK-18726 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16090 LGTM, merging to master/2.1! @ericl please create tickets for the other 2 issues --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-04 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 Yeah I was wondering if we should also try to fix that. It seems maybe not as bad since unpartitioned tables usually aren't that big. We can create separate tickets for investigating that,

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-04 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16090 After looking more at the code, now I agree with your approach. One question, seems we still scan the files when creating a unpartitioned external data source table? --- If your project is set

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69630/ Test PASSed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69630/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69630/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-03 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 Not sure I follow - could you explain more on why that would resolve the issue? Btw, I reverted this pr to b405635, which passes all tests. --- If your project is set up for it, you can

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16090 If we are going to hack it, how about this? ``` val dataSource = DataSource(...) if (classOf[FileFormat].isAssignableFrom(dataSource.providingClass)) {

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 Seems like the caching broke a bunch of tests. I'll take a look at this again tomorrow. On Fri, Dec 2, 2016, 7:49 PM UCB AMPLab wrote: > Test FAILed.

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69600/ Test FAILed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69600/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 cc @rxin please merge unless wenchen gets to it first --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69600/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 Fixed by adding a private cache to `Datasource`, which is used to avoid the duplicate file reads with InMemoryIndex. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 Seems like we also create InMemoryFileIndex twice for non-catalog tables. Let me try to fix that too. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 I looked at avoiding the creation of a CatalogFileIndex, but the way table resolution works right now, the only way is to create some sort of dummy file index class that does not support scans. It's

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16090 My main concern is that, in `CreateDataSourceTableCommand`, we call `DataSource.resolveRelation` to infer the schema and partition columns. At that time, the table is not created yet, so

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69536/ Test PASSed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69536/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69534/ Test FAILed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69534/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69536/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69534/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69514/ Test FAILed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69514/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-12-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69514/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69437/ Test FAILed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69437/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69437/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69435/ Test FAILed. ---

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69435/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16090 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16090 **[Test build #69435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69435/consoleFull)** for PR 16090 at commit

[GitHub] spark issue #16090: [SPARK-18661] [SQL] Creating a partitioned datasource ta...

2016-11-30 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/16090 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the