Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16938
thanks, merging to master!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73680/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73680 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73680/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73680 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73680/testReport)**
for PR 16938 at commit
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
I am modifying the hacky code
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73648/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73648 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73648/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73648 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73648/testReport)**
for PR 16938 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73634/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73634 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73634/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73634 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73634/testReport)**
for PR 16938 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73587/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73587 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73587/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73587 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73587/testReport)**
for PR 16938 at commit
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@gatorsmile @cloud-fan could you help to review this pr? thanks :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73424/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73424 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73424/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73424 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73424/testReport)**
for PR 16938 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73411/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73411 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73411/testReport)**
for PR 16938 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73399/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73411 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73411/testReport)**
for PR 16938 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73405/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73405 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73405/testReport)**
for PR 16938 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73400/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73400 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73400/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73405 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73405/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73400 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73400/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #73399 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73399/testReport)**
for PR 16938 at commit
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@gatorsmile I have test it for `partition path exists` , the result is
still same with `table path exists`
**2. CREATE TABLE ...PARTITIONED BY ... LOCATION path AS SELECT ...**
a)
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16938
@tejasapatil Spark doesn't need to be exactly same with Hive, we follow
hive behavior if it's reasonable, or use our own logic if hive's behavior
doesn't make sense.
---
If your project is set
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/16938
I looked into the code. Looks like that version is merely for picking the
hive shim and metastore interactions and got nothing to do with semantics of
SQL operations. So you are most likely
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@tejasapatil In my opinion, test in Hive 2.0.0 just make a compare with
Spark, the target is to determine these actions in Spark, not to make consist
with Hive 2.0.0 or Hive 1.2.1, isn't it?
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/16938
@windpiger : I realised that you are checking the hive behavior against
Hive 2.0.0. Spark is expected to support semantics for Hive 1.2.1 :
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
Thank you for your work!
Maybe the last question.
```
**2. CREATE TABLE ...PARTITIONED BY ... LOCATION path AS SELECT ...**
a) path exists
hive(external) ->
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
oh, you are right~ thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
Basically, the rules you proposed are
- When users specify the location in CT or CTAS (i.e., creating an external
table), we should create a new directory if not existed, or overwrite
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
I found you also changed the following cases:
**4. CREATE TABLE **
**5. CREATE TABLE ... AS SELECT ...**
Actually, they are managed tables. You do not need to update them. Can
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@cloud-fan @gatorsmile @tejasapatil As we discussed above, we have three
actions to do:
# 1. CREATE TABLE ... (PARTITIONED BY ...) LOCATION path
*situation:path not exists*
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@gatorsmile sorry, I make a mistake of this, I have updated the compare
test above.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
Based on [the
doc](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)),
Hive does not support CTAS when the target table is external.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16938
@windpiger yes for both questions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@cloud-fan
situation 2. CREATE TABLE ...(PARTITIONED BY ...) LOCATION path AS SELECT
...
is different for `path exists`, which is this PR going to resolve. It is ok
to make it consist
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@tejasapatil
* throw exception is the result of the test, It is really happened in
current spark master branch
* Hive CTAS not support for partition table
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16938
> CREATE TABLE ... (PARTITIONED BY ...) LOCATION path
I think hive's behavior makes more sense. Users may wanna insert data to
this table and put the data in a specified location, even it
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/16938
@windpiger :
- what does `throw exception(...)` mean ? Operation is supported OR not ?
it might throw exception but the operation itself might have happened.
- for 2nd point, you said
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@gatorsmile I have test all the cases above updated. The result shows that
spark for datasource table with HiveExternalCatalog and InMemoryCatatlog
have the same actions.
spark for hive
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@gatorsmile
Sorry, I forget to declare that ,Above tests, spark represents parquet
table with HiveExternalCatalog , hive represents hive table in hive2.0.0.
I will add hive serde
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
Could you check the behaviors for both data source tables and hive serde
tables? Later, we also need to check the behaviors of InMemoryCatalog for data
source tables without enabling Hive
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
@windpiger Thank you for your efforts! What you did above need to be
written as the test cases. Could you do it as a separate PR?
In addition, all the cases you tried are only for hive
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
Compare spark-master branch and hive-2.0.0
**1. CREATE TABLE ... PARTITIONED BY ... LOCATION path**
```
a) path exists
hive -> ok
spark -> ok
b) path not
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
**1. CREATE TABLE ... LOCATION path**
```
a) path exists
hive -> ok
spark -> ok
b) path not exists
hive -> ok
spark -> throw exception(path
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
One more case:
5. `CREATE TABLE` or `CTAS` without the location spec: if the default path
exists, should we succeed or fail?
After we finishing the TABLE-level DDLs, we also need to
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16938
ok let's discuss it case by case:
1. `CREATE TABLE ... LOCATION path` works if path exists, it's expected
2. `CREATE TABLE ... LOCATION path` fails if path doesn't exist, is it
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
@cloud-fan @gatorsmile @tejasapatil let's discuss this together?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
I think in CTASï¼it is not allowed an existed tableï¼ no strict for the
path exists. In DataFrameWriter.save with errorifnotexist modeï¼path existed
is not allowed.
---
If your project is
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/16938
From what I understand, this change is applicable for EXTERNAL tables only.
There are two main uses of EXTERNAL tables I am aware of (repost from
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/16938
We need to define a consistent rule in Catalog how to handle the scenario
when the to-be-created directory already exists. So far, in most DDL scenarios,
when trying to create a directory but it
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/16938
I don't think we should treat it as a bug just because hive supports it, we
should think more. Does it make sense to specify an existing directory in CTAS?
---
If your project is set up for it,
Github user windpiger commented on the issue:
https://github.com/apache/spark/pull/16938
cc @gatorsmile @cloud-fan
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72932/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/16938
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #72932 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72932/testReport)**
for PR 16938 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/16938
**[Test build #72932 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72932/testReport)**
for PR 16938 at commit
73 matches
Mail list logo