[GitHub] [iceberg] openinx commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
openinx commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458510725 ## File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueWriter.java ## @@ -19,20 +19,28 @@ package org.apache.iceberg.orc; -import java.io.IOE

[GitHub] [iceberg] JingsongLi commented on issue #1215: FlinkParquetWriter should build writer with schema visitor

2020-07-21 Thread GitBox
JingsongLi commented on issue #1215: URL: https://github.com/apache/iceberg/issues/1215#issuecomment-662196003 > we can have one for the external Row as well, but if Flink can already convert from RowData to Row, I would like to try to use those conversions instead. Yes, Flink `Data

[GitHub] [iceberg] openinx commented on a change in pull request #1158: Flink: Add Orc value reader, writer implementations

2020-07-21 Thread GitBox
openinx commented on a change in pull request #1158: URL: https://github.com/apache/iceberg/pull/1158#discussion_r458494939 ## File path: data/src/main/java/org/apache/iceberg/data/orc/BaseOrcReader.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [iceberg] sudssf commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
sudssf commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458455773 ## File path: site/docs/configuration.md ## @@ -109,14 +110,14 @@ spark.read .table("catalog.db.table") ``` -| Spark option| Default

[GitHub] [iceberg] sudssf commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
sudssf commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458455594 ## File path: site/docs/configuration.md ## @@ -109,14 +110,14 @@ spark.read .table("catalog.db.table") ``` -| Spark option| Default

[GitHub] [iceberg] HotSushi commented on a change in pull request #1192: Implement Hive input format

2020-07-21 Thread GitBox
HotSushi commented on a change in pull request #1192: URL: https://github.com/apache/iceberg/pull/1192#discussion_r458440578 ## File path: mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergInputFormat.java ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software F

[GitHub] [iceberg] aokolnychyi commented on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
aokolnychyi commented on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662133811 We would need to check the remove orphan files action in a follow-up too to make sure we are safe. This is an au

[GitHub] [iceberg] raptond commented on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
raptond commented on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662130198 👍 to keep the paths as `URI` objects and not convert to` String`s. I thought perhaps we should `unescape`, but it seems a better idea to delegate that to `Path`.

[GitHub] [iceberg] skambha opened a new pull request #1226: Fix TestHiveTableConcurrency test failure: Increase the timeout to 3min

2020-07-21 Thread GitBox
skambha opened a new pull request #1226: URL: https://github.com/apache/iceberg/pull/1226 In my local env, this test testConcurrentConnections fails with a timeout. ``` org.apache.iceberg.hive.TestHiveTableConcurrency > testConcurrentConnections FAILED java.lang.Assertio

[GitHub] [iceberg] RussellSpitzer edited a comment on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer edited a comment on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662112918 ```scala scala> import java.net.URI import java.net.URI scala> import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.Path scala> val uri = n

[GitHub] [iceberg] RussellSpitzer edited a comment on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer edited a comment on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662111528 Yes. Basically the constructor `public Path(String pathString)` will just take apart the string as is without encoding and attempts to break it into the correct parts.

[GitHub] [iceberg] skambha opened a new issue #1225: Test TestHiveTableConcurrency.testConcurrentConnections test fails with a timeout assertion error

2020-07-21 Thread GitBox
skambha opened a new issue #1225: URL: https://github.com/apache/iceberg/issues/1225 In my local env, this test testConcurrentConnections fails with a timeout. ``` org.apache.iceberg.hive.TestHiveTableConcurrency > testConcurrentConnections FAILED java.lang.AssertionErro

[GitHub] [iceberg] RussellSpitzer edited a comment on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer edited a comment on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662112918 scala> import java.net.URI import java.net.URI scala> import org.apache.hadoop.fs.Path import org.apache.hadoop.fs.Path scala> val uri = new URI("file

[GitHub] [iceberg] RussellSpitzer edited a comment on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer edited a comment on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662112918 ```scala> new URI("file:///has%20spaces") res3: java.net.URI = file:///has%20spaces scala> val uri = new URI("file:///has%20spaces") uri: java.net.URI = file

[GitHub] [iceberg] RussellSpitzer commented on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer commented on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662112918 ```scala> new URI("file:///has%20spaces") res3: java.net.URI = file:///has%20spaces scala> val uri = new URI("file:///has%20spaces") uri: java.net.URI = file:///has

[GitHub] [iceberg] RussellSpitzer commented on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer commented on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662111528 Yes. Basically the constructor `public Path(String pathString) throws IllegalArgumentException` is just take apart string as is and attempts to break it into the correct parts

[GitHub] [iceberg] aokolnychyi commented on issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
aokolnychyi commented on issue #1224: URL: https://github.com/apache/iceberg/issues/1224#issuecomment-662101559 Is it because `URI.toString` that we are using does not decode the URI correctly? We later create `Path` from `String`? Spark has `CatalogUtils` that handles `URI` to `String` co

[GitHub] [iceberg] RussellSpitzer opened a new issue #1224: Importing a Spark table with Whitespace in Partition URI Results in FileNotFound Exception

2020-07-21 Thread GitBox
RussellSpitzer opened a new issue #1224: URL: https://github.com/apache/iceberg/issues/1224 Because SparkPartition stores partition URI's as a String there is an issue when reconverting the URI into a Hadoop path if the String contains URI encoded characters (like a space -> %20). This end

[GitHub] [iceberg] massdosage commented on pull request #1192: Implement Hive input format

2020-07-21 Thread GitBox
massdosage commented on pull request #1192: URL: https://github.com/apache/iceberg/pull/1192#issuecomment-662058640 @guilload OK, thanks, let me know what you find. It has made me think we should probably have this mr module produce an "uber jar" so for the Hive use case you just have to a

[GitHub] [iceberg] guilload commented on pull request #1192: Implement Hive input format

2020-07-21 Thread GitBox
guilload commented on pull request #1192: URL: https://github.com/apache/iceberg/pull/1192#issuecomment-662056963 @massdosage, yes I have managed to use the input format in a "real" Hive client. I'm taking a shortcut though, I just add `iceberg-mr` to the `iceberg-spark-runtime` subproject

[GitHub] [iceberg] aokolnychyi commented on pull request #1223: Adds Tests for Untested RemoveSnapshot Pathway

2020-07-21 Thread GitBox
aokolnychyi commented on pull request #1223: URL: https://github.com/apache/iceberg/pull/1223#issuecomment-662054831 Thanks, @RussellSpitzer! This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [iceberg] aokolnychyi merged pull request #1223: Adds Tests for Untested RemoveSnapshot Pathway

2020-07-21 Thread GitBox
aokolnychyi merged pull request #1223: URL: https://github.com/apache/iceberg/pull/1223 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458276824 ## File path: site/docs/configuration.md ## @@ -109,14 +110,14 @@ spark.read .table("catalog.db.table") ``` -| Spark option| Default

[GitHub] [iceberg] sudssf commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
sudssf commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458266469 ## File path: site/docs/configuration.md ## @@ -109,14 +110,14 @@ spark.read .table("catalog.db.table") ``` -| Spark option| Default

[GitHub] [iceberg] rdblue commented on a change in pull request #1213: Abstract the generic task writers for sharing the common codes between spark and flink

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1213: URL: https://github.com/apache/iceberg/pull/1213#discussion_r458265385 ## File path: core/src/main/java/org/apache/iceberg/taskio/FileAppenderFactory.java ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [iceberg] rdblue commented on issue #1215: FlinkParquetWriter should build writer with schema visitor

2020-07-21 Thread GitBox
rdblue commented on issue #1215: URL: https://github.com/apache/iceberg/issues/1215#issuecomment-661994888 Thanks for the clarification on the types that Flink uses. I agree that we should create readers and writers that work directly with these types, instead of trying to make the I

[GitHub] [iceberg] rdblue commented on a change in pull request #1158: Flink: Add Orc value reader, writer implementations

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1158: URL: https://github.com/apache/iceberg/pull/1158#discussion_r458260259 ## File path: data/src/main/java/org/apache/iceberg/data/orc/BaseOrcReader.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1223: Adds Tests for Untested RemoveSnapshot Pathway

2020-07-21 Thread GitBox
RussellSpitzer commented on a change in pull request #1223: URL: https://github.com/apache/iceberg/pull/1223#discussion_r458253047 ## File path: core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java ## @@ -356,6 +356,74 @@ public void testRetainZeroSnapshots() {

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1223: Adds Tests for Untested RemoveSnapshot Pathway

2020-07-21 Thread GitBox
aokolnychyi commented on a change in pull request #1223: URL: https://github.com/apache/iceberg/pull/1223#discussion_r458249862 ## File path: core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java ## @@ -356,6 +356,74 @@ public void testRetainZeroSnapshots() {

[GitHub] [iceberg] rymurr commented on pull request #1216: [Python] create table support

2020-07-21 Thread GitBox
rymurr commented on pull request #1216: URL: https://github.com/apache/iceberg/pull/1216#issuecomment-661981645 > I merged #1214. Could you please rebase this one? Thanks! :+1: rebased and ready for review! This is an

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458240257 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458241058 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458241058 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458240257 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458239096 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458238571 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458238870 ## File path: spark/src/main/java/org/apache/iceberg/spark/source/ReaderUtils.java ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [iceberg] rdblue commented on a change in pull request #1221: Spark: Fix estimateStatistics when called without filters

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1221: URL: https://github.com/apache/iceberg/pull/1221#discussion_r458238324 ## File path: site/docs/configuration.md ## @@ -109,14 +110,14 @@ spark.read .table("catalog.db.table") ``` -| Spark option| Default

[GitHub] [iceberg] rdblue commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458234776 ## File path: orc/src/main/java/org/apache/iceberg/orc/OrcValueWriter.java ## @@ -19,20 +19,28 @@ package org.apache.iceberg.orc; -import java.io.IOEx

[GitHub] [iceberg] rdblue commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458233957 ## File path: orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java ## @@ -51,12 +51,12 @@ private final OutputFile file; private final Write

[GitHub] [iceberg] rdblue commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458233371 ## File path: data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilterTypes.java ## @@ -180,7 +180,7 @@ public void createOrcInputFile(List re

[GitHub] [iceberg] rdblue commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458230021 ## File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java ## @@ -0,0 +1,418 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [iceberg] rdblue commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458230021 ## File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java ## @@ -0,0 +1,418 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [iceberg] rdblue commented on a change in pull request #1197: Refactor the GenericOrcWriter by using OrcSchemaWithTypeVisitor#visit

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1197: URL: https://github.com/apache/iceberg/pull/1197#discussion_r458230021 ## File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java ## @@ -0,0 +1,418 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [iceberg] rdblue commented on pull request #1216: [Python] create table support

2020-07-21 Thread GitBox
rdblue commented on pull request #1216: URL: https://github.com/apache/iceberg/pull/1216#issuecomment-661956214 I merged #1214. Could you please rebase this one? Thanks! This is an automated message from the Apache Git Servic

[GitHub] [iceberg] rdblue commented on a change in pull request #1214: [Python] Minor fixes for tests

2020-07-21 Thread GitBox
rdblue commented on a change in pull request #1214: URL: https://github.com/apache/iceberg/pull/1214#discussion_r458219274 ## File path: python/tests/api/test_helpers.py ## @@ -74,7 +74,7 @@ def predicate(self, pred): return None -class TestDataFile(DataFile):

[GitHub] [iceberg] rdblue merged pull request #1214: [Python] Minor fixes for tests

2020-07-21 Thread GitBox
rdblue merged pull request #1214: URL: https://github.com/apache/iceberg/pull/1214 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [iceberg] RussellSpitzer opened a new pull request #1223: Adds Tests for Untested RemoveSnapshot Pathway

2020-07-21 Thread GitBox
RussellSpitzer opened a new pull request #1223: URL: https://github.com/apache/iceberg/pull/1223 This is an easy PR, no Prod code changes, just improves test coverage :) Previously none of the tests in TestRemoveSnapshots used the code path for valid snapshots which contained refer

[GitHub] [iceberg] RussellSpitzer commented on pull request #1223: Adds Tests for Untested RemoveSnapshot Pathway

2020-07-21 Thread GitBox
RussellSpitzer commented on pull request #1223: URL: https://github.com/apache/iceberg/pull/1223#issuecomment-661948242 cc @aokolnychyi This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [iceberg] openinx commented on a change in pull request #1145: Implement the flink stream writer to accept the row data and emit the complete data files event to downstream

2020-07-21 Thread GitBox
openinx commented on a change in pull request #1145: URL: https://github.com/apache/iceberg/pull/1145#discussion_r458059875 ## File path: core/src/main/java/org/apache/iceberg/BaseFile.java ## @@ -360,7 +360,7 @@ public ByteBuffer keyMetadata() { if (list != null) {

[GitHub] [iceberg] openinx commented on a change in pull request #1145: Implement the flink stream writer to accept the row data and emit the complete data files event to downstream

2020-07-21 Thread GitBox
openinx commented on a change in pull request #1145: URL: https://github.com/apache/iceberg/pull/1145#discussion_r458045368 ## File path: core/src/main/java/org/apache/iceberg/BaseFile.java ## @@ -360,7 +360,7 @@ public ByteBuffer keyMetadata() { if (list != null) {

[GitHub] [iceberg] xhochy commented on a change in pull request #1214: [Python] Minor fixes for tests

2020-07-21 Thread GitBox
xhochy commented on a change in pull request #1214: URL: https://github.com/apache/iceberg/pull/1214#discussion_r457953657 ## File path: python/tests/api/test_helpers.py ## @@ -74,7 +74,7 @@ def predicate(self, pred): return None -class TestDataFile(DataFile):