Re: Write reliability in Iceberg

2020-01-28 Thread suds
We have referred https://iceberg.incubator.apache.org/custom-catalog/ and implemented atomic operation using dynamo optimistic locking. Iceberg codebase has has excellent test case to validate custom implementation. https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apac

help/suggestions for schema reordering issues

2020-01-22 Thread suds
Our team has created PR to address schema reordering issues. is there better solution/guideline for schema ordering? https://github.com/apache/incubator-iceberg/pull/745 context: we have incoming avro payload , when ordering of avro is out of sync with iceberg writing to table fails because of com

Datasource v2 some task metrics are missing?

2019-12-09 Thread suds
We are collecting bunch of metrics from spark sources. recently we switched some of the tables to iceberg and noticed that metrics data is not available for some of the metrics following metrics always show count = 0 bytesRead: 0 recordsRead: 0 bytesWritten: 0 recordsWritten: 0 diskBytesSpilled: 0

Re: passing clustering spec to datasource v2

2019-12-04 Thread suds
t; https://github.com/apache/incubator-iceberg/issues/430#issuecomment-533360026 > > > > On 26 Nov 2019, at 21:17, suds wrote: > > > > I looked at open issue and discussion around sort spec > https://github.com/apache/incubator-iceberg/issues/317 > > > > for

passing clustering spec to datasource v2

2019-11-26 Thread suds
I looked at open issue and discussion around sort spec https://github.com/apache/incubator-iceberg/issues/317 for now we have added sort spec external to iceberg and made it work by adding additional logic to sort dataframe before writing to iceberg table ( its a hack until above issue gets resolv

Re: Understanding Iceberg's dependency configuration

2019-10-10 Thread suds
son. That way, it > doesn't conflict with versions used by Spark. Are you using the > iceberg-spark-runtime build? > > rb > > On Thu, Oct 10, 2019 at 2:24 PM suds wrote: > >> I am also seeing issues when using master branch with spark v 2.4.0+ >> Caused by: com

Re: Understanding Iceberg's dependency configuration

2019-10-10 Thread suds
I am also seeing issues when using master branch with spark v 2.4.0+ Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.8.8 requires Jackson Databind version >= 2.8.0 and < 2.9.0 at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:66

Re: spark partition discovery vs iceberg partition discovery implementation

2019-04-18 Thread suds
at date ordinal. If you're partitioning by > identity(date_col), it will store date_col. > > When reading data, values from the manifest are used for identity > partition data to avoid extra work materializing the same value for every > row. > > On Thu, Apr 18, 2019 at 8:

spark partition discovery vs iceberg partition discovery implementation

2019-04-18 Thread suds
I am working on spark project and came across interesting ( was known in hive) convention spark use. https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#partition-discovery in spark if I partition dataset. partition columns does not exists in parquet schema and hence in final data file.

Re: Question about schema evolution in iceberg table

2019-02-20 Thread suds
cates to evaluate on the partition data in the manifest. > So this is a bug where we haven't passed the current table schema down to > the manifest reader. > > I'll open an issue for it and fix this. Thanks! > > rb > > On Fri, Feb 15, 2019 at 11:34 AM suds wrote:

Re: Question about schema evolution in iceberg table

2019-02-15 Thread suds
Thanks for reply Ryan. I created gist with code example https://gist.github.com/sudssf/e5f2de7463487f98c0a269221bbe0f1a Please let me know if I am not using API correctly. On Thu, Feb 14, 2019 at 5:38 PM Ryan Blue wrote: > Sudsport, > > I'm wondering if you had the table cached somewhere? Th