We have referred https://iceberg.incubator.apache.org/custom-catalog/ and
implemented atomic operation using dynamo optimistic locking. Iceberg
codebase has has excellent test case to validate custom implementation.
https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apac
Our team has created PR to address schema reordering issues. is there
better solution/guideline for schema ordering?
https://github.com/apache/incubator-iceberg/pull/745
context:
we have incoming avro payload , when ordering of avro is out of sync with
iceberg writing to table fails because of com
We are collecting bunch of metrics from spark sources. recently we switched
some of the tables to iceberg and noticed that metrics data is not
available for some of the metrics
following metrics always show count = 0
bytesRead: 0
recordsRead: 0
bytesWritten: 0
recordsWritten: 0
diskBytesSpilled: 0
t; https://github.com/apache/incubator-iceberg/issues/430#issuecomment-533360026
>
>
> > On 26 Nov 2019, at 21:17, suds wrote:
> >
> > I looked at open issue and discussion around sort spec
> https://github.com/apache/incubator-iceberg/issues/317
> >
> > for
I looked at open issue and discussion around sort spec
https://github.com/apache/incubator-iceberg/issues/317
for now we have added sort spec external to iceberg and made it work by
adding additional logic to sort dataframe before writing to iceberg table (
its a hack until above issue gets resolv
son. That way, it
> doesn't conflict with versions used by Spark. Are you using the
> iceberg-spark-runtime build?
>
> rb
>
> On Thu, Oct 10, 2019 at 2:24 PM suds wrote:
>
>> I am also seeing issues when using master branch with spark v 2.4.0+
>> Caused by: com
I am also seeing issues when using master branch with spark v 2.4.0+
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala
module 2.8.8 requires Jackson Databind version >= 2.8.0 and < 2.9.0
at
com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:66
at date ordinal. If you're partitioning by
> identity(date_col), it will store date_col.
>
> When reading data, values from the manifest are used for identity
> partition data to avoid extra work materializing the same value for every
> row.
>
> On Thu, Apr 18, 2019 at 8:
I am working on spark project and came across interesting ( was known in
hive) convention spark use.
https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#partition-discovery
in spark if I partition dataset. partition columns does not exists in
parquet schema and hence in final data file.
cates to evaluate on the partition data in the manifest.
> So this is a bug where we haven't passed the current table schema down to
> the manifest reader.
>
> I'll open an issue for it and fix this. Thanks!
>
> rb
>
> On Fri, Feb 15, 2019 at 11:34 AM suds wrote:
Thanks for reply Ryan.
I created gist with code example
https://gist.github.com/sudssf/e5f2de7463487f98c0a269221bbe0f1a
Please let me know if I am not using API correctly.
On Thu, Feb 14, 2019 at 5:38 PM Ryan Blue wrote:
> Sudsport,
>
> I'm wondering if you had the table cached somewhere? Th
11 matches
Mail list logo