Some of the conversions we are seeing are:
- Decimal to Decimal; not just limited to increasing precision as with
Iceberg
- varchar to string
- numeric type to numeric type (float to Decimal, double to Decimal,
Decimal to double, etc)
-
- numeric type to string
On Tue, Nov 2
Data needs to be clustered so the Iceberg writer receives data for one
table partition at a time. If it isn't clustered, Iceberg would need to
either keep multiple files open (for all unfinished partitions) or would
need to close and open new files for the same partition resulting in small
files.
By “task receives data clustered by partition”, do you mean that I should
repartition using the same colums I order by? For example:
df
.repartition(col(“category”), col(“ts”), expr(“iceberg_bucket16(id)”))
.orderBy(col(“category”), col(“ts”), expr(“iceberg_bucket16(id)”))
…or am I misunders
You left the complex types off of your list (struct, map, array,
uniontype). All of them have natural mappings in Iceberg, except for
uniontype. Interval is supported on output, but not as a column type.
Unfortunately, we have some tables with uniontype, so we'll need a solution
for how to deal wit
It should work if you use `ORDER BY category, ts, iceberg_bucket16(id)`.
You just need to ensure that each task receives data clustered by partition.
On Tue, Nov 24, 2020 at 7:25 AM Kruger, Scott wrote:
> I did register the bucket UDF (you can see me using it in the examples),
> and the docs wer
One of the challenges we've had is that Hive is more flexible with schema
evolution compared to Iceberg. Are you guys also looking at this aspect?
On Tue, Nov 24, 2020 at 8:21 PM Peter Vary
wrote:
> Hi Team,
>
> With Shardul we had a longer discussion yesterday about the schema
> synchronization
I did register the bucket UDF (you can see me using it in the examples), and
the docs were helpful to an extent, but the issue is that it only shows how to
use bucketing when it’s the only partitioning scheme, not the innermost of a
multi-level partitioning scheme. That’s what I’m having trouble
Hi Team,
With Shardul we had a longer discussion yesterday about the schema
synchronization between Iceberg and Hive, and we thought that it would be good
to ask the opinion of the greater community too.
We can have 2 sources for the schemas.
Hive table definition / schema
Iceberg schema.
If