>From Dev's point, it has less burden to always support the latest version
of Spark (for example). But from user's point, especially for us who
maintain Spark internally, it is not easy to upgrade the Spark version for
the first time (since we have many customizations internally), and we're
still p
Great, thanks for sharing!
Best
Saisai
libis 于2020年12月4日周五 上午10:35写道:
> Nice, thanks for sharing
>
> Jacques Nadeau 于2020年12月4日周五 上午3:58写道:
>
>> Yeah, thanks for sharing
>>
>> On Thu, Dec 3, 2020 at 11:57 AM John Zhuge wrote:
>>
>>> Very nice!
>>>
>>> On Thu, Dec 3, 2020 at 10:36 AM Miao Wang
opFileIO and a
> FileSystem works great and is probably the easiest way to maintain an
> implementation for the object store.
>
> On Thu, Nov 12, 2020 at 7:31 PM Saisai Shao
> wrote:
>
>> Hi all,
>>
>> Sorry to chime in, I also have a same concern about using Iceberg
Hi all,
Sorry to chime in, I also have a same concern about using Iceberg with
Object storage.
One of my concerns with S3FileIO is getting tied too much to a single
> cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
> so that S3FileIO and (a future) GCSFileIO could share log
Would like to get structured streaming reader in in the next release :).
Will spend time on addressing new feedbacks.
Thanks
Saisai
Mass Dosage 于2020年8月27日周四 下午10:36写道:
> I'm all for a release. The only thing still required for basic Hive read
> support (other than documentation of course!) is
+1 for the changes.
Mass Dosage 于2020年8月20日周四 下午5:46写道:
> +1 for `iceberg-hive-metastore` as I found this confusing when I first
> started working with the code.
>
> On Thu, 20 Aug 2020 at 03:27, Jungtaek Lim
> wrote:
>
>> +1 for `iceberg-hive-metastore` and also +1 for RD's proposal.
>>
>> Tha
Congrats!
Thanks
Saisai
OpenInx 于2020年7月23日周四 上午10:06写道:
> Congratulations !
>
> On Thu, Jul 23, 2020 at 9:31 AM Jingsong Li
> wrote:
>
>> Congratulations Shardul! Well deserved!
>>
>> Best,
>> Jingsong
>>
>> On Thu, Jul 23, 2020 at 7:27 AM Anton Okolnychyi
>> wrote:
>>
>>> Congrats and welco
Hi team,
Any plan to get this Structured Streaming Read support (
https://github.com/apache/iceberg/pull/796) in 0.9.0 release? Would be
appreciated anyone can take a review. Thanks!
Best regards,
Saisai
Ryan Blue 于2020年6月23日周二 上午6:42写道:
> Hi everyone,
>
> I just posted my notes from the comm
+1 for graduation.
Junjie Chen 于2020年5月13日周三 上午9:33写道:
> +1
>
> On Wed, May 13, 2020 at 8:07 AM RD wrote:
>
>> +1 for graduation!
>>
>> On Tue, May 12, 2020 at 3:50 PM John Zhuge wrote:
>>
>>> +1
>>>
>>> On Tue, May 12, 2020 at 3:33 PM parth brahmbhatt <
>>> brahmbhatt.pa...@gmail.com> wrote:
>
> On 2020/03/27 01:53:09, Saisai Shao wrote:
> > Thanks Ryan, let me take a try.>
> >
> > Best regards,>
> > Saisai>
> >
> > Ryan Blue 于2020年3月27日周五 上午12:15写道:>
> >
> > > Here’s how it was done before:>
> > >
> h
>> the iceberg-hive and iceberg-mr modules so that they aren't locked to the
>> same versions as the rest of the projects.
>>
>> On Thu, 26 Mar 2020 at 01:53, Saisai Shao wrote:
>>
>>> Hi Ryan,
>>>
>>> As mentioned in the meeting, would
take a lot of time. I know in our
> company we have no near term plans to move to Spark 3.
>
> -Best,
> R.
>
> On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao wrote:
>
>> I was thinking that if it is possible to limit version lock plugin to
>> only iceberg core related
5pm PST in any day works for me.
Looking forward to it.
Thanks
Saisai
Hi team,
With more companies and developers joining in the community, I was
wondering if we could have regular sync up to discuss anything about
Iceberg, like milestone, feature design, etc. I think this will be quite
helpful to grow the community and move forward the project.
Would like to hear
ity?
Any suggestions on this?
Best regards,
Saisai
Saisai Shao 于2020年3月5日周四 下午3:12写道:
> I think the requirement of supporting different version should be quite
> common. As Iceberg is a table format which should be adapted to different
> engines like Hive, Flink, Spark. To support diff
ditional checks that
> baseline provides in general since this is a short-term problem. It would
> just be nice if we could have versions that are confined to a single
> module. The Nebula plugin that baseline uses claims to support that, but I
> couldn't get it to work.
>
an be differentiated by names when generating jars, also they will
not be relied by other modules in Iceberg.
So this dependency issue should not be the case here. And in Maven it could
be achieved easily. Please correct me if wrong.
Best regards,
Saisai
Saisai Shao 于2020年3月4日周三 上午10:01写道:
> Thanks Matt
gt;
> I would think that branching would be the best way to build and publish
> against multiple versions of a dependency.
>
>
>
> -Matt Cheah
>
>
>
> *From: *Saisai Shao
> *Reply-To: *"dev@iceberg.apache.org"
> *Date: *Tuesday, March 3, 2020 at
you can get it working, I think it's a
>> great idea to get this into master.
>>
>> Otherwise, I was thinking about proposing an 0.8.0 release in the next
>> month or so based on Spark 2.4. Then we could merge the branch into master
>> and do another release
Hi team,
I was thinking of merging spark-3 branch into master, also per the
discussion before we could make spark-2 and spark-3 coexisted into 2
different sub-modules. With this, one build could generate both spark-2 and
spark-3 runtime jars, user could pick either at preference.
One concern is t
wrote:
>
>> +1 for Iceberg branch
>>
>> Thanks for the contribution from you and your team!
>>
>> On Fri, Nov 22, 2019 at 8:29 AM Anton Okolnychyi
>> wrote:
>>
>>> +1 on having a branch in Iceberg as we have for vectorized reads.
>&g
is fixed in 3.0 because Spark will choose its behavior and correctly
> configure the source with a dynamic overwrite or an overwrite using an
> expression.
>
> On Thu, Nov 21, 2019 at 11:33 PM Saisai Shao
> wrote:
>
>> Hi Team,
>>
>> I found that Iceberg's &
Hi Team,
I found that Iceberg's "overwrite" is different from Spark's built-in
sources like Parquet. The "overwrite" semantics in Iceberg seems more like
"upsert", but not deleting the partitions where new data doesn't contain.
I would like to know what is the purpose of such design choice? Also
y, so that they
community could review and contribute on it.
I would like to hear your suggestions.
Best regards,
Saisai
Ryan Blue 于2019年11月20日周三 上午1:27写道:
> Sounds great, thanks Saisai!
>
> On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao
> wrote:
>
>> Thanks Anton, I
t;
> - Anton
>
>
> On 18 Nov 2019, at 02:08, Saisai Shao wrote:
>
> Hi Anton,
>
> Thanks to bring this out. We already have a branch building against Spark
> 3.0 (Master branch actually) internally, and we're actively working on it.
> I think it is a good idea to create
Hi Anton,
Thanks to bring this out. We already have a branch building against Spark
3.0 (Master branch actually) internally, and we're actively working on it.
I think it is a good idea to create an upstream Spark 3.0 branch, we could
share it if the community would like to do so.
Best regards,
Sa
Hi team,
I have some newbie questions about schema evolution and partition
evolution. From the design spec, Iceberg supports schema evolution and
partition spec evolution, my questions are:
1. If a new column is added, are we going to rewrite the whole data, if not
how do we support it?
2. Do we
Congrats Anton!
Best regards,
Saisai
Daniel Weeks 于2019年9月3日周二 上午7:48写道:
> Congrats Anton!
>
> On Fri, Aug 30, 2019 at 1:54 PM Edgar Rodriguez
> wrote:
>
>> Nice! Congratulations, Anton!
>>
>> Cheers,
>>
>> On Fri, Aug 30, 2019 at 1:42 PM Dongjoon Hyun
>> wrote:
>>
>>> Congratulations, Anton!
rch (in my opinion). I'm just one vote, though, so if most people
>> prefer to move to JIRA I'm open to it.
>>
>> What do you think is missing compared to JIRA?
>>
>> On Fri, Aug 16, 2019 at 3:09 AM Saisai Shao
>> wrote:
>>
>>> Hi Team,
&
Hi Team,
Seems Iceberg project uses Github issues instead of JIRA. IMHO JIRA is more
powerful and easy to manage, most of the Apache projects use JIRA to track
everything, any plan to move to JIRA or we stick on using Github issues?
Thanks
Saisai
e, Spark supports the latest features available in DataSourceV2,
> and will continue to. In fact, we're adding features to DSv2 based on what
> we've built internally at Netflix to support Iceberg.
>
> On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao wrote:
>
>> Thanks a lo
gt; missing automatically?
>
> On Wed, Aug 7, 2019 at 7:13 PM Saisai Shao wrote:
>
>> Thanks guys for your reply.
>>
>> I didn't do anything special, I don't even have a configured Hive. I just
>> simply put the iceberg (assembly) jar into Spark and start a local
IMHO I agree that we should have a branch to track the changes for Spark
3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would
be better to evaluate the changes and do the design by also considering 3.0
changes.
My two cents :)
Best regards,
Saisai
Edgar Rodriguez 于2019年8月
der if a newer version of Hive would avoid this problem? What version
> are you linking with?
> >
> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao
> wrote:
> > Hi team,
> >
> > I just met some issues when trying Iceberg with quick start guide. Not
> sure if it is pr
y to call this yet. Same for
> MERGE
> INTO, open source Spark doesn’t support the operation yet. We’re also
> working on building support into Spark as we go.
>
> I hope that helps!
>
> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao wrote:
>
>> Hi team,
>>
>&
Hi team,
Delta lake project recently announced version 0.3.0, which added several
new features in API level, like update, delete, merge, vacuum, etc. May I
ask is there any plan to add such features in Iceberg?
Thanks
Saisai
Hi team,
I just met some issues when trying Iceberg with quick start guide. Not sure
if it is proper to send this to @dev mail list (seems there's no user mail
list).
One issue is that seems current Iceberg cannot run with embedded metastore.
It will throw an exception. Is this an on-purpose beha
37 matches
Mail list logo