[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-03 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493775969 Finally, it's the problem that the record key string can not have character ":" int it. I think there should have some reminders. It takes time to figure it out. -- This is an

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-03 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493753828 Is there some key len check? It's hard to believe there is too many conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-03 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493728296 > Yes, seems some hash conflicts maybe. But we have about 1 million uid like this, all them are to hash to 1? Can't believe it. -- This is an automated message from

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-03 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493725310 It's weird. My uid keys is like below. It seems that bucket hash function think they are 1. ``` |tiq_fb3c7524-206c-4cef-a87f-4e6379190f38:htmtalent

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-02 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493692921 > > I see this pr #8308. It seems that this feature hasn't been merged? So 0.13.0 doesn't support this feature? > > Spark support for bulk_insert with bucket index is

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-02 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493552591 I see this pr https://github.com/apache/hudi/pull/8308. It seems that this feature hasn't been merged? So 0.13.0 doesn't support this feature? -- This is an automated message

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-02 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493546337 I test upsert with bucket index in spark. It's ok. But bulk insert with bucket index seems not as expected as I think. -- This is an automated message from the Apache Git

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-02 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493539406 > I'm sure that they are unique. I test upsert in spark. It's done as expect. But it's very slow. > How many distinct uid do you have in your dataset? I'm

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-02 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493538014 I'm sure that they are unique. I test upsert in spark. It's done as expect. But it's very slow. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-01 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492970849 I use below conf to test bulk insert. There is only one parquet. Did I miss something? I expect 5 parquet. My dataset is about 120GB. ``` CREATE TABLE

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-04-01 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492884944 > You do not declare the index type as bucket while doing the bulk_insert. So do you mean I should change my bulk insert conf like below ``` CREATE TABLE

[GitHub] [hudi] chenbodeng719 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that you can not change the write.tasks when you stop insert and

2023-03-31 Thread via GitHub
chenbodeng719 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492207358 - bulk insert conf ``` CREATE TABLE hbase2hudi_sink( uid STRING PRIMARY KEY NOT ENFORCED, oridata STRING, update_time