Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-04-22 Thread via GitHub
megri commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-2069863018 I am experiencing the same issues, using the same setup as paulpaul1076. Thanks to this discussion I also tried changing from S3FileIO to the default and so far it seems to be working

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-29 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1970723432 @ajantha-bhat you don't need large scale to reproduce it at all. For me this problem started happening after the first run of rewrite_data_files. The second run started failing

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-29 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1970719190 @RussellSpitzer thanks a lot for helping with this. Want to give a bit more details (we discussed this with Russell in iceberg slack). This is how I would load my catalog

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-28 Thread via GitHub
ajantha-bhat commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1970555983 Thanks for helping in narrowing down @RussellSpitzer 👍 We still need to figureout the solution to this problem. But I am not sure how to reproduce locally with small data.

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-28 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1969253762 @ajantha-bhat yea, so, wondering, maybe there are some settings that could be tuned to let this work in spark SQL. The thing is, I ran both Spark DSL and Spark SQL and com

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-28 Thread via GitHub
ajantha-bhat commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1969247917 I don't think this is related to catalogs. Catalogs just keep track of table metadata file. Here the callstack is about spark reading the parquet file from storage using

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-28 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1969238526 Just to update this. We deployed the Nessie catalog in prod and this issue persists for some odd reason. -- This is an automated message from the Apache Git Service. To respon

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-16 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1949240582 @nastra unfortunately this doesn't seem to be the only reason for the content-length exception. We now discovered that it still fails, even though I stopped using the direct str

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-14 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1944320729 Discussed in slack that this is due to iceberg's streaming writer not being unique, this PR should fix this: https://github.com/apache/iceberg/pull/9255, waiting for iceberg 1.5

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-14 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1944183346 @nastra So, I seem to have discovered new info about what's going on. For some reason in Iceberg metadata there are 2 entries of the same file: ![image](https://github.co

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-14 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1943746314 @nastra these are the logs from the driver that does compaction and fails with this content length exception, and from one of the executors: [logs.zip](https://github.com/

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-13 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1942611489 @nastra thank you very much, I will try tomorrow and let you know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-13 Thread via GitHub
nastra commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1941427126 @paulpaul1076 I don't have an Airflow setup but I ran a streaming job locally and created 4000+ files. The specific setup I used was from the [Spark quickstart example](https:

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-10 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1937083012 @nastra The easiest way to reproduce it is just use my streaming job, just leave it running, maybe for a few days even. And also schedule in airflow the compaction job to run ev

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-10 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1937082611 Let me know if you manage to do it or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-10 Thread via GitHub
nastra commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1936929767 Thanks @paulpaul1076, I will try and reproduce this next week on my end -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-09 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1936244579 Btw, as I said the Scala DSL for compaction works, Spark SQL doesn't. I compared the job parameters in the Spark UI tab, they are absolutely identical, so, it's not like t

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-09 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1936239049 @nastra where should I upload the data for you? I will upload it, then you can register the table in your catalog. I used hive catalog, but I don't think it matters. Anyw

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-08 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1933903982 Basically I had a streaming job that was streaming small files. Then I stopped it, tried compacting, and it failed with these content-length exceptions. I'll try to find some fr

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-08 Thread via GitHub
nastra commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1933750195 > Anyways, got it to work, now there's a similar exception, but written a bit different: > > ``` > org.apache.iceberg.exceptions.RuntimeIOException: java.io.EOFException: Re

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-08 Thread via GitHub
nastra commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1933702701 > Looks like iceberg-aws-bundle doesn't have this class: > > `Exception in thread "main" java.lang.NoClassDefFoundError: software/amazon/awssdk/http/urlconnection/UrlConnectionH

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-07 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1932587930 Anyways, got it to work, now there's a similar exception, but written a bit different: ``` org.apache.iceberg.exceptions.RuntimeIOException: java.io.EOFException: Reac

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-07 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1932570894 Looks like iceberg-aws-bundle doesn't have this class: `Exception in thread "main" java.lang.NoClassDefFoundError: software/amazon/awssdk/http/urlconnection/UrlConnectionH

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-07 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1932550064 I just need to set a spark option like this, right: `spark.sql.catalog.my_catalog.http-client.type=urlconnection` ? -- This is an automated message from the Apac

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-07 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1932545715 Yea, I can try with that setting, where do I set it, by the way? Do I have to rebuild iceberg jars? The problem is not the RewriteDataFiles Spark action, it's the procedur

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-07 Thread via GitHub
nastra commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1932300265 @paulpaul1076 do you have a chance to try with `http-client.type=urlconnection`? It's of course also possible that there's a bug in `RewriteDataFilesSparkAction` that went unnoticed.

Re: [I] rewrite_data_files procedure fails with Premature end of Content-Length when using S3 client [iceberg]

2024-02-07 Thread via GitHub
paulpaul1076 commented on issue #9679: URL: https://github.com/apache/iceberg/issues/9679#issuecomment-1932287892 @nastra the Scala code works fine, the problem is inside iceberg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu