[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-07 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-740161845 + @sunchao @dongjoon-hyun This is not for merging, just for run through yetus and discussion. Tested S3A London (consistent!) with/without S3guard ```

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-08 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-740536282 I have a more straightforward solution to this: S3A to implement the FileSystem.getXAttr API call to return the headers. No risk to other applications; all spark will need

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-08 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-740865126 Latest iteration is something viable to be merged in: All HTTP headers are returned as xattrs * New HeaderProcessing store operation to isolate this * ContextAcces

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-10 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-742788785 Matching spark PR is https://github.com/apache/spark/pull/30714 I've now done downstream testing in spark local mode, verifying from the log that the attribute was re

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-11 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-743129197 Fixed new javadocs. The whole aws codebase has slowly been dropping in javadoc quality; need to do a cleanup there

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-11 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-743157050 This is how the "-getfattr" command looks like against a non-magic-marker file ``` bin/hadoop fs -getfattr -d s3a://stevel-london/cloud-integration/DELAY_LISTING

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-14 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-744455742 I'm happy with this PR. Design Issues to consider * log headers @ debug when we copy metadata * should all headers be mapped to lower case in XAttr API? * is the new

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2020-12-18 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-748063774 > Gentle ping~ for who? I'm happy with this, the only open design issue is "should we lower case all the classic headers?" Primarily as it reduces the risk of

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-04 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-753983870 rebased to trunk. _not yet retested/reviewed_ This is an automated message from the Apache Git Servi

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-05 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-754696434 Rebased, full test run. Failures of HADOOP-17403 and HADOOP-17451; both unrelated This is an automated mes

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-06 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-755437582 @sunchao @liuml07 could either of you take a look @ this? This is an automated message from the Apache Git

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-07 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-756223044 did squash and rebase down to a single patch; then tested against s3 london with `-Dparallel-tests -DtestsThreadCount=6 -Dscale -Dmarkers=delete -Ds3guard -Ddynamo` -

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-11 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-758139586 > Looks good to me overall. Using XAttr is smart and extensible for solving problems like this - and it breaks nothing! thanks,. biggest risk is actually the copy cod

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-11 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-758146786 Next PR to come in * tries to address all review comments * adds stats gathering * adds almost all the AWS headers (everything but some of the encryption stuff) as

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-11 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-758165380 (testing in progress) This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-12 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-758556408 Left the tests running overnight and clearly the laptop dropped of the internet for a while. This showed up that there's an NPE lurking in test teardown if test setup fails

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-13 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-759555115 build failed. Looks spurious. But I'm going to rebase anyway This is an automated message from the Apache G

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-13 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-759567381 It's in the commit log, but I'll add it as an explicit comment: I've added getXAttr() for directories too. The code initially does a HEAD path and if that returns 404 do a

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-19 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-762809195 style ``` ./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ContextAccessors.java:95: ObjectMetadata getObjectMetadata(final String key) throws I

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-20 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-763617693 all test failures are in hadoop-common and therefore unrelated. JVM ones: thread space, and one other which I'm going to view as transient. This PR is ready for revi

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-20 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-763814617 note: I think I'm going to remove the change to always enable magic committer from this PR. Instead I'm going to remove the switch entirely in the same PR which updates all

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-25 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-766807576 Revert the switching of the fs.s3a.magic.enabled; got a bigger patch which removes that switch entirely; it was only ever in there for two concerns 1. to make sure people

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-25 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-766884586 having to rebase due to the site doc changes; sorry This is an automated message from the Apache Git Servic

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-25 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-766897657 * Pushed up rebased and squashed PR. * added changes to go with Mukund's review as a second patch * testing in progress --

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-25 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-766943305 Retested: s3 london, unguarded. all good. I think we are done here. Can I get a vote from anyone with commit rights, eg. @liuml07 @sunchao @bgaborg ? -

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-25 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-766807576 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-26 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-767775075 ahh., great, thanks! This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [hadoop] steveloughran commented on pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

2021-01-26 Thread GitBox
steveloughran commented on pull request #2530: URL: https://github.com/apache/hadoop/pull/2530#issuecomment-767778010 Big thanks to all reviewers -its in trunk, backport to 3.3 in progress and I'll get the spark one in now as well --