[ 
https://issues.apache.org/jira/browse/HADOOP-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18757.
-------------------------------------
    Fix Version/s: 3.4.0
                   3.3.9
       Resolution: Fixed

> S3A Committer only finalizes the commits in a single thread
> -----------------------------------------------------------
>
>                 Key: HADOOP-18757
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18757
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.5
>            Reporter: Moditha Hewasinghage
>            Assignee: Moditha Hewasinghage
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.9
>
>
> S3A Committer is being bottle-necked on the driver when finalizing the 
> commits. It seems like only a single thread is being used to finalize the 
> commit. In the experiment we are saving 36,000 files ending committing for 
> almost 2 hours each file taking 0.1 - 0.5 seconds while all the executors 
> stay idle while the driver commits. I have attached the driver log snippets 
> to support this theory in comparison to spark 3.4.0.
> The most likely reason is the usage of 
> [ThreadPoolExecutor]([https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html])
>  in 
> [https://github.com/apache/hadoop/blob/706d88266abcee09ed78fbaa0ad5f74d818ab0e9/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/impl/CommitContext.java#L239-L244]
> where an LinkedBlockingQueue is used with a corePoolSize of 0 which ends up 
> with a single thread and no new threads created.
> From the documentation
> {code:java}
> Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue 
> without a predefined capacity) will cause new tasks to wait in the queue when 
> all corePoolSize threads are busy. Thus, no more than corePoolSize threads 
> will ever be created. (And the value of the maximumPoolSize therefore doesn't 
> have any effect.) This may be appropriate when each task is completely 
> independent of others, so tasks cannot affect each others execution; for 
> example, in a web page server. While this style of queuing can be useful in 
> smoothing out transient bursts of requests, it admits the possibility of 
> unbounded work queue growth when commands continue to arrive on average 
> faster than they can be processed.{code}
> Magic Committer spark 3.5.0-SNAPSHOT
> {code:java}
> 2023-05-26 15:35:04,852 DEBUG impl.CommitContext: creating thread pool of 
> size 32
> 2023-05-26 15:35:04,922 INFO yarn.YarnAllocator: Driver requested a total 
> number of 0 executor(s) for resource profile id: 0.
> 2023-05-26 15:35:07,910 INFO commit.AbstractS3ACommitter: Starting: 
> committing the output of 36000 task(s)
> 2023-05-26 15:35:07,914 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset
> 2023-05-26 15:35:07,941 DEBUG files.PersistentCommitData: Reading commit data 
> from file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000000.pendingset
> 2023-05-26 15:35:08,036 DEBUG impl.CommitContext: creating thread pool of 
> size 32
> 2023-05-26 15:35:08,037 DEBUG impl.CommitOperations: Committing single commit 
> SinglePendingCommit
> {version=2, 
> uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 
> 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000000_1920855/__base/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>  
> destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>  
> uploadId='SCp78J9aYOmnPwrtfd5a.Q6B9Zu6olQw3eZcpyt.W.BCu6.M6fz54nlTe2ZYUDicSskFgPocsjlVXAbfiPn3Xu
>  26MzXNdWcg5j_kBGg9iWpeoWh4K21gt7bbKetML95MXAck15yP.VGFeOleoAspUg--', 
> created=1685113555232, saved=1685113555232, size=110636173, date='Fri May 26 
> 15:05:55 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', 
> taskId='attemp t_202305261454174742717183892533843_0031_m_000000_1920855', 
> notes='', 
> etags=[64b9d14a89890cf7f931ff05074f6b6a,7a8c22d95d85736806205ad0f84094a5,82f55a1b0595b38420c6b55cb2763a46,212a3105afcf4a9045730f885a622e0a,b563ec91932a66498
>  
> e8e9d717b625961,0425b7ae17a8a2af80734f0adf4debe9,496e9fbc08a583657fd29779dcd848d6,72bff33d090699baacb6a6171760a675,b2c6b9b76256416d304392bfbabca382,1a8f63d960cf50589b83b407ddfc77c7,d79cf49f16cd61fb00faf67034268529,e3b7ea2bd0ff
>  
> f2e20a89eccd0ebbeb4a,8edc254a94902fee2884350b4a89f223,f0858853a4542b562c6973e28ee40c19]}
> 2023-05-26 15:35:08,037 INFO impl.CommitOperations: Starting: Committing file 
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
>  s
> ize 110636173
> 2023-05-26 15:35:08,198 DEBUG impl.CommitOperations: Successful commit of 
> file length 110636173
> 2023-05-26 15:35:08,199 INFO impl.CommitOperations: Committing file 
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00000-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
>  size 110636
> 173: duration 0:00.162s
> 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Loading and 
> committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
> 0f7457354/00/task_202305261454174742717183892533843_0031_m_000000.pendingset: 
> duration 0:00.294s
> 2023-05-26 15:35:08,208 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset
> 2023-05-26 15:35:08,208 DEBUG files.PersistentCommitData: Reading commit data 
> from file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000001.pendingset
> 2023-05-26 15:35:08,232 DEBUG impl.CommitOperations: Committing single commit 
> SinglePendingCommit
> {version=2, 
> uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 
> 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000001_1920856/__base/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>  
> destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>  
> uploadId='I.5Kns.n5JKLbupr5.AWbzApL9YDGhQyI438WQT3JXUjevkQS8u1aQwQhkVbwBoxnFXItCbKJCS8EwPdtoijtc
>  UFrjU0eU3pNvt7lwcYRJ0iklLwXexHbYMpZq3pDj0udwsHCq0trnqyq2Ee.o.prw--', 
> created=1685113548923, saved=1685113548923, size=110570244, date='Fri May 26 
> 15:05:48 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', 
> taskId='attemp t_202305261454174742717183892533843_0031_m_000001_1920856', 
> notes='', 
> etags=[835dd18d4ce995c22c1a84de5cd26fd8,c1f2acd3360b2f94ce90ef0d80edd7e5,208787fce146d917dbc50b1cf4adff4c,a66b228379201a0a278061b096dc2faf,58134b7ae228f0d2a
>  
> 6879ec53a3f35c1,e847a7614639d3f1a43a1b3090b603e1,2d8eac2399238e6d356fe0bdfafd076e,e0c3a964ee2a3c270f67e7e4a7791440,84b05192e896e41e1d16aa363f0eb48b,c3cf764e4361da4573b8dfa6361e4174,78b145f682946c7e18750758da3578cd,d9cd438899d2
>  
> 0522668b9cd7f61cc098,e8b3944929f17d927ed7aef31800bdbf,52d279a75e9382ead94fe4d9b81c3bf9]}
> 2023-05-26 15:35:08,232 INFO impl.CommitOperations: Starting: Committing file 
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
>  s
> ize 110570244
> 2023-05-26 15:35:08,438 DEBUG impl.CommitOperations: Successful commit of 
> file length 110570244
> 2023-05-26 15:35:08,438 INFO impl.CommitOperations: Committing file 
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00001-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
>  size 110570
> 244: duration 0:00.206s
> 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Loading and 
> committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e
> 0f7457354/00/task_202305261454174742717183892533843_0031_m_000001.pendingset: 
> duration 0:00.240s
> 2023-05-26 15:35:08,448 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448
> e-8022-31e0f7457354/00/task_202305261454174742717183892533843_0031_m_000002.pendingset
> 2023-05-26 15:35:08,448 DEBUG files.PersistentCommitData: Reading commit data 
> from file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5-448e-8022-31e0f7457354/00/
> task_202305261454174742717183892533843_0031_m_000002.pendingset
> 2023-05-26 15:35:08,489 DEBUG impl.CommitOperations: Committing single commit 
> SinglePendingCommit
> {version=2, 
> uri='s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b9167b1a-35a5- 
> 448e-8022-31e0f7457354/00/tasks/attempt_202305261454174742717183892533843_0031_m_000002_1920857/__base/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>  
> destination='<PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet',
>  
> uploadId='AB982jhG0EAfnSxXWphX7R.QfqrkS2ipP6APqJZYGlqDl335WX4X2xn81wy7l2i5NBQrvG.eVcQgYFudk51wi5
>  EMDJRNVz__SEWyF0xTncpIHpaC5xn8La.AY8gzCHQrAfRJ83nErQhe4Idmhcs04w--', 
> created=1685113553072, saved=1685113553072, size=110136058, date='Fri May 26 
> 15:05:53 UTC 2023', jobId='b9167b1a-35a5-448e-8022-31e0f7457354', 
> taskId='attemp t_202305261454174742717183892533843_0031_m_000002_1920857', 
> notes='', 
> etags=[f043628528fc9bdd6db4107fd175eb1d,11e36b4bde35bdb1d2e160ffd5fcc3f6,5105cb3ebee1b27cfec66f7682679316,1a85e5c9477e855e38f7bd4e81bed28e,7c57360865e12c19b
>  
> 3125cbfb8c786f8,59d928629e7afff4c4c038a1dadd0ce7,bfcedee2f51b45c20a097fbbae85a48f,7e3ff2665714d8cdc6ec20134856eb4c,0117e5c47f94a9ff4d8d0eafd1f6b76e,3c4f27e69544e4cc5297c56413231639,a32cd6422aaae63bdcbaafa7400ab889,1d5d07f0a0b2
>  
> 58773993a456bc7ec7ee,a220e786aa36f643edc330a184e23d88,e19a85ab3accaa4dc697e7dfbf5d5325]}
> 2023-05-26 15:35:08,489 INFO impl.CommitOperations: Starting: Committing file 
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
>  s
> ize 110136058
> 2023-05-26 15:35:08,646 DEBUG impl.CommitOperations: Successful commit of 
> file length 110136058
> 2023-05-26 15:35:08,646 INFO impl.CommitOperations: Committing file 
> <PATH>/date=20230520/rawRequestType=imp_dsp/part-00002-b196e04e-c87b-43e1-97e0-7ad34ef0bedf.c000.zstd.parquet
>  size 110136058: duration 0:00.157s
> {code}
> Magic Committer spark 3.4.0
>  
> {code:java}
> 2023-05-24 09:47:19,906 INFO yarn.YarnAllocator: Driver requested a total 
> number of 524 executor(s) for resource profile id: 0.
> 2023-05-24 09:47:23,064 INFO commit.AbstractS3ACommitter: Starting: 
> committing the output of 36000 task(s)
> 2023-05-24 09:47:23,069 DEBUG commit.AbstractS3ACommitter: Task committer 
> attempt_202305240933298436217226765687147_0000_m_000000_0: creating thread 
> pool of size 32
> 2023-05-24 09:47:23,074 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,074 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
> 2023-05-24 09:47:23,075 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,075 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000006.pendingset
> 2023-05-24 09:47:23,076 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000004.pendingset
> 2023-05-24 09:47:23,076 INFO commit.AbstractS3ACommitter: Starting: Loading 
> and committing files in pendingset 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic
> /job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000003.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000000.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000002.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7a
> e3c5785060/task_20230524093329924571364925494992_0031_m_000001.pendingset
> 2023-05-24 09:47:23,078 DEBUG commit.Tasks: Executing task
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000005.pendingset
> 2023-05-24 09:47:23,076 DEBUG files.PendingSet: Reading pending commits in 
> file 
> s3://<BUCKET_NAME>/<PATH>/date=20230520/__magic/job-b15cffa2-2764-45f6-883b-7ae3c5785060/task_20230524093329924571364925494992_0031_m_000007.pendingset{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to