[ 
https://issues.apache.org/jira/browse/HDFS-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329298#comment-15329298
 ] 

Steve Loughran commented on HDFS-10413:
---------------------------------------

Can I note that from the perspective of S3a, using listFiles(recursive=true) is 
significantly faster than using listStatus(). If code were encouraged to use 
that API rather than their own treewalk, then anything that works with object 
stores would see significant speedup.

Also, listFiles and similar use the RemoteIterator. That code can be async, to 
the extent that the results can be arriving while the client is processing the 
previous results. The code I'm doing in HADOOP-13208 doesn't do that, but it 
does do windowed queries; you only get a window-full of files listed, filtered 
and made available at a time. This keeps memory consumption down.

> Implement asynchronous listStatus for DistributedFileSystem
> -----------------------------------------------------------
>
>                 Key: HDFS-10413
>                 URL: https://issues.apache.org/jira/browse/HDFS-10413
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>
> Per the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15285597&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15285597]
>  from [~mingma], this Jira tracks efforts of implementing async listStatus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to