[ https://issues.apache.org/jira/browse/SPARK-46339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
L. C. Hsieh reassigned SPARK-46339: ----------------------------------- Assignee: L. C. Hsieh > Directory with number name should not be treated as metadata log > ---------------------------------------------------------------- > > Key: SPARK-46339 > URL: https://issues.apache.org/jira/browse/SPARK-46339 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 3.3.3, 3.4.2, 3.5.0 > Reporter: L. C. Hsieh > Assignee: L. C. Hsieh > Priority: Major > Labels: pull-request-available > > HDFSMetadataLog takes a metadata path as parameter. When it goes to retrieves > all batches metadata, it calls `CheckpointFileManager.list` to get all files > under the metadata path. However, currently all implementations of > `CheckpointFileManager.list` returns all files/directories under the given > path. So if there is a dictionary with name of batch number (a long value), > the directory will be returned too and cause trouble when HDFSMetadataLog > goes to read it. > Actually, `CheckpointFileManager.list` method clearly defines that it lists > the "files" in a path. That's being said, current implementations don't > follow the doc. We should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org