Hi Averell,

Is this all the custom code for "CustomFileSource"?
If not, can you share the entire file with us, and if you can set the log
level to DEBUG, it will help you analyze and locate the problem.
If you can't come to a conclusion, you can share the log with us.

Thanks, vino.


Averell <lvhu...@gmail.com> 于2018年9月21日周五 上午6:51写道:

> Good day everyone,
>
> I have about 100 thousand files to read, and a custom FilePathFilter with a
> simple filterPath method defined as below (the custom part is only to check
> file-size and skip files with size = 0)
>         override def filterPath(filePath: Path): Boolean = {
>                 filePath == null ||
>                                 filePath.getName.startsWith(".") ||
>                                 filePath.getName.startsWith("_") ||
>
> filePath.getName.contains(FilePathFilter.HADOOP_COPYING) ||
>                                 {
>                                                 def fileStatus =
> filePath.getFileSystem.getFileStatus(filePath)
>                                                 !fileStatus.isDir &&
> fileStatus.getLen == 0
>                                 }
>         }
>
> It is running fine either when I disable checkpointing or when I use the
> default FilePathFilter. It takes about 7 minutes to finished processing all
> files (from source to sink).
> However, when I have both, customer filter and checkpointing, it usually
> takes 15-20 minutes for Flink to start reading my files (in Flink GUI, the
> CustomFileSource monitor generates 0 records during that 15-20 minutes
> period)
>
> Could someone please help with this?
>
> Thank you very much for your time.
> Best regards,
> Averell
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to