Hi Arjun,
If you can filter files by a regex pattern, I think the config
`source.path.regex-pattern`[1] maybe what you want.
'source.path.regex-pattern' = '...', -- optional: regex pattern to filter
files to read under the
-- directory of `path` option. This
regex pattern should be
-- matched with the absolute file path.
If this option is set,
-- the connector will recursive all
files under the directory
-- of `path` option
Best,
Yu Chen
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/filesystem/
________________________________
发件人: arjun s <[email protected]>
发送时间: 2023年11月6日 20:50
收件人: [email protected] <[email protected]>
主题: Handling Schema Variability and Applying Regex Patterns in Flink Job
Configuration
Hi team,
I'm currently utilizing the Table API function within my Flink job, with the
objective of reading records from CSV files located in a source directory. To
obtain the file names, I'm creating a table and specifying the schema using the
Table API in Flink. Consequently, when the schema matches, my Flink job
successfully submits and executes as intended. However, in cases where the
schema does not match, the job fails to submit. Given that the schema of the
files in the source directory is unpredictable, I'm seeking a method to handle
this situation.
Create table query
=============
CREATE TABLE sample (col1 STRING,col2 STRING,col3 STRING,col4 STRING,file.path`
STRING NOT NULL METADATA) WITH ('connector' = 'filesystem','path' =
'file:///home/techuser/inputdata','format' = 'csv','source.monitor-interval' =
'10000')
=============
Furthermore, I have a question about whether there's a way to read files from
the source directory based on a specific regex pattern. This is relevant in our
situation because only file names that match a particular pattern need to be
processed by the Flink job.
Thanks and Regards,
Arjun