回复: Handling Schema Variability and Applying Regex Patterns in Flink Job Configuration

Chen Yu Mon, 06 Nov 2023 07:27:21 -0800

Hi Arjun,

If you can filter files by a regex pattern, I think the config 
`source.path.regex-pattern`[1] maybe what you want.



  'source.path.regex-pattern' = '...',  -- optional: regex pattern to filter 
files to read under the
                                        -- directory of `path` option. This 
regex pattern should be
                                        -- matched with the absolute file path. 
If this option is set,
                                        -- the connector  will recursive all 
files under the directory
                                        -- of `path` option

Best,
Yu Chen


[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/filesystem/

________________________________
发件人: arjun s <arjunjoice...@gmail.com>
发送时间: 2023年11月6日 20:50
收件人: user@flink.apache.org <user@flink.apache.org>
主题: Handling Schema Variability and Applying Regex Patterns in Flink Job 
Configuration

Hi team,
I'm currently utilizing the Table API function within my Flink job, with the 
objective of reading records from CSV files located in a source directory. To 
obtain the file names, I'm creating a table and specifying the schema using the 
Table API in Flink. Consequently, when the schema matches, my Flink job 
successfully submits and executes as intended. However, in cases where the 
schema does not match, the job fails to submit. Given that the schema of the 
files in the source directory is unpredictable, I'm seeking a method to handle 
this situation.
Create table query
=============
CREATE TABLE sample (col1 STRING,col2 STRING,col3 STRING,col4 STRING,file.path` 
STRING NOT NULL METADATA) WITH ('connector' = 'filesystem','path' = 
'file:///home/techuser/inputdata','format' = 'csv','source.monitor-interval' = 
'10000')
=============

Furthermore, I have a question about whether there's a way to read files from 
the source directory based on a specific regex pattern. This is relevant in our 
situation because only file names that match a particular pattern need to be 
processed by the Flink job.

Thanks and Regards,
Arjun

回复: Handling Schema Variability and Applying Regex Patterns in Flink Job Configuration

Reply via email to