Hi team,
Thank you for your response. Could you please provide a sample
regex(source.path.regex-pattern) for the following scenarios:

Matching filenames that start with "flink" Eg : flink_2023_11_08.csv
Matching filenames that end with "flink.csv" Eg:
customer_2023_11_08_flink.csv

Thanks and regards,
Arjun

On Tue, 7 Nov 2023 at 16:00, Yu Chen <yuchen.e...@gmail.com> wrote:

> Hi Arjun,
>
> As stated in the document, 'This regex pattern should be matched with the
> absolute file path.'
> Therefore, you should adjust your regular expression to match absolute
> paths.
>
> Please let me know if there are any other problems.
>
> Best,
> Yu Chen
>
> > 2023年11月7日 18:11,arjun s <arjunjoice...@gmail.com> 写道:
> >
> > Hi Chen,
> > I attempted to configure the 'source.path.regex-pattern' property in the
> table settings as '^customer.*' to ensure that the Flink job only processes
> file names starting with "customer" in the specified directory. However, it
> appears that this configuration is not producing the expected results. Are
> there any additional configurations or adjustments that need to be made?
> The table script I used is as follows:
> > CREATE TABLE sample (
> >   col1 STRING,
> >   col2 STRING,
> >   col3 STRING,
> >   col4 STRING,
> >   file.path STRING NOT NULL METADATA
> > ) WITH (
> >   'connector' = 'filesystem',
> >   'path' = 'file:///home/techuser/inputdata',
> >   'format' = 'csv',
> >   'source.path.regex-pattern' = '^customer.*',
> >   'source.monitor-interval' = '10000'
> > )
> > Thanks in advance,
> > Arjun
> >
> > On Mon, 6 Nov 2023 at 20:56, Chen Yu <yuchen.e...@gmail.com> wrote:
> > Hi Arjun,
> >
> > If you can filter files by a regex pattern, I think the config
> `source.path.regex-pattern`[1] maybe what you want.
> >
> >   'source.path.regex-pattern' = '...',  -- optional: regex pattern to
> filter files to read under the
> >                                         -- directory of `path` option.
> This regex pattern should be
> >                                         -- matched with the absolute
> file path. If this option is set,
> >                                         -- the connector  will recursive
> all files under the directory
> >                                         -- of `path` option
> >
> > Best,
> > Yu Chen
> >
> >
> > [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/filesystem/
> >
> > 发件人: arjun s <arjunjoice...@gmail.com>
> > 发送时间: 2023年11月6日 20:50
> > 收件人: user@flink.apache.org <user@flink.apache.org>
> > 主题: Handling Schema Variability and Applying Regex Patterns in Flink Job
> Configuration   Hi team,
> > I'm currently utilizing the Table API function within my Flink job, with
> the objective of reading records from CSV files located in a source
> directory. To obtain the file names, I'm creating a table and specifying
> the schema using the Table API in Flink. Consequently, when the schema
> matches, my Flink job successfully submits and executes as intended.
> However, in cases where the schema does not match, the job fails to submit.
> Given that the schema of the files in the source directory is
> unpredictable, I'm seeking a method to handle this situation.
> > Create table query
> > =============
> > CREATE TABLE sample (col1 STRING,col2 STRING,col3 STRING,col4
> STRING,file.path` STRING NOT NULL METADATA) WITH ('connector' =
> 'filesystem','path' = 'file:///home/techuser/inputdata','format' =
> 'csv','source.monitor-interval' = '10000')
> > =============
> >
> > Furthermore, I have a question about whether there's a way to read files
> from the source directory based on a specific regex pattern. This is
> relevant in our situation because only file names that match a particular
> pattern need to be processed by the Flink job.
> >
> > Thanks and Regards,
> > Arjun
>
>

Reply via email to