Hi Nikola,

for the ORC source it is fine to use `TableEnvironment#fromTableSource`. It is true that this method is deprecated, but as I said not all connectors have been ported to be supported in the SQL DDL via string properties. Therefore, `TableEnvironment#fromTableSource` is still accessible until all connectors are support in the DDL.

Btw it might also make sense to look into the Hive connector for reading ORC.

Regards,
Timo

On 22.03.21 18:02, Nikola Hrusov wrote:
Hi Timo,

I need to read ORC files and run a query on them as in the example above. Since the example given in docs is not recommended what should I use?

I looked into the method you suggest - TableEnvironment#fromTableSource - it shows as Deprecated on the docs: https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/table/api/TableEnvironment.html#fromTableSource-org.apache.flink.table.sources.TableSource- <https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/table/api/TableEnvironment.html#fromTableSource-org.apache.flink.table.sources.TableSource->

However, it doesn't say what I should use instead?

I have looked in all the docs available for 1.12 but I cannot find how to achieve the same result as it was in some previous versions. In some previous versions you could define `tableEnv.registerTableSource(tableName, orcTableSource);` but that method is not available anymore.

What is the way to go from here? I would like to read from orc files, run a query and transform the result. I do not necessarily need it to be with the DataSet API.

Regards
,
Nikola

On Mon, Mar 22, 2021 at 6:49 PM Timo Walther <twal...@apache.org <mailto:twal...@apache.org>> wrote:

    Hi Nikola,


    the OrcTableSource has not been updated to be used in a SQL DDL. You
    can
    define your own table factory [1] that translates properties into a
    object to create instances or use
    `org.apache.flink.table.api.TableEnvironment#fromTableSource`. I
    recommend the latter option.

    Please keep in mind that we are about to drop DataSet support for Table
    API in 1.13. Batch and streaming use cases are already possible with
    the
    unified TableEnvironment.

    Are you sure that you really need DataSet API?

    Regards,
    Timo

    [1]
    
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/
    
<https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sourcessinks/>

    On 21.03.21 15:42, Nikola Hrusov wrote:
     > Hello,
     >
     > I am trying to find some examples of how to use the
    OrcTableSource and
     > query it.
     > I got to the documentation here:
     >
    
https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html>

     >
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html
    
<https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/orc/OrcTableSource.html>>

     > and it says that an OrcTableSource is used as below:
     >
     > |OrcTableSource orcSrc = OrcTableSource.builder()
     > .path("file:///my/data/file.orc")
     >
    .forOrcSchema("struct<col1:boolean,col2:tinyint,col3:smallint,col4:int>")
    .build();
     > tEnv.registerTableSourceInternal("orcTable", orcSrc); Table res =
     > tableEnv.sqlQuery("SELECT * FROM orcTable"); |
     >
     >
     > My question is what should tEnv be so that I can use
     > the registerTableSourceInternal method?
     > My end goal is to query the orc source and then return a DataSet.
     >
     > Regards
     > ,
     > Nikola


Reply via email to