Thanks, looks well, nice job! Best, Jingsong Lee
On Fri, Apr 10, 2020 at 5:56 PM wangl...@geekplus.com.cn < wangl...@geekplus.com.cn> wrote: > > https://issues.apache.org/jira/browse/FLINK-17086 > > It is my first time to create a flink jira issue. > Just point it out and correct it if I write something wrong. > > Thanks, > Lei > > ------------------------------ > wangl...@geekplus.com.cn > > > *From:* Jingsong Li <jingsongl...@gmail.com> > *Date:* 2020-04-10 11:03 > *To:* wangl...@geekplus.com.cn > *CC:* Jark Wu <imj...@gmail.com>; lirui <li...@apache.org>; user > <user@flink.apache.org> > *Subject:* Re: Re: fink sql client not able to read parquet format table > Hi lei, > > I think the reason is that our `HiveMapredSplitReader` not supports name > mapping reading for parquet format. > Can you create a JIRA for tracking this? > > Best, > Jingsong Lee > > On Fri, Apr 10, 2020 at 9:42 AM wangl...@geekplus.com.cn < > wangl...@geekplus.com.cn> wrote: > >> >> I am using Hive 3.1.1 >> The table has many fields, each field is corresponded to a feild in the >> RobotUploadData0101 >> class. >> >> CREATE TABLE `robotparquet`(`robotid` int, `framecount` int, >> `robottime` bigint, `robotpathmode` int, `movingmode` int, >> `submovingmode` int, `xlocation` int, `ylocation` int, >> `robotradangle` int, `velocity` int, `acceleration` int, >> `angularvelocity` int, `angularacceleration` int, `literangle` int, >> `shelfangle` int, `onloadshelfid` int, `rcvdinstr` int, `sensordist` >> int, `pathstate` int, `powerpresent` int, `neednewpath` int, >> `pathelenum` int, `taskstate` int, `receivedtaskid` int, >> `receivedcommcount` int, `receiveddispatchinstr` int, >> `receiveddispatchcount` int, `subtaskmode` int, `versiontype` int, >> `version` int, `liftheight` int, `codecheckstatus` int, >> `cameraworkmode` int, `backrimstate` int, `frontrimstate` int, >> `pathselectstate` int, `codemisscount` int, `groundcameraresult` int, >> `shelfcameraresult` int, `softwarerespondframe` int, `paramstate` int, >> `pilotlamp` int, `codecount` int, `dist2waitpoint` int, >> `targetdistance` int, `obstaclecount` int, `obstacleframe` int, >> `cellcodex` int, `cellcodey` int, `cellangle` int, `shelfqrcode` int, >> `shelfqrangle` int, `shelfqrx` int, `shelfqry` int, >> `trackthetaerror` int, `tracksideerror` int, `trackfuseerror` int, >> `lifterangleerror` int, `lifterheighterror` int, `linearcmdspeed` int, >> `angluarcmdspeed` int, `liftercmdspeed` int, `rotatorcmdspeed` int) >> PARTITIONED BY (`hour` string) STORED AS parquet; >> >> >> Thanks, >> Lei >> ------------------------------ >> wangl...@geekplus.com.cn >> >> >> *From:* Jingsong Li <jingsongl...@gmail.com> >> *Date:* 2020-04-09 21:45 >> *To:* wangl...@geekplus.com.cn >> *CC:* Jark Wu <imj...@gmail.com>; lirui <li...@apache.org>; user >> <user@flink.apache.org> >> *Subject:* Re: Re: fink sql client not able to read parquet format table >> Hi lei, >> >> Which hive version did you use? >> Can you share the complete hive DDL? >> >> Best, >> Jingsong Lee >> >> On Thu, Apr 9, 2020 at 7:15 PM wangl...@geekplus.com.cn < >> wangl...@geekplus.com.cn> wrote: >> >>> >>> I am using the newest 1.10 blink planner. >>> >>> Perhaps it is because of the method i used to write the parquet file. >>> >>> Receive kafka message, transform each message to a Java class Object, >>> write the Object to HDFS using StreamingFileSink, add the HDFS path as a >>> partition of the hive table >>> >>> No matter what the order of the field description in hive ddl >>> statement, the hive client will work, as long as the field name is the >>> same with Java Object field name. >>> But flink sql client will not work. >>> >>> DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x)); >>> final StreamingFileSink<RobotUploadData0101> sink; >>> sink = StreamingFileSink >>> .forBulkFormat(new >>> Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"), >>> ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class)) >>> >>> For example >>> RobotUploadData0101 has two fields: robotId int, robotTime long >>> >>> CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and >>> CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int) >>> is the same for hive client, but is different for flink-sql client >>> >>> It is an expected behavior? >>> >>> Thanks, >>> Lei >>> >>> ------------------------------ >>> wangl...@geekplus.com.cn >>> >>> >>> *From:* Jark Wu <imj...@gmail.com> >>> *Date:* 2020-04-09 14:48 >>> *To:* wangl...@geekplus.com.cn; Jingsong Li <jingsongl...@gmail.com>; >>> lirui <li...@apache.org> >>> *CC:* user <user@flink.apache.org> >>> *Subject:* Re: fink sql client not able to read parquet format table >>> Hi Lei, >>> >>> Are you using the newest 1.10 blink planner? >>> >>> I'm not familiar with Hive and parquet, but I know @Jingsong Li >>> <jingsongl...@gmail.com> and @li...@apache.org <li...@apache.org> are >>> experts on this. Maybe they can help on this question. >>> >>> Best, >>> Jark >>> >>> On Tue, 7 Apr 2020 at 16:17, wangl...@geekplus.com.cn < >>> wangl...@geekplus.com.cn> wrote: >>> >>>> >>>> Hive table stored as parquet. >>>> >>>> Under hive client: >>>> hive> select robotid from robotparquet limit 2; >>>> OK >>>> 1291097 >>>> 1291044 >>>> >>>> >>>> But under flink sql-client the result is 0 >>>> Flink SQL> select robotid from robotparquet limit 2; >>>> robotid >>>> 0 >>>> 0 >>>> >>>> Any insight on this? >>>> >>>> Thanks, >>>> Lei >>>> >>>> >>>> >>>> ------------------------------ >>>> wangl...@geekplus.com.cn >>>> >>>> >> >> -- >> Best, Jingsong Lee >> >> > > -- > Best, Jingsong Lee > > -- Best, Jingsong Lee