Re: Re: fink sql client not able to read parquet format table

Jingsong Li Thu, 09 Apr 2020 20:03:45 -0700

Hi lei,

I think the reason is that our `HiveMapredSplitReader` not supports name
mapping reading for parquet format.
Can you create a JIRA for tracking this?


Best,
Jingsong Lee

On Fri, Apr 10, 2020 at 9:42 AM wangl...@geekplus.com.cn <
wangl...@geekplus.com.cn> wrote:

>
> I am using Hive 3.1.1
> The table has many fields, each field is corresponded to a feild in the 
> RobotUploadData0101
> class.
>
> CREATE TABLE `robotparquet`(`robotid` int,   `framecount` int,
> `robottime` bigint,   `robotpathmode` int,   `movingmode` int,
> `submovingmode` int,   `xlocation` int,   `ylocation` int,
> `robotradangle` int,   `velocity` int,   `acceleration` int,
> `angularvelocity` int,   `angularacceleration` int,   `literangle` int,
> `shelfangle` int,   `onloadshelfid` int,   `rcvdinstr` int,   `sensordist`
> int,   `pathstate` int,   `powerpresent` int,   `neednewpath` int,
> `pathelenum` int,   `taskstate` int,   `receivedtaskid` int,
> `receivedcommcount` int,   `receiveddispatchinstr` int,
> `receiveddispatchcount` int,   `subtaskmode` int,   `versiontype` int,
> `version` int,   `liftheight` int,   `codecheckstatus` int,
> `cameraworkmode` int,   `backrimstate` int,   `frontrimstate` int,
> `pathselectstate` int,   `codemisscount` int,   `groundcameraresult` int,
> `shelfcameraresult` int,   `softwarerespondframe` int,   `paramstate` int,
>   `pilotlamp` int,   `codecount` int,   `dist2waitpoint` int,
> `targetdistance` int,   `obstaclecount` int,   `obstacleframe` int,
> `cellcodex` int,   `cellcodey` int,   `cellangle` int,   `shelfqrcode` int,
>   `shelfqrangle` int,   `shelfqrx` int,   `shelfqry` int,
> `trackthetaerror` int,   `tracksideerror` int,   `trackfuseerror` int,
> `lifterangleerror` int,   `lifterheighterror` int,   `linearcmdspeed` int,
>   `angluarcmdspeed` int,   `liftercmdspeed` int,   `rotatorcmdspeed` int)
> PARTITIONED BY (`hour` string) STORED AS parquet;
>
>
> Thanks,
> Lei
> ------------------------------
> wangl...@geekplus.com.cn
>
>
> *From:* Jingsong Li <jingsongl...@gmail.com>
> *Date:* 2020-04-09 21:45
> *To:* wangl...@geekplus.com.cn
> *CC:* Jark Wu <imj...@gmail.com>; lirui <li...@apache.org>; user
> <user@flink.apache.org>
> *Subject:* Re: Re: fink sql client not able to read parquet format table
> Hi lei,
>
> Which hive version did you use?
> Can you share the complete hive DDL?
>
> Best,
> Jingsong Lee
>
> On Thu, Apr 9, 2020 at 7:15 PM wangl...@geekplus.com.cn <
> wangl...@geekplus.com.cn> wrote:
>
>>
>> I am using the newest 1.10 blink planner.
>>
>> Perhaps it is because of the method i used to write the parquet file.
>>
>> Receive kafka message, transform each message to a Java class Object,
>> write the Object to HDFS using StreamingFileSink, add  the HDFS path as a
>> partition of the hive table
>>
>> No matter what the order of the field description in  hive ddl statement,
>> the hive client will work, as long as  the field name is the same with Java
>> Object field name.
>> But flink sql client will not work.
>>
>> DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
>> final StreamingFileSink<RobotUploadData0101> sink;
>> sink = StreamingFileSink
>>     .forBulkFormat(new 
>> Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
>>         ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
>>
>> For example
>> RobotUploadData0101 has two fields:  robotId int, robotTime long
>>
>> CREATE TABLE `robotparquet`(  `robotid` int,  `robottime` bigint ) and
>> CREATE TABLE `robotparquet`(  `robottime` bigint,   `robotid` int)
>> is the same for hive client, but is different for flink-sql client
>>
>> It is an expected behavior?
>>
>> Thanks,
>> Lei
>>
>> ------------------------------
>> wangl...@geekplus.com.cn
>>
>>
>> *From:* Jark Wu <imj...@gmail.com>
>> *Date:* 2020-04-09 14:48
>> *To:* wangl...@geekplus.com.cn; Jingsong Li <jingsongl...@gmail.com>;
>> lirui <li...@apache.org>
>> *CC:* user <user@flink.apache.org>
>> *Subject:* Re: fink sql client not able to read parquet format table
>> Hi Lei,
>>
>> Are you using the newest 1.10 blink planner?
>>
>> I'm not familiar with Hive and parquet, but I know @Jingsong Li
>> <jingsongl...@gmail.com> and @li...@apache.org <li...@apache.org> are
>> experts on this. Maybe they can help on this question.
>>
>> Best,
>> Jark
>>
>> On Tue, 7 Apr 2020 at 16:17, wangl...@geekplus.com.cn <
>> wangl...@geekplus.com.cn> wrote:
>>
>>>
>>> Hive table stored as parquet.
>>>
>>> Under hive client:
>>> hive> select robotid from robotparquet limit 2;
>>> OK
>>> 1291097
>>> 1291044
>>>
>>>
>>> But under flink sql-client the result is 0
>>> Flink SQL> select robotid  from robotparquet limit 2;
>>>                   robotid
>>>                          0
>>>                          0
>>>
>>> Any insight on this?
>>>
>>> Thanks，
>>> Lei
>>>
>>>
>>>
>>> ------------------------------
>>> wangl...@geekplus.com.cn
>>>
>>>
>
> --
> Best, Jingsong Lee
>
>

-- 
Best, Jingsong Lee

Re: Re: fink sql client not able to read parquet format table

Reply via email to