Hi, We are using plugins from apache hudi which self defined a hive external table inputformat with:
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1' ) STORED AS INPUTFORMAT 'com.uber.hoodie.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://vungle2-dataeng/jun-test/stage20190424new' It works when query in spark-shell, however not in spark thrift server with same config, After debug found: spark-shell execution plan differ from spark thrift server 1. in spark-shell |== Physical Plan == TakeOrderedAndProject(limit=10, orderBy=[datestr#130 ASC NULLS FIRST,event_id#81 DESC NULLS LAST], output=[event_id#81,datestr#130,c#74L]) +- *(2) Filter (c#74L > 1) +- *(2) HashAggregate(keys=[event_id#81, datestr#130], functions=[count(1)]) +- Exchange hashpartitioning(event_id#81, datestr#130, 200) +- *(1) HashAggregate(keys=[event_id#81, datestr#130], functions=[partial_count(1)]) +- *HiveTableScan* [event_id#81, datestr#130], *HiveTableRelation* `default`.`hoodie_test_as_reportads_new`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [_hoodie_record_key#78, _hoodie_commit_time#79, _hoodie_commit_seqno#8... 2. in spark thrift server | == Physical Plan == TakeOrderedAndProject(limit=10, orderBy=[datestr#63 ASC NULLS FIRST,event_id#14 DESC NULLS LAST], output=[event_id#14,datestr#63,c#7L]) +- *(2) Filter (c#7L > 1) +- *(2) HashAggregate(keys=[event_id#14, datestr#63], functions=[count(1)]) +- Exchange hashpartitioning(event_id#14, datestr#63, 200) +- *(1) HashAggregate(keys=[event_id#14, datestr#63], functions=[partial_count(1)]) +- *(1) *FileScan* *parquet* default.hoodie_test_as_reportads_new[event_id#14,datestr#63] Batched: true, Format: *Parquet*, Location: PrunedInMemoryFileIndex[s3a://vungle2-dataeng/jun-test/stage20190424new/2019-04-24_08, s3 Looks like thrift server failed to recognize self-define inputformat. Any thoughts? Or can I config the FileScan to HiveTableScan? thanks~ Best, -- [image: vshapesaqua11553186012.gif] <https://vungle.com/> *Jun Zhu* Sr. Engineer I, Data +86 18565739171 [image: in1552694272.png] <https://www.linkedin.com/company/vungle> [image: fb1552694203.png] <https://facebook.com/vungle> [image: tw1552694330.png] <https://twitter.com/vungle> [image: ig1552694392.png] <https://www.instagram.com/vungle> Units 3801, 3804, 38F, C Block, Beijing Yintai Center, Beijing, China