Re: What does the ORC SERDE do

Jörn Franke Sun, 13 May 2018 07:43:02 -0700

In detail you can check the source code, but a Serde needs to translate an 
object to a Hive object and vice versa. Usually this is very simple (simply 
passing the object or create A HiveDecimal etc). It also provides an 
ObjectInspector that basically describes an object in more detail (eg to be 
processed by an UDF). For example, it can tell you precision and scale of an 
objects. In case of ORC it describes also how a bunch of objects (vectorized) 
can be mapped to hive objects and the other way around. Furthermore, it 
provides statistics and provides means to deal with partitions as well as table 
properties (!=input/outputformat properties).
Although it sounds complex, hive provides most of the functionality so 
implementing a serde is most of the times easy.


> On 13. May 2018, at 16:34, 侯宗田 <[email protected]> wrote:
> 
> Hello,everyone
>   I know the json serde turn fields in a row to a json format, csv serde turn 
> it to csv format with their serdeproperties. But I wonder what the orc serde 
> does when I choose to stored as orc file format. And why is there still 
> escaper, separator in orc serdeproperties. Also with RC Parquet. I think they 
> are just about how to stored and compressed with their input and output 
> format respectively, but I don’t know what their serde does, can anyone give 
> some hint?

Re: What does the ORC SERDE do

Reply via email to