Also, I forgot to mention, I'm using Hive v3.1.2.
On 2022/05/16 03:09:19 Julien Phalip wrote: > Hi, > > I've noticed an odd behavior with the 'hive.io.file.readcolumn.names' conf > property. > > Imagine a simple table "mytable" with two fields: "text" and "number". > > - If you run the query "SELECT * FROM mytable", then the > "hive.io.file.readcolumn.names" has the value: "text,number". Makes sense > so far. > - If you run the query "SELECT text FROM mytable", then the > "hive.io.file.readcolumn.names" has the value: "text". Still makes sense. > > However, if you add a predicate (WHERE clause), then the behavior of that > property seems strange to me: > > - If you run the query "SELECT * FROM mytable WHERE number = 999", then the > "hive.io.file.readcolumn.names" has the value: "text". The "number" column > is missing from the property. > - If you run the query "SELECT number FROM mytable WHERE number = 999", > then the "hive.io.file.readcolumn.names" has the value: "" (empty string). > The "number" column is still missing from the property. > > In other terms, it looks like if a column is part of a predicate, then it > is omitted from the "hive.io.file.readcolumn.names" property. Do you know > why that is? > > I'm writing a custom StorageHandler and so I would need to know exactly > what columns the user is requesting. Is there a way to consistently > retrieve all the requested columns either from the configuration or from > within the InputFormat class, even when there is a WHERE clause? > > Thanks, > > Julien >