A bug belongs to Hive or Elephant-bird

java8964 java8964 Fri, 08 Mar 2013 11:45:44 -0800

Hi, 
Hive 0.9.0 + Elephant-Bird 3.0.7
I faced a problem to use the elephant-bird with hive. I know what maybe cause 
this problem, but I don't know which side this bug belongs to. Let me know 
explain what is the problem.
If we define a google protobuf file, with field name like 'dateString' (the 
field contains an uppercase 'S'), then when I query the table like this: 
select dateString from table .............


I will get the following exception trace:
Caused by:
java.lang.RuntimeException: cannot find field datestring from
[org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@49aacd5f
 .....................        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)

       
at
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:96)

       
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)

       
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)

       
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)

       
at
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)

       
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

       
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

       
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

       
at
org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:73)

       
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

       
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)

       
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)

       
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)

       
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

       
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)

       
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)

       
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)

Here is the code for the method throws this error:
  public static StructField getStandardStructFieldRef(String fieldName,      
List<? extends StructField> fields) {    fieldName = fieldName.toLowerCase();   
 for (int i = 0; i < fields.size(); i++) {      if 
(fields.get(i).getFieldName().equals(fieldName)) {        return fields.get(i); 
     }    }    // For backward compatibility: fieldNames can also be integer 
Strings.    try {      int i = Integer.parseInt(fieldName);      if (i >= 0 && 
i < fields.size()) {        return fields.get(i);      }    } catch 
(NumberFormatException e) {      // ignore    }    throw new 
RuntimeException("cannot find field " + fieldName + " from "        + fields);  
  // return null;  }
I understand the problem happens because at this time, the fileName is 
"datestring" (all lowercase charcters), but the List<fields> contains the 
fieldName for that field is "dateString", and that is why the RuntimeException 
happened.
But I don't know which side this bug belongs to, or I want to know more inside 
detail about the Hive implementation contract.
>From this link: 
>https://cwiki.apache.org/Hive/user-faq.html#UserFAQ-AreHiveQLidentifiers%2528e.g.tablenames%252Ccolumnnames%252Cetc%2529casesensitive%253F
I know that in hive, the table name and column name should be case insensitive, 
so even though in my Query, I used "select dateString", the fieldName changed 
to "datestring" in the code, but the StructField of ObjectInspector from the 
elephant-bird return the EXACTLY fieldname, defined in the code, "dateString" 
in this case. of course, I can change my protof file to only use lowercase 
field name to bypass this bug, but my questions are:
1) If I implement my ObjectInspector, should I pay attention to the field name? 
Is it needed to be lowercase? 2) I would consider this as a bug of hive, right? 
If this line:
fieldName = fieldName.toLowerCase(); to lowercase the data,
then the comparing should also do it by lowering case by changing
if (fields.get(i).getFieldName().equals(fieldName))
to 
if (fields.get(i).getFieldName().toLowerCase().equals(fieldName))
right?
Thanks
Yong

A bug belongs to Hive or Elephant-bird

Reply via email to