[ 
https://issues.apache.org/jira/browse/DRILL-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-4184.
----------------------------------------
    Resolution: Fixed

Fixed in the scope of DRILL-6094

> Drill does not support Parquet DECIMAL values in variable length BINARY fields
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-4184
>                 URL: https://issues.apache.org/jira/browse/DRILL-4184
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.4.0
>         Environment: Windows 7 Professional, Java 1.8.0_66
>            Reporter: Dave Oshinsky
>            Priority: Major
>
> Encoding a DECIMAL logical type in Parquet using the variable length BINARY 
> primitive type is not supported by Drill as of versions 1.3.0 and 1.4.0.  The 
> problem first surfaces with the ClassCastException shown below, but fixing 
> the immediate cause of the exception is not sufficient to support this 
> combination (DECIMAL, BINARY) in a Parquet file.
> In Drill, DECIMAL is currently assumed to be INT32, INT64, INT96, or 
> FIXED_LEN_BINARY_ARRAY.  Are there any plans to support DECIMAL with variable 
> length BINARY?  Avro definitely supports encoding DECIMAL in variable length 
> bytes (see https://avro.apache.org/docs/current/spec.html#Decimal), but this 
> support in Parquet is less clear.
> Selecting on a BINARY DECIMAL field in a parquet file throws an exception as 
> shown below (java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector).  The successful query at 
> bottom selected on a string field in the same file.
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenrows.parquet` where acct_no=70000020;
> org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet 
> recor
> d reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message sbi.acct_mstr {
>   required binary ACCT_NO (DECIMAL(20,0));
>   optional binary SF_NO (UTF8);
>   optional binary LF_NO (UTF8);
>   optional binary BRANCH_NO (DECIMAL(20,0));
>   optional binary INTRO_CUST_NO (DECIMAL(20,0));
>   optional binary INTRO_ACCT_NO (DECIMAL(20,0));
>   optional binary INTRO_SIGN (UTF8);
>   optional binary TYPE (UTF8);
>   optional binary OPR_MODE (UTF8);
>   optional binary CUR_ACCT_TYPE (UTF8);
>   optional binary TITLE (UTF8);
>   optional binary CORP_CUST_NO (DECIMAL(20,0));
>   optional binary APLNDT (UTF8);
>   optional binary OPNDT (UTF8);
>   optional binary VERI_EMP_NO (DECIMAL(20,0));
>   optional binary VERI_SIGN (UTF8);
>   optional binary MANAGER_SIGN (UTF8);
>   optional binary CURBAL (DECIMAL(8,2));
>   optional binary STATUS (UTF8);
> }
> , metadata: 
> {parquet.avro.schema={"type":"record","name":"acct_mstr","namespace"
> :"sbi","fields":[{"name":"ACCT_NO","type":{"type":"bytes","logicalType":"decimal
> ","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_co
> lumn_class":"java.math.BigDecimal","cv_connection":"oracle.jdbc.driver.T4CConnec
> tion","cv_currency":true,"cv_def_writable":false,"cv_nullable":0,"cv_precision":
> 20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_s
> ubscript":1,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}},{"name":"SF_
> NO","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":tru
> e,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":fal
> se,"cv_nullable":1,"cv_precision":10,"cv_read_only":false,"cv_scale":0,"cv_searc
> hable":true,"cv_signed":true,"cv_subscript":2,"cv_type":12,"cv_typename":"VARCHA
> R2","cv_writable":true}]},{"name":"LF_NO","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":10,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":3,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"BRANCH_
> NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20,"scale
> ":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.math.
> BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"cv_preci
> sion":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true
> ,"cv_subscript":4,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}]},{"nam
> e":"INTRO_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","preci
> sion":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cla
> ss":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullab
> le":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"
> cv_signed":true,"cv_subscript":5,"cv_type":2,"cv_typename":"NUMBER","cv_writable
> ":true}]},{"name":"INTRO_ACCT_NO","type":["null",{"type":"bytes","logicalType":"
> decimal","precision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false
> ,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":f
> alse,"cv_nullable":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_sea
> rchable":true,"cv_signed":true,"cv_subscript":6,"cv_type":2,"cv_typename":"NUMBE
> R","cv_writable":true}]},{"name":"INTRO_SIGN","type":["null",{"type":"string","c
> v_auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String"
> ,"cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"c
> v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr
> ipt":7,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"TYPE
> ","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":true,
> "cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":false
> ,"cv_nullable":1,"cv_precision":2,"cv_read_only":false,"cv_scale":0,"cv_searchab
> le":true,"cv_signed":true,"cv_subscript":8,"cv_type":12,"cv_typename":"VARCHAR2"
> ,"cv_writable":true}]},{"name":"OPR_MODE","type":["null",{"type":"string","cv_au
> to_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv
> _currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":2,"cv_re
> ad_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript"
> :9,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"CUR_ACCT
> _TYPE","type":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":
> true,"cv_column_class":"java.lang.String","cv_currency":false,"cv_def_writable":
> false,"cv_nullable":1,"cv_precision":4,"cv_read_only":false,"cv_scale":0,"cv_sea
> rchable":true,"cv_signed":true,"cv_subscript":10,"cv_type":12,"cv_typename":"VAR
> CHAR2","cv_writable":true}]},{"name":"TITLE","type":["null",{"type":"string","cv
> _auto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String",
> "cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":30,"c
> v_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscr
> ipt":11,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"COR
> P_CUST_NO","type":["null",{"type":"bytes","logicalType":"decimal","precision":20
> ,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"jav
> a.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nullable":1,"c
> v_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signe
> d":true,"cv_subscript":12,"cv_type":2,"cv_typename":"NUMBER","cv_writable":true}
> ]},{"name":"APLNDT","type":["null",{"type":"string","cv_auto_incr":false,"cv_cas
> e_sensitive":false,"cv_column_class":"java.sql.Timestamp","cv_currency":false,"c
> v_def_writable":false,"cv_nullable":1,"cv_precision":0,"cv_read_only":false,"cv_
> scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":13,"cv_type":93,"c
> v_typename":"DATE","cv_writable":true}]},{"name":"OPNDT","type":["null",{"type":
> "string","cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_class":"java.
> sql.Timestamp","cv_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_p
> recision":0,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":t
> rue,"cv_subscript":14,"cv_type":93,"cv_typename":"DATE","cv_writable":true}]},{"
> name":"VERI_EMP_NO","type":["null",{"type":"bytes","logicalType":"decimal","prec
> ision":20,"scale":0,"cv_auto_incr":false,"cv_case_sensitive":false,"cv_column_cl
> ass":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":false,"cv_nulla
> ble":1,"cv_precision":20,"cv_read_only":false,"cv_scale":0,"cv_searchable":true,
> "cv_signed":true,"cv_subscript":15,"cv_type":2,"cv_typename":"NUMBER","cv_writab
> le":true}]},{"name":"VERI_SIGN","type":["null",{"type":"string","cv_auto_incr":f
> alse,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","cv_currency"
> :false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_read_only":f
> alse,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript":16,"cv_ty
> pe":12,"cv_typename":"VARCHAR2","cv_writable":true}]},{"name":"MANAGER_SIGN","ty
> pe":["null",{"type":"string","cv_auto_incr":false,"cv_case_sensitive":true,"cv_c
> olumn_class":"java.lang.String","cv_currency":false,"cv_def_writable":false,"cv_
> nullable":1,"cv_precision":1,"cv_read_only":false,"cv_scale":0,"cv_searchable":t
> rue,"cv_signed":true,"cv_subscript":17,"cv_type":12,"cv_typename":"VARCHAR2","cv
> _writable":true}]},{"name":"CURBAL","type":["null",{"type":"bytes","logicalType"
> :"decimal","precision":8,"scale":2,"cv_auto_incr":false,"cv_case_sensitive":fals
> e,"cv_column_class":"java.math.BigDecimal","cv_currency":true,"cv_def_writable":
> false,"cv_nullable":1,"cv_precision":8,"cv_read_only":false,"cv_scale":2,"cv_sea
> rchable":true,"cv_signed":true,"cv_subscript":18,"cv_type":2,"cv_typename":"NUMB
> ER","cv_writable":true}]},{"name":"STATUS","type":["null",{"type":"string","cv_a
> uto_incr":false,"cv_case_sensitive":true,"cv_column_class":"java.lang.String","c
> v_currency":false,"cv_def_writable":false,"cv_nullable":1,"cv_precision":1,"cv_r
> ead_only":false,"cv_scale":0,"cv_searchable":true,"cv_signed":true,"cv_subscript
> ":19,"cv_type":12,"cv_typename":"VARCHAR2","cv_writable":true}]}]}}}, blocks: 
> [B
> lockMetaData{10, 1281 [ColumnMetaData{SNAPPY [ACCT_NO] BINARY  [BIT_PACKED, 
> PLAI
> N], 4}, ColumnMetaData{SNAPPY [SF_NO] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY
> ], 88}, ColumnMetaData{SNAPPY [LF_NO] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY
> ], 163}, ColumnMetaData{SNAPPY [BRANCH_NO] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTI
> ONARY], 241}, ColumnMetaData{SNAPPY [INTRO_CUST_NO] BINARY  [RLE, BIT_PACKED, 
> PL
> AIN_DICTIONARY], 298}, ColumnMetaData{SNAPPY [INTRO_ACCT_NO] BINARY  [RLE, 
> BIT_P
> ACKED, PLAIN_DICTIONARY], 364}, ColumnMetaData{SNAPPY [INTRO_SIGN] BINARY  
> [RLE,
>  BIT_PACKED, PLAIN_DICTIONARY], 421}, ColumnMetaData{SNAPPY [TYPE] BINARY  
> [RLE,
>  BIT_PACKED, PLAIN_DICTIONARY], 478}, ColumnMetaData{SNAPPY [OPR_MODE] BINARY 
>  [
> RLE, BIT_PACKED, PLAIN_DICTIONARY], 538}, ColumnMetaData{SNAPPY 
> [CUR_ACCT_TYPE]
> BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 598}, ColumnMetaData{SNAPPY 
> [TITLE]
>  BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 658}, ColumnMetaData{SNAPPY 
> [CORP_
> CUST_NO] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 736}, 
> ColumnMetaData{SNAPP
> Y [APLNDT] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 802}, 
> ColumnMetaData{SNA
> PPY [OPNDT] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 919}, 
> ColumnMetaData{SN
> APPY [VERI_EMP_NO] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1036}, 
> ColumnMet
> aData{SNAPPY [VERI_SIGN] BINARY  [RLE, BIT_PACKED, PLAIN_DICTIONARY], 1093}, 
> Col
> umnMetaData{SNAPPY [MANAGER_SIGN] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY], 1
> 150}, ColumnMetaData{SNAPPY [CURBAL] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONARY]
> , 1207}, ColumnMetaData{SNAPPY [STATUS] BINARY  [RLE, BIT_PACKED, 
> PLAIN_DICTIONA
> RY], 1270}]}]}
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
> .handleAndRaise(ParquetRecordReader.java:346)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
> .setup(ParquetRecordReader.java:339)
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:1
> 01)
>         at 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(
> ParquetScanBatchCreator.java:168)
>         at 
> org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(
> ParquetScanBatchCreator.java:56)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:151)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCr
> eator.java:131)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreat
> or.java:174)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreat
> or.java:105)
>         at 
> org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.j
> ava:79)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExec
> utor.java:230)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable
> .java:38)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: 
> org.apache.drill.exec.vector.Decimal28SparseVector cannot be cast to 
> org.apache.drill.exec.vector.VariableWidthVector
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColu
> mn.<init>(VarLengthValuesColumn.java:44)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnRead
> ers$Decimal28Column.<init>(VarLengthColumnReaders.java:52)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory
> .getReader(ColumnReaderFactory.java:178)
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader
> .setup(ParquetRecordReader.java:319)
>         ... 22 more
> Error: SYSTEM ERROR: ClassCastException: 
> org.apache.drill.exec.vector.Decimal28S
> parseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector
> Fragment 0:0
> [Error Id: 22bfa8dd-1129-4300-9449-409e96d6c800 on 
> DaveOshinsky-PC.gp.cv.commvau
> lt.com:31010] (state=,code=0)
> 0: jdbc:drill:zk=local> select count(*) from 
> dfs.`c:/dao/DBArchivePredictor/tenr
> ows.parquet` where opr_mode='JO';
> +---------+
> | EXPR$0  |
> +---------+
> | 10      |
> +---------+
> 1 row selected (0.406 seconds)
> 0: jdbc:drill:zk=local>
> The immediate cause of this exception is that Drill, in 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader, 
> assumes that all BINARY values are encoded in VariableWidthVectors.  For 
> BINARY DECIMAL, this is not true, as for example Decimal28SparseVector is a 
> FixedWidthVector, not a VariableWidthVector.   The assumption that DECIMAL is 
> not encoded in variable length BINARY is found in a number of other places in 
> the Drill code, including:
> org.apache.drill.exec.store.parquet.columnreaders.ColumnReaderFactory only 
> contains logic to handle DECIMAL with INT32, INT64, INT96, or 
> FIXED_LEN_BYTE_ARRAY.  BINARY is not supported with DECIMAL.
> org.apache.drill.exec.store.parquet.columnreaders.NullableFixedByteAlignedReaders
>  does not support a nullable reader for BINARY in getNullableColumnReader 
> method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to