shabnam perween created IMPALA-7278: ---------------------------------------
Summary: distinct clause is not working as expected with custom UDFs Key: IMPALA-7278 URL: https://issues.apache.org/jira/browse/IMPALA-7278 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 2.8.0 Reporter: shabnam perween Distinct clause when executed with custom UDF returns unexpected results. Custom UDF Definition: udf.h file: ========== #ifndef IMPALA_UDF_SAMPLE_UDF_H #define IMPALA_UDF_SAMPLE_UDF_H #include "udf.h" using namespace impala_udf; #ifdef __cplusplus extern "C" { #endif StringVal udf_clear(FunctionContext* context, StringVal& sInput); #ifdef __cplusplus } #endif #endif udf.cpp: ======== #include "clear.h" StringVal udf_clear( FunctionContext* context, StringVal& sInput /* String to encrypt */ ) { unsigned char* pReturnData = context->Allocate( 100 ); memset( pReturnData, NULL, 100); memcpy(pReturnData, sInput.ptr, sInput.len ); StringVal sResult( pReturnData ); sResult.len = sInput.len; context->Free( (uint8_t*)pReturnData ); return sResult; } CMakeLists.txt: =============== project (clear) ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp ) TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a ) SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so") SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "") INSTALL ( TARGETS clear2.8_RHEL DESTINATION . ) Query Syntax: CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields terminated by ',' stored as textfile; LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear; Query: describe clear +------+--------+---------+ | name | type | comment | +------+--------+---------+ | c1 | string | | | c2 | string | | +------+--------+---------+ Fetched 2 row(s) in 0.04s select * from clear; +---------+---------+ | c1 | c2 | +---------+---------+ | 1111111 | 1111111 | | 1111111 | 1111111 | | 222222 | 222222 | | 444444 | 444444 | | 222222 | 222222 | | 3333333 | 3333333 | | 3333333 | 3333333 | +---------+---------+ Fetched 7 row(s) in 0.14s select distinct udf_clear(c1),c2 from clear; +-----------------------+---------+ | default.udf_clear(c1) | c2 | +-----------------------+---------+ | {color:#d04437}*222222* {color}| 444444 | <== this should be *444444* | 222222 | 222222 | | 3333333 | 3333333 | | 1111111 | 1111111 | +-----------------------+---------+ Fetched 4 row(s) in 0.24s Expected result: select distinct c1,c2 from clear; +---------+---------+ | c1 | c2 | +---------+---------+ | 444444 | 444444 | | 222222 | 222222 | | 3333333 | 3333333 | | 1111111 | 1111111 | +---------+---------+ Fetched 4 row(s) in 0.25s -- This message was sent by Atlassian JIRA (v7.6.3#76005)