Rohini Palaniswamy created PIG-5288: ---------------------------------------
Summary: Improve performance of PigTextRawBytesComparator Key: PIG-5288 URL: https://issues.apache.org/jira/browse/PIG-5288 Project: Pig Issue Type: Improvement Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.18.0 Came across this stacktrace for a group by when investigating a different performance issue. {code} "TezChild" #22 daemon prio=5 os_prio=0 tid=0x00007fa935495000 nid=0x7c3e runnable [0x00007fa91d354000] java.lang.Thread.State: RUNNABLE at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:412) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:579) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:802) at org.apache.hadoop.io.Text.decode(Text.java:412) at org.apache.hadoop.io.Text.decode(Text.java:389) at org.apache.hadoop.io.Text.toString(Text.java:280) at org.apache.pig.impl.io.NullableText.getValueAsPigType(NullableText.java:46) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextRawComparator.compare(PigTextRawComparator.java:95) at org.apache.tez.runtime.library.common.ValuesIterator.readNextKey(ValuesIterator.java:188) at org.apache.tez.runtime.library.common.ValuesIterator.access$300(ValuesIterator.java:47) at org.apache.tez.runtime.library.common.ValuesIterator$1$1.next(ValuesIterator.java:143) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.getNextTuple(POShuffleTezLoad.java:218) {code} Conversion to String and comparing is a wastage (result of extending from PigTextRawBytesComparator which is used in sorting). PigCharArrayWritableComparator which is the equivalent used in mapreduce does not. It directly compares it as a Text. -- This message was sent by Atlassian JIRA (v6.4.14#64029)