Josh Rosen created SPARK-9785:
---------------------------------

             Summary: HashPartitioning guarantees / compatibleWith violate 
those methods' contracts
                 Key: SPARK-9785
                 URL: https://issues.apache.org/jira/browse/SPARK-9785
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen
            Priority: Blocker


HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but 
in other contexts the ordering of those expressions matters.  This is 
illustrated by the following regression test:

{code}
  test("HashPartitioning compatibility") {
    val expressions = Seq(Literal(2), Literal(3))
    // Consider two HashPartitionings that have the same _set_ of hash 
expressions but which are
    // created with different orderings of those expressions:
    val partitioningA = HashPartitioning(expressions, 100)
    val partitioningB = HashPartitioning(expressions.reverse, 100)
    // These partitionings are not considered equal:
    assert(partitioningA != partitioningB)
    // However, they both satisfy the same clustered distribution:
    val distribution = ClusteredDistribution(expressions)
    assert(partitioningA.satisfies(distribution))
    assert(partitioningB.satisfies(distribution))
    // Both partitionings are compatible with and guarantee each other:
    assert(partitioningA.compatibleWith(partitioningB))
    assert(partitioningB.compatibleWith(partitioningA))
    assert(partitioningA.guarantees(partitioningB))
    assert(partitioningB.guarantees(partitioningA))
    // Given all of this, we would expect these partitionings to compute the 
same hashcode for
    // any given row:
    def computeHashCode(partitioning: HashPartitioning): Int = {
      val hashExprProj = new 
InterpretedMutableProjection(partitioning.expressions, Seq.empty)
      hashExprProj.apply(InternalRow.empty).hashCode()
    }
    assert(computeHashCode(partitioningA) === computeHashCode(partitioningB))
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to