Hi Darin, In SQL we have finer grained information about partitioning, so we don't use the RDD Partitioner. Here's a notebook <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/3633335638369146/2840265927289860/latest.html>that walks through what we do expose and how it is used by the query planner.
Michael On Tue, Sep 20, 2016 at 11:22 AM, McBeath, Darin W (ELS-STL) < d.mcbe...@elsevier.com> wrote: > I’m using Spark 2.0. > > I’ve created a dataset from a parquet file and repartition on one of the > columns (docId) and persist the repartitioned dataset. > > val om = ds.repartition($"docId”).persist(StorageLevel.MEMORY_AND_DISK) > > When I try to confirm the partitioner, with > > om.rdd.partitioner > > I get > > Option[org.apache.spark.Partitioner] = None > > I would have thought it would be HashPartitioner. > > Does anyone know why this would be None and not HashPartitioner? > > Thanks. > > Darin. > > >