TableInputFormat doesn't read memstore. bq. I am inserting 10-20 entires only
You can query JMX and check the values for the following: flushedCellsCount flushedCellsSize FlushMemstoreSize_num_ops For Q2, there is no client side support for knowing where the data comes from. On Wed, Jun 28, 2017 at 8:15 PM, Sachin Jain <sachinjain...@gmail.com> wrote: > Hi, > > I have used TableInputFormat and newAPIHadoopRDD defined on sparkContext to > do a full table scan and get an rdd from it. > > Partial piece of code looks like this: > > sparkContext.newAPIHadoopRDD( > HBaseConfigurationUtil.hbaseConfigurationForReading(table.getName. > getNameWithNamespaceInclAsString, > hbaseQuorum, hBaseFilter, versionOpt, zNodeParentOpt), > classOf[TableInputFormat], > classOf[ImmutableBytesWritable], > classOf[Result] > ) > > > As per my understanding this full table scan works fast because we are > reading Hfiles directly. > > *Q1. Does that mean we are skipping memstores ? *If yes, then we should > have missed some data which is present in memstore because that data has > not been persisted to disk yet and hence not available via HFile. > > *In my local setup, I always get all the data*. Since I am inserting 10-20 > entires only I am assuming this is present in memstore when I am issuing > the full table scan spark job. > > Q2. When I issue a get command, Is there a way to know if the record is > served from blockCache, memstore or Hfile? > > Thanks > -Sachin >