Thanks Vladimir. We will try this out soon. Regards, Gautam
On Mon, Jun 1, 2015 at 12:22 AM, Vladimir Rodionov <vladrodio...@gmail.com> wrote: > InternalScan has ctor from Scan object > > See https://issues.apache.org/jira/browse/HBASE-12720 > > You can instantiate InternalScan from Scan, set checkOnlyMemStore, then > open RegionScanner, but the best approach is > to cache data on write and run regular RegionScanner from memstore and > block cache. > > best, > -Vlad > > > > > On Sun, May 31, 2015 at 11:45 PM, Anoop John <anoop.hb...@gmail.com> > wrote: > > > If your scan is having a time range specified in it, HBase internally > will > > check this against the time range of files etc and will avoid those which > > are clearly out of your interested time range. You dont have to do any > > thing for this. Make sure you set the TimeRange for ur read > > > > -Anoop- > > > > On Mon, Jun 1, 2015 at 12:09 PM, ramkrishna vasudevan < > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > We have a postScannerOpen hook in the CP but that may not give you a > > direct > > > access to know which one are the internal scanners on the Memstore and > > > which one are on the store files. But this is possible but we may need > to > > > add some new hooks at this place where we explicitly add the internal > > > scanners required for a scan. > > > > > > But still a general question - are you sure that your data will be only > > in > > > the memstore and that the latest data would not have been flushed by > that > > > time from your memstore to the Hfiles. I see that your scenario is > write > > > centric and how can you guarentee your data can be in memstore only? > > > Though your time range may say it is the latest data (may be 10 to 15 > > min) > > > but you should be able to configure your memstore flushing in such a > way > > > that there are no flushes happening for the latest data in that 10 to > 15 > > > min time. Just saying my thoughts here. > > > > > > > > > > > > > > > On Mon, Jun 1, 2015 at 11:46 AM, Gautam Borah <gbo...@appdynamics.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > Here is our use case, > > > > > > > > We have a very write heavy cluster. Also we run periodic end point co > > > > processor based jobs that operate on the data written in the last > 10-15 > > > > mins, every 10 minute. > > > > > > > > Is there a way to only query in the MemStore from the end point > > > > co-processor? The periodic job scans for data using a time range. We > > > would > > > > like to implement a simple logic, > > > > > > > > a. if query time range is within MemStore's TimeRangeTracker, then > > query > > > > only memstore. > > > > b. If end Time of the query time range is within MemStore's > > > > TimeRangeTracker, but query start Time is outside MemStore's > > > > TimeRangeTracker (memstore flush happened), then query both MemStore > > and > > > > Files. > > > > c. If start time and end time of the query is outside of MemStore > > > > TimeRangeTracker we query only files. > > > > > > > > The incoming data is time series and we do not allow old data (out of > > > sync > > > > from clock) to come into the system(HBase). > > > > > > > > Cloudera has a scanner > > org.apache.hadoop.hbase.regionserver.InternalScan, > > > > that has methods like checkOnlyMemStore() and checkOnlyStoreFiles(). > Is > > > > this available in Trunk? > > > > > > > > Also, how do I access the Memstore for a Column Family in the end > point > > > > co-processor from CoprocessorEnvironment? > > > > > > > > > >