Hi Pat, I dont think hbase TTL is the issue because
1. I added the data 1 day back 2. I have a simlar server running for 1.5 million events each having 6k feature having data 10 days old and its working fine. Regards, Abhimanyu On Thu, Nov 23, 2017 at 10:58 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > My vague recollection is that HBase may mark things for removal but wait > for certain operations before they are compacted. If this is the case I’m > sure there is a way to get the correct count so this may be a question for > the HBase list. > > > On Nov 23, 2017, at 1:51 AM, Abhimanyu Nagrath <abhimanyunagr...@gmail.com> > wrote: > > Done the same as you have mentioned but problem still ersists > > > > > Regards, > Abhimanyu > > On Thu, Nov 23, 2017 at 2:53 PM, Александр Лактионов <lokotoc...@gmail.com > > wrote: > >> Hi Abhimanyu, >> >> try setting TTL for rows in your hbase table >> it can be set in hbase shell: >> alter 'pio_event:events_?', NAME => 'e', TTL => <seconds to live> >> and then do the following in the shell: >> major_compact 'pio_event:events_?' >> >> You can configure auto major compact: it will delete all the rows that >> are older than TTL >> >> 23 нояб. 2017 г., в 12:19, Abhimanyu Nagrath <abhimanyunagr...@gmail.com> >> написал(а): >> >> Hi, >> >> I am stuck at this point .How to identify the problem? >> >> >> Regards, >> Abhimanyu >> >> On Mon, Nov 20, 2017 at 11:08 AM, Abhimanyu Nagrath < >> abhimanyunagr...@gmail.com> wrote: >> >>> Hi , I am new to predictionIO V 0.12.0 (elasticsearch - 5.2.1 , hbase - >>> 1.2.6 , spark - 2.6.0) Hardware (244 GB RAM and Core - 32) . I have >>> uploaded near about 1 million events(each containing 30k features) . while >>> uploading I can see the size of hbase disk increasing and after all the >>> events got uploaded the size of hbase disk is 567GB. In order to verify I >>> ran the following commands >>> >>> - pio-shell --with-spark --conf spark.network.timeout=10000000 >>> --driver-memory 30G --executor-memory 21G --num-executors 7 >>> --executor-cores 3 --conf spark.driver.maxResultSize=4g --conf >>> spark.executor.heartbeatInterval=10000000 >>> - import org.apache.predictionio.data.store.PEventStore >>> - val eventsRDD = PEventStore.find(appName="test")(sc) >>> - val c = eventsRDD.count() >>> it shows event counts as 18944 >>> >>> After that from the script through which I uploaded the events, I >>> randomly queried with there events Id and I was getting that event. >>> >>> I don't know how to make sure that all the events uploaded by me are >>> there in the app. Any help is appreciated. >>> >>> >>> Regards, >>> Abhimanyu >>> >> >> >> > >