no. but I did run major compaction. As I explained initially, I disabled the table so I could change its TTL, then re-enabled it then ran major compaction so it would clean up the expired data due to the TTL change.
-eran On Wed, Jul 6, 2011 at 02:43, Ted Yu <[email protected]> wrote: > Eran: > You didn't run hbck during the enabling of gs_raw_events table, right ? > > I saw: > 2011-06-29 16:43:50,395 DEBUG > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction (major) > requested for > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > because User-triggered major compaction; priority=1, compaction queue > size=1248 > > The above might be related to: > >> 2011-06-29 16:43:57,880 INFO > org.apache.hadoop.hbase. > master.AssignmentManager: Region has been > PENDING_OPEN for too long, reassigning > > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > Thanks > > On Tue, Jul 5, 2011 at 7:19 AM, Ted Yu <[email protected]> wrote: > > > Eran: > > I logged https://issues.apache.org/jira/browse/HBASE-4060 for you. > > > > > > On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu <[email protected]> wrote: > > > >> Thanks for the understanding. > >> > >> Can you log a JIRA and put your ideas below in it ? > >> > >> > >> > >> On Jul 4, 2011, at 12:42 AM, Eran Kutner <[email protected]> wrote: > >> > >> > Thanks for the explanation Ted, > >> > > >> > I will try to apply HBASE-3789 and hope for the best but my > >> understanding is > >> > that it doesn't really solve the problem, it only reduces the > >> probability of > >> > it happening, at least in one particular scenario. I would hope for a > >> more > >> > robust solution. > >> > My concern is that the region allocation process seems to rely too > much > >> on > >> > timing considerations and doesn't seem to take enough measures to > >> guarantee > >> > conflicts do not occur. I understand that in a distributed > environment, > >> when > >> > you don't get a timely response from a remote machine you can't know > for > >> > sure if it did or did not receive the request, however there are > things > >> that > >> > can be done to mitigate this and reduce the conflict time > significantly. > >> For > >> > example, when I run dbck it knows that some regions are multiply > >> assigned, > >> > the master could do the same and try to resolve the conflict. Another > >> > approach would be to handle late responses, even if the response from > >> the > >> > remote machine arrives after it was assumed to be dead the master > should > >> > have enough information to know it had created a conflict by assigning > >> the > >> > region to another server. An even better solution, I think, is for the > >> RS to > >> > periodically test that it is indeed the rightful owner of every region > >> it > >> > holds and relinquish control over the region if it's not. > >> > Obviously a state where two RSs hold the same region is pathological > and > >> can > >> > lead to data loss, as demonstrated in my case. The system should be > able > >> to > >> > actively protect itself against such a scenario. It probably doesn't > >> need > >> > saying but there is really nothing worse for a data storage system > than > >> data > >> > loss. > >> > > >> > In my case the problem didn't happen in the initial phase but after > >> > disabling and enabling a table with about 12K regions. > >> > > >> > -eran > >> > > >> > > >> > > >> > On Sun, Jul 3, 2011 at 23:49, Ted Yu <[email protected]> wrote: > >> > > >> >> Let me try to answer some of your questions. > >> >> The two paragraphs below were written along my reasoning which is in > >> >> reverse > >> >> order of the actual call sequence. > >> >> > >> >> For #4 below, the log indicates that the following was executed: > >> >> private void assign(final RegionState state, final boolean > >> setOfflineInZK, > >> >> final boolean forceNewPlan) { > >> >> for (int i = 0; i < this.maximumAssignmentAttempts; i++) { > >> >> if (setOfflineInZK && !*setOfflineInZooKeeper*(state)) return; > >> >> > >> >> The above was due to the timeout which you noted in #2 which would > have > >> >> caused > >> >> TimeoutMonitor.chore() to run this code (line 1787) > >> >> > >> >> for (Map.Entry<HRegionInfo, Boolean> e: assigns.entrySet()){ > >> >> assign(e.getKey(), false, e.getValue()); > >> >> } > >> >> > >> >> This means there is lack of coordination between > >> >> assignmentManager.TimeoutMonitor and OpenedRegionHandler > >> >> > >> >> The reason I mention HBASE-3789 is that it is marked as Incompatible > >> change > >> >> and is in TRUNK already. > >> >> The application of HBASE-3789 to 0.90 branch would change the > behavior > >> >> (timing) of region assignment. > >> >> > >> >> I think it makes sense to evaluate the effect of HBASE-3789 in 0.90.4 > >> >> > >> >> BTW were the incorrect region assignments observed for a table with > >> >> multiple > >> >> initial regions ? > >> >> If so, I have HBASE-4010 in TRUNK which speeds up initial region > >> assignment > >> >> by about 50%. > >> >> > >> >> Cheers > >> >> > >> >> On Sun, Jul 3, 2011 at 12:02 PM, Eran Kutner <[email protected]> wrote: > >> >> > >> >>> Ted, > >> >>> So if I understand correctly the the theory is that because of the > >> issue > >> >>> fixed in HBASE-3789 the master took too long to detect that the > region > >> >> was > >> >>> successfully opened by the first server so it forced closed it and > >> >>> transitioned to a second server, but there are a few things about > this > >> >>> scenario I don't understand, probably because I don't know enough > >> about > >> >> the > >> >>> inner workings of the region transition process and would appreciate > >> it > >> >> if > >> >>> you can help me understand: > >> >>> 1. The RS opened the region at 16:37:49. > >> >>> 2. The master started handling the opened event at 16:39:54 - this > >> delay > >> >>> can > >> >>> probably be explained by HBASE-3789 > >> >>> 3. At 16:39:54 the master log says: Opened region > gs_raw_events,..... > >> on > >> >>> hadoop1-s05.farm-ny.gigya.com > >> >>> 4. Then at 16:40:00 the master log says: > >> master:60000-0x13004a31d7804c4 > >> >>> Creating (or updating) unassigned node for > >> 584dac5cc70d8682f71c4675a843c3 > >> >>> 09 with OFFLINE state - why did it decide to take the region offline > >> >> after > >> >>> learning it was successfully opened? > >> >>> 5. Then it tries to reopen the region on hadoop1-s05, which > indicates > >> in > >> >>> its > >> >>> log that the open request failed because the region was already open > - > >> >> why > >> >>> didn't the master use that information to learn that the region was > >> >> already > >> >>> open? > >> >>> 6. At 16:43:57 the master decides the region transition timed out > and > >> >>> starts > >> >>> forcing the transition - HBASE-3789 again? > >> >>> 7. Now the master forces the transition of the region to hadoop1-s02 > >> but > >> >>> there is no sign of that on hadoop1-s05 - why doesn't the old RS > >> >>> (hadoop1-s05) detect that it is no longer the master and > relinquishes > >> >>> control of the region? > >> >>> > >> >>> Thanks. > >> >>> > >> >>> -eran > >> >>> > >> >>> > >> >>> > >> >>> On Sun, Jul 3, 2011 at 20:09, Ted Yu <[email protected]> wrote: > >> >>> > >> >>>> HBASE-3789 should have sped up region assignment. > >> >>>> The patch for 0.90 is attached to that JIRA. > >> >>>> > >> >>>> You may prudently apply that patch. > >> >>>> > >> >>>> Regards > >> >>>> > >> >>>> On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner <[email protected]> > wrote: > >> >>>> > >> >>>>> Thanks Ted, but, as stated before, I'm already using 0.90.3, so > >> >> either > >> >>>> it's > >> >>>>> not fixed or it's not the same thing. > >> >>>>> > >> >>>>> -eran > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> On Sun, Jul 3, 2011 at 17:27, Ted Yu <[email protected]> wrote: > >> >>>>> > >> >>>>>> Eran: > >> >>>>>> I was thinking of this: > >> >>>>>> HBASE-3789 Cleanup the locking contention in the master > >> >>>>>> > >> >>>>>> though it doesn't directly handle 'PENDING_OPEN for too long' > case. > >> >>>>>> > >> >>>>>> https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 > and > >> >>>>> actually > >> >>>>>> close to the symptom you described. > >> >>>>>> > >> >>>>>> On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner <[email protected]> > >> >> wrote: > >> >>>>>> > >> >>>>>>> It does seem that both servers opened the same region around the > >> >>> same > >> >>>>>> time. > >> >>>>>>> The region was offline because I disabled the table so I can > >> >> change > >> >>>> its > >> >>>>>>> TTL. > >> >>>>>>> > >> >>>>>>> Here is the log from haddop1-s05 : > >> >>>>>>> 2011-06-29 16:37:12,576 INFO > >> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received > >> >>> request > >> >>>> to > >> >>>>>>> open > >> >>>>>>> region: > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:37:12,680 DEBUG > >> >>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > >> >>>>>> Processing > >> >>>>>>> open of > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:37:12,680 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x33004a38816050b Attempting to transition > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:37:12,711 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x33004a38816050b Successfully transitioned > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:37:12,711 DEBUG > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Opening region: REGION => {NAME => > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.', > >> >>>>>>> STARTKEY => 'GSLoad_1308518553_168_WEB204', ENDKEY => > >> >>>>>>> 'GSLoad_1308518810_1249_WEB204', ENCODED => > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309, TABLE => {{NAME => > >> >>> 'gs_raw_events', > >> >>>>>>> FAMILIES => [{NAME => 'events', BLOOMFILTER => 'NONE', > >> >>>>> REPLICATION_SCOPE > >> >>>>>> => > >> >>>>>>> '1', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '604800', > >> >>>> BLOCKSIZE > >> >>>>> => > >> >>>>>>> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} > >> >>>>>>> 2011-06-29 16:37:12,711 DEBUG > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Instantiated > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:37:12,847 DEBUG > >> >>>>> org.apache.hadoop.hbase.regionserver.Store: > >> >>>>>>> loaded > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360, > >> >>>>>>> isReference=false, isBulkLoadResult=false, seqid=1162228062, > >> >>>>>>> majorCompaction=false > >> >>>>>>> 2011-06-29 16:37:12,848 INFO > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Onlined > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.; > >> >>>>>>> next sequenceid=1162228063 > >> >>>>>>> 2011-06-29 16:37:12,849 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x33004a38816050b Attempting to transition > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:37:12,875 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x33004a38816050b Successfully transitioned > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:37:12,951 INFO > >> >>>>> org.apache.hadoop.hbase.catalog.MetaEditor: > >> >>>>>>> Updated row > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> in region .META.,,1 with server= > >> >>> hadoop1-s05.farm-ny.gigya.com:60020, > >> >>>>>>> startcode=1307349217076 > >> >>>>>>> 2011-06-29 16:37:12,951 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x33004a38816050b Attempting to transition > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>> 2011-06-29 16:37:12,964 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x33004a38816050b Successfully transitioned > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>> 2011-06-29 16:37:12,964 DEBUG > >> >>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > >> >>>> Opened > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:40:00,878 INFO > >> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received > >> >>> request > >> >>>> to > >> >>>>>>> open > >> >>>>>>> region: > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:40:00,878 DEBUG > >> >>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > >> >>>>>> Processing > >> >>>>>>> open of > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:40:01,079 WARN > >> >>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > >> >>>>> Attempted > >> >>>>>>> open of > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> but already online on this server > >> >>>>>>> 2011-06-29 16:43:50,395 DEBUG > >> >>>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: > >> >> Compaction > >> >>>>>> (major) > >> >>>>>>> requested for > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> because User-triggered major compaction; priority=1, compaction > >> >>> queue > >> >>>>>>> size=1248 > >> >>>>>>> 2011-06-29 20:19:49,906 INFO > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Starting major compaction on region > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 20:19:49,906 INFO > >> >>>>> org.apache.hadoop.hbase.regionserver.Store: > >> >>>>>>> Started compaction of 1 file(s) in cf=events into > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp, > >> >>>>>>> seqid=1162228062, totalSize=98.3m > >> >>>>>>> 2011-06-29 20:19:49,906 DEBUG > >> >>>>> org.apache.hadoop.hbase.regionserver.Store: > >> >>>>>>> Compacting > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360, > >> >>>>>>> keycount=6882816, bloomtype=NONE, size=98.3m > >> >>>>>>> 2011-06-29 20:19:59,920 INFO > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> completed compaction on region > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> after 10sec > >> >>>>>>> > >> >>>>>>> And here is the one from hadoop1-s02: > >> >>>>>>> 2011-06-29 16:43:57,935 INFO > >> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received > >> >>> request > >> >>>> to > >> >>>>>>> open > >> >>>>>>> region: > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:43:58,990 DEBUG > >> >>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > >> >>>>>> Processing > >> >>>>>>> open of > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:43:58,990 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x23004a31d8904de Attempting to transition > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:43:59,002 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x23004a31d8904de Successfully transitioned > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:43:59,002 DEBUG > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Opening region: REGION => {NAME => > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.', > >> >>>>>>> STARTKEY => 'GSLoad_1308518553_168_WEB204', ENDKEY => > >> >>>>>>> 'GSLoad_1308518810_1249_WEB204', ENCODED => > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309, TABLE => {{NAME => > >> >>> 'gs_raw_events', > >> >>>>>>> FAMILIES => [{NAME => 'events', BLOOMFILTER => 'NONE', > >> >>>>> REPLICATION_SCOPE > >> >>>>>> => > >> >>>>>>> '1', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '604800', > >> >>>> BLOCKSIZE > >> >>>>> => > >> >>>>>>> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} > >> >>>>>>> 2011-06-29 16:43:59,003 DEBUG > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Instantiated > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> 2011-06-29 16:43:59,204 DEBUG > >> >>>>> org.apache.hadoop.hbase.regionserver.Store: > >> >>>>>>> loaded > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360, > >> >>>>>>> isReference=false, isBulkLoadResult=false, seqid=1162228062, > >> >>>>>>> majorCompaction=false > >> >>>>>>> 2011-06-29 16:43:59,205 INFO > >> >>>>>> org.apache.hadoop.hbase.regionserver.HRegion: > >> >>>>>>> Onlined > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.; > >> >>>>>>> next sequenceid=1162228063 > >> >>>>>>> 2011-06-29 16:43:59,205 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x23004a31d8904de Attempting to transition > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:43:59,212 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x23004a31d8904de Successfully transitioned > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENING > >> >>>>>>> 2011-06-29 16:43:59,214 INFO > >> >>>>> org.apache.hadoop.hbase.catalog.MetaEditor: > >> >>>>>>> Updated row > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> in region .META.,,1 with server= > >> >>> hadoop1-s02.farm-ny.gigya.com:60020, > >> >>>>>>> startcode=1306919627544 > >> >>>>>>> 2011-06-29 16:43:59,214 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x23004a31d8904de Attempting to transition > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>> 2011-06-29 16:43:59,224 DEBUG > >> >>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>> regionserver:60020-0x23004a31d8904de Successfully transitioned > >> >> node > >> >>>>>>> 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > >> >>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>> 2011-06-29 16:43:59,224 DEBUG > >> >>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > >> >>>> Opened > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>> java.io.IOException: Got error in response to OP_READ_BLOCK > >> >> self=/ > >> >>>>>>> 10.1.104.2:33356, remote=/10.1.104.2:50010 for file > >> >>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360 > >> >>>>>>> for block 3674866614142268536_674205 > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> Ted, can you please point me to J-D's bug fix you mentioned? Are > >> >>> you > >> >>>>>>> positive it's the same scenario - data loss is a very serious > >> >>> problem > >> >>>>> for > >> >>>>>> a > >> >>>>>>> DB. > >> >>>>>>> I'd really like to apply that patch ASAP, because when I run > hbck > >> >> I > >> >>>> get > >> >>>>>>> over > >> >>>>>>> 400 regions which are multiply assigned. > >> >>>>>>> Last question, I understand the region's data is lost but is > >> >> there > >> >>> a > >> >>>>> way > >> >>>>>> to > >> >>>>>>> at least make the table consistent again by some how removing > the > >> >>>> lost > >> >>>>>>> region? > >> >>>>>>> > >> >>>>>>> Thanks. > >> >>>>>>> > >> >>>>>>> -eran > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Sat, Jul 2, 2011 at 01:46, Ted Yu <[email protected]> > >> >> wrote: > >> >>>>>>> > >> >>>>>>>>>> 2011-06-29 16:43:57,880 INFO > >> >>>>>>>> org.apache.hadoop.hbase. > >> >>>>>>>> master.AssignmentManager: Region has been > >> >>>>>>>> PENDING_OPEN for too long, reassigning > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>> > >> >>>>>>>> The double assignment should have been fixed by J-D's recent > >> >>>> checkin. > >> >>>>>>>> > >> >>>>>>>> On Fri, Jul 1, 2011 at 3:14 PM, Stack <[email protected]> > >> >> wrote: > >> >>>>>>>> > >> >>>>>>>>> Is > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>> the region that was having the issue? If so, if you looked > >> >> in > >> >>>>>>>>> hadoop1-s05's logs, was this region opened around 2011-06-29 > >> >>>>>> 16:43:57? > >> >>>>>>>>> Was it also opened hadoop1-s02 not long after? Did you say > >> >>> what > >> >>>>>>>>> version of hbase you are on? > >> >>>>>>>>> > >> >>>>>>>>> St.Ack > >> >>>>>>>>> > >> >>>>>>>>> On Fri, Jul 1, 2011 at 5:08 AM, Eran Kutner <[email protected]> > >> >>>>> wrote: > >> >>>>>>>>>> Hi Stack, > >> >>>>>>>>>> I'm not sure what the log means. I do see references to two > >> >>>>>> different > >> >>>>>>>>>> servers, but that would probably happen if there was normal > >> >>>>>>> transition > >> >>>>>>>> I > >> >>>>>>>>>> assume. I'm using version 0.90.3 > >> >>>>>>>>>> Here are the relevant lines from the master logs: > >> >>>>>>>>>> > >> >>>>>>>>>> 2011-06-19 21:39:37,164 INFO > >> >>>>>>>>> org.apache.hadoop.hbase.master.ServerManager: > >> >>>>>>>>>> Received REGION_SPLIT: > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.: > >> >>>>>>>>>> Daughters; > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518810_1249_WEB204,1308533970928.46f876a4e97be04edb35eb8f8959d482. > >> >>>>>>>>>> from hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > >> >>>>>>>>>> 2011-06-19 21:43:12,983 INFO > >> >>>>>>>> org.apache.hadoop.hbase.catalog.MetaEditor: > >> >>>>>>>>>> Deleted daughter reference > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > >> >>>>>>>>>> qualifier=splitA, from parent > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658. > >> >>>>>>>>>> 2011-06-29 16:29:36,143 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Starting > >> >>>>>>> unassignment > >> >>>>>>>>> of > >> >>>>>>>>>> region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> (offlining) > >> >>>>>>>>>> 2011-06-29 16:29:36,146 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Sent > >> >> CLOSE > >> >>> to > >> >>>>>>>>> serverName= > >> >>>>>>>>>> hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > >> >>>>>> load=(requests=0, > >> >>>>>>>>>> regions=1654, usedHeap=1870, maxHeap=12483) for region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> 2011-06-29 16:29:38,327 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Handling > >> >>> new > >> >>>>>>>> unassigned > >> >>>>>>>>>> node: /hbase/unassigned/584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > (region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > >> >>>>>>>>>> server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > >> >>>>>>>>>> state=RS_ZK_REGION_CLOSED) > >> >>>>>>>>>> 2011-06-29 16:29:38,327 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Handling > >> >>>>>>>>>> transition=RS_ZK_REGION_CLOSED, > >> >>>>>>>>>> server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > >> >>>>>>>>>> region=584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> 2011-06-29 16:30:53,742 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: > >> >>>>>> Handling > >> >>>>>>>>> CLOSED > >> >>>>>>>>>> event for 584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> 2011-06-29 16:30:53,742 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Table > >> >> being > >> >>>>>>> disabled > >> >>>>>>>> so > >> >>>>>>>>>> deleting ZK node and removing from regions in transition, > >> >>>>> skipping > >> >>>>>>>>>> assignment of region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> 2011-06-29 16:30:53,742 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Deleting existing unassigned > >> >>>> node > >> >>>>>> for > >> >>>>>>>>>> 584dac5cc70d8682f71c4675a843c309 that is in expected state > >> >>>>>>>>>> RS_ZK_REGION_CLOSED > >> >>>>>>>>>> 2011-06-29 16:30:53,801 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Successfully deleted > >> >>> unassigned > >> >>>>> node > >> >>>>>>> for > >> >>>>>>>>>> region 584dac5cc70d8682f71c4675a843c309 in expected state > >> >>>>>>>>>> RS_ZK_REGION_CLOSED > >> >>>>>>>>>> 2011-06-29 16:34:01,453 INFO > >> >>>>>>>> org.apache.hadoop.hbase.catalog.MetaEditor: > >> >>>>>>>>>> Updated region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> in META > >> >>>>>>>>>> 2011-06-29 16:37:12,247 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Creating (or updating) > >> >>>> unassigned > >> >>>>>> node > >> >>>>>>>> for > >> >>>>>>>>>> 584dac5cc70d8682f71c4675a843c309 with OFFLINE state > >> >>>>>>>>>> 2011-06-29 16:37:12,576 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: No > >> >> previous > >> >>>>>>>> transition > >> >>>>>>>>>> plan was found (or we are ignoring an existing plan) for > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> so generated a random one; > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > >> >>>>>>>>>> src=, dest=hadoop1-s05.farm-ny.gigya.com > >> >>> ,60020,1307349217076; > >> >>>> 5 > >> >>>>>>>>> (online=5, > >> >>>>>>>>>> exclude=null) available servers > >> >>>>>>>>>> 2011-06-29 16:37:12,576 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Assigning > >> >>>>> region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > >> >>>>>>>>>> 2011-06-29 16:37:13,102 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Handling > >> >>>>>>>>>> transition=RS_ZK_REGION_OPENED, > >> >>>>>>>>>> server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > >> >>>>>>>>>> region=584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> 2011-06-29 16:39:54,075 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > >> >>>>>> Handling > >> >>>>>>>>> OPENED > >> >>>>>>>>>> event for 584dac5cc70d8682f71c4675a843c309; deleting > >> >>> unassigned > >> >>>>>> node > >> >>>>>>>>>> 2011-06-29 16:39:54,075 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Deleting existing unassigned > >> >>>> node > >> >>>>>> for > >> >>>>>>>>>> 584dac5cc70d8682f71c4675a843c309 that is in expected state > >> >>>>>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>>>>> 2011-06-29 16:39:54,192 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Successfully deleted > >> >>> unassigned > >> >>>>> node > >> >>>>>>> for > >> >>>>>>>>>> region 584dac5cc70d8682f71c4675a843c309 in expected state > >> >>>>>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>>>>> 2011-06-29 16:39:54,326 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > >> >>>>> Opened > >> >>>>>>>> region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > >> >>>>>>>>>> 2011-06-29 16:40:00,598 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Creating (or updating) > >> >>>> unassigned > >> >>>>>> node > >> >>>>>>>> for > >> >>>>>>>>>> 584dac5cc70d8682f71c4675a843c309 with OFFLINE state > >> >>>>>>>>>> 2011-06-29 16:40:00,877 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: No > >> >> previous > >> >>>>>>>> transition > >> >>>>>>>>>> plan was found (or we are ignoring an existing plan) for > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> so generated a random one; > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > >> >>>>>>>>>> src=, dest=hadoop1-s05.farm-ny.gigya.com > >> >>> ,60020,1307349217076; > >> >>>> 5 > >> >>>>>>>>> (online=5, > >> >>>>>>>>>> exclude=null) available servers > >> >>>>>>>>>> 2011-06-29 16:40:00,877 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Assigning > >> >>>>> region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > >> >>>>>>>>>> 2011-06-29 16:43:57,879 INFO > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Regions > >> >> in > >> >>>>>>> transition > >> >>>>>>>>>> timed out: > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> state=PENDING_OPEN, ts=1309380052723 > >> >>>>>>>>>> 2011-06-29 16:43:57,880 INFO > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Region > >> >> has > >> >>>> been > >> >>>>>>>>>> PENDING_OPEN for too long, reassigning > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> 2011-06-29 16:43:57,936 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Forcing > >> >>>>> OFFLINE; > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > was=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> state=PENDING_OPEN, ts=1309380052723 > >> >>>>>>>>>> 2011-06-29 16:43:57,936 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: No > >> >> previous > >> >>>>>>>> transition > >> >>>>>>>>>> plan was found (or we are ignoring an existing plan) for > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> so generated a random one; > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > >> >>>>>>>>>> src=, dest=hadoop1-s02.farm-ny.gigya.com > >> >>> ,60020,1306919627544; > >> >>>> 5 > >> >>>>>>>>> (online=5, > >> >>>>>>>>>> exclude=null) available servers > >> >>>>>>>>>> 2011-06-29 16:43:57,936 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Assigning > >> >>>>> region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> to hadoop1-s02.farm-ny.gigya.com,60020,1306919627544 > >> >>>>>>>>>> 2011-06-29 16:43:59,022 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Handling > >> >>>>>>>>>> transition=RS_ZK_REGION_OPENING, > >> >>>>>>>>>> server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > >> >>>>>>>>>> region=584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> 2011-06-29 16:43:59,221 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Handling > >> >>>>>>>>>> transition=RS_ZK_REGION_OPENING, > >> >>>>>>>>>> server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > >> >>>>>>>>>> region=584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> 2011-06-29 16:43:59,226 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: Handling > >> >>>>>>>>>> transition=RS_ZK_REGION_OPENED, > >> >>>>>>>>>> server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > >> >>>>>>>>>> region=584dac5cc70d8682f71c4675a843c309 > >> >>>>>>>>>> 2011-06-29 16:43:59,274 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > >> >>>>>> Handling > >> >>>>>>>>> OPENED > >> >>>>>>>>>> event for 584dac5cc70d8682f71c4675a843c309; deleting > >> >>> unassigned > >> >>>>>> node > >> >>>>>>>>>> 2011-06-29 16:43:59,274 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Deleting existing unassigned > >> >>>> node > >> >>>>>> for > >> >>>>>>>>>> 584dac5cc70d8682f71c4675a843c309 that is in expected state > >> >>>>>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>>>>> 2011-06-29 16:43:59,296 DEBUG > >> >>>>>>>> org.apache.hadoop.hbase.zookeeper.ZKAssign: > >> >>>>>>>>>> master:60000-0x13004a31d7804c4 Successfully deleted > >> >>> unassigned > >> >>>>> node > >> >>>>>>> for > >> >>>>>>>>>> region 584dac5cc70d8682f71c4675a843c309 in expected state > >> >>>>>>>>>> RS_ZK_REGION_OPENED > >> >>>>>>>>>> 2011-06-29 16:43:59,375 WARN > >> >>>>>>>>>> org.apache.hadoop.hbase.master.AssignmentManager: > >> >> Overwriting > >> >>>>>>>>>> 584dac5cc70d8682f71c4675a843c309 on > >> >>>>>>>>>> serverName=hadoop1-s05.farm-ny.gigya.com > >> >>> ,60020,1307349217076, > >> >>>>>>>>>> load=(requests=0, regions=1273, usedHeap=2676, > >> >> maxHeap=12483) > >> >>>>>>>>>> 2011-06-29 16:43:59,375 DEBUG > >> >>>>>>>>>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > >> >>>>> Opened > >> >>>>>>>> region > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > >> >>>>>>>>>> on hadoop1-s02.farm-ny.gigya.com,60020,1306919627544 > >> >>>>>>>>>> > >> >>>>>>>>>> Thanks. > >> >>>>>>>>>> > >> >>>>>>>>>> -eran > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> On Fri, Jul 1, 2011 at 09:05, Stack <[email protected]> > >> >>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>>> So, Eran, it seems as though two RegionServers were > >> >> carrying > >> >>>> the > >> >>>>>>>>>>> region? One deleted a file (compaction on its side)? Can > >> >>> you > >> >>>>>>> figure > >> >>>>>>>>>>> if indeed two servers had same region? (Check master logs > >> >>> for > >> >>>>>> this > >> >>>>>>>>>>> regions assignments). > >> >>>>>>>>>>> > >> >>>>>>>>>>> What version of hbase? > >> >>>>>>>>>>> > >> >>>>>>>>>>> St.Ack > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner < > >> >>> [email protected]> > >> >>>>>>> wrote: > >> >>>>>>>>>>>> Hi, > >> >>>>>>>>>>>> I have a cluster of 5 nodes with one large table that > >> >>>>> currently > >> >>>>>>> has > >> >>>>>>>>>>> around > >> >>>>>>>>>>>> 12000 regions. Everything was working fine for > >> >> relatively > >> >>>> long > >> >>>>>>> time, > >> >>>>>>>>>>> until > >> >>>>>>>>>>>> now. > >> >>>>>>>>>>>> Yesterday I significantly reduced the TTL on the table > >> >> and > >> >>>>>>> initiated > >> >>>>>>>>>>> major > >> >>>>>>>>>>>> compaction. This should have reduced the table size to > >> >>> about > >> >>>>> 20% > >> >>>>>>> of > >> >>>>>>>>> its > >> >>>>>>>>>>>> original size. > >> >>>>>>>>>>>> Today, I'm getting errors of inaccessible files on HDFS, > >> >>> for > >> >>>>>>>> example: > >> >>>>>>>>>>>> java.io.IOException: Got error in response to > >> >>> OP_READ_BLOCK > >> >>>>>> self=/ > >> >>>>>>>>>>>> 10.1.104.2:58047, remote=/10.1.104.2:50010 for file > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360 > >> >>>>>>>>>>>> for block 3674866614142268536_674205 > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1487) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) > >> >>>>>>>>>>>> at > >> >>>>> java.io.DataInputStream.read(DataInputStream.java:132) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105) > >> >>>>>>>>>>>> at > >> >>>>>>>>> > >> >> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > >> >>>>>>>>>>>> at > >> >>>>>>>>> > >> >> java.io.BufferedInputStream.read(BufferedInputStream.java:237) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:128) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:68) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75) > >> >>>>>>>>>>>> at > >> >>>>>>>>> > >> >> java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > >> >>>>>>>>>>>> at > >> >>>>>>>>> > >> >> java.io.BufferedInputStream.read(BufferedInputStream.java:317) > >> >>>>>>>>>>>> at > >> >>>>>> org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>> > >> >>>>> > >> >>> > >> org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1433) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:139) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>> > >> >>>>> > >> >> > org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1341) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2269) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1126) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1118) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1102) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1781) > >> >>>>>>>>>>>> at > >> >>>> sun.reflect.GeneratedMethodAccessor46.invoke(Unknown > >> >>>>>>>> Source) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> >>>>>>>>>>>> at > >> >> java.lang.reflect.Method.invoke(Method.java:597) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>> > >> >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > >> >>>>>>>>>>>> at > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I checked and the file, indeed doesn't exist on HDFS, > >> >> here > >> >>>> is > >> >>>>>> the > >> >>>>>>>> name > >> >>>>>>>>>>> node > >> >>>>>>>>>>>> logs for this block, apparently because it was deleted: > >> >>>>>>>>>>>> 2011-06-19 21:39:36,651 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>>> NameSystem.allocateBlock: > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp/2096863423111131624. > >> >>>>>>>>>>>> blk_3674866614142268536_674205 > >> >>>>>>>>>>>> 2011-06-19 21:40:11,954 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>>> NameSystem.addStoredBlock: blockMap updated: > >> >>>> 10.1.104.2:50010 > >> >>>>> is > >> >>>>>>>>> added > >> >>>>>>>>>>> to > >> >>>>>>>>>>>> blk_3674866614142268536_674205 size 67108864 > >> >>>>>>>>>>>> 2011-06-19 21:40:11,954 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>>> NameSystem.addStoredBlock: blockMap updated: > >> >>>> 10.1.104.3:50010 > >> >>>>> is > >> >>>>>>>>> added > >> >>>>>>>>>>> to > >> >>>>>>>>>>>> blk_3674866614142268536_674205 size 67108864 > >> >>>>>>>>>>>> 2011-06-19 21:40:11,955 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>>> NameSystem.addStoredBlock: blockMap updated: > >> >>>> 10.1.104.5:50010 > >> >>>>> is > >> >>>>>>>>> added > >> >>>>>>>>>>> to > >> >>>>>>>>>>>> blk_3674866614142268536_674205 size 67108864 > >> >>>>>>>>>>>> 2011-06-29 20:20:01,662 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>> ask > >> >>>>>>>>>>>> 10.1.104.2:50010 to delete > >> >>> blk_3674866614142268536_674205 > >> >>>>>>>>>>>> 2011-06-29 20:20:13,671 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>> ask > >> >>>>>>>>>>>> 10.1.104.5:50010 to delete > >> >>> blk_-4056387895369608597_675174 > >> >>>>>>>>>>>> blk_-5017882805850873821_672281 > >> >>>> blk_702373987100607684_672288 > >> >>>>>>>>>>>> blk_-5357157478043290010_668506 > >> >>>> blk_7118175133735412789_674903 > >> >>>>>>>>>>>> blk_-3569812563715986384_675231 > >> >>>> blk_8296855057240604851_669285 > >> >>>>>>>>>>>> blk_-6483679172530609101_674268 > >> >>>> blk_8738539715363739108_673682 > >> >>>>>>>>>>>> blk_1744841904626813502_675238 > >> >>>> blk_-6035315106100051103_674266 > >> >>>>>>>>>>>> blk_-1789501623010070237_674908 > >> >>>> blk_1944054629336265129_673689 > >> >>>>>>>>>>>> blk_3674866614142268536_674205 > >> >>>> blk_7930425446738143892_647410 > >> >>>>>>>>>>>> blk_-3007186753042268449_669296 > >> >>>>> blk_-5482302621772778061_647416 > >> >>>>>>>>>>>> blk_-3765735404924932181_672004 > >> >>>> blk_7476090998956811081_675169 > >> >>>>>>>>>>>> blk_7862291659285127712_646890 > >> >>>> blk_-2666244746343584727_672013 > >> >>>>>>>>>>>> blk_6039172613960915602_674206 > >> >>>> blk_-8470884397893086564_646899 > >> >>>>>>>>>>>> blk_4558230221166712802_668510 > >> >>>>>>>>>>>> 2011-06-29 20:20:46,698 INFO > >> >>>>> org.apache.hadoop.hdfs.StateChange: > >> >>>>>>>>> BLOCK* > >> >>>>>>>>>>> ask > >> >>>>>>>>>>>> 10.1.104.3:50010 to delete > >> >>> blk_-7851606440036350812_671552 > >> >>>>>>>>>>>> blk_9214649160203453845_647566 > >> >>> blk_702373987100607684_672288 > >> >>>>>>>>>>>> blk_5958099369749234073_668143 > >> >>>> blk_-5172218034084903173_673109 > >> >>>>>>>>>>>> blk_-2934555181472719276_646476 > >> >>>>> blk_-1409986679370073931_672552 > >> >>>>>>>>>>>> blk_-2786034325506235869_669086 > >> >>>> blk_3674866614142268536_674205 > >> >>>>>>>>>>>> blk_510158930393283118_673225 > >> >>> blk_916244738216205237_677068 > >> >>>>>>>>>>>> blk_-4317027806407316617_670379 > >> >>>> blk_8555705688850972639_673485 > >> >>>>>>>>>>>> blk_-3765735404924932181_672004 > >> >>>>> blk_-5482302621772778061_647416 > >> >>>>>>>>>>>> blk_-2461801145731752623_674605 > >> >>>>> blk_-8737702908048998927_672549 > >> >>>>>>>>>>>> blk_-8470884397893086564_646899 > >> >>>> blk_4558230221166712802_668510 > >> >>>>>>>>>>>> blk_-4056387895369608597_675174 > >> >>>>> blk_-8675430610673886073_647695 > >> >>>>>>>>>>>> blk_-6642870230256028318_668211 > >> >>>>> blk_-3890408516362176771_677483 > >> >>>>>>>>>>>> blk_-3569812563715986384_675231 > >> >>>>> blk_-5007142629771321873_674548 > >> >>>>>>>>>>>> blk_-3345355191863431669_667066 > >> >>>> blk_8296855057240604851_669285 > >> >>>>>>>>>>>> blk_-6595462308187757470_672420 > >> >>>>> blk_-2583945228783203947_674607 > >> >>>>>>>>>>>> blk_-346988625120916345_677063 > >> >>>> blk_4449525876338684218_674496 > >> >>>>>>>>>>>> blk_2617172363857549730_668201 > >> >>>> blk_8738539715363739108_673682 > >> >>>>>>>>>>>> blk_-208904675456598428_679286 > >> >>>> blk_-497549341281882641_646477 > >> >>>>>>>>>>>> blk_-6035315106100051103_674266 > >> >>>>> blk_-2356539038067297411_672388 > >> >>>>>>>>>>>> blk_-3881703084497103249_668137 > >> >>>> blk_2214397881104950315_646643 > >> >>>>>>>>>>>> blk_-5907671443455357710_673223 > >> >>>>> blk_-2431880309956605679_669204 > >> >>>>>>>>>>>> blk_6039172613960915602_674206 > >> >>>> blk_5053643911633142711_669194 > >> >>>>>>>>>>>> blk_-2636977729205236686_674664 > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I assume the file loss is somehow related to this change > >> >>> and > >> >>>>> the > >> >>>>>>>> major > >> >>>>>>>>>>>> compaction that followed because the same scan that is > >> >>>> failing > >> >>>>>> now > >> >>>>>>>> was > >> >>>>>>>>>>>> working fine yesterday and that is the only changed that > >> >>>>>> happened > >> >>>>>>> on > >> >>>>>>>>> the > >> >>>>>>>>>>>> cluster. Any suggestions what to do now? > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Thanks. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> -eran > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > > > > >
