Re: LeaseException while extracting data via pig/hbase integration

Mikael Sitruk Mon, 20 Feb 2012 07:00:36 -0800

Andrew hi,

Can you please paste the method and not the output of the patch, please? So
i will be able to test it (hopefully)


Thanks
Mikael.S

On Fri, Feb 17, 2012 at 12:23 AM, Mikael Sitruk <[email protected]>wrote:

> Ok i understand you now, but i think that the lines are different so , can
> you paste the method (full content instead of patch) into the email, i will
> compile and check?
>
> Mikael.S
>
>
> On Thu, Feb 16, 2012 at 7:49 PM, Andrew Purtell <[email protected]>wrote:
>
>> I'm wondering if the removal and re-add of the lease is racy. We used to
>> just refresh the lease.
>>
>> In the patch provided I don't remove the lease and add it back, instead
>> just refresh it on the way out. If you apply the patch and the
>> LeaseExceptions go away, then we will know this works for you. I've applied
>> this patch to our internal build as part of tracking down what might be
>> spurious LeaseExceptions. I've been blaming the clients but maybe that is
>> wrong.
>>
>> Best regards,
>>
>>     - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>>
>> ----- Original Message -----
>> > From: Mikael Sitruk <[email protected]>
>> > To: [email protected]; Andrew Purtell <[email protected]>
>> > Cc:
>> > Sent: Wednesday, February 15, 2012 11:32 PM
>> > Subject: Re: LeaseException while extracting data via pig/hbase
>> integration
>> >
>> > Andy hi
>> >
>> > Not sure what you mean by "Does something like the below help?" The
>> > current
>> > code running is pasted below, line number are sightly different than
>> yours.
>> > It seems very close to the first file (revision "a") in your extract.
>> >
>> > Mikael.S
>> >
>> >   public Result[] next(final long scannerId, int nbRows) throws
>> IOException
>> > {
>> >     String scannerName = String.valueOf(scannerId);
>> >     InternalScanner s = this.scanners.get(scannerName);
>> >     if (s == null) throw new UnknownScannerException("Name: " +
>> > scannerName);
>> >     try {
>> >       checkOpen();
>> >     } catch (IOException e) {
>> >       // If checkOpen failed, server not running or filesystem gone,
>> >       // cancel this lease; filesystem is gone or we're closing or
>> > something.
>> >       try {
>> >         this.leases.cancelLease(scannerName);
>> >       } catch (LeaseException le) {
>> >         LOG.info("Server shutting down and client tried to access
>> missing
>> > scanner " +
>> >           scannerName);
>> >       }
>> >       throw e;
>> >     }
>> >     Leases.Lease lease = null;
>> >     try {
>> >       // Remove lease while its being processed in server; protects
>> against
>> > case
>> >       // where processing of request takes > lease expiration time.
>> >       lease = this.leases.removeLease(scannerName);
>> >       List<Result> results = new ArrayList<Result>(nbRows);
>> >       long currentScanResultSize = 0;
>> >       List<KeyValue> values = new ArrayList<KeyValue>();
>> >       for (int i = 0; i < nbRows
>> >           && currentScanResultSize < maxScannerResultSize; i++) {
>> >         requestCount.incrementAndGet();
>> >         // Collect values to be returned here
>> >         boolean moreRows = s.next(values);
>> >         if (!values.isEmpty()) {
>> >           for (KeyValue kv : values) {
>> >             currentScanResultSize += kv.heapSize();
>> >           }
>> >           results.add(new Result(values));
>> >         }
>> >         if (!moreRows) {
>> >           break;
>> >         }
>> >         values.clear();
>> >       }
>> >       // Below is an ugly hack where we cast the InternalScanner to be a
>> >       // HRegion.RegionScanner. The alternative is to change
>> InternalScanner
>> >       // interface but its used everywhere whereas we just need a bit of
>> > info
>> >       // from HRegion.RegionScanner, IF its filter if any is done with
>> the
>> > scan
>> >       // and wants to tell the client to stop the scan. This is done by
>> > passing
>> >       // a null result.
>> >       return ((HRegion.RegionScanner) s).isFilterDone() &&
>> > results.isEmpty() ? null
>> >           : results.toArray(new Result[0]);
>> >     } catch (Throwable t) {
>> >       if (t instanceof NotServingRegionException) {
>> >         this.scanners.remove(scannerName);
>> >       }
>> >       throw convertThrowableToIOE(cleanup(t));
>> >     } finally {
>> >       // We're done. On way out readd the above removed lease.  Adding
>> > resets
>> >       // expiration time on lease.
>> >       if (this.scanners.containsKey(scannerName)) {
>> >         if (lease != null) this.leases.addLease(lease);
>> >       }
>> >     }
>> >   }
>> >
>> > On Thu, Feb 16, 2012 at 3:10 AM, Andrew Purtell <[email protected]>
>> > wrote:
>> >
>> >>  Hmm...
>> >>
>> >>  Does something like the below help?
>> >>
>> >>
>> >>  diff --git
>>
>> >>  a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
>> >>  index f9627ed..0cee8e3 100644
>> >>  ---
>> a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
>> >>  +++
>> b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
>> >>  @@ -2137,11 +2137,7 @@ public class HRegionServer implements
>> >>  HRegionInterface, HBaseRPCErrorHandler,
>> >>         }
>> >>         throw e;
>> >>       }
>> >>  -    Leases.Lease lease = null;
>> >>       try {
>> >>  -      // Remove lease while its being processed in server; protects
>> >>  against case
>> >>  -      // where processing of request takes > lease expiration time.
>> >>  -      lease = this.leases.removeLease(scannerName);
>> >>         List<Result> results = new ArrayList<Result>(nbRows);
>> >>         long currentScanResultSize = 0;
>> >>         List<KeyValue> values = new ArrayList<KeyValue>();
>> >>  @@ -2197,10 +2193,9 @@ public class HRegionServer implements
>> >>  HRegionInterface, HBaseRPCErrorHandler,
>> >>         }
>> >>         throw convertThrowableToIOE(cleanup(t));
>> >>       } finally {
>> >>  -      // We're done. On way out readd the above removed lease.
>> Adding
>> >>  resets
>> >>  -      // expiration time on lease.
>> >>  +      // We're done. On way out reset expiration time on lease.
>> >>         if (this.scanners.containsKey(scannerName)) {
>> >>  -        if (lease != null) this.leases.addLease(lease);
>> >>  +        this.leases.renewLease(scannerName);
>> >>         }
>> >>       }
>> >>     }
>> >>
>> >>
>> >>
>> >>  Best regards,
>> >>
>> >>      - Andy
>> >>
>> >>  Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> >>  (via Tom White)
>> >>
>> >>
>> >>
>> >>  ----- Original Message -----
>> >>  > From: Jean-Daniel Cryans <[email protected]>
>> >>  > To: [email protected]
>> >>  > Cc:
>> >>  > Sent: Wednesday, February 15, 2012 10:17 AM
>> >>  > Subject: Re: LeaseException while extracting data via pig/hbase
>> >>  integration
>> >>  >
>> >>  > You would have to grep the lease's id, in your first email it was
>> >>  > "-7220618182832784549".
>> >>  >
>> >>  > About the time it takes to process each row, I meant client (pig)
>> side
>> >>  > not in the RS.
>> >>  >
>> >>  > J-D
>> >>  >
>> >>  > On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk
>> > <[email protected]>
>> >>  > wrote:
>> >>  >>  Please see answer inline
>> >>  >>  Thanks
>> >>  >>  Mikael.S
>> >>  >>
>> >>  >>  On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans
>> >>  > <[email protected]>wrote:
>> >>  >>
>> >>  >>>  On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk
>> >>  > <[email protected]>
>> >>  >>>  wrote:
>> >>  >>>  > hi,
>> >>  >>>  > Well no, i can't figure out what is the problem, but
>> > i saw
>> >>  > that someone
>> >>  >>>  > else had the same problem (see email:
>> > "LeaseException despite
>> >>  > high
>> >>  >>>  > hbase.regionserver.lease.period")
>> >>  >>>  > What can i tell is the following:
>> >>  >>>  > Last week the problem was consistent
>> >>  >>>  > 1. I updated hbase.regionserver.lease.period=300000 (5
>> > mins),
>> >>  > restarted
>> >>  >>>  the
>> >>  >>>  > cluster and still got the problem, the map got this
>> > exception
>> >>  > event
>> >>  >>>  before
>> >>  >>>  > the 5 mins, (some after 1 min and 20 sec)
>> >>  >>>
>> >>  >>>  That's extremely suspicious. Are you sure the setting is
>> > getting
>> >>  > picked
>> >>  >>>  up? :) I hope so :-)
>> >>  >>>
>> >>  >>>  You should be able to tell when the lease really expires by
>> > simply
>> >>  >>>  grepping for the number in the region server log, it should
>> > give you a
>> >>  >>>  good idea of what your lease period is.
>> >>  >>>   greeping on which value? the lease configured here:300000?
>> > It does
>> >>  not
>> >>  >>>  return anything, also tried in current execution where some
>> > were ok
>> >>  and
>> >>  >>>  some were not
>> >>  >>>
>> >>  >>>  2. The problem occurs only on job that will extract a large
>> > number of
>> >>  >>>  > columns (>150 cols per row)
>> >>  >>>
>> >>  >>>  What's your scanner caching set to? Are you spending a
>> > lot of time
>> >>  >>>  processing each row? from the job configuration generated by
>> > pig i can
>> >>  > see
>> >>  >>>  caching set to 1, regarding the processing time of each row i
>> > have no
>> >>  > clue
>> >>  >>>  how many time it spent. the data for each row is 150 columns
>> > of 2k
>> >>  > each.
>> >>  >>>  This is approx 5 block to bring.
>> >>  >>>
>> >>  >>>  > 3. The problem never occurred when only 1 map per server
>> > is
>> >>  > running (i
>> >>  >>>  have
>> >>  >>>  > 8 CPU with hyper-threaded enabled = 16, so using only 1
>> > map per
>> >>  > machine
>> >>  >>>  is
>> >>  >>>  > just a waste), (at this stage I was thinking perhaps
>> > there is a
>> >>  >>>  > multi-threaded problem)
>> >>  >>>
>> >>  >>>  More mappers would pull more data from the region servers so
>> > more
>> >>  >>>  concurrency from the disks, using more mappers might just
>> > slow you
>> >>  >>>  down enough that you hit the issue.
>> >>  >>>
>> >>  >>  Today i ran with 8 mappers and some failed and some didn't (2
>> > of 4),
>> >>  > they
>> >>  >>  got the lease exception after 5 mins, i will try to check the
>> >>  >>  logs/sar/metric files for additional info
>> >>  >>
>> >>  >>>
>> >>  >>>  >
>> >>  >>>  > This week i got a sightly different behavior, after
>> > having
>> >>  > restarted the
>> >>  >>>  > servers. The extract were able to ran ok in most of the
>> > runs even
>> >>  > with 4
>> >>  >>>  > maps running (per servers), i got only once the
>> > exception but the
>> >>  > job was
>> >>  >>>  > not killed as other runs last week
>> >>  >>>
>> >>  >>>  If the client got an UnknownScannerException before the
>> > timeout
>> >>  >>>  expires (the client also keeps track of it, although it may
>> > have a
>> >>  >>>  different configuration), it will recreate the scanner.
>> >>  >>>
>> >>  >>  No this is not the case.
>> >>  >>
>> >>  >>>
>> >>  >>>  Which reminds me, are your regions moving around? If so, and
>> > your
>> >>  >>>  clients don't know about the high timeout, then they
>> > might let the
>> >>  >>>  exception pass on to your own code.
>> >>  >>>
>> >>  >>  Region are presplited ahead, i do not have any region split
>> > during the
>> >>  run,
>> >>  >>  region size is set of 8GB, storefile is around 3.5G
>> >>  >>
>> >>  >>  The test was run after major compaction, so the number of store
>> > file
>> >>  is 1
>> >>  >>  per RS/family
>> >>  >>
>> >>  >>
>> >>  >>>
>> >>  >>>  J-D
>> >>  >>>
>> >>  >>
>> >>  >>
>> >>  >>
>> >>  >>  --
>> >>  >>  Mikael.S
>> >>  >
>> >>
>> >
>> >
>> >
>> > --
>> > Mikael.S
>> >
>>
>
>
>
> --
> Mikael.S
>
>


-- 
Mikael.S

Re: LeaseException while extracting data via pig/hbase integration

Reply via email to