Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
@ShawnWalker can you close this PR? Its in 1.8 in commit
3e5524c3c391d2556492d070710a789510be3532. Locally, I squashed these commits
into one, cherry picked that single commit to 1.8, and
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
I looked in the code. The suspension time is obtained from a table config
object. I am going to push these changes to 1.8 and master.
---
If your project is set up for it, you can reply to
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
I ran some manual test on EC2 w/ 10 nodes. It worked nicely. I created
1000 tablets and ran continuous ingest. One thing I tried is setting the
suspend duration to 10m, admin stop,
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
I cherry picked this back to 1.8 in my local fork. My intent is to play
around with this on a 10 node EC2 cluster. I will report back after I do that.
If anyone has any particular
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
> Did you test running stop-here.sh?
I couldn't get stop-here.sh to do anything in my setup. Running `accumulo
admin stop ...` brought a flaw to my attention, which I've fixed: I was
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
> Indeed, I found this out when I went to implement my idea.
I just noticed that commit. I can look over those changes today. I like
the idea of changing save to an enum.
---
If
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
> I took a quick look and it seemed a tserver did not know why it was
unloading. It seems like the tserver would need to know why it was unloading so
it could only suspend when being stopped
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
> My current thought would be to make unloading a tablet this way suspend
the tablet instead of unassigning
I took a quick look and it seemed a tserver did not know why it was
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
> The stop-here.sh command has the master unload the tablets I think. How
will this patch handle that case?
This patch won't handle such a case at all. I'm sure it shows my
inexperience
Github user mjwall commented on the issue:
https://github.com/apache/accumulo/pull/121
@ShawnWalker I reviewed this and talked with @keith-turner some. I imagine
during a rolling upgrade the process would be.
- Stage the new install on the box
- Run stop-here.sh
-
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
> Did you have suggestions for additional tests (manual or automated) that
you would like to see?
No it sounds like you did the manual testing I would have done. The case
where
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
Some testing on a small development cluster before I wrote
`SuspendedTabletsIT`, but nothing significantly different from what I has
`SuspendedTabletsIT` perform: Start Accumulo, create table,
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
@ShawnWalker did you do any testing besides the IT?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
> If the tablet server returns before master notices it's gone, master will
see it as a new empty tablet server.
One possible beginning of solution is to change the behavior of
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
If the tablet server returns before master notices it's gone, master will
see it as a new empty tablet server. This will (usually) cause balancing to be
run, to give that tserver some work.
Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/121
> I haven't accounted for the possibility that a tablet server might return
before the master notices it had died.
What do you think will happen in this situation?
---
If your
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
Upon further thought, I haven't accounted for the possibility that a tablet
server might return before the master notices it had died. Such a situation
would likely happen during a rolling
Github user ShawnWalker commented on the issue:
https://github.com/apache/accumulo/pull/121
Well, Jenkins doesn't seem to like my PR, but its console output suggests a
problem in the build server.
---
If your project is set up for it, you can reply to this email and have your
reply
18 matches
Mail list logo