I've seen items.c:1183 reported elsewhere in 1.4.24... so probably the bug
was introduced when I rewrote the page mover for that.
I didn't mean to send me a core file: I mean if you dump the core you can
load it in gdb and get the backtrace (bt + thread apply all bt)
Don't have a handler for conv
Sorry for the data dumps here, but I want to give you everything I have. I
found 3 more addresses that showed up in the dmesg logs:
$ for addr in 40e013 40eff4 40f7c4; do addr2line -e memcached $addr; done
.../build/memcached-1.4.24-slab-rebal-next/slabs.c:265 (discriminator 1)
.../build/memcac
got it. that might be a decent hint actually... I had addded a bugfix to
the branch to not miscount the mem_requested counter, but it's not working
or I missed a spot.
On Thu, 1 Oct 2015, Scott Mansfield wrote:
> The number now, after maybe 90 minutes of writes, is 1,446. I think after
> disabli
ok... slab class 12 claims to have 2 in "total_pages", yet 14g in
mem_requested. is this stat wrong?
On Thu, 1 Oct 2015, Scott Mansfield wrote:
> The ones that crashed (new code cluster) were set to only be written to from
> the client applications. The data is an index key and a series of data
Any chance you could describe (perhaps privately?) in very broad strokes
what the write load looks like? (they're getting only writes, too?).
otherwise I'll have to devise arbitrary torture tests. I'm sure the bug's
in there but it's not obvious yet
On Thu, 1 Oct 2015, dormando wrote:
> perfect,
perfect, thanks! I have $dayjob as well but will look into this as soon as
I can. my torture test machines are in a box but I'll try to borrow one
On Thu, 1 Oct 2015, Scott Mansfield wrote:
> Yes. Exact args:
> -p 11211 -u -l 0.0.0.0 -c 10 -o slab_reassign -o
> lru_maintainer,lru_crawler,ha
Were lru_maintainer/lru_crawler/etc enabled though? even if slab mover is
off, those two were the big changes in .24
On Thu, 1 Oct 2015, Scott Mansfield wrote:
> The same cluster has > 400 servers happily running 1.4.24. It's been our
> standard deployment for a while now, and we haven't seen an
Yes. Exact args:
-p 11211 -u -l 0.0.0.0 -c 10 -o slab_reassign -o
lru_maintainer,lru_crawler,hash_algorithm=murmur3 -I 4m -m 56253
On Thursday, October 1, 2015 at 12:41:06 PM UTC-7, Dormando wrote:
>
> Were lru_maintainer/lru_crawler/etc enabled though? even if slab mover is
> off, those t
The same cluster has > 400 servers happily running 1.4.24. It's been our
standard deployment for a while now, and we haven't seen any crashes. The
servers in the same cluster running 1.4.24 (with the same write load the
new build was taking) have been up for 29 days. The start options do not
co
Just before I sit in and try to narrow this down: have you run any host on
1.4.24 mainline with those same start options? just in case the crash is
older
On Thu, 1 Oct 2015, Scott Mansfield wrote:
> Another message for you:
> [78098.528606] traps: memcached[2757] general protection ip:412b9d
> s
Another message for you:
[78098.528606] traps: memcached[2757] general protection ip:412b9d
sp:7fc0700dbdd0 error:0 in memcached[40+1d000]
addr2line shows:
$ addr2line -e memcached 412b9d
/mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-slab-rebal-next/a
Ok, thanks!
I'll noodle this a bit... unfortunately a backtrace might be more helpful.
will ask you to attempt to get one if I don't figure anything out in time.
(allow it to core dump or attach a GDB session and set an ignore handler
for sigpipe/int/etc and run "continue")
what were your full s
Oops, forgot the startup args:
-p 11211 -u -l 0.0.0.0 -c 10 -o
slab_reassign,slab_automove,lru_maintainer,lru_crawler,hash_algorithm=murmur3
-I 2m -m 56253
On Thursday, October 1, 2015 at 1:22:12 AM UTC-7, Scott Mansfield wrote:
>
> The commit was the latest in slab_rebal_next at the tim
The commit was the latest in slab_rebal_next at the time:
https://github.com/dormando/memcached/commit/bdd688b4f20120ad844c8a4803e08c6e03cb061a
addr2line gave me this output:
$ addr2line -e memcached 0x40e007
/mnt/builds/slave/workspace/TL-SYS-memcached-slab_rebal_next/build/memcached-1.4.24-sl
How many servers were you running it on? I hope it wasn't more than a
handful. I'd recommend starting with one :P
can you do an addr2line? what were your startup args, and what was the
commit sha1 for the branch you pulled?
sorry about that :/
On Thu, 1 Oct 2015, Scott Mansfield wrote:
> A few
A few different servers (5 / 205) experienced a segfault all within an hour
or so. Unfortunately at this point I'm a bit out of my depth. I have the
dmesg output, which is identical for all 5 boxes:
[46545.316351] memcached[2789]: segfault at 0 ip 0040e007 sp
7f362ceedeb0 error 4 in
16 matches
Mail list logo