Hi Pierre,
Yes, I'm describing multiple symptoms here. But the message queues were
the problem, despite not seeing anything in the logs.
5. The integrity errors look like this (not a failing disk in this case.):
2017-03-07T07:36:04-0500 [-] Got fatal Exception on DB
Traceback (most recent call last):
Failure: sqlalchemy.exc.IntegrityError: (IntegrityError) update
or delete on table "changes" violates foreign key constraint
"changes_parent_changeids_fkey" on table "changes"
DETAIL: Key (changeid)=(5983) is still referenced from table
"changes".
'DELETE FROM changes WHERE changes.changeid IN
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s,
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s,
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s,
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s,
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s,
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s,
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s,
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s,
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s,
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s,
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s,
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s,
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s,
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s,
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s,
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s,
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s,
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s,
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s,
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s,
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s,
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s,
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s,
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s,
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)'
{'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975,
'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978,
'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981,
'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914,
'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923,
'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918,
'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917,
'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969,
'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966,
'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971,
'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904,
'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909,
'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910,
'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940,
'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943,
'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938,
'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935,
'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989,
'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990,
'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985,
'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932,
'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929,
'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926,
'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958,
'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957,
'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960,
'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955,
'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997,
'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001,
'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994,
'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951,
'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952,
'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949,
'changeid_55': 5948}
2017-03-07T07:36:04-0500 [-] while pruning changes
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in
__bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_threadworker.py",
line 46, in work
task()
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/_threads/_team.py",
line 190, in doWork
task()
--- <exception caught here> ---
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py",
line 246, in inContext
result = inContext.theWork()
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/threadpool.py",
line 262, in <lambda>
inContext.theWork = lambda: context.call(ctx, func, *args,
**kw)
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py",
line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func,
*args, **kw)
File
"/usr/local/lib/python2.7/dist-packages/Twisted-16.3.0-py2.7-linux-x86_64.egg/twisted/python/context.py",
line 81, in callWithContext
return func(*args,**kw)
File
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/pool.py",
line 180, in __thd
rv = callable(arg, *args, **kwargs)
File
"/usr/local/lib/python2.7/dist-packages/buildbot-0.9.3-py2.7.egg/buildbot/db/changes.py",
line 338, in thd
table.delete(table.c.changeid.in_(batch)))
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 662, in
execute
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 761, in
_execute_clauseelement
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 874, in
_execute_context
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 1024, in
_handle_dbapi_exception
File
"build/bdist.linux-x86_64/egg/sqlalchemy/util/compat.py", line 195, in
raise_from_cause
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/base.py", line 867, in
_execute_context
File
"build/bdist.linux-x86_64/egg/sqlalchemy/engine/default.py", line 324,
in do_execute
sqlalchemy.exc.IntegrityError: (IntegrityError) update or
delete on table "changes" violates foreign key constraint
"changes_parent_changeids_fkey" on table "changes"
DETAIL: Key (changeid)=(5983) is still referenced from table
"changes".
'DELETE FROM changes WHERE changes.changeid IN
(%(changeid_1)s, %(changeid_2)s, %(changeid_3)s, %(changeid_4)s,
%(changeid_5)s, %(changeid_6)s, %(changeid_7)s, %(changeid_8)s,
%(changeid_9)s, %(changeid_10)s, %(changeid_11)s, %(changeid_12)s,
%(changeid_13)s, %(changeid_14)s, %(changeid_15)s, %(changeid_16)s,
%(changeid_17)s, %(changeid_18)s, %(changeid_19)s, %(changeid_20)s,
%(changeid_21)s, %(changeid_22)s, %(changeid_23)s, %(changeid_24)s,
%(changeid_25)s, %(changeid_26)s, %(changeid_27)s, %(changeid_28)s,
%(changeid_29)s, %(changeid_30)s, %(changeid_31)s, %(changeid_32)s,
%(changeid_33)s, %(changeid_34)s, %(changeid_35)s, %(changeid_36)s,
%(changeid_37)s, %(changeid_38)s, %(changeid_39)s, %(changeid_40)s,
%(changeid_41)s, %(changeid_42)s, %(changeid_43)s, %(changeid_44)s,
%(changeid_45)s, %(changeid_46)s, %(changeid_47)s, %(changeid_48)s,
%(changeid_49)s, %(changeid_50)s, %(changeid_51)s, %(changeid_52)s,
%(changeid_53)s, %(changeid_54)s, %(changeid_55)s, %(changeid_56)s,
%(changeid_57)s, %(changeid_58)s, %(changeid_59)s, %(changeid_60)s,
%(changeid_61)s, %(changeid_62)s, %(changeid_63)s, %(changeid_64)s,
%(changeid_65)s, %(changeid_66)s, %(changeid_67)s, %(changeid_68)s,
%(changeid_69)s, %(changeid_70)s, %(changeid_71)s, %(changeid_72)s,
%(changeid_73)s, %(changeid_74)s, %(changeid_75)s, %(changeid_76)s,
%(changeid_77)s, %(changeid_78)s, %(changeid_79)s, %(changeid_80)s,
%(changeid_81)s, %(changeid_82)s, %(changeid_83)s, %(changeid_84)s,
%(changeid_85)s, %(changeid_86)s, %(changeid_87)s, %(changeid_88)s,
%(changeid_89)s, %(changeid_90)s, %(changeid_91)s, %(changeid_92)s,
%(changeid_93)s, %(changeid_94)s, %(changeid_95)s, %(changeid_96)s,
%(changeid_97)s, %(changeid_98)s, %(changeid_99)s, %(changeid_100)s)'
{'changeid_100': 5903, 'changeid_29': 5974, 'changeid_28': 5975,
'changeid_27': 5976, 'changeid_26': 5977, 'changeid_25': 5978,
'changeid_24': 5979, 'changeid_23': 5980, 'changeid_22': 5981,
'changeid_21': 5982, 'changeid_20': 5983, 'changeid_89': 5914,
'changeid_88': 5915, 'changeid_81': 5922, 'changeid_80': 5923,
'changeid_83': 5920, 'changeid_82': 5921, 'changeid_85': 5918,
'changeid_84': 5919, 'changeid_87': 5916, 'changeid_86': 5917,
'changeid_38': 5965, 'changeid_39': 5964, 'changeid_34': 5969,
'changeid_35': 5968, 'changeid_36': 5967, 'changeid_37': 5966,
'changeid_30': 5973, 'changeid_31': 5972, 'changeid_32': 5971,
'changeid_33': 5970, 'changeid_98': 5905, 'changeid_99': 5904,
'changeid_96': 5907, 'changeid_97': 5906, 'changeid_94': 5909,
'changeid_95': 5908, 'changeid_92': 5911, 'changeid_93': 5910,
'changeid_90': 5913, 'changeid_91': 5912, 'changeid_63': 5940,
'changeid_62': 5941, 'changeid_61': 5942, 'changeid_60': 5943,
'changeid_67': 5936, 'changeid_66': 5937, 'changeid_65': 5938,
'changeid_64': 5939, 'changeid_69': 5934, 'changeid_68': 5935,
'changeid_16': 5987, 'changeid_17': 5986, 'changeid_14': 5989,
'changeid_15': 5988, 'changeid_12': 5991, 'changeid_13': 5990,
'changeid_10': 5993, 'changeid_11': 5992, 'changeid_18': 5985,
'changeid_19': 5984, 'changeid_70': 5933, 'changeid_71': 5932,
'changeid_72': 5931, 'changeid_73': 5930, 'changeid_74': 5929,
'changeid_75': 5928, 'changeid_76': 5927, 'changeid_77': 5926,
'changeid_78': 5925, 'changeid_79': 5924, 'changeid_45': 5958,
'changeid_44': 5959, 'changeid_47': 5956, 'changeid_46': 5957,
'changeid_41': 5962, 'changeid_40': 5963, 'changeid_43': 5960,
'changeid_42': 5961, 'changeid_49': 5954, 'changeid_48': 5955,
'changeid_4': 5999, 'changeid_5': 5998, 'changeid_6': 5997,
'changeid_7': 5996, 'changeid_1': 6002, 'changeid_2': 6001,
'changeid_3': 6000, 'changeid_8': 5995, 'changeid_9': 5994,
'changeid_58': 5945, 'changeid_59': 5944, 'changeid_52': 5951,
'changeid_53': 5950, 'changeid_50': 5953, 'changeid_51': 5952,
'changeid_56': 5947, 'changeid_57': 5946, 'changeid_54': 5949,
'changeid_55': 5948}
Thanks!
Neil Gilmore
grammatech.com
On 3/7/2017 3:49 AM, Pierre Tardy wrote:
Hi Neil,
I am not sure exactly how I can help on this as you are describing
lots of symptoms.
What goes to my mind right now is a problem with the message queue. In
the multimaster tests I am doing, I figured out that a disconnection
of the message queue is not recovered right now, which could explain
why build do not start (the master will not check for new requests
unless they receive a message)
However, when the mq fails, I can see evidence of it in the logs, but
you don't mention any issue in the logs.
Database integrity errors looks bad also, what kind of errors is that?
We already had some reports of those which were due to a failing disk.
Could that be the case?
Regards
Pierre
On Mon, Mar 6, 2017 at 10:36 PM Neil Gilmore <[email protected]
<mailto:[email protected]>> wrote:
Hi everyone,
Well, things ran OK for a couple weeks. But we had some problems
starting last weekend. At least some failure emails don't seem to be
getting sent out. And a problem we'd been having a bit of got a
lot worse.
For whatever reason, queued builds don't seem to want to start.
Sometimes for hours. Even forced builds. This doesn't seem to be a
locking problem, though I'll be having a look at that side in a
bit. But
we'll have builds sitting for hours before they start. If they start.
Some of our people get antsy and cancel the current queue then force a
build. But sometimes those wait, too.
And we're having trouble getting the masters to deal with new
revisions
fro svn. Everything else looks OK (postcommit hooks, etc.) I'm
just not
sure what's going on.
Reconfig hasn't helped, nor has restarting one of the masters.
We are getting integrity errors in our database, too.
Except for the database problem, the rest looks like network
connection
stuff, perhaps, though we haven't had any problems there for a while.
Neil Gilmore
grammatech.com <http://grammatech.com>
_______________________________________________
users mailing list
[email protected] <mailto:[email protected]>
https://lists.buildbot.net/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://lists.buildbot.net/mailman/listinfo/users