[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-08-06 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #7 from metatron metat...@online.ms ---
See also

https://bugzilla.wikimedia.org/show_bug.cgi?id=69182

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #2 from Tim Landscheidt t...@tim-landscheidt.de ---
(In reply to Sean Pringle from comment #1)
 [...]

 Today we are trialing READ-COMMITTED isolation level which was initially
 REPEATABLE-READ. This has to be watched carefully; while toolserver was RC,
 the labsdb replicas were RR and tools may rely on one or the other.

 [...]

Asher wrote in http://permalink.gmane.org/gmane.org.wikimedia.labs/1308:

|  [...]

|  Is this limited to the replicated tables or does it affect
|  the user tables as well?  Does this mean REPEATABLE READ is
|  not available at all on the labsdb servers?

| Essentially.  While tx_isolation is set to REPEATABLE READ, only READ
| COMMITTED is actually guaranteed.  This applies to all labsdb systems.

So Tools shouldn't have been able to rely on RR.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #3 from Tim Landscheidt t...@tim-landscheidt.de ---
*tools

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #4 from metatron metat...@online.ms ---
(In reply to Sean Pringle from comment #1)

Though I didn't change anything, the massive lock wait timeouts or deadlocks
disappeared. Right now I'm trying: 
ENGINE=Aria PAGE_CHECKSUM=0 TABLE_CHECKSUM=0 ROW_FORMAT=FIXED  for this special
purpose tables.


Another issue, but related to blocks:
From time to time statistics collector appears and blocks all current requests
for ~10-20 sec.

1903088 s51187  10.68.16.29:48497   zhwiki_pQuery   37   Opening
tables  
1903141 s51187  10.68.17.123:45670  ptwiki_pQuery   33   Sending
data
1903172 s51187  10.68.16.29:48546   dewiki_pQuery   29   statistics 
1903179 s51187  10.68.17.123:45770  dewiki_pQuery   25   statistics
1903183 s51187  10.68.16.29:48562   wikidatawiki_p  Query   24   statistics 
1903184 s51187  10.68.16.29:48566   wikidatawiki_p  Query   24   statistics 
1903185 s51187  10.68.17.123:45790  ptwiki_pQuery   24   statistics
...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #5 from Sean Pringle sprin...@wikimedia.org ---
(In reply to Tim Landscheidt from comment #2)
 (In reply to Sean Pringle from comment #1)
  [...]
 
  Today we are trialing READ-COMMITTED isolation level which was initially
  REPEATABLE-READ. This has to be watched carefully; while toolserver was RC,
  the labsdb replicas were RR and tools may rely on one or the other.
 
  [...]
 
 Asher wrote in http://permalink.gmane.org/gmane.org.wikimedia.labs/1308:
 
 |  [...]
 
 |  Is this limited to the replicated tables or does it affect
 |  the user tables as well?  Does this mean REPEATABLE READ is
 |  not available at all on the labsdb servers?
 
 | Essentially.  While tx_isolation is set to REPEATABLE READ, only READ
 | COMMITTED is actually guaranteed.  This applies to all labsdb systems.
 
 So Tools shouldn't have been able to rely on RR.

More complicated than that, unfortunately. Using innodb_locks_unsafe_for_binlog
has the effect of only guaranteeing READ COMMMITTED for INSERT/UPDATE/DELETE
queries, but not for SELECT which respects the REPEATABLE READ isolation level.
This would only be a potential behavior change for certain multi-statement
transactions.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-07-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #6 from Sean Pringle sprin...@wikimedia.org ---
(In reply to metatron from comment #4)

 Though I didn't change anything, the massive lock wait timeouts or deadlocks
 disappeared. Right now I'm trying: 
 ENGINE=Aria PAGE_CHECKSUM=0 TABLE_CHECKSUM=0 ROW_FORMAT=FIXED  for this
 special purpose tables.

Ok, good. Let's watch it for a day or two before declaring the locking problem
solved.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 68753] Database upgrade MariaDB 10: Lock wait timeouts / deadlocks in a row

2014-07-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=68753

--- Comment #1 from Sean Pringle sprin...@wikimedia.org ---
(In reply to metatron from comment #0)
 - Is this behaviour transient? If not, please make it go away.

Hopefully transient, yes.

We are using the TokuDB storage engine on the new instance in order to handle
the write load from replicating multiple shards. TokuDB has a shorter lock-wait
setting by default, just 4s. Immediately after upgrade that default was in use
for around 24h, but has since been increased to 50s to match InnoDB config.

Today we are trialing READ-COMMITTED isolation level which was initially
REPEATABLE-READ. This has to be watched carefully; while toolserver was RC, the
labsdb replicas were RR and tools may rely on one or the other.

 - Is new user-schema-storage slower than the older one? And is this realated?

The new instance is on a RAID 10 array during migration. It will be migrated
back onto SSD this week after the old instances are removed. Yes, it may slower
until then.

Also worth noting that TokuDB is likely to be a little slower than InnoDB in
cases when the dataset fits entirely in memory, however it should be faster
when the dataset exceeds memory and/or data is cold. The difference should
always be well within the same order of magnitude. Anything running drastically
slower is a bug.

 - Is there some process that interferes with user schemas?

Do you mean user schemas or user traffic? If queries are running for many
hours, or doing table scans on large replicated tables that block replication,
they may be killed.

Some tools use very slow processes that only *seemed* ok on the old instances,
but in fact caused issues behind the scenes up to and including instance OOM
crashes. We need to reach out to certain labs users to help redesign some
queries to be a little friendlier.

In most cases it will be a matter of breaking up large, slow transactions into
batches.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l