In this article I'd like to share some of my findings about a deadlock caused
by the FTWRL statement executed by xtrabackup on a running MySQL slave node,
and my thoughts around how MySQL server locking and xtrabackup works.
I. Background
The 'flush table with read lock'(FTWRL) statement makes a MySQL node totally
read only. After FTWRL executed and before 'unlock tables' executed, only
select statements and other read-only(e.g. show) statements can be executed.
It's used by mysqldump and old versions of percona-xtrabackup to prevent any
data files on the node from being modified before copying files in MySQL data
directory.
To make a MySQL instance read only, FTWRL acquires 2 locks one after another
(and does some extra work in between) --- a global read lock (GRL) and then a
COMMIT LOCK(CL).
The GRL blocks insert/delete/update statement execution within the entire
mysqld process because each insert/delete/update statement acquires a GLOBAL
write intention lock at its very start and releases it at end of statement.
This intention GLOBAL lock conflicts with the GRL of FTWRL;
The CL blocks commits of transactions that are executing the 'commit'
statement or at the commit phase of an autocommit statement, because each
transaction commit operation acquires an intention COMMIT lock compatible with
each other, but conflicts with the COMMIT lock mode acquired by FTWRL.
II. The Unresolvable Deadlock
I located this deadlock when our DBA reported me an issue: xtrabackup caused a
slave node to block forever. To illustrate this deadlock issue, I'll first
introduce the process of xtrabackup as below:
a. start copying of innodb data files (*.ibd) and innodb redo log files. If
there are more transactional storage engines, do the same for them too.
b. FTWRL
c. copy FRM files and all data files of non-transactional storage engines
d. run 'get master status' to get binlog position, this is where user needs to
start applying binlogs from when he/she restores a MySQL instance using this backup.
e. finish copying innodb redo log files and any other storage engines' redo
logs. The redo logs appended up to the tip(newest end) of the log must all be
copied.
f. unlock tables.
The problem is that on a slave node, during the execution of a FTWRL
statement(e.g. by xtrabackup), after GRL is acquired and before the CL is
acquired, the CL can already be acquired by a slave worker thread S1 commiting
a transaction T1. And another slave worker thread S0 may be executing another
transaction T0 which precedes T1 and S0 can be blocked by the FTWRL's GRL,
then at T1 commit, S1(T1) has to wait for T0's
commit(slave_preserve_commit_order is true) which is blocked by
the GRL of the FTWRL, and the FTWRL statement holding the GRL is blocked by
T1's COMMIT lock, forming a cyclic wait graph, hence a deadlock forms.
III. Why FTWRL works the current way
To fix/work around this issue, one would think that if FTWRL doesn't acquire
GRL then this issue won't happen. Indeed for transactional storage engines in
MySQL, as long as COMMIT lock is acquired, no transactions can commit anyway
so the global data view keeps consistent, the only trivial shortcomming is that a lot
of redo logs would be produced and written to storage engine(e.g. innodb) redo
log files during the backup process and copied by xtrabackup, which is a waste
of storage space and CPU cycles.
The bigger issues are with non-transactional storage engines --- MySQL started
out using myisam which isn't transactional, and there are several
non-transactional storage engines today in MySQL and mariadb(MyISAM, CSV,
Aria, etc) and this is why xtrabackup has to perform a FTWRL before the backup
--- it has to flush all modified table files to disk, and prevent
insert/delete/update statements to further modify any non-transactional table
files during the backup, because there is no transactional redo logging or
recovery in myisam and the like, any partial/dirty changes to table files done
during the backup could be flushed to disk and go into the backup, forming a
backup that not only contains dirty changes but also may even contain
corrupted data structures.
Also, DDLs against tables on any storage engine have to be blocked during the
backup because before mysql8.0, in MySQL server layer, table metadata is
stored in FRM files, rather than transactional tables. DDLs modify FRM files
so they can not be executed during a backup so as not to copy dirty and/or
incomplete changes to the changed FRM files.
For other DBMS that are totally transactional, e.g. PostgreSQL, etc, doing a
backup is as easy as copying all data directories and redo log files while the
db instance is running. No global long term locks like done in xtrabackup are
acquired at all, transactions are executed and committed just as without the
running hot backup, and redo logs keep growing.
As long as the backup process keeps copying all the
incrementally appended redo logs until copying of all data files are finished,
the backup is in a consistent state. At restoration, such a DBMS would simply
do a recovery to restore changes done to tables whose pages were not flushed to
disk during the backup, and the the restored db instance is in a consistent
state.
Then what about the COMMIT lock acquired by FTWRL and the commit phase of each
transaction? can this COMMIT lock be removed for transactions that modified
only tables on transactional storage engines, so that MySQL instances using
only transactional storage engines can be backed up simply like other
transactional DBMS'es? After all without the COMMIT lock, transactions don't
have to be blocked waiting for the long process of xtrabackup holding the FTWRL
locks(i.e. the steps from b to f above) so parallelism could be enhanced a lot!
Unfortionately for MySQL the answer is no, because there are binlogs in
MySQL. When we restore a db instance using the backup, we often need to apply
binlogs which are backed up seperately in order to bring the db instance to a
target time point or version represented using a gtid. The binlog replay must
start from the position that is equivalent to or consistent with the storage
engine data version, that's why xtrabackup records this position while holding
the FTWRL GRL and COMMIT locks --- only during this window can one reliably
get a binlog position where transactions in the binlog system and storage
engines match --- they have the same set of committed transactions. Otherwise
when one replays binlogs later one could meet with errors caused by binlog
events applied to data which already has such changes or applied to data that
doesn't even exist in storage engine, and the restored db instance would be
not able to catch up to be a usable new db instance.
In short, there is no way such a dead lock can be fixed or resolved when
xtrabackup works in the old mode.
IV. Percona backup locks
Percona server introduced backup locks in recent xtrabackup&percona-mysql
versions to make the backup process less costly to server performance, and
such enhancements also partly fix above deadlock issue for transactional
storage engines such as innodb.
Backup locks are acquired by xtrabackup via 'lock tables for backup' and 'lock
binlog for backup' statements in place of the GRL and CL locks acquired by
FTWRL statement. The table backup lock serves similar purpose as the GRL of
FTWRL but is much less heavy-weight -- insert/delete/update statements on
transactional storage engines(e.g. innodb) don't acquire table backup lock(so
never blocked by xtrabackup), thus above dead lock is avoided in this case.
And without blocking waiting for the lock, parallellism and performance is
improved.
Also, open transactional tables are not closed like done by FTWRL and no
transactional tables have to be flushed during a backup process, which saves
quite some IO and CPU cycles.
The table backup lock is still acquired by insert/delete/update statements
against tables in non-transactional storage engines like MyISAM/CSV, etc,
and such tables must be flushed to disk in 'lock table for backup' like in
FTWRL. And the backup lock is also acquired by DDL statements against tables
of any storage engine, all for the same reason why the action is performed in
FTWRL --- table meta data stored in FRM files instead of transactional data
dictionary tables.
The binlog backup lock serves the same purpose as the COMMIT lock acquired in old
version of FTWRL --- to keep binlog and backed-up storage engine data/redo
logs consistent. But there are two differences between the COMMIT lock
in FTWRL and the binlog backup lock acquired in 'lock binlog for backup'.
The big difference is that if you don't do binary logging, then your
transactions don't acquire the binlog backup lock. This means if a transaction
modifies only transactional tables but doesn't do binary logging(which is very
rare today because almost everyone needs a slave node to fail over), then it's
never blocked by xtrabackup. This way, a backup process causes much less
impacts to the normal run of mysqld service, just as how other transactional
DBMS'es (e.g. PostgreSQL) work. And the copying of the ever growing innodb
redo logs can end when all innodb data files are copied (the same applies to
other transactional storage engines if any).
The second difference is that rather than being blocked at the very
beginning of transaction commit, now the transactional storage engines can
finish their first phase of the internal 2PC commit, leaving many internal
transactions at 'prepared' state in innodb and other transactional storage
engines, this means more progress for the running db instance. But OTOH
without corresponding binlogs, at restoration such prepared transactions will
all be aborted at server startup, so the backup has to copy a lot more
redo logs in vain.
And as to the deadlock issue above, for transactions that only operate on
transactional storage engines without doing DDL, the deadlock issue is fixed;
for those operating non-transactional storage engines, and also for all DDL
statements, the deadlock issue still exists.
The only work around for the deadlock issue is to turn off
slave_preserve_commit_order. But if one uses slave nodes to do read scale-out,
turning off slave_preserve_commit_order would make slave data often in a state
or view that the master never is in, because of the committed-transactions
gaps formed by unordered transaction commits which is a natural result of
slave parallel replication.
I'll share my findings about a few other questions in my mind later regarding
xtrabackup, plus an enhancement for xtrabackup in future.
V. References
1. https://www.percona.com/doc/percona-server/LATEST/management/backup_locks.html
2. https://www.percona.com/blog/2014/03/11/introducing-backup-locks-percona-server-2/
3. https://dev.mysql.com/doc/refman/5.7/en/
4. Percona-MySQL-5.7.17-11 source code.
文章转载自数据库开发者,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




