Recently, I set up several new database pairs for our backend search team to use. After bringing them online, the search team began backfilling data by writing to the A-side. A bit later, I noticed that replication had started falling behind, maxing out at ~1000 inserts/second.
There isn’t too much unusual about these servers. They run Percona Server 5.6.19 to take advantage of the TokuDB storage engine option that Percona now offers. The schema is sharded into many databases, so each database (db1, db2, db3) has the same schema, but different data in the tables. With SSD backed storage, it seemed most likely that this was a simple case of a single replication thread not being able to keep up with the volume of inserts that it was receiving from the master. A perfect use case to try multithreaded replication (MTR). The setup was very easy:
- Stop both the IO and SQL thread.
- Enable TABLE based repositories for master.info and relay.info data. The relevant options are master_info_repository and relay_log_info_repository. Switching these to TABLE, from their default FILE, will cause MySQL to use two InnoDB tables, mysql.slave_master_info and mysql.slave_relay_log_info, instead of the usual master.info and relay-log.info files. This is needed to track the state of each of the multiple workers (SQL threads) in conjunction with the slave_relay_log_info table using the ACID benefits that InnoDB provides.
- Change the default value of slave_parallel_workers from 0. The default of 0 will not use the mysql.slave_worker_info table. Setting it to 1 or above will use the mysql.slave_worker_info and hand jobs off to however many workers you specify. Anecdotally, I had heard that 8 was a good setting for performance.
- Start replication again.
This yielded some impressive results:
Each gap in the mysql_slave_status_secs represented a different worker thread setting:
Funny enough, replication catchup was so quick that it was hard to really gauge how well 16 threads did. In the end, it looks like the sustained rate of inserts was averaging ~1200/sec, about 200 more than a single SQL thread could handle.
There are some concerns about using MTR with regards to how it handles the replication coordinates. As Stephane Combaudon notes on his Percona blog post, the coordinates written to the relay log table are the oldest. If you were to crash during replication, or use a restored backup, that position could have had a few events processed after it, possibly leading to duplicate key errors. Now that Percona Server supports online GTID (Global Transaction ID) deployments, it should be easier to roll GTIDs out to existing environments. There is also an excellent MariaDB blog entry about how to manage MTR with Percona Xtrabackup.
Backups end up not being an issue for us in our use-case. To backup TokuDB, you need to use the TokuDB Enterprise hot-backup tool, shut your database down or use LVM to ensure a consistent snapshot of your volume. We opted for LVM + mylvmbackup with a few gotchas.
Deploying Percona Server 5.6 had a few side effects for how we deploy in general, which had been exclusively Percona Server 5.5.
- Our long running thread monitor wasn’t setup to ignore the replication threads owned by “system user”. Since the ‘time’ column from information_schema.processlist tracks how long the processes have been running, it didn’t take long before they went off.
- As there are now the mysql.slave_master_info and mysql.slave_relay_log_info InnoDB tables in 5.6,, ibdata1 and ib_logfile0,1 are now present when you install the server.Our Chef recipe that builds new instances of MySQL didn’t expect these additional files deployed by Percona-Server-server-56. Since we also deploy a my.cnf file that changes the defaults for these, I had to modify the initial size of our ibdata1 file in my.cnf to match the one installed by the package. Thankfully, 5.6 detects changes to the ib_logfiles and resizes them accordingly (h/t to Peter Boros and Andrew Moore for pointing this out).