One of the most commonly misunderstood configurations in MySQL is related to parallel replication threads in MySQL, and more importantly how do I monitor my replica to ensure its configured effectively.
Background on parallel replication
Traditionally, MySQL replication was single threaded where changes applied on a writer database are fetched by a replica and applied serially on a replica using a single applier thread, in the same order they are executed on a writer. Over time some optimizations were introduced to allow replicas apply non-dependent transactions/changes in parallel on a replica.
To decide on whether a change is dependent MySQL first introduced database level dependency tracking where changes applied to different databases could be applied in parallel on a replica.
Then MySQL introduced the "logical clock". When an event is written to the binary log on a mysql database, whether a replica is connected or not, each transaction/event will also be associated with a a logical timestamp. The logical timestamp is calculated according to the type of dependency tracking configured on the source, and consists of two values, last_committed
and sequence_number
.
sequence_number
- each transaction in a binlog file, starting with the first, is assigned a sequence number starting from 1 and monotonically increasing from there. In other words, the Nth transaction in a file will have sequence_number == Nlast_committed
- this field of the timestamp is the sequence number of the most recent transaction which the current transaction depends on. In other words, this is the last transaction which the current transaction cannot be applied in parallel with.
Before MySQL 5.7 there was one mechanism of determining dependencies based on logical timestamp, COMMIT_ORDER. The core concept here is that transactions can safely run in parallel if they held their locks simultaneously at commit time on the source. This concept is implemented by having each transaction track its dependencies through a commit_parent variable, which updates as locks are acquired. After each non-COMMIT statement, this variable is updated to match the latest transaction in the binlog, and at commit time, this value becomes the last_committed number. The mechanism works because once a transaction has executed its last pre-COMMIT statement, it holds all its needed locks. Any transactions that commit during this window can safely run in parallel on a replica since these transactions would have held their locks concurrently. This approach ensures data consistency while allowing parallel execution, providing a reliable mechanism for determining safe concurrent transactions, though the actual parallelism gains may be modest depending on workload. There are some caveats here, such as when a DDL is being executed, no other replica worker threads can be active which will basically block all other threads from scheduling work. All transactions are dependent on the DDL executing,
In this example, we have parsed the binary log using the mysqlbinlog tool.
Statement | last_committed | sequence_number | Note |
create database test_db | 0 | 1 | This is the first statement in a new binary log file, so last committed is zero, and the first statement is assigned the sequence number of 1 |
create table test_db.t1(id int) | 1 | 2 | This is the second transaction/event in this binary log file so it is assigned a sequence number of 2, last committed is set to 1 as this is the sequence number of the previous transaction/event to which this is dependant on. Logically thinking, it would not make sense to try execute create table(sequence number 2) in a database which does not exist.(sequence number 1) |
This new implementation (COMMIT_ORDER) was heavily dependent on the concurrency/throughput of the source system. The sequence numbers generated are heavily dependent on commit rate; where low-thread sources would commonly run into issues where dependency information was not granular enough. As a result, MySQL introduced a new mechanism to determine dependencies called WRITESET dependency tracking, which could be enabled via binlog_transaction_dependency_tracking
. (WL#9556) Rather than relying on timing-dependent COMMIT_ORDER dependency tracking, WRITESET tracks actual data dependencies between transactions. When WRITESET is used for dependency tracking, MySQL creates transaction "writesets," which are collections of hashes. Each hash represents a modified database row on the MySQL source database.
The algorithm tracks:
- sequence_number (unique transaction ID)
- last_committed (most recent dependent transaction)
- Overlapping writesets (indicating dependencies)
Using this information, MySQL is able to determine dependencies based on writesets, rather than just locks held at commit time. Two transactions are deemed dependent if their writesets overlap. In the binary log from the source, last_committed now points to the most recent transaction with shared data. Using writesets, independent transactions can execute in parallel even if:
- They were sequential on the source
- Their group commit/commit windows didn't overlap
- They were in different sessions
![]() |
Source |
Since I didn't want to write "yet another" blog on dependency tracking I may have skipped over some details above but if you want to read more into this I would highly recommend reading Daniel Nichter's blog series here:
- Group commit and trx dependency tracking
- Replica preserve commit order (RPCO)
- Monitoring MTR lag with the Performance Schema
New Performance Schema Replication Tables
Replication: This release adds the MySQL Replication Applier Metrics component, which provides users with statistical information about replication formerly logged in the error log. The component adds two tables containing this information to the MySQL Performance Schema:
replication_applier_metrics
provides replication applier metrics for a given replication channel, andreplication_applier_progress_by_worker
: This table provides similar metrics for a specific worker.This enhances observability of replication by gathering statistics from the entire replication pipeline, and unifying their presentation. As part of this work, some metrics which were not especially helpful have been replaced with more useful ones.
For more information about this component, see Replication Applier Metrics Component. (WL #15620)
References: See also: Bug #32587480.
Multi-threaded replica statistics
$ grep "Multi-threaded replica statistics" node1/data/msandbox.err | head -n 4
2025-01-27T18:07:46.516639Z 6 [Note] [MY-010559] [Repl] Multi-threaded replica statistics for channel '': seconds elapsed = 250; events assigned = 13313; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 7565719600 waited (count) when Workers occupied = 44 waited when Workers occupied = 1489733600
2025-01-27T18:09:46.150048Z 6 [Note] [MY-010559] [Repl] Multi-threaded replica statistics for channel '': seconds elapsed = 120; events assigned = 466945; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 55769034800 waited (count) when Workers occupied = 15175 waited when Workers occupied = 69850249500
2025-01-27T18:12:04.031062Z 16 [Note] [MY-010559] [Repl] Multi-threaded replica statistics for channel '': seconds elapsed = 120; events assigned = 976897; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 103952309000 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
2025-01-27T18:15:09.087390Z 38 [Note] [MY-010559] [Repl] Multi-threaded replica statistics for channel '': seconds elapsed = 120; events assigned = 1163265; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 111850198700 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
You can see all definitions in the MySQL docs, but from first glance we can see this prints information on worker thread utilization(waited when Workers occupied) and clock conflicts! (Waited at clock conflicts). So whats the catch?
- This only gets printed when log_error_verbosity=3. Since MySQL defaults to 2, a lot of users probably have not seen this before.
This is a dynamic parameter which can be changed online. - When introduced, there were no views exposing this information at the database level, requiring grepping and wrangling of log files. This coupled with the log_error_verbosity default, probably led to this log message being neglected by MySQL operators over the years.
Luckily, MySQL 8.0 changes this with the introduction of the performance_schema.error_log table. More on this below. - Messages are not printed in real time, as per the docs "The statistics are printed depending on the volume of events that the coordinator thread has assigned to applier worker threads, with a maximum frequency of once every 120 seconds". So we only have outputs at most, every two minutes.
This shouldn't be an issue on most replication setups though, we should be generating enough replication events to have enough processed events every 2 minute window.
New replication Views
As described above, from MySQL 8 we can use the performance_schema.error log table to query the error log directly from the database. To demonstrate this I will load some sysbench sample data and run some tests. After a couple of minutes with log_error_verbosity=3 on the replica, you should see some outputs.
slave1 [*********:23244] {msandbox} (performance_schema) > SELECT * FROM
performance_schema.error_log
WHERE
SUBSYSTEM = 'Repl'
AND ERROR_CODE = 'MY-010559'
AND DATA LIKE '%Multi-threaded % statistics for channel%'\G
*************************** 1. row ***************************
LOGGED: 2025-04-25 21:28:05.406046
THREAD_ID: 6
PRIO: Note
ERROR_CODE: MY-010559
SUBSYSTEM: Repl
DATA: Multi-threaded replica statistics for channel '': seconds elapsed = 132; events assigned = 5121; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 883466700 waited (count) when Workers occupied = 8 waited when Workers occupied = 86657300
.
.
*************************** 30. row ***************************
LOGGED: 2025-04-25 22:37:53.021052
THREAD_ID: 16
PRIO: Note
ERROR_CODE: MY-010559
SUBSYSTEM: Repl
DATA: Multi-threaded replica statistics for channel '': seconds elapsed = 120; events assigned = 58414081; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 316849996600 waited (count) when Workers occupied = 55876 waited when Workers occupied = 31639630000
30 rows in set (0.00 sec)
Now that we have a way of getting the log messages we can do some parsing of the data columns to present it in a neater way. To do this I've created a view called binlog_replication_coordinator_stats
which you can see below in the appendix.
Column | Description |
---|---|
timestamp | When these statistics were logged |
channel | Replication channel name ('default' if unnamed channel) |
seconds_elapsed | Time period since last stats output in error log |
events_assigned | Number of events the coordinator assigned to worker threads |
avg_events_assigned | Average events assigned per second (events_assigned/seconds_elapsed) |
queues_filled | Number of times worker queues exceeded the overrun level in this period. High numbers indicate workers can't keep up with coordinator |
waited_worker_queue_full | Times coordinator waited because a worker's queue was full |
waited_pending_total_size | Times coordinator waited due to total pending jobs size limit |
clock_conflict_waits | Times coordinator waited due to commit order conflicts. High numbers can indicate: 1. Transactions touching same databases (when slave_parallel_type=DATABASE) 2. Write-set dependency conflicts (when slave_parallel_type=LOGICAL_CLOCK or WRITE_SET) 3. Commit order preservation overhead with slave_preserve_commit_order=ON: - Workers must commit in same order as master binlog - Earlier transaction in worker queue blocks later ones - Critical for preventing data inconsistency Can be influenced by: - binlog-transaction-dependency-tracking setting on source (COMMIT_ORDER, WRITESET, or WRITESET_SESSION) - slave_parallel_type setting on replica (DATABASE, LOGICAL_CLOCK, or WRITE_SET) - transaction writeset contents - transaction size and complexity |
workers_occupied_count | Number of times coordinator found all workers busy. Indicates potential need for more worker threads |
workers_occupied_time | Total microseconds waited for busy workers. Long waits suggest workers are the bottleneck |
Example troubleshooting flow:
slave1 [*********:23244] {msandbox} (repl_mon) > pager grep "Seconds_Behind_Master"
PAGER set to 'grep "Seconds_Behind_Master"'
slave1 [*********:23244] {msandbox} (repl_mon) > show slave status\G
Seconds_Behind_Master: 51
1 row in set, 1 warning (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > select * from binlog_replication_coordinator_stats;
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| timestamp | channel | seconds_elapsed | events_assigned | avg_events_assigned | queues_filled | waited_worker_queue_full | waited_pending_total_size | clock_conflict_waits | workers_occupied_count | workers_occupied_time |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| 2025-04-28 21:25:44.328627 | default | 345 | 1025 | 3 | 0 | 0 | 0 | 24208500 | 4 | 8697300 |
| 2025-04-28 21:27:44.012512 | default | 120 | 5208064 | 43401 | 0 | 0 | 0 | 100624635100 | 8256 | 4027435500 |
| 2025-04-28 21:35:57.464912 | default | 493 | 4176896 | 8472 | 0 | 0 | 0 | 69082991700 | 5879 | 2292737600 |
| 2025-04-28 21:37:57.020855 | default | 120 | 5331968 | 44433 | 0 | 0 | 0 | 100571333200 | 7642 | 3408264300 |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
4 rows in set (0.00 sec)
From the above, we can see some waits on Workers occupied
and also clock conflicts
. In show slave status we can see Seconds_Behind_Master
is 51 seconds. This is an indicator that I may have underprovisioned my @@replica_parallel_workers, but also that I may need to look into dependency tracking.
Since changing the replica_parallel_workers
is low hanging fruit lets start there.
To test this out I will:
- Increase
replica_parallel_workers
to a higher value and restart the replication thread. - Obseve replication lag to see if it decreases.
- Tune
replica_parallel_workers
further, or continue troubleshooting.
Step 1: Change replica_parallel_workers and restart replication:
slave1 [*********:23244] {msandbox} (repl_mon) > select @@replica_parallel_workers;
+----------------------------+
| @@replica_parallel_workers |
+----------------------------+
| 4 |
+----------------------------+
1 row in set (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > set global replica_parallel_workers=20;
Query OK, 0 rows affected (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > select @@replica_parallel_workers;
+----------------------------+
| @@replica_parallel_workers |
+----------------------------+
| 20 |
+----------------------------+
1 row in set (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > stop slave;start slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.09 sec)
Step 2: Wait for 4-5 minutes to observe changes in workers_occupied_count
and seconds_behind_master
After a couple of minutes we can see that workers_occupied_count
dropped to zero which is great! But our replication lag has not improved.
slave1 [*********:23244] {msandbox} (repl_mon) > select * from binlog_replication_coordinator_stats;
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| timestamp | channel | seconds_elapsed | events_assigned | avg_events_assigned | queues_filled | waited_worker_queue_full | waited_pending_total_size | clock_conflict_waits | workers_occupied_count | workers_occupied_time |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| 2025-04-28 21:25:44.328627 | default | 345 | 1025 | 3 | 0 | 0 | 0 | 24208500 | 4 | 8697300 |
| 2025-04-28 21:27:44.012512 | default | 120 | 5208064 | 43401 | 0 | 0 | 0 | 100624635100 | 8256 | 4027435500 |
| 2025-04-28 21:35:57.464912 | default | 493 | 4176896 | 8472 | 0 | 0 | 0 | 69082991700 | 5879 | 2292737600 |
| 2025-04-28 21:37:57.020855 | default | 120 | 5331968 | 44433 | 0 | 0 | 0 | 100571333200 | 7642 | 3408264300 |
| 2025-04-28 21:42:53.061556 | default | 120 | 4802561 | 40021 | 0 | 0 | 0 | 90936245100 | 0 | 0 |
| 2025-04-28 21:44:53.014055 | default | 120 | 5512192 | 45935 | 0 | 0 | 0 | 103869504400 | 0 | 0 |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
6 rows in set (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > pager grep "Seconds_Behind_Master"
PAGER set to 'grep "Seconds_Behind_Master"'
slave1 [*********:23244] {msandbox} (repl_mon) > show slave status\G
Seconds_Behind_Master: 75
1 row in set, 1 warning (0.00 sec)
Step 3: Now we should look into dependency tracking changes. Here I will take Daniel's advice, and enable WRITESET on the source instance. This can be done online, so lets try:
master [*********:23243] {msandbox} ((none)) > set global binlog_transaction_dependency_tracking='writeset';
Query OK, 0 rows affected, 1 warning (0.00 sec)
master [*********:23243] {msandbox} ((none)) > select @@binlog_transaction_dependency_tracking,@@version;
+------------------------------------------+-----------+
| @@binlog_transaction_dependency_tracking | @@version |
+------------------------------------------+-----------+
| WRITESET | 8.0.42 |
+------------------------------------------+-----------+
1 row in set, 1 warning (0.00 sec)
Now we need to wait a few minutes to allow changes to propagate.
Remember that dependency tracking is applied on the source, so only binlog events from the time binlog_transaction_dependency_tracking is changed will use writeset to persist dependencies to the binary log, so we need to wait for replica to catch up to the change point to take advantage.
slave1 [*********:23244] {msandbox} (repl_mon) > show slave status\G
Seconds_Behind_Master: 143
1 row in set, 1 warning (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > \n
PAGER set to stdout
slave1 [*********:23244] {msandbox} (repl_mon) > select * from binlog_replication_coordinator_stats;
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| timestamp | channel | seconds_elapsed | events_assigned | avg_events_assigned | queues_filled | waited_worker_queue_full | waited_pending_total_size | clock_conflict_waits | workers_occupied_count | workers_occupied_time |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| 2025-04-28 21:25:44.328627 | default | 345 | 1025 | 3 | 0 | 0 | 0 | 24208500 | 4 | 8697300 |
| 2025-04-28 21:27:44.012512 | default | 120 | 5208064 | 43401 | 0 | 0 | 0 | 100624635100 | 8256 | 4027435500 |
| 2025-04-28 21:35:57.464912 | default | 493 | 4176896 | 8472 | 0 | 0 | 0 | 69082991700 | 5879 | 2292737600 |
| 2025-04-28 21:37:57.020855 | default | 120 | 5331968 | 44433 | 0 | 0 | 0 | 100571333200 | 7642 | 3408264300 |
| 2025-04-28 21:42:53.061556 | default | 120 | 4802561 | 40021 | 0 | 0 | 0 | 90936245100 | 0 | 0 |
| 2025-04-28 21:44:53.014055 | default | 120 | 5512192 | 45935 | 0 | 0 | 0 | 103869504400 | 0 | 0 |
| 2025-04-28 21:46:53.018936 | default | 120 | 5500928 | 45841 | 0 | 0 | 0 | 104048594800 | 0 | 0 |
| 2025-04-28 21:48:53.020615 | default | 120 | 5524480 | 46037 | 0 | 0 | 0 | *********300 | 0 | 0 |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
8 rows in set (0.00 sec)
Step 4: Verify changes to binlog_transaction_dependency_tracking=writeset After a few minutes, we can now see replication lag has disappeared!
slave1 [*********:23244] {msandbox} (repl_mon) > pager grep "Seconds_Behind_Master"
PAGER set to 'grep "Seconds_Behind_Master"'
slave1 [*********:23244] {msandbox} (repl_mon) > show slave status\G
Seconds_Behind_Master: 0
1 row in set, 1 warning (0.00 sec)
This is great, and shows the impact that clock conflicts were having on our replication throughput.
However, if we look at binlog_replication_coordinator_stats we can now see the bottleneck has moved back to the number of worker threads!
Can we achieve more parallelism? Lets increase replica_parallel_workers
again and see:
slave1 [*********:23244] {msandbox} (repl_mon) > select * from binlog_replication_coordinator_stats;
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| timestamp | channel | seconds_elapsed | events_assigned | avg_events_assigned | queues_filled | waited_worker_queue_full | waited_pending_total_size | clock_conflict_waits | workers_occupied_count | workers_occupied_time |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
| 2025-04-28 21:25:44.328627 | default | 345 | 1025 | 3 | 0 | 0 | 0 | 24208500 | 4 | 8697300 |
| 2025-04-28 21:27:44.012512 | default | 120 | 5208064 | 43401 | 0 | 0 | 0 | 100624635100 | 8256 | 4027435500 |
| 2025-04-28 21:35:57.464912 | default | 493 | 4176896 | 8472 | 0 | 0 | 0 | 69082991700 | 5879 | 2292737600 |
| 2025-04-28 21:37:57.020855 | default | 120 | 5331968 | 44433 | 0 | 0 | 0 | 100571333200 | 7642 | 3408264300 |
| 2025-04-28 21:42:53.061556 | default | 120 | 4802561 | 40021 | 0 | 0 | 0 | 90936245100 | 0 | 0 |
| 2025-04-28 21:44:53.014055 | default | 120 | 5512192 | 45935 | 0 | 0 | 0 | 103869504400 | 0 | 0 |
| 2025-04-28 21:46:53.018936 | default | 120 | 5500928 | 45841 | 0 | 0 | 0 | 104048594800 | 0 | 0 |
| 2025-04-28 21:48:53.020615 | default | 120 | 5524480 | 46037 | 0 | 0 | 0 | 104187049300 | 0 | 0 |
| 2025-04-28 21:50:53.017684 | default | 120 | 13111296 | 109261 | 0 | 0 | 0 | 38211462000 | 148868 | 34451537500 |
| 2025-04-28 21:52:53.016574 | default | 120 | 9456640 | 78805 | 0 | 0 | 0 | 265316200 | 52396 | 14188590700 |
| 2025-04-28 21:54:53.014725 | default | 120 | 7398400 | 61653 | 0 | 0 | 0 | 43328300 | 4493 | 3769165700 |
+----------------------------+---------+-----------------+-----------------+---------------------+---------------+--------------------------+---------------------------+----------------------+------------------------+-----------------------+
11 rows in set (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > select @@replica_parallel_workers;
+----------------------------+
| @@replica_parallel_workers |
+----------------------------+
| 20 |
+----------------------------+
1 row in set (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > set global replica_parallel_workers=40;
Query OK, 0 rows affected (0.00 sec)
slave1 [*********:23244] {msandbox} (repl_mon) > stop slave;start slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.10 sec)
Step 5: Wait for 4-5 minutes to observe changes in workers_occupied_count
Now we can see results have improved, for fun we have even increased to 100 to see how things go.
After doing so we can see the bottleneck swung back to clock_conflict_waits and workers_occupied_count dropped to zero.
slave1 [*********:23244] {msandbox} (repl_mon) > select * from binlog_replication_coordinator_stats;
[... previous output truncated for brevity ...]
# For fun, lets increase even more
slave1 [*********:23244] {msandbox} (repl_mon) > select @@replica_parallel_workers;
+----------------------------+
| @@replica_parallel_workers |
+----------------------------+
| 100 |
+----------------------------+
1 row in set (0.00 sec)
Final thoughts:
- Replication is a first-class citizen in MySQL, it would be nice to have native CE views for this information instead of relying on error log hacks.
- The performance_schema.error_log table is limited in size, so I would suggest periodically publishing metrics to your preferred monitoring solution so logs are not deleted or rotated accidentally.
- log_error_verbosity is a dynamic variable, so if replication logging is too noisy for your liking, you can only enable when needed but historical info can be useful in such investigations; its a tradeoff :).
- Starting in MySQL 8.4, writeset is enabled by default(only config allowed), for 8.0 you should really look into enabling it.
- Parallel replication threads defaults to 4 from MySQL version 8.0.27. Prior to this it was disabled by default.
- Tuning of worker threads(and replication in general!) should be done with caution, there are always trade offs and you should optimize based on your requirements.
In the above example, while provisioning 100 replica_parallel_workers removed the bottleneck on worker threads, my problem was solved at ~20 threads.
Be very careful that you are not shifting deck chairs around the titanic. Having sensible values, can help avoid fires elsewhere and act as a throttling mechanism, you don't want to saturate other parts of your system unnecessarily at peak times. - Its not always the replication configuration, and you should be careful, and verify while changing. For example:
- There can be other bottlenecks such as missing primary keys, DDLs and long running transactions blocking replication applier threads which can lead to lag too. See Daniel's blog for advice on monitoring these. I will try do a follow up here too if I can get time.
- Increasing replica_parallel_workers can introduce bottlenecks elsewhere, in my case I started to see the following messages in the error log indicating my redo log size was not adequate. So be careful to test any such changes at peak and regular load.
slave1 [*********:23244] {msandbox} (repl_mon) > select * from performance_schema.error_log where error_code='MY-013865' order by 1 desc limit 4;
+----------------------------+-----------+---------+------------+-----------+---------------------------------------------------------------------------------------------------+
| LOGGED | THREAD_ID | PRIO | ERROR_CODE | SUBSYSTEM | DATA |
+----------------------------+-----------+---------+------------+-----------+---------------------------------------------------------------------------------------------------+
| 2025-04-28 22:12:12.303168 | 0 | Warning | MY-013865 | InnoDB | Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity. |
| 2025-04-28 22:11:59.514332 | 0 | Warning | MY-013865 | InnoDB | Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity. |
| 2025-04-28 22:11:47.038615 | 0 | Warning | MY-013865 | InnoDB | Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity. |
| 2025-04-28 22:11:39.048435 | 0 | Warning | MY-013865 | InnoDB | Redo log writer is waiting for a new redo log file. Consider increasing innodb_redo_log_capacity. |
+----------------------------+-----------+---------+------------+-----------+---------------------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)
Appendix
View Definition
Below is the definition for the view used in this blog post. Use/test with caution, created for demonstration purposes :)
CREATE OR REPLACE VIEW repl_mon.binlog_replication_coordinator_stats AS
WITH repl_stats AS (
SELECT
LOGGED as timestamp,
SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'channel \'', -1), '\':', 1) as channel,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'seconds elapsed = ', -1), ';', 1) AS UNSIGNED) as seconds_elapsed,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'events assigned = ', -1), ';', 1) AS UNSIGNED) as events_assigned,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'worker queues filled over overrun level = ', -1), ';', 1) AS UNSIGNED) as queues_filled,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'waited due a Worker queue full = ', -1), ';', 1) AS UNSIGNED) as waited_worker_queue_full,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'waited due the total size = ', -1), ';', 1) AS UNSIGNED) as waited_pending_total_size,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'waited at clock conflicts = ', -1), ' waited', 1) AS UNSIGNED) as clock_conflict_waits,
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(DATA, 'waited (count) when Workers occupied = ', -1), ' waited', 1) AS UNSIGNED) as workers_occupied_count,
CAST(SUBSTRING_INDEX(DATA, 'when Workers occupied = ', -1) AS UNSIGNED) as workers_occupied_time
FROM performance_schema.error_log
WHERE
SUBSYSTEM = 'Repl'
AND ERROR_CODE = 'MY-010559'
AND DATA LIKE '%Multi-threaded % statistics for channel%'
)
SELECT
timestamp,
CASE WHEN channel = '' THEN 'default' ELSE channel END as channel,
seconds_elapsed,
events_assigned,
ROUND(events_assigned/seconds_elapsed) as avg_events_assigned,
queues_filled,
waited_worker_queue_full,
waited_pending_total_size,
clock_conflict_waits,
workers_occupied_count,
workers_occupied_time
FROM repl_stats;
Quick notes on common Issues and Interpretations.
feel free to add to comments as you see them!
- High waited_worker_queue_full: Consider increasing replica_pending_jobs_size_max
- High waited_pending_total_size: Check overall replication memory usage and limits
- High clock_conflict_waits: Could be improved by:
- Using WRITESET dependency tracking on master
- Using LOGICAL_CLOCK or WRITE_SET parallel type on dependency tracking instead of DATABASE
- Better database partitioning (if using DATABASE parallel type)
- Optimizing transaction batching
Note: If slave_preserve_commit_order=ON:
- Conflicts are necessary for consistency
- Disabling it may improve performance but:
- Can break data consistency
- May cause temporary read inconsistencies
- Not suitable if applications depend on commit order
- See https://hackmysql.com/replica-preserve-commit-order/ for more details
- High workers_occupied counts/times: Consider increasing replica_parallel_workers
Related Configuration Variables
- replica_parallel_workers: Number of worker threads (per channel)
- replica_parallel_type: DATABASE, LOGICAL_CLOCK, or WRITE_SET
- replica_preserve_commit_order: Affects commit order preservation
- replica_pending_jobs_size_max: Per-worker queue size limit
- binlog_transaction_dependency_tracking: How master tracks dependencies (COMMIT_ORDER, WRITESET, or WRITESET_SESSION)
Write-set Based Dependency Tracking
Write-set based dependency tracking can significantly reduce conflicts by:
- Tracking actual row-level write conflicts instead of schema-level
- Allowing parallel execution of non-conflicting transactions
- Supporting better transaction scheduling
- Reducing false dependencies in LOGICAL_CLOCK mode
Note: Each replication channel has its own coordinator thread and worker threads, allowing different parallel replication configurations per channel.
Related Reading
- Group commit and trx dependency tracking - Daniel Nichter
- Replica preserve commit order (RPCO) - Daniel Nichter
- Monitoring MTR lag with the Performance Schema - Daniel Nichter
- MySQL Documentation: replica_parallel_workers
- MySQL Documentation: binlog_transaction_dependency_tracking
Version Information
The examples in this blog post were tested with:
- MySQL Version: 8.0.42
Last updated: April 2024