Monday, October 14, 2024

InnoDB data_locks: a happy ending/new beginning in MySQL 8.0.40 ! (?)

 

Earlier today lefred tweeted that the new MySQL releases were rolling out today! As per the usual release process, the yum repositories typically have the release rpms available first, followed by github (release tags here) and then the release notes.


Since the release notes/commits are not out yet(at time of writing!) I’ll hold back on commenting on other changes, but one change I’ve really been looking forward to is improvements in the performance_schema.data_locks and performance_schema.data_lock_waits tables which I was notified of in my Bug #112035 report last month. You can see more on these tables in the MySQL docs, but at a high level these views can be used to view and troubleshoot locks being held by InnoDB. You have lock contention or want to see a lock tree? Check these tables!

But not so fast!!!

For years now, when dealing with high concurrency MySQL deployments, touching these tables should be done with extreme caution, especially in an automated fashion! There have been multiple bugs reported [1] where querying these views on systems under heavy load can lead to InnoDB freezing up and stalling, which is the last thing you need when you are trying to debug a server which is already struggling due to lock contention! I’ve lost count of the number monitoring scripts and services I have seen bitten by this over the years.

Just to demonstrate, here is what we can see in mysql 8.0.39, if you attempt to:
  1. Run a sysbench workload (top pane)
  2. Have an open transaction holding many row locks (middle pane). Here I' doing so on an unrelated set of tables(20m rows) to the foreground transactions for demonstration purposes. 
  3. In a third session, select from the performance_schema.data_locks table (bottom pane)
MySQL 8.0.39 : limit clause

As you can see, when the select ENGINE,ENGINE_LOCK_ID,ENGINE_TRANSACTION_ID from data_locks limit 1; is executed, the sysbench workload stalls to zero until the query is completed(14.4 seconds here). But why is this? To simplify, its because when querying this table, InnoDB must gather the trx_sys mutex while scanning through all the active transactions/row locks. If there are a lot of active transactions or locks being held by InnoDB, the mutex can be held for quite some time while the view is materializing, blocking foreground transactions from “doing work“. (see linked bug reports for more detailed info)
So when I seen the following in my bug report Bug #112035 last month I was pretty excited. 

Posted by developer:

Added the following note to the MySQL Server 8.0.40 release notes:

Redesigned the performance schema data_locks and data_lock_waits tables so
that querying them does not require an exclusive global mutex on the
transaction or lock system. It now iterates over buckets of hash tables
that hold the locks to only latch the actively processed shard, when
previously it iterated over the transactions. This also improves the
iteration logic complexity in terms of speed and memory to decrease the
impact of these queries on the rest of the system.

Note that the query result might show an incomplete list of transaction locks
if it committed, started, or otherwise changed the set of owned locks
in-between visiting two buckets. This differs from previous behavior which
always showed a consistent snapshot of locks held by individual
transactions, although two different transactions could have been
presented at different moments. In other words, the new approach gives a
consistent view of a single wait queue to show conflicting locks with a
waiting lock because they are always in the same bucket, while the old
approach could miss some of them because they belonged to other
transactions. The old approach would always show all the other locks held
by a reported transaction but could miss locks of other transactions even
if they were conflicting.
Also, thanks to the MySQL team for the detailed explanations in the proposed release note entry!

So back to where we started, when I seen lefred’s tweet I couldn't wait to give it a spin. After running the same 8.0.39 test from above on 8.0.40, I could no longer see the stall, and my basic query completed in less than a second.
MySQL 8.0.40 : limit clause

Running without the limit clause will increase execution time as we need to materialize the view, but will not cause a stall for the entire duration in my test case, as noticed in 8.0.39:
MySQL 8.0.40 : No limit clause

Conclusion

- While this is only a first look and not an extensive test, so be sure to test and report any issues you may find. Saying that, its a very promising sign and hopefully will put this to bed; allowing users monitor, troubleshoot and track locks/waits more granularly, with less risk, going forward. 

- Note the changes in behavior for these tables starting from 8.0.40 from bug report Bug #112035 and review 8.0/8.4/9.x release notes for final information once released. 

- Time will tell, but if the above holds true, its a good enough reason to go to 8.0.40 on its own!

Also, kudos to the MySQL team for including this fix in the 8.0 releases, and not just LTS and innovation releases. In the past this has been the case as mentioned by JFG here, but nice to see they have taken feedback onboard. Exited to see what else is included in 8.0.40/8.4.3/9.2.0 when their release notes are put out.(not available at time of writing) Long live da Dolphin!! 🐬 :-)

Appendix:

Where you will find MySQL release notes once published:


Some MySQL docs on locking:

Some other resources with lots more deep dive info on InnoDB internals/Locking



[1] Example bug reports for the data_locks tables, not exhaustive.

Sunday, July 7, 2024

Bug hunting in MySQL with git bisect


Troubleshooting a database crash or regression can be a tedious task. Whether you have recently deployed new application changes, upgraded your database or "nothing has changed" ™ it can be tricky to corner why the database started behaving differently to how it has before.


In the case of a database crash or regression post upgrade, my usual first step in troubleshooting will be to take a look at the database workload and backtrace in the `mysql-error.log`, and see where it's crashing. From the stack you should be able to get a good idea of where the engine is crashing. 
Upon checking that, I will review release notes to see if any changes have been made in this area recently or if there are any open bug reports. Sometimes you will get a good idea of what change may have introduced the issue, but others can be difficult to corner and we have to resort to using other tools. The release note review process can be a lot easier if you only went from one minor to the next, but this becomes a lot tricker when moving multiple minors, 8.0.23 to 8.0.38 will have hundreds of changes.
Whether or not you get a good idea of what changed, the next step is to get a reliable repro. This is key for understanding the issue and opening a good quality bug report. In a lot of cases this can be very difficult, such as when "production" like workload and data is required; especially when debugging is required on production servers. 


Provided you can get a reliable way of reproducing the crash or regression you can then start narrowing down which release or change may have led to this change in behavior. Here I will show how you can do this using git bisect


As per the docs:

This command uses a binary search algorithm to find which commit in your project’s history introduced a bug. You use it by first telling it a "bad" commit that is known to contain the bug, and a "good" commit that is known to be before the bug was introduced. Then git bisect picks a commit between those two endpoints and asks you whether the selected commit is "good" or "bad". It continues narrowing down the range until it finds the exact commit that introduced the change.


 

Git bisect process

With git bisect, you can specify two points in time(branches, tags or commits), execute a reproducible test case and evaluate if the regression/change of behavior is still present. Once the bisect process is complete you should be presented with a commit where the issue started occurring. 

The most well-known process to:

  1. Start the bisect process outlining the known good and bad commits/branches.
  2. Build MySQL
  3. Run your test
    1. If test fails: Mark as bad
    2. If test Succeeds: Mark as good
  4. Repeat

You continue this process until you are presented with the “first” offending commit. This can be quite a tedious process of manually stepping through commits, so here we will demonstrate using git bisect run to automatically handle this for you. With git bisect run you create a script which builds MySQL and executes your test case. Based on the exit status, bisect will automatically mark it as good or bad, and eventually spit out the first bad commit. Just start it, go for a coffee and come back to your bad commit! 

Note: See the git bisect docs for more information on how exit status codes are handled. In general, exit code 0 will be “good”, non-zero will be “bad“. 


Once you have the commit, you can start focusing your debugging on a single area of the code, and if you have upgraded multiple minor releases find which release the regression was introduced.
Even if you do not feel comfortable, or it’s not possible to revert or fix the issue, knowing what commit introduced a change in behavior can help in a few ways :

  • It improves the quality of bug reports. All bug reports should include reproducible test cases, this makes things a lot easier for the bug verification team to identify cause and introduce a fix. Providing the commit which introduced the regression is even better; make life easier for the database developers!
  • Knowing what area of the code triggers the behavior can also help DBAs and Application developers better understand what caused the regression, and if possible, introduce an application side change which can help in avoiding the problem until a permanent fix is in place. For example, if you see the regression/crash is caused by a commit relating to INSTANT DDL in InnoDB, you can make application-level changes to avoid this to restore availability until a permanent fix is available.

In this post I will demonstrate using git bisect run to identify a commit which led to a regression in 8.0.37, that has since been fixed in 8.0.38. You can see more information on the bug here PS-9222 , originally reported in private bug MySQL bug # 114837. I have chosen this as we have a reproducible test case already using the MySQL test framework (MTR). 

Note:
The MySQL test framework (MTR) is being used here but it is not required. You can do whatever test you want in your shell script; you just need to ensure the correct exit status is returned so 
git bisect run knows whether it should mark is as good or bad. For example, here is a simple script which catches the exit code and returns something git bisect run
 is comfortable with:


#!/bin/bash

# Build MySQL
build_mysql.sh 

# Start MySQL binary with a custom data directory
~/mysql-server/bld/runtime_output_directory/mysqld \
--defaults-group-suffix=.1 --defaults-file=~/my.cnf \
--log-output=file --explain-format=TRADITIONAL_STRICT \
--loose-debug-sync-timeout=600 --initialize=0 --core-file \
--init-file=~/init.sql

exit_code=$?
if [[ ${exit_code} -gt 127 ]]; then
    exit 1
fi

Example


The first step is to get the mysql-server source code
git clone https://github.com/mysql/mysql-server.git
Once you have the source code, checkout your branch. In MySQL, each release has its own tag. So, we will checkout mysql-8.0.37

cd mysql-server
git checkout mysql-8.0.37

Now we will get the test cases from PS-9222.
  1. mysql-test/suite/innodb/include/instant_ddl_recovery.inc
  2. mysql-test/suite/innodb/r/instant_alter_stored_column_order.result
  3. mysql-test/suite/innodb/r/instant_ddl_recovery_debug.result
  4. mysql-test/suite/innodb/t/instant_alter_stored_column_order.test

Build MySQL and verify that the above test cases fail on mysql-8.0.37. I won’t go through the MySQL build process here, but here are the results of the above.

mkdir mysql-server/bld
cd mysql-server/bld
cmake .. -DWITH_DEBUG=1 -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1  -DDOWNLOAD_BOOST=1 -DWITH_BOOST=$PWD/../boost \
  -DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \
  -DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
make -j 8

Note: Here I’m building a debug build. It’s not a requirement for git bisect but a lot of MTR tests will require it. If you are not using MTR, you can build non-debug and use it in the same way


Once built, try running the test using mtr in your bld directory, this is linked to mysql-test-run-pl. (see here for usage information):

 /home/ec2-user/mysql-server/bld/mysql-test/mtr innodb.instant_alter_stored_column_order;


Upon running the test, you will see that the test did not pass, the engine crashed(reproduced the issue!) and a non-zero exit status was returned. 


[ec2-user@i-0c6e9baba12c0d932 mysql-server]$ /home/ec2-user/mysql-server/bld/mysql-test/mtr innodb.instant_alter_stored_column_order;
Logging: /home/ec2-user/mysql-server/mysql-test/mysql-test-run.pl  innodb.instant_alter_stored_column_order
MySQL Version 8.0.37
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ec2-user/mysql-server/bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] innodb.instant_alter_stored_column_order  [ fail ]
        Test ended at 2024-07-07 22:50:42

CURRENT_TEST: innodb.instant_alter_stored_column_order
mysqltest: At line 17: Query 'UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1'' failed.
ERROR 2013 (HY000): Lost connection to MySQL server during query

The result from queries just before the failure was:
# PS-9222 - Testing if ALGORITHM=instant crashes server
CREATE TABLE t1 (c1 TINYTEXT COLLATE ascii_bin NOT NULL, c2 DATETIME(3) NOT NULL, c3 TEXT, PRIMARY KEY (c1(30)));
INSERT INTO t1 (c1, c2, c3) VALUE ('k1','2021-12-21','something');
INSERT INTO t1 (c1, c2, c3) VALUE ('k3','2021-12-21','something else');
ALTER TABLE t1 ADD COLUMN c4 VARCHAR(18) NOT NULL, algorithm=instant;
UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1';
safe_process[187000]: Child process: 187001, exit: 1


Server [mysqld.1 - pid: 186954, winpid: 186954, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2024-07-07T22:50:22.121344Z 0 [Note] [MY-013667] [Server] Error-log destination "stderr" is not a file. Can not restore error log messages from previous run.
2024-07-07T22:50:22.116996Z 0 [Warning] [MY-013711] [Server] Manifest file '/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld.my' is not read-only. For better security, please make sure that the file is read-only.
2024-07-07T22:50:22.118817Z 0 [Warning] [MY-010099] [Server] Insecure configuration for --secure-file-priv: Data directory is accessible through --secure-file-priv. Consider choosing a different directory.
2024-07-07T22:50:22.118828Z 0 [Warning] [MY-010101] [Server] Insecure configuration for --secure-file-priv: Location is accessible to all OS users. Consider choosing a different directory.
2024-07-07T22:50:22.118909Z 0 [Note] [MY-013932] [Server] BuildID[sha1]=22bfc288cb831a3fc0bdeee5e573d5e8649563d4
2024-07-07T22:50:22.118923Z 0 [Note] [MY-010949] [Server] Basedir set to /home/ec2-user/mysql-server/bld/.
2024-07-07T22:50:22.118933Z 0 [System] [MY-010116] [Server] /home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld (mysqld 8.0.37-debug) starting as process 186955
2024-07-07T22:50:22.125897Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 4aaa3aec-3cb3-11ef-a9d1-0e76622b41ab.
2024-07-07T22:50:22.130885Z 0 [Note] [MY-012366] [InnoDB] Using Linux native AIO
2024-07-07T22:50:22.131239Z 0 [Note] [MY-010747] [Server] Plugin 'FEDERATED' is disabled.
2024-07-07T22:50:22.131324Z 0 [Note] [MY-010747] [Server] Plugin 'ndbcluster' is disabled.
2024-07-07T22:50:22.131357Z 0 [Note] [MY-010747] [Server] Plugin 'ndbinfo' is disabled.
2024-07-07T22:50:22.131368Z 0 [Note] [MY-010747] [Server] Plugin 'ndb_transid_mysql_connection_map' is disabled.
2024-07-07T22:50:22.133583Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-07-07T22:50:22.133620Z 1 [Note] [MY-013546] [InnoDB] Atomic write enabled
2024-07-07T22:50:22.133675Z 1 [Note] [MY-012932] [InnoDB] PUNCH HOLE support available
2024-07-07T22:50:22.133690Z 1 [Note] [MY-012937] [InnoDB] !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2024-07-07T22:50:22.133695Z 1 [Note] [MY-012944] [InnoDB] Uses event mutexes
2024-07-07T22:50:22.133701Z 1 [Note] [MY-012945] [InnoDB] GCC builtin __atomic_thread_fence() is used for memory barrier
2024-07-07T22:50:22.133708Z 1 [Note] [MY-012948] [InnoDB] Compressed tables use zlib 1.2.13
2024-07-07T22:50:22.137355Z 1 [Note] [MY-012951] [InnoDB] Using hardware accelerated crc32 and polynomial multiplication.
2024-07-07T22:50:22.137865Z 1 [Note] [MY-012203] [InnoDB] Directories to scan './'
2024-07-07T22:50:22.137908Z 1 [Note] [MY-012204] [InnoDB] Scanning './'
2024-07-07T22:50:22.138749Z 1 [Note] [MY-012208] [InnoDB] Completed space ID check of 8 files.
2024-07-07T22:50:22.139293Z 1 [Note] [MY-012955] [InnoDB] Initializing buffer pool, total size = 24.000000M, instances = 1, chunk size =24.000000M
2024-07-07T22:50:22.143342Z 1 [Note] [MY-012957] [InnoDB] Completed initialization of buffer pool
2024-07-07T22:50:22.148312Z 0 [Note] [MY-011952] [InnoDB] If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2024-07-07T22:50:22.148587Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_0.dblwr' for doublewrite
2024-07-07T22:50:22.148880Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_1.dblwr' for doublewrite
2024-07-07T22:50:22.159089Z 1 [Note] [MY-013566] [InnoDB] Double write buffer files: 2
2024-07-07T22:50:22.159109Z 1 [Note] [MY-013565] [InnoDB] Double write buffer pages per instance: 4
2024-07-07T22:50:22.159140Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_0.dblwr' for doublewrite
2024-07-07T22:50:22.159155Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_1.dblwr' for doublewrite
2024-07-07T22:50:22.229760Z 1 [Note] [MY-013883] [InnoDB] The latest found checkpoint is at lsn = 20116776 in redo log file ./#innodb_redo/#ib_redo61.
2024-07-07T22:50:22.229839Z 1 [Note] [MY-013086] [InnoDB] Starting to parse redo log at lsn = 20116527, whereas checkpoint_lsn = 20116776 and start_lsn = 20116480
2024-07-07T22:50:22.234657Z 1 [Note] [MY-013083] [InnoDB] Log background threads are being started...
2024-07-07T22:50:22.235450Z 1 [Note] [MY-012532] [InnoDB] Applying a batch of 0 redo log records ...
2024-07-07T22:50:22.235464Z 1 [Note] [MY-012535] [InnoDB] Apply batch completed!
2024-07-07T22:50:22.235902Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2024-07-07T22:50:22.236084Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2024-07-07T22:50:22.236510Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
2024-07-07T22:50:22.236573Z 1 [Note] [MY-011980] [InnoDB] GTID recovery trx_no: 1086
2024-07-07T22:50:22.294196Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-07-07T22:50:22.294247Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 57685 ms.
2024-07-07T22:50:22.294501Z 1 [Note] [MY-012923] [InnoDB] Creating shared tablespace for temporary tables
2024-07-07T22:50:22.294574Z 1 [Note] [MY-012265] [InnoDB] Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2024-07-07T22:50:22.326292Z 1 [Note] [MY-012266] [InnoDB] File './ibtmp1' size is now 12 MB.
2024-07-07T22:50:22.326574Z 1 [Note] [MY-013627] [InnoDB] Scanning temp tablespace dir:'./#innodb_temp/'
2024-07-07T22:50:22.368475Z 1 [Note] [MY-013018] [InnoDB] Created 128 and tracked 128 new rollback segment(s) in the temporary tablespace. 128 are now active.
2024-07-07T22:50:22.369219Z 1 [Note] [MY-012976] [InnoDB] 8.0.37 started; log sequence number 20116786
2024-07-07T22:50:22.369525Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-07-07T22:50:22.384916Z 1 [Note] [MY-011089] [Server] Data dictionary restarting version '80023'.
2024-07-07T22:50:22.845682Z 1 [Note] [MY-012357] [InnoDB] Reading DD tablespace files
2024-07-07T22:50:22.849801Z 1 [Note] [MY-012356] [InnoDB] Scanned 10 tablespaces. Validated 10.
2024-07-07T22:50:22.924166Z 1 [Note] [MY-010006] [Server] Using data dictionary with version '80023'.
2024-07-07T22:50:23.700027Z 0 [Note] [MY-010902] [Server] Thread priority attribute setting in Resource Group SQL shall be ignored due to unsupported platform or insufficient privilege.
2024-07-07T22:50:23.715619Z 0 [Note] [MY-013911] [Server] Crash recovery finished in binlog engine. No attempts to commit, rollback or prepare any transactions.
2024-07-07T22:50:23.715648Z 0 [Note] [MY-013911] [Server] Crash recovery finished in InnoDB engine. No attempts to commit, rollback or prepare any transactions.
2024-07-07T22:50:23.721492Z 0 [Note] [MY-012487] [InnoDB] DDL log recovery : begin
2024-07-07T22:50:23.721664Z 0 [Note] [MY-012488] [InnoDB] DDL log recovery : end
2024-07-07T22:50:23.722107Z 0 [Note] [MY-011946] [InnoDB] Loading buffer pool(s) from /home/ec2-user/mysql-server/bld/mysql-test/var/mysqld.1/data/ib_buffer_pool
2024-07-07T22:50:23.728378Z 0 [Note] [MY-011946] [InnoDB] Buffer pool(s) load completed at 240708  1:50:23
2024-07-07T22:50:23.762999Z 0 [Note] [MY-010303] [Server] Skipping generation of SSL certificates as options related to SSL are specified.
2024-07-07T22:50:23.765041Z 0 [Warning] [MY-010068] [Server] CA certificate /home/ec2-user/mysql-server/mysql-test/std_data/cacert.pem is self signed.
2024-07-07T22:50:23.765066Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2024-07-07T22:50:23.765077Z 0 [Note] [MY-010310] [Server] Skipping generation of RSA key pair as --sha256_password_auto_generate_rsa_keys is set to OFF.
2024-07-07T22:50:23.765082Z 0 [Note] [MY-010310] [Server] Skipping generation of RSA key pair as --caching_sha2_password_auto_generate_rsa_keys is set to OFF.
2024-07-07T22:50:23.766780Z 0 [Note] [MY-010252] [Server] Server hostname (bind-address): '*'; port: 13000
2024-07-07T22:50:23.766827Z 0 [Note] [MY-010253] [Server] IPv6 is available.
2024-07-07T22:50:23.766839Z 0 [Note] [MY-010264] [Server]   - '::' resolves to '::';
2024-07-07T22:50:23.766859Z 0 [Note] [MY-010251] [Server] Server socket created on IP: '::'.
2024-07-07T22:50:23.780117Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-port=13020'.
2024-07-07T22:50:23.780140Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-socket=/home/ec2-user/mysql-server/bld/mysql-test/var/tmp/mysqlx.1.sock'.
2024-07-07T22:50:23.780146Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl=DISABLED'.
2024-07-07T22:50:23.780150Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl-ca='.
2024-07-07T22:50:23.780153Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl-cert='.
2024-07-07T22:50:23.780157Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl-key='.
2024-07-07T22:50:23.856915Z 0 [Note] [MY-011025] [Repl] Failed to start replica threads for channel ''.
2024-07-07T22:50:23.865353Z 5 [Note] [MY-010051] [Server] Event Scheduler: scheduler thread started with id 5
2024-07-07T22:50:23.865943Z 0 [System] [MY-010931] [Server] /home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld: ready for connections. Version: '8.0.37-debug' socket: '/home/ec2-user/mysql-server/bld/mysql-test/var/tmp/mysqld.1.sock'  port: 13000  Source distribution.
2024-07-07T22:50:24.332276Z 9 [ERROR] [MY-013183] [InnoDB] Assertion failure: mtr0log.cc:647:total > (ulint)(log_ptr - log_start) thread 139761664595520
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-07-07T22:50:24Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=22bfc288cb831a3fc0bdeee5e573d5e8649563d4
Thread pointer: 0x7f1c2c011bd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f1ccc5f09f0 thread_stack 0x100000
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x49e97e7]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(print_fatal_signal(int)+0x3a2) [0x35c5d19]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_server_abort()+0x6b) [0x35c5fcb]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_abort()+0xd) [0x49e01d5]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4de03e9]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c0165a]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c0176e]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c03767]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mlog_open_and_write_index(mtr_t*, unsigned char const*, dict_index_t const*, mlog_id_t, unsigned long, unsigned char*&)+0x5c5) [0x4c01d48]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c38a52]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(page_cur_delete_rec(page_cur_t*, dict_index_t const*, unsigned long const*, mtr_t*)+0x363) [0x4c39019]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(btr_cur_pessimistic_update(unsigned long, btr_cur_t*, unsigned long**, mem_block_info_t**, mem_block_info_t*, big_rec_t**, upd_t*, unsigned long, que_thr_t*, unsigned long, unsigned long, mtr_t*, btr_pcur_t*)+0xe7e) [0x4e64749]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d24445]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d24d1b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d251af]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_upd_step(que_thr_t*)+0x248) [0x4d2566a]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4ca8b43]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_update_for_mysql(unsigned char const*, row_prebuilt_t*)+0xbb) [0x4ca8f2f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0x423) [0x4a58f2d]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x1ea) [0x3770398]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::update_single_table(THD*)+0x214b) [0x34d3f63]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::execute_inner(THD*)+0x10c) [0x34d6d74]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_dml::execute(THD*)+0x6ed) [0x341d0b1]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x2156) [0x33971cb]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x798) [0x339c811]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x16ec) [0x3392541]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(do_command(THD*)+0x5bd) [0x339048e]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x35b0edf]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x5685133]
/lib64/libc.so.6(+0x9f7f2) [0x7f1cd589f7f2]
/lib64/libc.so.6(+0x3f450) [0x7f1cd583f450]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f1c2c5cc1c0): UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1'
Connection ID (thread ID): 9
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
safe_process[186954]: Child process: 186955, killed by signal: 6
----------SERVER LOG END-------------


 - the logfile can be found in '/home/ec2-user/mysql-server/bld/mysql-test/var/log/innodb.instant_alter_stored_column_order/instant_alter_stored_column_order.log'

[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 0.000 of 33 seconds executing testcases

Completed: Failed 1/2 tests, 50.00% were successful.

Failing test(s): innodb.instant_alter_stored_column_order

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases
[ec2-user@i-0c6e9baba12c0d932 mysql-server]$ echo $?
1
[ec2-user@i-0c6e9baba12c0d932 mysql-server]$


Now that we have verified the above, we can stitch it all together into a single shell script, and start git bisect:


$ cat ~/bisect_commands.sh
#!/bin/bash
#

# Build MySQL
cd /home/ec2-user/mysql-server/bld
cmake .. -DWITH_DEBUG=1 -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1  -DDOWNLOAD_BOOST=1 -DWITH_BOOST=$PWD/../boost \
  -DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \
  -DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
make -j 8

# Run test case, after each build.
# If test passes(exit code 0), bisect will mark as good, otherwise bad. 
/home/ec2-user/mysql-server/bld/mysql-test/mtr innodb.instant_alter_stored_column_order;


Now start bisect specifying the good and bad releases. Here I’m using mysql-8.0.36 where I know we dont crash, and mysql-8.0.37 where I know we do crash!


[ec2-user@i-0c6e9baba12c0d932 bld]$ git bisect start mysql-8.0.37 mysql-8.0.36
Bisecting: 156 revisions left to test after this (roughly 7 steps)
[310704bc6b3a6dad52726bf8820f5a1e3194c844] Bug#35993709 Several MTR tests fail when MySQL Server built with VS2022


Then I execute git bisect run ~/bisect_commands.sh and go grab coffee!


[ec2-user@i-0c6e9baba12c0d932 bld]$ chmod +x ~/bisect_commands.sh
[ec2-user@i-0c6e9baba12c0d932 bld]$ git bisect run ~/bisect_commands.sh >> ~/bisect_log.txt 2>&1


It can take a while to complete, but you can tail the ~/bisect_log.txt to check progress. While running you will see outputs of the MTR test run, mysql logs and the bisect commands. For the bisect commands you will see something like the following after each build and test run letting you know how the bisect is progressing


[ec2-user@i-0c6e9baba12c0d932 ~]$ grep "Bisect" bisect_log.txt
Bisecting: 77 revisions left to test after this (roughly 6 steps)
Bisecting: 38 revisions left to test after this (roughly 5 steps)
Bisecting: 19 revisions left to test after this (roughly 4 steps)
Bisecting: 9 revisions left to test after this (roughly 3 steps)
Bisecting: 4 revisions left to test after this (roughly 2 steps)
Bisecting: 2 revisions left to test after this (roughly 1 step)
Bisecting: 0 revisions left to test after this (roughly 0 steps)

You can see the full output here, but looking at the log you will see the outputs from the build, and also that in some cases the test passed, while others it failed. This is expected and is part of the bisect process.


Failed:


Logging: /home/ec2-user/mysql-server/mysql-test/mysql-test-run.pl  innodb.instant_alter_stored_column_order
MySQL Version 8.0.37
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ec2-user/mysql-server/bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] innodb.instant_alter_stored_column_order  [ fail ]
        Test ended at 2024-07-07 23:21:12
        Completed: Failed 1/2 tests, 50.00% were successful.
.
.
.
2024-07-07T23:20:55.043456Z 9 [ERROR] [MY-013183] [InnoDB] Assertion failure: mtr0log.cc:646:total > (ulint)(log_ptr - log_start) thread 139912252782144
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-07-07T23:20:55Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=b8458559fd278542e79ef051af2982224cca197d
Thread pointer: 0x7f3f3c5b8d20
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f3fdc206a30 thread_stack 0x100000
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x49e2b43]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(print_fatal_signal(int)+0x3a2) [0x35c0a1d]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_server_abort()+0x6b) [0x35c0ccf]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_abort()+0xd) [0x49d9531]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4dd936f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4bfa71c]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4bfa830]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4bfc829]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mlog_open_and_write_index(mtr_t*, unsigned char const*, dict_index_t const*, mlog_id_t, unsigned long, unsigned char*&)+0x5c5) [0x4bfae0a]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c31aae]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(page_cur_delete_rec(page_cur_t*, dict_index_t const*, unsigned long const*, mtr_t*)+0x363) [0x4c32075]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(btr_cur_pessimistic_update(unsigned long, btr_cur_t*, unsigned long**, mem_block_info_t**, mem_block_info_t*, big_rec_t**, upd_t*, unsigned long, que_thr_t*, unsigned long, unsigned long, mtr_t*, btr_pcur_t*)+0xe7e) [0x4e5d707]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d1d4a1]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d1dd77]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d1e20b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_upd_step(que_thr_t*)+0x248) [0x4d1e6c6]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4ca1b9f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_update_for_mysql(unsigned char const*, row_prebuilt_t*)+0xbb) [0x4ca1f8b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0x423) [0x4a5228b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x1ea) [0x376afdc]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::update_single_table(THD*)+0x214b) [0x34cec67]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::execute_inner(THD*)+0x10c) [0x34d1a78]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_dml::execute(THD*)+0x6ed) [0x3417af1]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x2156) [0x3391b99]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x798) [0x33971df]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x16ec) [0x338cf0f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(do_command(THD*)+0x5bd) [0x338ae5c]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x35abbe3]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x567d049]
/lib64/libc.so.6(+0x9f7f2) [0x7f3fe3a9f7f2]
/lib64/libc.so.6(+0x3f450) [0x7f3fe3a3f450]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f3f3c368790): UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1'
Connection ID (thread ID): 9
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
safe_process[220791]: Child process: 220792, killed by signal: 6
----------SERVER LOG END-------------
.
.
.
Failing test(s): innodb.instant_alter_stored_column_order

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases
Bisecting: 4 revisions left to test after this (roughly 2 steps)


Passed:



Logging: /home/ec2-user/mysql-server/mysql-test/mysql-test-run.pl  innodb.instant_alter_stored_column_order
MySQL Version 8.0.37
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ec2-user/mysql-server/bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] innodb.instant_alter_stored_column_order  [ pass ]   2530
[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 2.530 of 20 seconds executing testcases

Completed: All 2 tests were successful.

Bisecting: 2 revisions left to test after this (roughly 1 step)
[908d0de4420946a45bd89fd9bd79702fd5c28773] Bug #33970854  Connect timeout too long towards MGMD nodes

At the very end of the bisect log, you will see a message indicating what the “first bad commit“ is:


e6e13a8f64ca1005f0d2d4da8463fdbef3767917 is the first bad commit
commit e6e13a8f64ca1005f0d2d4da8463fdbef3767917
Author: Yasufumi Kinoshita <yasufumi.kinoshita@oracle.com>
Date:   Tue Aug 22 12:03:52 2023 +0900

    Bug#35183686: ALTER_STORED_COLUMN_ORDER with algorithm=INSTANT makes wrong INSERT REDO log

    REDO log is not logged (non instant add/del) column order change with instant DDL.
    It might cause wrong REDO log replay when recovery, very danger.

    The fix makes the column order change logged also correctly.

    Change-Id: I7e617735bb9d327136b18b5165039e1155f1fe50

 .../suite/innodb/include/instant_ddl_recovery.inc  | 23 +++++++++++
 .../innodb/r/instant_ddl_recovery_debug.result     | 45 ++++++++++++++++++++
 storage/innobase/mtr/mtr0log.cc                    | 48 ++++++++++++++++++----
 3 files changed, 107 insertions(+), 9 deletions(-)
bisect found first bad commit



In this case it identified the first bad commit as “Bug#35183686: ALTER_STORED_COLUMN_ORDER with algorithm=INSTANT makes wrong INSERT REDO log”, which as we guessed, is related to instant DDL changes in InnoDB, that was introduced in MySQL 8.0.37. This makes sense based on our repro! After reviewing release notes on 8.0.38, you will be happy to see this is already fixed so you can just upgrade![1] And remember, you have a test case; so why not test it out on 8.0.38 prior to upgrading!

[1]

InnoDB: MySQL unexpectedly halted on an UPDATE after an ALTER TABLE operation. (Bug #36571091)
References: This issue is a regression of: Bug #35183686.

ref


What now?

Now that you know the offending commit, you have a few options.

  1. If you are not on the latest MySQL version, try checking release notes and upgrading to see if the issue is fixed. In this case, the issue is fixed in 8.0.38. 
  2. If the issue is not fixed, try digging deeper into the cause of the issue. Knowing what introduced the bug is a great head start into understanding why, and a kickstart to the debugging process. Read the commit message as to why this change was made and familiarize yourself with this area of code. Once you have done this consider making a contribution to have it included in a future release. If you don’t feel comfortable or cannot fix the issue, don’t fret! You should still open a bug report including your repro, test case and let the bug verification team know what you think may be causing the issue, or at least what commit you think introduced the problem. I’m sure they would be happy with the leg up!
  3. Even if you do open a bug report, MySQL releases are quarterly and prior to release, there are no announcements if/when your contribution will be accepted or the bug fixed. So it may be a good idea, if not already fixed in another release to evaluate if any application or database parameter changes could alleviate the pain. For the example above, since its related to INSTANT DDL, one option could be to consider using other DDL types such as the INPLACE DDL algorithm, or an online schema change tool such as PTOSCGH-ost or Spirit.


Thanks for reading this far, hope this will help someone else out!

 

InnoDB data_locks: a happy ending/new beginning in MySQL 8.0.40 ! (?)

  Earlier today lefred tweeted that the new MySQL releases were rolling out today! As per the usual release process, the yum repositories t...