As of MySQL Version 3.23.6, you can choose between three basic
table formats (
MySQL may support additional table type (
BDB), depending on how you compile it.
When you create a new table, you can tell MySQL which table type it should use for the table. MySQL will always create a `.frm' file to hold the table and column definitions. Depending on the table type, the index and data will be stored in other files.
Note that to use
InnoDB tables you have to use at least
innodb_data_file_path startup option. See section 7.5.2 InnoDB Startup Options.
The default table type in MySQL is
MyISAM. If you are
trying to use a table type that is not compiled-in or activated,
MySQL will instead create a table of type
is a very useful feature when you want to copy tables between different
SQL servers that supports different table types (like copying tables to
a slave that is optimised for speed by not having transactional tables).
This automatic table changing can however also be very confusing for new
MySQL users. We plan to fix this by introducing warnings in
MySQL 4.0 and giving a warning when a table type is automatically
You can convert tables between different types with the
TABLE statement. See section 6.5.4
ALTER TABLE Syntax.
Note that MySQL supports two different kinds of
tables: transaction-safe tables (
and not transaction-safe tables (
Advantages of transaction-safe tables (TST):
ROLLBACKto ignore your changes (if you are not running in auto-commit mode).
Advantages of not transaction-safe tables (NTST):
You can combine TST and NTST tables in the same statements to get the best of both worlds.
MyISAM is the default table type in MySQL Version 3.23. It's
based on the
ISAM code and has a lot of useful extensions.
The index is stored in a file with the `.MYI' (MYIndex) extension,
and the data is stored in a file with the `.MYD' (MYData) extension.
You can check/repair
MyISAM tables with the
utility. See section 188.8.131.52 Using
myisamchk for Crash Recovery. You can compress
MyISAM tables with
myisampack to take up much less space.
See section 4.7.4
myisampack, The MySQL Compressed Read-only Table Generator.
The following is new in
MyISAMfile that indicates whether the table was closed correctly. If
mysqldis started with
MyISAMtables will automatically be checked and/or repaired on open if the table wasn't closed properly.
INSERTnew rows in a table that doesn't have free blocks in the middle of the datafile, at the same time other threads are reading from the table (concurrent insert). An free block can come from an update of a dynamic length row with much data to a row with less data or when deleting rows. When all free blocks are used up, all future inserts will be concurrent again.
MyISAMwill automatically update this on
AUTO_INCREMENTvalue can be reset with
myisamchk. This will make
AUTO_INCREMENTcolumns faster (at least 10%) and old numbers will not be reused as with the old
ISAM. Note that when an
AUTO_INCREMENTis defined on the end of a multi-part-key the old behaviour is still present.
AUTO_INCREMENTcolumn) the key tree will be split so that the high node only contains one key. This will improve the space utilisation in the key tree.
TEXTcolumns can be indexed.
NULLvalues are allowed in indexed columns. This takes 0-1 bytes/key.
myisamchkwill mark tables as checked if one runs it with
myisamchk --fastwill only check those tables that don't have this mark.
myisamchk -astores statistics for key parts (and not only for whole keys as in
DATA/INDEX DIRECTORY="path"option to
CREATE TABLE). See section 6.5.3
MyISAM also supports the following things, which MySQL
will be able to use in the near future:
VARCHARcolumn starts with a length stored in 2 bytes.
VARCHARmay have fixed or dynamic record length.
CHARmay be up to 64K. All key segments have their own language definition. This will enable MySQL to have different language definitions per column.
UNIQUE. This will allow you to have
UNIQUEon any combination of columns in a table. (You can't search on a
UNIQUEcomputed index, however.)
Note that index files are usually much smaller with
MyISAM than with
ISAM. This means that
MyISAM will normally use less
system resources than
ISAM, but will need more CPU time when inserting
data into a compressed index.
The following options to
mysqld can be used to change the behaviour of
MyISAM tables. See section 184.108.40.206
|Automatic recovery of crashed tables.|
|Buffer used when recovering tables.|
|Don't flush key buffers between writes for any MyISAM table|
|Used to help MySQL to decide when to use the slow but safe key cache index create method. Note that this parameter is given in megabytes before 4.0.3 and in bytes starting from this version.|
|Don't use the fast sort index method to created index if the temporary file would get bigger than this. Note that this parameter is given in megabytes before 4.0.3 and in bytes starting from this version.|
|Size of tree cache used in bulk insert optimisation. Note that this is a limit per thread!|
The automatic recovery is activated if you start
--myisam-recover=#. See section 4.1.1
mysqld Command-line Options.
On open, the table is checked if it's marked as crashed or if the open
count variable for the table is not 0 and you are running with
--skip-external-locking. If either of the above is true the following
If the recover wouldn't be able to recover all rows from a previous
completed statement and you didn't specify
FORCE as an option to
myisam-recover, then the automatic repair will abort with an error
message in the error file:
Error: Couldn't repair table: test.g00pages
If you in this case had used the
FORCE option you would instead have got
a warning in the error file:
Warning: Found 344 of 354 rows when repairing ./test/g00pages
Note that if you run automatic recover with the
you should have a
cron script that automatically moves file with names
like `tablename-datetime.BAK' from the database directories to a
See section 4.1.1
mysqld Command-line Options.
MySQL can support different index types, but the normal type is
ISAM or MyISAM. These use a B-tree index, and you can roughly calculate
the size for the index file as
(key_length+4)/0.67, summed over
all keys. (This is for the worst case when all keys are inserted in
sorted order and we don't have any compressed keys.)
String indexes are space compressed. If the first index part is a
string, it will also be prefix compressed. Space compression makes the
index file smaller than the above figures if the string column has a lot
of trailing space or is a
VARCHAR column that is not always used
to the full length. Prefix compression is used on keys that start
with a string. Prefix compression helps if there are many strings
with an identical prefix.
MyISAM tables, you can also prefix compress numbers by specifying
PACK_KEYS=1 when you create the table. This helps when you have
many integer keys that have an identical prefix when the numbers are stored
MyISAM supports 3 different table types. Two of them are chosen
automatically depending on the type of columns you are using. The third,
compressed tables, can only be created with the
ALTER a table you can for tables that
BLOBs force the table format to
FIXED with the
ROW_FORMAT=# table option. In the future
you will be able to compress/decompress tables by specifying
ROW_FORMAT=compressed | default to
See section 6.5.3
CREATE TABLE Syntax.
This is the default format. It's used when the table contains no
This format is the simplest and most secure format. It is also the fastest of the on-disk formats. The speed comes from the easy way data can be found on disk. When looking up something with an index and static format it is very simple. Just multiply the row number by the row length.
Also, when scanning a table it is very easy to read a constant number of records with each disk read.
The security is evidenced if your computer crashes when writing to a
fixed-size MyISAM file, in which case
myisamchk can easily figure out where each
row starts and ends. So it can usually reclaim all records except the
partially written one. Note that in MySQL all indexes can always be
DECIMALcolumns are space-padded to the column width.
myisamchk) unless a huge number of records are deleted and you want to return free disk space to the operating system.
This format is used if the table contains any
TEXT columns or if the table was created with
This format is a little more complex because each row has to have a header that says how long it is. One record can also end up at more than one location when it is made longer at an update.
You can use
OPTIMIZE table or
myisamchk to defragment a
table. If you have static data that you access/change a lot in the same
table as some
BLOB columns, it might be a good
idea to move the dynamic columns to other tables just to avoid
'') for string columns, or zero for numeric columns. (This isn't the same as columns containing
NULLvalues.) If a string column has a length of zero after removal of trailing spaces, or a numeric column has a value of zero, it is marked in the bit map and not saved to disk. Non-empty strings are saved as a length byte plus the string contents.
myisamchk -rfrom time to time to get better performance. Use
myisamchk -ei tbl_namefor some statistics.
3 + (number of columns + 7) / 8 + (number of char columns) + packed size of numeric columns + length of strings + (number of NULL columns + 7) / 8There is a penalty of 6 bytes for each link. A dynamic record is linked whenever an update causes an enlargement of the record. Each new link will be at least 20 bytes, so the next enlargement will probably go in the same link. If not, there will be another link. You may check how many links there are with
myisamchk -ed. All links may be removed with
This is a read-only type that is generated with the optional
myisampack tool (
0are stored using 1 bit.
BIGINTcolumn (8 bytes) may be stored as a
TINYINTcolumn (1 byte) if all values are in the range
The file format that MySQL uses to store data has been extensively tested, but there are always circumstances that may cause database tables to become corrupted.
Even if the MyISAM table format is very reliable (all changes to a table is written before the SQL statements returns) , you can still get corrupted tables if some of the following things happens:
mysqldprocess being killed in the middle of a write.
Typial typical symptoms for a corrupt table is:
Incorrect key file for table: '...'. Try to repair itwhile selecting data from the table.
You can check if a table is ok with the command
TABLE. See section 4.4.4
CHECK TABLE Syntax.
You can repair a corrupted table with
REPAIR TABLE. See section 4.4.5
REPAIR TABLE Syntax.
You can also repair a table, when
mysqld is not running with
If your tables get corrupted a lot you should try to find the reason for this! See section A.4.1 What To Do If MySQL Keeps Crashing.
In this case the most important thing to know is if the table got
corrupted if the
mysqld died (one can easily verify this by
checking if there is a recent row
restarted mysqld in the mysqld
error file). If this isn't the case, then you should try to make a test
case of this. See section E.1.6 Making a Test Case When You Experience Table Corruption.
MyISAM `.MYI' file has in the header a counter that can
be used to check if a table has been closed properly.
If you get the following warning from
CHECK TABLE or
# clients is using or hasn't closed the table properly
this means that this counter has come out of sync. This doesn't mean that the table is corrupted, but means that you should at least do a check on the table to verify that it's okay.
The counter works as follows:
FLUSHor because there isn't room in the table cache) the counter is decremented if the table has been updated at any point.
In other words, the only ways this can go out of sync are:
MyISAMtables are copied without a
myisamchk --update-stateon a table that was in use by
mysqldservers are using the table and one has done a
CHECKof the table while it was in use by another server. In this setup the
CHECKis safe to do (even if you will get the warning from other servers), but
REPAIRshould be avoided as it currently replaces the datafile with a new one, which is not signaled to the other servers.
MERGE tables are new in MySQL Version 3.23.25. The code
is still in gamma, but should be resonable stable.
MERGE table (also known as a
MRG_MyISAM table) is a
collection of identical
MyISAM tables that can be used as one.
You can only
UPDATE from the
collection of tables. If you
MERGE table, you
are only dropping the
DELETE FROM merge_table used without a
will only clear the mapping for the table, not delete everything in the
mapped tables. (We plan to fix this in 4.1).
With identical tables we mean that all tables are created with identical
column and key information. You can't merge tables in which the
columns are packed differently, doesn't have exactly the same columns,
or have the keys in different order. However, some of the tables can be
myisampack. See section 4.7.4
myisampack, The MySQL Compressed Read-only Table Generator.
When you create a
MERGE table, you will get a `.frm' table
definition file and a `.MRG' table list file. The `.MRG' just
contains a list of the index files (`.MYI' files) that should
be used as one. All used tables must be in the same database as the
MERGE table itself.
For the moment, you need to have
DELETE privileges on the tables you map to a
MERGE tables can help you solve the following problems:
myisampack, and then create a
MERGEto use these as one.
MERGEtable on this could be much faster than using the big table. (You can, of course, also use a RAID to get the same kind of benefits.)
MERGEtable for others. You can even have many different
MERGEtables active, with possible overlapping files.
MERGEfile than trying to repair a really big file.
MERGEtable uses the index of the individual tables. It doesn't need to maintain an index of its one. This makes
MERGEtable collections VERY fast to make or remap. Note that you must specify the key definitions when you create a
MERGEtable on them on demand. This is much faster and will save a lot of disk space.
MERGEover one table. There shouldn't be any really notable performance impacts of doing this (only a couple of indirect calls and
memcpy()calls for each read).
The disadvantages with
MERGE tables are:
MyISAMtables for a
MERGEtables uses more file descriptors. If you are using a
MERGEtable that maps over 10 tables and 10 users are using this, you are using 10*10 + 10 file descriptors. (10 datafiles for 10 users and 10 shared index files.)
MERGEhandler will need to issue a read on all underlying tables to check which one most closely matches the given key. If you then do a 'read-next' then the merge table handler will need to search the read buffers to find the next key. Only when one key buffer is used up, the handler will need to read the next key block. This makes
MERGEkeys much slower on
eq_refsearches, but not much slower on
refsearches. See section 5.2.1
EXPLAINSyntax (Get Information About a
DELETE FROM table_namewithout a
WHEREclause on any of the table that is mapped by a
MERGEtable that is 'open'. If you do this, the
MERGEtable may still refer to the original table and you will get unexpected results.
When you create a
MERGE table you have to specify with
UNION(list-of-tables) which tables you want to use as
one. Optionally you can specify with
INSERT_METHOD if you want
insert for the
MERGE table to happen in the first or last table
UNION list. If you don't specify
NO, then all
INSERT commands on the
table will return an error.
The following example shows you how to use
CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20)); CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20)); INSERT INTO t1 (message) VALUES ("Testing"),("table"),("t1"); INSERT INTO t2 (message) VALUES ("Testing"),("table"),("t2"); CREATE TABLE total (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20)) TYPE=MERGE UNION=(t1,t2) INSERT_METHOD=LAST;
Note that you can also manipulate the `.MRG' file directly from the outside of the MySQL server:
shell> cd /mysql-data-directory/current-database shell> ls -1 t1.MYI t2.MYI > total.MRG shell> mysqladmin flush-tables
Now you can do things like:
mysql> SELECT * FROM total; +---+---------+ | a | message | +---+---------+ | 1 | Testing | | 2 | table | | 3 | t1 | | 1 | Testing | | 2 | table | | 3 | t2 | +---+---------+
Note that the
a column, though declared as
is not really unique, as
MERGE table cannot enforce uniqueness
over a set of underlying
To remap a
MERGE table you can do one of the following:
DROPthe table and re-create it
ALTER TABLE table_name UNION(...)
FLUSH TABLEon the
MERGEtable and all underlying tables to force the handler to read the new definition file.
The following are the known problems with
MERGEtable cannot maintain
UNIQUEconstraints over the whole table. When you do
INSERT, the data goes into the first or last table (according to
INSERT_METHOD=xxx) and this
MyISAMtable ensures that the data are unique, but it knows nothing about others
DELETE FROM merge_tableused without a
WHEREwill only clear the mapping for the table, not delete everything in the mapped tables.
RENAME TABLEon a table used in an active
MERGEtable may corrupt the table. This will be fixed in MySQL 4.0.x.
MERGEdoesn't check if the underlying tables are of compatible types. If you use
MERGEtables in this fashion, you are very likely to run into strange problems.
ALTER TABLEto first add an
UNIQUEindex to a table used in a
MERGEtable and then use
ALTER TABLEto add a normal index on the
MERGEtable, the key order will be different for the tables if there was an old non-unique key in the table. This is because
UNIQUEkeys before normal keys to be able to detect duplicate keys as early as possible.
MERGEtable efficiently and may sometimes produce non-optimal joins. This will be fixed in MySQL 4.0.x.
DROP TABLEon a table that is in use by a
MERGEtable will not work on Windows because the
MERGEhandler does the table mapping hidden from the upper layer of MySQL. Because Windows doesn't allow you to drop files that are open, you first must flush all
FLUSH TABLES) or drop the
MERGEtable before dropping the table. We will fix this at the same time we introduce
You can also use the deprecated
ISAM table type. This will disappear
rather soon (probably in MySQL 4.1) because
MyISAM is a better
implementation of the same thing.
ISAM uses a
B-tree index. The
index is stored in a file with the `.ISM' extension, and the data
is stored in a file with the `.ISD' extension. You can
ISAM tables with the
isamchk utility. See section 220.127.116.11 Using
myisamchk for Crash Recovery.
ISAM has the following features/properties:
Most of the things true for
MyISAM tables are also true for
tables. See section 7.1
MyISAM Tables. The major differences compared
MyISAM tables are:
ISAMtables are not binary portable across OS/Platforms.
pack_isamrather than with
If you want to convert an
ISAM table to a
MyISAM table so
that you can use utilities such as
mysqlcheck, use an
mysql> ALTER TABLE tbl_name TYPE = MYISAM;
The embedded MySQL versions doesn't support
HEAP tables use a hashed index and are stored in memory. This
makes them very fast, but if MySQL crashes you will lose all
data stored in them.
HEAP is very useful for temporary tables!
The MySQL internal
HEAP tables use 100% dynamic hashing
without overflow areas. There is no extra space needed for free lists.
HEAP tables also don't have problems with delete + inserts, which
normally is common with hashed tables:
mysql> CREATE TABLE test TYPE=HEAP SELECT ip,SUM(downloads) AS down -> FROM log_table GROUP BY ip; mysql> SELECT COUNT(ip),AVG(down) FROM test; mysql> DROP TABLE test;
Here are some things you should consider when you use
CREATEstatement to ensure that you accidentally do not use all memory.
<=>(but are VERY fast).
HEAPtables can only use whole keys to search for a row; compare this to
MyISAMtables where any prefix of the key can be used to find rows.
HEAPtables use a fixed record length format.
HEAPdoesn't support an index on a
HEAPtable (this isn't common for hashed tables).
HEAPtables are shared between all clients (just like any other table).
HEAPtables are allocated in small blocks. The tables are 100% dynamic (on inserting). No overflow areas and no extra key space are needed. Deleted rows are put in a linked list and are reused when you insert new data into the table.
HEAPtables that you want to use at the same time.
DELETE FROM heap_table,
DROP TABLE heap_table.
MyISAMtable to a
HEAPtables bigger than
The memory needed for one row in a
HEAP table is:
SUM_OVER_ALL_KEYS(max_length_of_key + sizeof(char*) * 2) + ALIGN(length_of_row+1, sizeof(char*))
sizeof(char*) is 4 on 32-bit machines and 8 on 64-bit machines.
InnoDB provides MySQL with a transaction-safe (
table handler with commit, rollback, and crash recovery capabilities.
InnoDB does locking on row level and also provides an Oracle-style
non-locking read in
SELECTs. These features increase
multiuser concurrency and performance. There is no need for
lock escalation in InnoDB,
because row level locks in InnoDB fit in very small space.
InnoDB tables support
FOREIGN KEY constraints
as the first table type in MySQL.
InnoDB has been designed for maximum performance when processing large data volumes. Its CPU efficiency is probably not matched by any other disk-based relational database engine.
Technically, InnoDB is a complete database backend placed under MySQL. InnoDB has its own buffer pool for caching data and indexes in main memory. InnoDB stores its tables and indexes in a tablespace, which may consist of several files. This is different from, for example, MyISAM tables where each table is stored as a separate file. InnoDB tables can be of any size also on those operating systems where file-size is limited to 2 GB.
You can find the latest information about InnoDB at http://www.innodb.com/. The most up-to-date version of the InnoDB manual is always placed there, and you can also order commercial licenses and support for InnoDB.
InnoDB is currently (October 2001) used in production at several large database sites requiring high performance. The famous Internet news site Slashdot.org runs on InnoDB. Mytrix, Inc. stores over 1 TB of data in InnoDB, and another site handles an average load of 800 inserts/updates per second in InnoDB.
InnoDB tables are included in the MySQL source distribution starting from 3.23.34a and are activated in the MySQL -Max binary. For Windows the -Max binaries are contained in the standard distribution.
If you have downloaded a binary version of MySQL that includes
support for InnoDB, simply follow the instructions of the
for installing a binary version of MySQL. If you already have
MySQL-3.23 installed, then the simplest way to install
MySQL -Max is to replace the server executable `mysqld'
with the corresponding executable in the -Max distribution.
MySQL and MySQL -Max differ only in the server executable.
See section 2.2.7 Installing a MySQL Binary Distribution.
See section 4.7.5
mysqld-max, An Extended
To compile MySQL with InnoDB support,
download MySQL-3.23.34a or newer version from
and configure MySQL with the
--with-innodb option. See the
about installing a MySQL source distribution.
See section 2.3 Installing a MySQL Source Distribution.
cd /path/to/source/of/mysql-3.23.37 ./configure --with-innodb
To use InnoDB you have to specify InnoDB startup options in
your `my.cnf' or `my.ini' file. The minimal way
to modify it is to add to the
[mysqld] section the line
but to get good performance it is best that you specify options as recommended. See section 7.5.2 InnoDB Startup Options.
InnoDB is distributed under the GNU GPL License Version 2 (of June 1991). In the source distribution of MySQL, InnoDB appears as a subdirectory.
To use InnoDB tables in MySQL-Max-3.23 you MUST specify configuration
[mysqld] section of
the configuration file `my.cnf', or on Windows optionally in
At the minimum, in 3.23 you must specify
where you specify the names and the sizes of datafiles. If you do
innodb_data_home_dir in `my.cnf' the default
is to create these files to the
datadir of MySQL.
If you specify
innodb_data_home_dir as an empty string,
then you can give absolute paths to your data files in
In MySQL-4.0 you do not need to specify even
innodb_data_file_path: the default for it is to create
an auto-extending 10 MB file `ibdata1' to the
of MySQL. (In MySQL-4.0.0 and 4.0.1 the datafile is 64 MB and not
If you don't want to use
InnoDB tables, you can add the
skip-innodb option to your MySQL option file.
But to get good performance you MUST explicitly set the InnoDB parameters listed in the following examples.
Starting from versions 3.23.50 and 4.0.2 InnoDB allows the last
datafile on the
to be specified as auto-extending. The syntax for
innodb_data_file_path is then the following:
pathtodatafile:sizespecification;pathtodatafile:sizespecification;... ... ;pathtodatafile:sizespecification[:autoextend[:max:sizespecification]]
If you specify the last datafile with the autoextend option, InnoDB will extend the last datafile if it runs out of free space in the tablespace. The increment is 8 MB at a time. An example:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:100M:autoextend
instructs InnoDB to create just a single datafile whose initial size is
100 MB and which is extended in 8 MB blocks when space runs out.
If the disk becomes full you may want to add another data
file to another disk, for example. Then you have to look the size
of `ibdata1', round the size downward to
the closest multiple of 1024 * 1024 bytes (= 1 MB), and specify
the rounded size of `ibdata1' explicitly in
After that you can add another datafile:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend
Be cautious on filesystems where the maximum file-size is 2 GB! InnoDB is not aware of the OS maximum file-size. On those filesystems you might want to specify the max size for the datafile:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:100M:autoextend:max:2000M
A simple `my.cnf' example. Suppose you have a computer
with 128 MB RAM and one hard disk. Below is an example of
possible configuration parameters in `my.cnf' or
`my.ini' for InnoDB. We assume you are running
MySQL-Max-3.23.50 or later, or MySQL-4.0.2 or later.
This example suits most users, both on Unix and Windows,
who do not want to distribute InnoDB datafiles and
log files on several disks. This creates an
auto-extending data file `ibdata1' and two InnoDB log files
`ib_logfile0' and `ib_logfile1' to the
datadir of MySQL (typically `/mysql/data').
Also the small archived InnoDB log file
`ib_arch_log_0000000000' ends up in the
[mysqld] # You can write your other MySQL server options here # ... # Data file(s) must be able to # hold your data and indexes. # Make sure you have enough # free disk space. innodb_data_file_path = ibdata1:10M:autoextend # Set buffer pool size to # 50 - 80 % of your computer's # memory set-variable = innodb_buffer_pool_size=70M set-variable = innodb_additional_mem_pool_size=10M # Set the log file size to about # 25 % of the buffer pool size set-variable = innodb_log_file_size=20M set-variable = innodb_log_buffer_size=8M # Set ..flush_log_at_trx_commit # to 0 if you can afford losing # some last transactions innodb_flush_log_at_trx_commit=1
Check that the MySQL server has the rights to create files in
Note that datafiles must be < 2G in some file systems! The combined size of the log files must be < 4G. The combined size of datafiles must be >= 10 MB.
When you for the first time create an InnoDB database, it is best that you start the MySQL server from the command prompt. Then InnoDB will print the information about the database creation to the screen, and you see what is happening. See below next section what the printout should look like. For example, in Windows you can start `mysqld-max.exe' with:
Where to put `my.cnf' or `my.ini' in Windows? The rules for Windows are the following:
SETcommand of MS-DOS to print the value of WINDIR.
Where to specify options in Unix? On Unix `mysqld' reads options from the following files, if they exist, in the following order:
`COMPILATION_DATADIR' is the MySQL data directory which was
specified as a
./configure option when `mysqld'
(typically `/usr/local/mysql/data' for a binary installation or `/usr/local/var' for a source installation).
If you are not sure from where `mysqld' reads its `my.cnf'
or `my.ini', you can give the path as the first command-line
option to the server:
InnoDB forms the directory path to a datafile by textually catenating
innodb_data_home_dir to a datafile name or path in
innodb_data_file_path, adding a possible slash or
backslash in between if needed. If the keyword
innodb_data_home_dir is not mentioned in
`my.cnf' at all, the default for it is the
'dot' directory `./' which means the
datadir of MySQL.
An advanced `my.cnf' example. Suppose you have a Linux computer with 2 GB RAM and three 60 GB hard disks (at directory paths `/', `/dr2' and `/dr3'). Below is an example of possible configuration parameters in `my.cnf' for InnoDB.
Note that InnoDB does not create directories: you
have to create them yourself. Use the Unix or MS-DOS
mkdir command to create the data and log group home directories.
[mysqld] # You can write your other MySQL server options here # ... innodb_data_home_dir = # Data files must be able to # hold your data and indexes innodb_data_file_path = /ibdata/ibdata1:2000M;/dr2/ibdata/ibdata2:2000M:autoextend # Set buffer pool size to # 50 - 80 % of your computer's # memory, but make sure on Linux # x86 total memory usage is # < 2 GB set-variable = innodb_buffer_pool_size=1G set-variable = innodb_additional_mem_pool_size=20M innodb_log_group_home_dir = /dr3/iblogs # .._log_arch_dir must be the same # as .._log_group_home_dir innodb_log_arch_dir = /dr3/iblogs set-variable = innodb_log_files_in_group=3 # Set the log file size to about # 15 % of the buffer pool size set-variable = innodb_log_file_size=150M set-variable = innodb_log_buffer_size=8M # Set ..flush_log_at_trx_commit to # 0 if you can afford losing # some last transactions innodb_flush_log_at_trx_commit=1 set-variable = innodb_lock_wait_timeout=50 #innodb_flush_method=fdatasync #set-variable = innodb_thread_concurrency=5
Note that we have placed the two datafiles on different disks. InnoDB will fill the tablespace formed by the datafiles from bottom up. In some cases it will improve the performance of the database if all data is not placed on the same physical disk. Putting log files on a different disk from data is very often beneficial for performance. You can also use raw disk partitions (raw devices) as datafiles. In some Unixes they speed up I/O. See the manual section on InnoDB file space management about how to specify them in `my.cnf'.
Warning: on Linux x86 you must be careful you do not set memory usage too high. glibc will allow the process heap to grow over thread stacks, which will crash your server. It is a risk if the value of
innodb_buffer_pool_size + key_buffer + max_connections * (sort_buffer + read_buffer_size) + max_connections * 2 MB
is close to 2 GB or exceeds 2 GB. Each thread will use a stack
(often 2 MB, but in MySQL AB binaries only 256 kB) and in the worst case also
sort_buffer + read_buffer_size
How to tune other `mysqld' server parameters? Typical values which suit most users are:
skip-locking set-variable = max_connections=200 set-variable = read_buffer_size=1M set-variable = sort_buffer=1M # Set key_buffer to 5 - 50% # of your RAM depending on how # much you use MyISAM tables, but # keep key_buffer + InnoDB # buffer pool size < 80% of # your RAM set-variable = key_buffer=...
Note that some parameters are given using the numeric `my.cnf'
set-variable = innodb... = 123, others
(string and boolean parameters) with another format:
innodb_... = ... .
The meanings of the configuration parameters are the following:
The common part of the directory path for all InnoDB datafiles.
If you do not mentioned this option in `my.cnf'
the default is the |
|Paths to individual datafiles and their sizes. The full directory path to each datafile is acquired by concatenating innodb_data_home_dir to the paths specified here. The file-sizes are specified in megabytes, hence the 'M' after the size specification above. InnoDB also understands the abbreviation 'G', 1G meaning 1024M. Starting from 3.23.44 you can set the file-size bigger than 4 GB on those operating systems which support big files. On some operating systems files must be < 2 GB. The sum of the sizes of the files must be at least 10 MB.|
|Number of identical copies of log groups we keep for the database. Currently this should be set to 1.|
|Directory path to InnoDB log files.|
|Number of log files in the log group. InnoDB writes to the files in a circular fashion. Value 3 is recommended here.|
|Size of each log file in a log group in megabytes. Sensible values range from 1M to 1/nth of the size of the buffer pool specified below, where n is the number of log files in the group. The bigger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But bigger log files also mean that recovery will be slower in case of a crash. The combined size of log files must be < 4 GB on 32-bit computers.|
|The size of the buffer which InnoDB uses to write log to the log files on disk. Sensible values range from 1M to 8M. A big log buffer allows large transactions to run without a need to write the log to disk until the transaction commit. Thus, if you have big transactions, making the log buffer big will save disk I/O.|
|Normally this is set to 1, meaning that at a transaction commit the log is flushed to disk, and the modifications made by the transaction become permanent, and survive a database crash. If you are willing to compromise this safety, and you are running small transactions, you may set this to 0 to reduce disk I/O to the logs.|
The directory where fully written log files would be archived if we used
log archiving. The value of this parameter should currently be set the
same as |
|This value should currently be set to 0. As recovery from a backup is done by MySQL using its own log files, there is currently no need to archive InnoDB log files.|
|The size of the memory buffer InnoDB uses to cache data and indexes of its tables. The bigger you set this the less disk I/O is needed to access data in tables. On a dedicated database server you may set this parameter up to 80% of the machine physical memory size. Do not set it too large, though, because competition of the physical memory may cause paging in the operating system.|
|Size of a memory pool InnoDB uses to store data dictionary information and other internal data structures. A sensible value for this might be 2M, but the more tables you have in your application the more you will need to allocate here. If InnoDB runs out of memory in this pool, it will start to allocate memory from the operating system, and write warning messages to the MySQL error log.|
|Number of file I/O threads in InnoDB. Normally, this should be 4, but on Windows disk I/O may benefit from a larger number.|
Timeout in seconds an InnoDB transaction may wait for a lock before
being rolled back. InnoDB automatically detects transaction deadlocks
in its own lock table and rolls back the transaction. If you use
(Available from 3.23.40 up.)
The default value for this is |
Suppose you have installed MySQL and have edited `my.cnf' so that it contains the necessary InnoDB configuration parameters. Before starting MySQL you should check that the directories you have specified for InnoDB datafiles and log files exist and that you have access rights to those directories. InnoDB cannot create directories, only files. Check also you have enough disk space for the data and log files.
When you now start MySQL, InnoDB will start creating your datafiles and log files. InnoDB will print something like the following:
~/mysqlm/sql > mysqld InnoDB: The first specified datafile /home/heikki/data/ibdata1 did not exist: InnoDB: a new database to be created! InnoDB: Setting file /home/heikki/data/ibdata1 size to 134217728 InnoDB: Database physically writes the file full: wait... InnoDB: datafile /home/heikki/data/ibdata2 did not exist: new to be created InnoDB: Setting file /home/heikki/data/ibdata2 size to 262144000 InnoDB: Database physically writes the file full: wait... InnoDB: Log file /home/heikki/data/logs/ib_logfile0 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile0 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile1 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile1 size to 5242880 InnoDB: Log file /home/heikki/data/logs/ib_logfile2 did not exist: new to be created InnoDB: Setting log file /home/heikki/data/logs/ib_logfile2 size to 5242880 InnoDB: Started mysqld: ready for connections
A new InnoDB database has now been created. You can connect to the MySQL
server with the usual MySQL client programs like
When you shut down the MySQL server with `mysqladmin shutdown',
InnoDB output will be like the following:
010321 18:33:34 mysqld: Normal shutdown 010321 18:33:34 mysqld: Shutdown Complete InnoDB: Starting shutdown... InnoDB: Shutdown completed
You can now look at the datafiles and logs directories and you will see the files created. The log directory will also contain a small file named `ib_arch_log_0000000000'. That file resulted from the database creation, after which InnoDB switched off log archiving. When MySQL is again started, the output will be like the following:
~/mysqlm/sql > mysqld InnoDB: Started mysqld: ready for connections
If InnoDB prints an operating system error in a file operation, usually the problem is one of the following:
If something goes wrong in an InnoDB database creation, you should delete all files created by InnoDB. This means all datafiles, all log files, the small archived log file, and in the case you already did create some InnoDB tables, delete also the corresponding `.frm' files for these tables from the MySQL database directories. Then you can try the InnoDB database creation again.
Suppose you have started the MySQL client with the command
To create a table in the InnoDB format you must specify
TYPE = InnoDB in the table creation SQL command:
CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A)) TYPE = InnoDB;
This SQL command will create a table and an index on column
into the InnoDB tablespace consisting of the datafiles you specified
in `my.cnf'. In addition MySQL will create a file
`CUSTOMER.frm' to the MySQL database directory `test'.
Internally, InnoDB will add to its own data dictionary an entry
'test/CUSTOMER'. Thus you can create a table
of the same name
CUSTOMER in another database of MySQL, and
the table names will not collide inside InnoDB.
You can query the amount of free space in the InnoDB tablespace
by issuing the table status command of MySQL for any table you have
TYPE = InnoDB. Then the amount of free
space in the tablespace appears in the table comment section in the
SHOW. An example:
SHOW TABLE STATUS FROM test LIKE 'CUSTOMER'
Note that the statistics
SHOW gives about InnoDB tables
are only approximate: they are used in SQL optimisation. Table and
index reserved sizes in bytes are accurate, though.
InnoDB does not have a special optimisation for separate index creation.
Therefore it does not pay to export and import the table and create indexes
The fastest way to alter a table to InnoDB is to do the inserts
directly to an InnoDB table, that is, use
ALTER TABLE ... TYPE=INNODB,
or create an empty InnoDB table with identical definitions and insert
the rows with
INSERT INTO ... SELECT * FROM ....
To get better control over the insertion process, it may be good to insert big tables in pieces:
INSERT INTO newtable SELECT * FROM oldtable WHERE yourkey > something AND yourkey <= somethingelse;
After all data has been inserted you can rename the tables.
During the conversion of big tables you should set the InnoDB buffer pool size big to reduce disk I/O. Not bigger than 80% of the physical memory, though. You should set InnoDB log files big, and also the log buffer large.
Make sure you do not run out of tablespace: InnoDB tables take a lot
more space than MyISAM tables. If an
ALTER TABLE runs out
of space, it will start a rollback, and that can take hours if it is
In inserts InnoDB uses the insert buffer to merge secondary index records
to indexes in batches. That saves a lot of disk I/O. In rollback no such
mechanism is used, and the rollback can take 30 times longer than the
In the case of a runaway rollback, if you do not have valuable data in your database, it is better that you kill the database process and delete all InnoDB data and log files and all InnoDB table `.frm' files, and start your job again, rather than wait for millions of disk I/Os to complete.
Starting from version 3.23.43b InnoDB features foreign key constraints. InnoDB is the first MySQL table type which allows you to define foreign key constraints to guard the integrity of your data.
The syntax of a foreign key constraint definition in InnoDB:
[CONSTRAINT symbol] FOREIGN KEY (index_col_name, ...) REFERENCES table_name (index_col_name, ...) [ON DELETE CASCADE | ON DELETE SET NULL]
Both tables have to be InnoDB type and there must be an index where the foreign key and the referenced key are listed as the first columns. InnoDB does not auto-create indexes on foreign keys or referenced keys: you have to create them explicitly.
Corresponding columns in the foreign key and the referenced key must have similar internal data types inside InnoDB so that they can be compared without a type conversion. The size and the signedness of integer types has to be the same. The length of string types need not be the same.
Starting from version 3.23.50 you can also associate the
ON DELETE CASCADE or
ON DELETE SET NULL
clause with the foreign key constraint.
ON DELETE CASCADE is specified, and a row in the parent
table is deleted, then InnoDB automatically deletes also all those rows
in the child table whose foreign key values are equal to
the referenced key value in the parent row. If
ON DELETE SET NULL
is specified, the child rows are automatically updated so that the
columns in the foreign key are set to the SQL NULL value.
Starting from version 3.23.50, InnoDB does not check foreign key constraints on those foreign key or referenced key values which contain a NULL column.
Starting from version 3.23.50 the InnoDB parser allows you to
use backquotes (`) around table and column names in the
FOREIGN KEY ... REFERENCES ... clause
but the InnoDB parser is not yet aware of the option
lower_case_table_names you can specify in `my.cnf'.
CREATE TABLE parent(id INT NOT NULL, PRIMARY KEY (id)) TYPE=INNODB; CREATE TABLE child(id INT, parent_id INT, INDEX par_ind (parent_id), FOREIGN KEY (parent_id) REFERENCES parent(id) ON DELETE SET NULL ) TYPE=INNODB;
If MySQL gives the error number 1005 from a
statement, and the error message string refers to errno 150, then
the table creation failed because a foreign key constraint was not
Similarly, if an
ALTER TABLE fails and it refers to errno
150, that means a foreign key definition would be incorrectly
formed for the altered table.
Starting from version 3.23.50 InnoDB allows you to add a new foreign key constraint to a table through
ALTER TABLE yourtablename ADD [CONSTRAINT symbol] FOREIGN KEY (...) REFERENCES anothertablename(...)
Remember to create the required indexes first, though.
In InnoDB versions < 3.23.50
should not be used in connection with tables which have foreign
key constraints or which are referenced in foreign key constraints:
ALTER TABLE removes all foreign key
constrainst defined for the table. You should not use
ALTER TABLE to the referenced table either, but
DROP TABLE and
CREATE TABLE to modify the
schema. When MySQL does an
ALTER TABLE it may internally
RENAME TABLE, and that will confuse the
foreign key costraints which refer to the table.
CREATE INDEX statement is in MySQL
processed as an
ALTER TABLE, and these
restrictions apply also to it.
When doing foreign key checks InnoDB sets shared row level locks on child or parent records it has to look at. InnoDB checks foreign key constraints immediately: the check is not deferred to transaction commit.
InnoDB allows you to drop any table even though that would break the foreign key constraints which reference the table. When you drop a table the constraints which were defined in its create statement are also dropped.
If you re-create a table which was dropped, it has to have a definition which conforms to the foreign key constraints referencing it. It must have the right column names and types, and it must have indexes on the referenced keys, as stated above. If these are not satisfied, MySQL returns error number 1005 and refers to errno 150 in the error message string.
Starting from version 3.23.50 InnoDB returns the foreign key definitions of a table when you call
SHOW CREATE TABLE yourtablename
Then also `mysqldump' produces correct definitions of tables to the dump file, and does not forget about the foreign keys.
You can also list the foreign key constraints for a table
SHOW TABLE STATUS FROM yourdatabasename LIKE 'T'
The foreign key constraints are listed in the table comment of the output.
From version 3.23.50 and 4.0.2 you can specify the last InnoDB datafile
autoextend. Alternatively, you can increase to your tablespace
by specifying an additional datafile. To do this you have to shut down
the MySQL server, edit the `my.cnf' file adding a new datafile
innodb_data_file_path, and then start the MySQL server again.
Currently you cannot remove a datafile from InnoDB. To decrease the size of your database you have to use `mysqldump' to dump all your tables, create a new database, and import your tables to the new database.
If you want to change the number or the size of your InnoDB log files, you have to shut down MySQL and make sure that it shuts down without errors. Then copy the old log files into a safe place just in case something went wrong in the shutdown and you will need them to recover the database. Delete then the old log files from the log file directory, edit `my.cnf', and start MySQL again. InnoDB will tell you at the startup that it is creating new log files.
The key to safe database management is taking regular backups.
InnoDB Hot Backup is an online backup tool you can use to backup your InnoDB database while it is running. InnoDB Hot Backup does not require you to shut down your database and it does not set any locks or disturb your normal database processing. InnoDB Hot Backup is a non-free additional tool which is not included in the standard MySQL distribution. See the InnoDB Hot Backup homepage http://www.innodb.com/hotbackup.html for detailed information and screenshots.
If you are able to shut down your MySQL server, then to take a 'binary' backup of your database you have to do the following:
In addition to taking the binary backups described above, you should also regularly take dumps of your tables with `mysqldump'. The reason to this is that a binary file may be corrupted without you noticing it. Dumped tables are stored into text files which are human-readable and much simpler than database binary files. Seeing table corruption from dumped files is easier, and since their format is simpler, the chance for serious data corruption in them is smaller.
A good idea is to take the dumps at the same time you take a binary backup of your database. You have to shut out all clients from your database to get a consistent snapshot of all your tables into your dumps. Then you can take the binary backup, and you will then have a consistent snapshot of your database in two formats.
To be able to recover your InnoDB database to the present from the binary backup described above, you have to run your MySQL database with the general logging and log archiving of MySQL switched on. Here by the general logging we mean the logging mechanism of the MySQL server which is independent of InnoDB logs.
To recover from a crash of your MySQL server process, the only thing you have to do is to restart it. InnoDB will automatically check the logs and perform a roll-forward of the database to the present. InnoDB will automatically roll back uncommitted transactions which were present at the time of the crash. During recovery, InnoDB will print out something like the following:
~/mysqlm/sql > mysqld InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 13674004 InnoDB: Doing recovery: scanned up to log sequence number 0 13739520 InnoDB: Doing recovery: scanned up to log sequence number 0 13805056 InnoDB: Doing recovery: scanned up to log sequence number 0 13870592 InnoDB: Doing recovery: scanned up to log sequence number 0 13936128 ... InnoDB: Doing recovery: scanned up to log sequence number 0 20555264 InnoDB: Doing recovery: scanned up to log sequence number 0 20620800 InnoDB: Doing recovery: scanned up to log sequence number 0 20664692 InnoDB: 1 uncommitted transaction(s) which must be rolled back InnoDB: Starting rollback of uncommitted transactions InnoDB: Rolling back trx no 16745 InnoDB: Rolling back of trx no 16745 completed InnoDB: Rollback of uncommitted transactions completed InnoDB: Starting an apply batch of log records to the database... InnoDB: Apply batch completed InnoDB: Started mysqld: ready for connections
If your database gets corrupted or your disk fails, you have to do the recovery from a backup. In the case of corruption, you should first find a backup which is not corrupted. From a backup do the recovery from the general log files of MySQL according to instructions in the MySQL manual.
InnoDB implements a checkpoint mechanism called a fuzzy checkpoint. InnoDB will flush modified database pages from the buffer pool in small batches, there is no need to flush the buffer pool in one single batch, which would in practice stop processing of user SQL statements for a while.
In crash recovery InnoDB looks for a checkpoint label written to the log files. It knows that all modifications to the database before the label are already present on the disk image of the database. Then InnoDB scans the log files forward from the place of the checkpoint applying the logged modifications to the database.
InnoDB writes to the log files in a circular fashion. All committed modifications which make the database pages in the buffer pool different from the images on disk must be available in the log files in case InnoDB has to do a recovery. This means that when InnoDB starts to reuse a log file in the circular fashion, it has to make sure that the database page images on disk already contain the modifications logged in the log file InnoDB is going to reuse. In other words, InnoDB has to make a checkpoint and often this involves flushing of modified database pages to disk.
The above explains why making your log files very big may save disk I/O in checkpointing. It can make sense to set the total size of the log files as big as the buffer pool or even bigger. The drawback in big log files is that crash recovery can last longer because there will be more log to apply to the database.
InnoDB data and log files are binary-compatible on all platforms
if the floating-point number format on the machines is the same.
You can move an InnoDB database simply by copying all the relevant
files, which we already listed in the previous section on backing up
a database. If the floating-point formats on the machines are
different but you have not used
data types in your tables then the procedure is the same: just copy
the relevant files. If the formats are different and your tables
contain floating-point data, you have to use `mysqldump'
and `mysqlimport' to move those tables.
A performance tip is to switch off auto-commit mode when you import data into your database, assuming your tablespace has enough space for the big rollback segment the big import transaction will generate. Do the commit only after importing a whole table or a segment of a table.
In the InnoDB transaction model the goal has been to combine the best properties of a multi-versioning database to traditional two-phase locking. InnoDB does locking on row level and runs queries by default as non-locking consistent reads, in the style of Oracle. The lock table in InnoDB is stored so space-efficiently that lock escalation is not needed: typically several users are allowed to lock every row in the database, or any random subset of the rows, without InnoDB running out of memory.
In InnoDB all user activity happens inside transactions. If the
auto-commit mode is used in MySQL, then each SQL statement
will form a single transaction. If the auto-commit mode is
switched off, then we can think that a user always has a transaction
open. If he issues
ROLLBACK statement, that
ends the current transaction, and a new starts. Both statements
will release all InnoDB locks that were set during the
current transaction. A
COMMIT means that the
changes made in the current transaction are made permanent
and become visible to other users. A
on the other hand cancels all modifications made by the current
A consistent read means that InnoDB uses its multi-versioning to present to a query a snapshot of the database at a point in time. The query will see the changes made by exactly those transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query will see the changes made by the transaction itself which issues the query.
When a transaction issues its first consistent read, InnoDB assigns the snapshot, or the point of time, which all consistent reads in the same transaction will use. In the snapshot are all transactions that committed before assigning the snapshot. Thus the consistent reads within the same transaction will also be consistent with respect to each other. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.
Consistent read is the default mode in which InnoDB processes
SELECT statements. A consistent read does not set any locks
on the tables it accesses, and therefore other users are free to
modify those tables at the same time a consistent read is being performed
on the table.
A consistent read is not convenient in some circumstances.
Suppose you want to add a new row into your table
and make sure that the child already has a parent in table
Suppose you use a consistent read to read the table
and indeed see the parent of the child in the table. Can you now safely
add the child row to table
CHILD? No, because it may
happen that meanwhile some other user has deleted the parent row
from the table
PARENT, and you are not aware of that.
The solution is to perform the
SELECT in a locking
LOCK IN SHARE MODE.
SELECT * FROM PARENT WHERE NAME = 'Jones' LOCK IN SHARE MODE;
Performing a read in share mode means that we read the latest
available data, and set a shared mode lock on the rows we read.
If the latest data belongs to a yet uncommitted transaction of another
user, we will wait until that transaction commits.
A shared mode lock prevents others from updating or deleting
the row we have read. After we see that the above query returns
'Jones', we can safely add his child
CHILD, and commit our transaction.
This example shows how to implement referential
integrity in your application code.
Let us look at another example: we have an integer counter field in
CHILD_CODES which we use to assign
a unique identifier to each child we add to table
Obviously, using a consistent read or a shared mode read
to read the present value of the counter is not a good idea, since
then two users of the database may see the same value for the
counter, and we will get a duplicate key error when we add
the two children with the same identifier to the table.
In this case there are two good ways to implement the
reading and incrementing of the counter: (1) update the counter
first by incrementing it by 1 and only after that read it,
or (2) read the counter first with
a lock mode
FOR UPDATE, and increment after that:
SELECT COUNTER_FIELD FROM CHILD_CODES FOR UPDATE; UPDATE CHILD_CODES SET COUNTER_FIELD = COUNTER_FIELD + 1;
SELECT ... FOR UPDATE will read the latest
available data setting exclusive locks on each row it reads.
Thus it sets the same locks a searched SQL
UPDATE would set
on the rows.
In row level locking InnoDB uses an algorithm called next-key locking. InnoDB does the row level locking so that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records in encounters. Thus the row level locks are more precisely called index record locks.
The locks InnoDB sets on index records also affect the 'gap'
before that index record. If a user has a shared or exclusive
lock on record R in an index, then another user cannot insert
a new index record immediately before R in the index order.
This locking of gaps is done to prevent the so-called phantom
problem. Suppose I want to read and lock all children with identifier
bigger than 100 from table
and update some field in the selected rows.
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
Suppose there is an index on table
CHILD on column
ID. Our query will scan that index starting from
the first record where
ID is bigger than 100.
Now, if the locks set on the index records would not lock out
inserts made in the gaps, a new child might meanwhile be
inserted to the table. If now I in my transaction execute
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
again, I will see a new child in the result set the query returns. This is against the isolation principle of transactions: a transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, then the new 'phantom' child would break this isolation principle.
When InnoDB scans an index it can also lock the gap
after the last record in the index. Just that happens in the previous
example: the locks set by InnoDB will prevent any insert to
the table where
ID would be bigger than 100.
You can use next-key locking to implement a uniqueness check in your application: if you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read will prevent anyone meanwhile inserting a duplicate for your row. Thus the next-key locking allows you to 'lock' the non-existence of something in your table.
SELECT ... FROM ...: this is a consistent read, reading a snapshot of the database and setting no locks.
SELECT ... FROM ... LOCK IN SHARE MODE: sets shared next-key locks on all index records the read encounters.
SELECT ... FROM ... FOR UPDATE: sets exclusive next-key locks on all index records the read encounters.
INSERT INTO ... VALUES (...): sets an exclusive lock on the inserted row; note that this lock is not a next-key lock and does not prevent other users from inserting to the gap before the inserted row. If a duplicate key error occurs, sets a shared lock on the duplicate index record.
INSERT INTO T SELECT ... FROM S WHERE ...sets an exclusive (non-next-key) lock on each row inserted into
T. Does the search on
Sas a consistent read, but sets shared next-key locks on
Sif the MySQL logging is on. InnoDB has to set locks in the latter case because in roll-forward recovery from a backup every SQL statement has to be executed in exactly the same way as it was done originally.
CREATE TABLE ... SELECT ...performs the
SELECTas a consistent read or with shared locks, like in the previous item.
REPLACEis done like an insert if there is no collision on a unique key. Otherwise, an exclusive next-key lock is placed on the row which has to be updated.
UPDATE ... SET ... WHERE ...: sets an exclusive next-key lock on every record the search encounters.
DELETE FROM ... WHERE ...: sets an exclusive next-key lock on every record the search encounters.
FOREIGN KEYconstraint is defined on a table, any insert, update, or delete which requires checking of the constraint condition sets shared record level locks on the records it looks at to check the constraint. Also in the case where the constraint fails, InnoDB sets these locks.
LOCK TABLES ...: sets table locks. In the implementation the MySQL layer of code sets these locks. The automatic deadlock detection of InnoDB cannot detect deadlocks where such table locks are involved: see the following section. Also, since MySQL does know about row level locks, it is possible that you get a table lock on a table where another user currently has row level locks. But that does not put transaction integerity into danger. See section 7.5.13 Restrictions on InnoDB Tables.
InnoDB automatically detects a deadlock of transactions and rolls
back the transaction whose lock request was the last one to build
a deadlock, that is, a cycle in the waits-for graph of transactions.
InnoDB cannot detect deadlocks where a lock set by a MySQL
LOCK TABLES statement is involved, or if a lock set
in another table handler than InnoDB is involved. You have to resolve
these situations using
innodb_lock_wait_timeout set in
When InnoDB performs a complete rollback of a transaction, all the locks of the transaction are released. However, if just a single SQL statement is rolled back as a result of an error, some of the locks set by the SQL statement may be preserved. This is because InnoDB stores row locks in a format where it cannot afterwards know which was set by which SQL statement.
When you issue a consistent read, that is, an ordinary
statement, InnoDB will give your transaction a timepoint according
to which your query sees the database. Thus, if transaction B deletes
a row and commits after your timepoint was assigned, then you will
not see the row deleted. Similarly with inserts and updates.
You can advance your timepoint by committing your transaction
and then doing another
This is called multi-versioned concurrency control.
User A User B SET AUTOCOMMIT=0; SET AUTOCOMMIT=0; time | SELECT * FROM t; | empty set | INSERT INTO t VALUES (1, 2); | v SELECT * FROM t; empty set COMMIT; SELECT * FROM t; empty set; COMMIT; SELECT * FROM t; --------------------- | 1 | 2 | ---------------------
Thus user A sees the row inserted by B only when B has committed the insert, and A has committed his own transaction so that the timepoint is advanced past the commit of B.
If you want to see the ``freshest'' state of the database, you should use a locking read:
SELECT * FROM t LOCK IN SHARE MODE;
Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally you have to write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
InnoDB uses automatic row level locking. You can get deadlocks even in the case of transactions which just insert or delete a single row. That is because these operations are not really 'atomic': they automatically set locks on the (possibly several) index records of the row inserted/deleted.
You can cope with deadlocks and reduce the number of them with the following tricks:
SELECTto return data from an old snapshot, do not add the clause
LOCK IN SHARE MODEto it.
LOCK TABLES t1 WRITE, t2 READ, ... ; [do something with tables t1 and t2 here]; UNLOCK TABLES.Table level locks make you transactions to queue nicely, and deadlocks are avoided. Note that
LOCK TABLESimplicitly starts a transaction, just like the command
UNLOCK TABLESimplicitly ends the transaction in a
1. If the Unix `top' or the Windows `Task Manager' shows that the CPU usage percentage with your workload is less than 70%, your workload is probably disk-bound. Maybe you are making too many transaction commits, or the buffer pool is too small. Making the buffer pool bigger can help, but do not set it bigger than 80% of physical memory.
2. Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit, if that transaction made modifications to the database. Since the rotation speed of a disk is typically at most 167 revolutions/second, that constrains the number of commits to the same 167/second if the disk does not fool the operating system.
If you can afford the loss of some latest committed transactions, you can
set the `my.cnf' parameter
to zero. InnoDB tries to flush the log anyway once in a second,
though the flush is not guaranteed.
4. Make your log files big, even as big as the buffer pool. When InnoDB has written the log files full, it has to write the modified contents of the buffer pool to disk in a checkpoint. Small log files will cause many unnecessary disk writes. The drawback in big log files is that recovery time will be longer.
5. Also the log buffer should be quite big, say 8 MB.
6. (Relevant from 3.23.39 up.)
In some versions of Linux and Unix, flushing files to disk with the Unix
fdatasync and other similar methods is surprisingly slow.
The default method InnoDB uses is the
If you are not satisfied with the database write performance, you may
innodb_flush_method in `my.cnf'
O_DSYNC, though O_DSYNC seems to be slower on most systems.
7. In importing data to InnoDB, make sure that MySQL does not have
autocommit=1 on. Then every insert requires a log flush to disk.
Put before your plain SQL import file line
and after it
If you use the `mysqldump' option
--opt, you will get dump
files which are fast to import also to an InnoDB table, even without wrapping
them to the above
SET AUTOCOMMIT=0; ... COMMIT; wrappers.
8. Beware of big rollbacks of mass inserts: InnoDB uses the insert buffer to save disk I/O in inserts, but in a corresponding rollback no such mechanism is used. A disk-bound rollback can take 30 times the time of the corresponding insert. Killing the database process will not help because the rollback will start again at the database startup. The only way to get rid of a runaway rollback is to increase the buffer pool so that the rollback becomes CPU-bound and runs fast, or delete the whole InnoDB database.
Beware also of other big disk-bound operations.
DROP TABLE or
TRUNCATE (from MySQL-4.0 up) to empty a
DELETE FROM yourtable.
Use the multi-line
INSERT to reduce
communication overhead between the client and the server if you need
to insert many rows:
INSERT INTO yourtable VALUES (1, 2), (5, 5);
This tip is of course valid for inserts into any table type, not just InnoDB.
Starting from version 3.23.41 InnoDB includes the InnoDB
Monitor which prints information on the InnoDB internal state.
When switched on, InnoDB Monitor
will make the MySQL server `mysqld' to print data
(note: the MySQL client will not print anything)
to the standard
output about once every 15 seconds. This data is useful in
On Windows you must start
from a MS-DOS prompt
options to direct the output to the MS-DOS prompt
There is a separate
prints the same information as
plus information on locks set by each transaction.
The printed information includes data on:
You can start InnoDB Monitor through the following SQL command:
CREATE TABLE innodb_monitor(a int) type = innodb;
and stop it by
DROP TABLE innodb_monitor;
CREATE TABLE syntax is just a way to pass a command
to the InnoDB engine through the MySQL SQL parser: the created
table is not relevant at all for InnoDB Monitor. If you shut down
the database when the monitor is running, and you want to start
the monitor again, you have to drop the
table before you can issue a new
to start the monitor.
This syntax may change in a future release.
A sample output of the InnoDB Monitor:
================================ 010809 18:45:06 INNODB MONITOR OUTPUT ================================ -------------------------- LOCKS HELD BY TRANSACTIONS -------------------------- LOCK INFO: Number of locks in the record hash table 1294 LOCKS FOR TRANSACTION ID 0 579342744 TABLE LOCK table test/mytable trx id 0 582333343 lock_mode IX RECORD LOCKS space id 0 page no 12758 n bits 104 table test/mytable index PRIMARY trx id 0 582333343 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 74; 1-byte offs FALSE; info bits 0 0: len 4; hex 0001a801; asc ;; 1: len 6; hex 000022b5b39f; asc ";; 2: len 7; hex 000002001e03ec; asc ;; 3: len 4; hex 00000001; ... ----------------------------------------------- CURRENT SEMAPHORES RESERVED AND SEMAPHORE WAITS ----------------------------------------------- SYNC INFO: Sorry, cannot give mutex list info in non-debug version! Sorry, cannot give rw-lock list info in non-debug version! ----------------------------------------------------- SYNC ARRAY INFO: reservation count 6041054, signal count 2913432 4a239430 waited for by thread 49627477 op. S-LOCK file NOT KNOWN line 0 Mut ex 0 sp 5530989 r 62038708 sys 2155035; rws 0 8257574 8025336; rwx 0 1121090 1848344 ----------------------------------------------------- CURRENT PENDING FILE I/O'S -------------------------- Pending normal aio reads: Reserved slot, messages 40157658 4a4a40b8 Reserved slot, messages 40157658 4a477e28 ... Reserved slot, messages 40157658 4a4424a8 Reserved slot, messages 40157658 4a39ea38 Total of 36 reserved aio slots Pending aio writes: Total of 0 reserved aio slots Pending insert buffer aio reads: Total of 0 reserved aio slots Pending log writes or reads: Reserved slot, messages 40158c98 40157f98 Total of 1 reserved aio slots Pending synchronous reads or writes: Total of 0 reserved aio slots ----------- BUFFER POOL ----------- LRU list length 8034 Free list length 0 Flush list length 999 Buffer pool size in pages 8192 Pending reads 39 Pending writes: LRU 0, flush list 0, single page 0 Pages read 31383918, created 51310, written 2985115 ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 010809 18:45:22 InnoDB starts purge 010809 18:45:22 InnoDB purged 0 pages
Some notes on the output:
UNIV_SYNC_DEBUGdefined in `univ.i'.
Since InnoDB is a multi-versioned database, it must keep information of old versions of rows in the tablespace. This information is stored in a data structure we call a rollback segment after an analogous data structure in Oracle.
InnoDB internally adds two fields to each row stored in the database. A 6-byte field tells the transaction identifier for the last transaction which inserted or updated the row. Also a deletion is internally treated as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, then the undo log record contains the information necessary to rebuild the content of the row before it was updated.
InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are only needed in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, and they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
You must remember to commit your transactions regularly, also those transactions which only issue consistent reads. Otherwise InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space need for your rollback segment.
In our multi-versioning scheme a row is not physically removed from the database immediately when you delete it with an SQL statement. Only when InnoDB can discard the update undo log record written for the deletion, it can also physically remove the corresponding row and its index records from the database. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement which did the deletion.
MySQL stores its data dictionary information of tables
files in database directories. But every InnoDB type table
also has its own entry in InnoDB internal data dictionaries
inside the tablespace. When MySQL drops a table or a database,
it has to delete both a `.frm' file or files, and
the corresponding entries inside the InnoDB data dictionary.
This is the reason why you cannot move InnoDB tables between
databases simply by moving the `.frm' files, and why
DROP DATABASE did not work for InnoDB type tables
in MySQL versions <= 3.23.43.
Every InnoDB table has a special index called the clustered index
where the data of the rows is stored. If you define a
PRIMARY KEY on your table, then the index of the primary key
will be the clustered index.
If you do not define a primary key for your table, InnoDB will internally generate a clustered index where the rows are ordered by the row id InnoDB assigns to the rows in such a table. The row id is a 6-byte field which monotonically increases as new rows are inserted. Thus the rows ordered by the row id will be physically in the insertion order.
Accessing a row through the clustered index is fast, because the row data will be on the same page where the index search leads us. In many databases the data is traditionally stored on a different page from the index record. If a table is large, the clustered index architecture often saves a disk I/O when compared to the traditional solution.
The records in non-clustered indexes (we also call them secondary indexes), in InnoDB contain the primary key value for the row. InnoDB uses this primary key value to search for the row from the clustered index. Note that if the primary key is long, the secondary indexes will use more space.
All indexes in InnoDB are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16 kB. When new records are inserted, InnoDB tries to leave 1 / 16 of the page free for future insertions and updates of the index records.
If index records are inserted in a sequential (ascending or descending) order, the resulting index pages will be about 15/16 full. If records are inserted in a random order, then the pages will be 1/2 - 15/16 full. If the fillfactor of an index page drops below 1/2, InnoDB will try to contract the index tree to free the page.
It is a common situation in a database application that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus the insertions to the clustered index do not require random reads from a disk.
On the other hand, secondary indexes are usually non-unique and insertions happen in a relatively random order into secondary indexes. This would cause a lot of random disk I/Os without a special mechanism used in InnoDB.
If an index record should be inserted to a non-unique secondary index, InnoDB checks if the secondary index page is already in the buffer pool. If that is the case, InnoDB will do the insertion directly to the index page. But, if the index page is not found from the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it entirely fits in the buffer pool, and insertions can be made to it very fast.
The insert buffer is periodically merged to the secondary index trees in the database. Often we can merge several insertions on the same page in of the index tree, and hence save disk I/Os. It has been measured that the insert buffer can speed up insertions to a table up to 15 times.
If a database fits almost entirely in main memory, then the fastest way to perform queries on it is to use hash indexes. InnoDB has an automatic mechanism which monitors index searches made to the indexes defined for a table, and if InnoDB notices that queries could benefit from building of a hash index, such an index is automatically built.
But note that the hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on what search pattern InnoDB observes on the B-tree index. A hash index can be partial: it is not required that the whole B-tree index is cached in the buffer pool. InnoDB will build hash indexes on demand to those pages of the index which are often accessed.
In a sense, through the adaptive hash index mechanism InnoDB adapts itself to ample main memory, coming closer to the architecture of main memory databases.
After a database startup, when a user first does an insert to a
where an auto-increment column has been defined, and the user does not provide
an explicit value for the column, then InnoDB executes
MAX(auto-inc-column) FROM T, and assigns that value incremented
by one to the column and the auto-increment counter of the table.
We say that
the auto-increment counter for table
T has been initialised.
InnoDB follows the same procedure in initializing the auto-increment counter for a freshly created table.
Note that if the user specifies in an insert the value 0 to the auto-increment column, then InnoDB treats the row like the value would not have been specified.
After the auto-increment counter has been initialised, if a user inserts a row where he explicitly specifies the column value, and the value is bigger than the current counter value, then the counter is set to the specified column value. If the user does not explicitly specify a value, then InnoDB increments the counter by one and assigns its new value to the column.
The auto-increment mechanism, when assigning values from the counter, bypasses locking and transaction handling. Therefore you may also get gaps in the number sequence if you roll back transactions which have got numbers from the counter.
The behaviour of auto-increment is not defined if a user gives a negative value to the column or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.
In disk I/O InnoDB uses asynchronous I/O. On Windows NT it uses the native asynchronous I/O provided by the operating system. On Unix, InnoDB uses simulated asynchronous I/O built into InnoDB: InnoDB creates a number of I/O threads to take care of I/O operations, such as read-ahead. In a future version we will add support for simulated aio on Windows NT and native aio on those versions of Unix which have one.
On Windows NT InnoDB uses non-buffered I/O. That means that the disk pages InnoDB reads or writes are not buffered in the operating system file cache. This saves some memory bandwidth.
Starting from 3.23.41 InnoDB uses a novel file flush technique called doublewrite. It adds safety to crash recovery after an operating system crash or a power outage, and improves performance on most Unix flavors by reducing the need for fsync operations.
Doublewrite means that InnoDB before writing pages to a datafile first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed, InnoDB writes the pages to their proper positions in the datafile. If the operating system crashes in the middle of a page write, InnoDB will in recovery find a good copy of the page from the doublewrite buffer.
Starting from 3.23.41
you can also use a raw disk partition as a datafile, though this has
not been tested yet. When you create a new datafile you have
to put the keyword
newraw immediately after the data
innodb_data_file_path. The partition must be
>= than you specify as the size. Note that 1M in InnoDB is
1024 x 1024 bytes, while in disk specifications 1 MB usually means
1000 000 bytes.
When you start the database again you must change the keyword
raw. Otherwise, InnoDB will write over your
By using a raw disk you can on some Unixes perform unbuffered I/O.
There are two read-ahead heuristics in InnoDB: sequential read-ahead and random read-ahead. In sequential read-ahead InnoDB notices that the access pattern to a segment in the tablespace is sequential. Then InnoDB will post in advance a batch of reads of database pages to the I/O system. In random read-ahead InnoDB notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool. Then InnoDB posts the remaining reads to the I/O system.
The datafiles you define in the configuration file form the tablespace of InnoDB. The files are simply catenated to form the tablespace, there is no striping in use. Currently you cannot directly instruct where the space is allocated for your tables, except by using the following fact: from a newly created tablespace InnoDB will allocate space starting from the low end.
The tablespace consists of database pages whose default size is 16 kB. The pages are grouped into extents of 64 consecutive pages. The 'files' inside a tablespace are called segments in InnoDB. The name of the rollback segment is somewhat misleading because it actually contains many segments in the tablespace.
For each index in InnoDB we allocate two segments: one is for non-leaf nodes of the B-tree, the other is for the leaf nodes. The idea here is to achieve better sequentiality for the leaf nodes, which contain the data.
When a segment grows inside the tablespace, InnoDB allocates the first 32 pages to it individually. After that InnoDB starts to allocate whole extents to the segment. InnoDB can add to a large segment up to 4 extents at a time to ensure good sequentiality of data.
Some pages in the tablespace contain bitmaps of other pages, and therefore a few extents in an InnoDB tablespace cannot be allocated to segments as a whole, but only as individual pages.
When you issue a query
SHOW TABLE STATUS FROM ... LIKE ...
to ask for available free space in the tablespace, InnoDB will
report the extents which are definitely free in the tablespace.
InnoDB always reserves some extents for clean-up and other internal
purposes; these reserved extents are not included in the free space.
When you delete data from a table, InnoDB will contract the corresponding B-tree indexes. It depends on the pattern of deletes if that frees individual pages or extents to the tablespace, so that the freed space is available for other users. Dropping a table or deleting all rows from it is guaranteed to release the space to other users, but remember that deleted rows can be physically removed only in a purge operation after they are no longer needed in transaction rollback or consistent read.
If there are random insertions or deletions in the indexes of a table, the indexes may become fragmented. By fragmentation we mean that the physical ordering of the index pages on the disk is not close to the alphabetical ordering of the records on the pages, or that there are many unused pages in the 64-page blocks which were allocated to the index.
It can speed up index scans if you
mysqldump to dump the table to
a text file, drop the table, and reload it from the dump.
Another way to do the defragmenting is to
ALTER the table type to
MyISAM and back to
Note that a
MyISAM table must fit in a single file
on your operating system.
If the insertions to and index are always ascending and records are deleted only from the end, then the file space management algorithm of InnoDB guarantees that fragmentation in the index will not occur.
The error handling in InnoDB is not always the same as specified in the ANSI SQL standards. According to the ANSI standard, any error during an SQL statement should cause the rollback of that statement. InnoDB sometimes rolls back only part of the statement, or the whole transaction. The following list specifies the error handling of InnoDB.
'Table is full'error and InnoDB rolls back the SQL statement.
INSERT INTO ... SELECT .... This will probably change so that the SQL statement will be rolled back if you have not specified the
IGNOREoption in your statement.
SHOW TABLE STATUSdoes not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimisation.
CREATE TABLE T (A CHAR(20), B INT, UNIQUE (A(5))) TYPE = InnoDB;If you create a non-unique index on a prefix of a column, InnoDB will create an index over the whole column.
INSERT DELAYEDis not supported for InnoDB tables.
LOCK TABLESoperation does not know of InnoDB row level locks set in already completed SQL statements: this means that you can get a table lock on a table even if there still exist transactions of other users which have row level locks on the same table. Thus your operations on the table may have to wait if they collide with these locks of other users. Also a deadlock is possible. However, this does not endanger transaction integrity, because the row level locks set by InnoDB will always take care of the integrity. Also, a table lock prevents other transactions from acquiring more row level locks (in a conflicting lock mode) on the table.
DELETE FROM TABLEdoes not regenerate the table but instead deletes all rows, one by one, which is not that fast. In future versions of MySQL you can use
TRUNCATEwhich is fast.
phone: 358-9-6969 3250 (office) 358-40-5617367 (mobile) Innobase Oy Inc. World Trade Center Helsinki Aleksanterinkatu 17 P.O.Box 800 00101 Helsinki Finland
BDB tables is included in the MySQL source distribution
starting from Version 3.23.34 and is activated in the MySQL-Max
BerkeleyDB, available at http://www.sleepycat.com/ has provided
MySQL with a transactional table handler. By using BerkeleyDB
tables, your tables may have a greater chance of surviving crashes, and also
ROLLBACK on transactions. The
MySQL source distribution comes with a
BDB distribution that has a
couple of small patches to make it work more smoothly with MySQL.
You can't use a non-patched
BDB version with MySQL.
We at MySQL AB are working in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high.
When it comes to supporting
BDB tables, we are committed to help our
users to locate the problem and help creating a reproducable test case
for any problems involving
BDB tables. Any such test case will be
forwarded to Sleepycat who in turn will help us find and fix the
problem. As this is a two stage operation, any problems with
may take a little longer for us to fix than for other table handlers.
However, as the BerkeleyDB code itself has been used by many other
applications than MySQL, we don't envision any big problems with
this. See section 1.4.1 Support Offered by MySQL AB.
If you have downloaded a binary version of MySQL that includes
support for BerkeleyDB, simply follow the instructions for installing a
binary version of MySQL.
See section 2.2.7 Installing a MySQL Binary Distribution. See section 4.7.5
mysqld-max, An Extended
To compile MySQL with Berkeley DB support, download MySQL
Version 3.23.34 or newer and configure
MySQL with the
--with-berkeley-db option. See section 2.3 Installing a MySQL Source Distribution.
cd /path/to/source/of/mysql-3.23.34 ./configure --with-berkeley-db
Please refer to the manual provided with the
BDB distribution for
more updated information.
Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered beta quality. We are actively improving and optimising it to get it stable very soon.
If you are running with
AUTOCOMMIT=0 then your changes in
tables will not be updated until you execute
COMMIT. Instead of commit
you can execute
ROLLBACK to forget your changes. See section 6.7.1
If you are running with
AUTOCOMMIT=1 (the default), your changes
will be committed immediately. You can start an extended transaction with
BEGIN WORK SQL command, after which your changes will not be
committed until you execute
COMMIT (or decide to
The following options to
mysqld can be used to change the behaviour of
| Base directory for |
| Berkeley lock detect. One of (|
|Berkeley DB log file directory.|
|Don't synchronously flush logs.|
|Don't start Berkeley DB in recover mode.|
| Start Berkeley DB in multi-process mode (Don't use |
|Berkeley DB temporary file directory.|
| Disable usage of |
| Set the maximum number of locks possible. See section 18.104.22.168 |
If you use
--skip-bdb, MySQL will not initialise the
Berkeley DB library and this will save a lot of memory. Of course,
you cannot use
BDB tables if you are using this option. If you try
to create a
BDB table, MySQL will instead create a
Normally you should start
--bdb-no-recover if you
intend to use
BDB tables. This may, however, give you problems when you
try to start
mysqld if the
BDB log files are corrupted. See section 2.4.2 Problems Starting the MySQL Server.
bdb_max_lock you can specify the maximum number of locks
(10000 by default) you can have active on a
BDB table. You should
increase this if you get errors of type
bdb: Lock table is out of
available locks or
Got error 12 from ... when you have do long
transactions or when
mysqld has to examine a lot of rows to
calculate the query.
You may also want to change
max_binlog_cache_size if you are using big multi-line transactions.
See section 6.7.1
BDBhandler maintains log files. For maximum performance you should place these on another disk than your databases by using the
BDBlog file is started, and removes any log files that are not needed for current transactions. One can also run
FLUSH LOGSat any time to checkpoint the Berkeley DB tables. For disaster recovery, one should use table backups plus MySQL's binary log. See section 4.4.1 Database Backups. Warning: If you delete old log files that are in use,
BDBwill not be able to do recovery at all and you may lose data if something goes wrong.
PRIMARY KEYin each
BDBtable to be able to refer to previously read rows. If you don't create one, MySQL will create an maintain a hidden
PRIMARY KEYfor you. The hidden key has a length of 5 bytes and is incremented for each insert attempt.
BDBtable are part of the same index or part of the primary key, then MySQL can execute the query without having to access the actual row. In a
MyISAMtable the above holds only if the columns are part of the same index.
PRIMARY KEYwill be faster than any other key, as the
PRIMARY KEYis stored together with the row data. As the other keys are stored as the key data + the
PRIMARY KEY, it's important to keep the
PRIMARY KEYas short as possible to save disk and get better speed.
LOCK TABLESworks on
BDBtables as with other tables. If you don't use
LOCK TABLE, MySQL will issue an internal multiple-write lock on the table to ensure that the table will be properly locked if another thread issues a table lock.
BDBtables is done on page level.
SELECT COUNT(*) FROM table_nameis slow as
BDBtables doesn't maintain a count of the number of rows in the table.
MyISAMtables as one has data in
BDBtables stored in B-trees and not in a separate datafile.
BDBtable may make an automatic rollback and any read may fail with a deadlock error.
MyISAMtables. In other words, the key information will take a little more space in
BDBtables compared to
MyISAMtables which don't use
BDBtable to allow you to insert new rows in the middle of the key tree. This makes
BDBtables somewhat larger than
BDBtable. If you don't issue a lot of
ROLLBACKstatements, this number should be accurate enough for the MySQL optimiser, but as MySQL only stores the number on close, it may be incorrect if MySQL dies unexpectedly. It should not be fatal even if this number is not 100% correct. One can update the number of rows by executing
OPTIMIZE TABLE. See section 4.5.2
ANALYZE TABLESyntax . See section 4.5.1
BDBtable, you will get an error (probably error 28) and the transaction should roll back. This is in contrast with
mysqldwill wait for enough free disk before continuing.
BDBtables at the same time. If you are going to use
BDBtables, you should not have a very big table cache (like >256) and you should use
mysqlclient. We plan to partly fix this in 4.0.
SHOW TABLE STATUSdoesn't yet provide that much information for
If you after having built MySQL with support for
BDB tables get
the following error in the log file when you start
bdb: architecture lacks fast mutexes: applications cannot be threaded Can't init dtabases
This means that
BDB tables are not supported for your architecture.
In this case you have to rebuild MySQL without
BDB table support.
Note: The following list is not complete; we will update it as we receive more information about this.
Currently we know that the
BDB handler works with the following
It doesn't work with the following operating systems:
Here follows the restrictions you have when using
BDBtables store in the `.db' file the path to the file as it was created. (This was done to be able to detect locks in a multi-user environment that supports symlinks). The effect of this is that
BDBtables are not movable between directories!
BDBtables, you have to either use
mysqldumpor take a backup of all
table_name.dbfiles and the
BDBlog files. The
BDBlog files are the files in the base data directory named
log.XXXXXXXXXX(ten digits); The
BDBtable handler stores unfinished transactions in the log files and requires these to be present when
hostname.err logwhen starting
bdb: Ignoring log file: .../log.XXXXXXXXXX: unsupported log version #it means that the new
BDBversion doesn't support the old log file format. In this case you have to delete all
BDBlogs from your database directory (the files with names that have the format
log.XXXXXXXXXX) and restart
mysqld. We would also recommend you to do a
mysqldump --optof your old
BDBtables, delete the old tables, and restore the dump.
001119 23:43:56 bdb: Missing log fileid entry 001119 23:43:56 bdb: txn_abort: Log undo failed for LSN: 1 3644744: InvalidThis is not fatal but we don't recommend that you delete tables if you are not in auto-commit mode, until this problem is fixed (the fix is not trivial).