Converting The Charset For An Entire Schema

Key Length Limitations In MySQL

Suppose I try to create a table with a primary key that’s varchar(500), and MySQL complains the key length is longer than the maximum of 1000 bytes. 500 is less than 1000. What’s happening? Plus, a tasty (yet apparently harmless) bug in MySQL.

Here’s a statement that will fail on most servers:

CREATE TABLE test(c VARCHAR(250), d VARCHAR(250), PRIMARY KEY(c,d));
ERROR 1071 (42000): Specified key was too long; max key LENGTH IS 1000 bytes

Why does it fail? Simple; my default character set is multi-byte:

show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

While most characters will fit in one or two bytes, the utf8 encoding of Unicode, as implemented by MySQL can require up to 3 bytes per character, so MySQL must be pessimistic and assume the worst-case scenario of every character requiring 3 bytes. It’s easy to see this by trying to create a table with a single VARCHAR(334) primary key. It will fail, but VARCHAR(333) will succeed, because 3 * 333 is less than 1000 bytes.

Here’s a fun bug (bug #18927):

mysql> CREATE TABLE test(c VARCHAR(250), d VARCHAR(250),PRIMARY KEY(c,d));
ERROR 1071 (42000): Specified key was too long; max key LENGTH IS 1000 bytes
mysql> CREATE TABLE test(c VARCHAR(334), d VARCHAR(334), PRIMARY KEY(c,d));
ERROR 1071 (42000): Specified key was too long; max key LENGTH IS 999 bytes

Sometimes it says 999, sometimes 1000. I have no idea why. Rounding, maybe?

It may be a good idea to check the default character set to get the best performance out of a database. If there’s no need for multi-byte encodings, switching to a single-byte encoding might be significantly more efficient. It’s especially important to keep indexes as small as possible.

By now, we’re all aware that a utf8 charset and collation is the “right” thing to do, with most folks opting for utf8_general_ci or utf8_unicode_ci. However, lots of installations still default to some form of latin1, which is unfortunate. There are multiple ways to do this, all which functionally do the same thing. Be aware there is a big potential gotcha that you should be aware of.

Here’s the “safe” way:

First, the schema itself:

ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;

Then…

SELECT CONCAT("ALTER TABLE ",TABLE_SCHEMA,".",TABLE_NAME," CHARACTER SET utf8 COLLATE utf8_general_ci;   ",
    "ALTER TABLE ",TABLE_SCHEMA,".",TABLE_NAME," CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;  ") 
    AS alter_sql
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = dbname ;

Capture this output to a file and then execute it.

If you’re feeling brave, you can do the second bit with a one-liner:

DB="dbname"; ( echo 'ALTER DATABASE `'"$DB"'` CHARACTER SET utf8 COLLATE utf8_general_ci;'; mysql "$DB" -e "SHOW TABLES" --batch --skip-column-names | xargs -I{} echo 'ALTER TABLE `'{}'` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;' ) | mysql "$DB"

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: