Writing queries that are optimized

When writing a query there are basic parts:
1) the result set
2) the tables to get the data from
3) the where clause – provides filtering
4) group by for grouping data to be aggregated.


This article will discuss areas 2 and 3 since that’s where most query optimizations occur.

Joining Tables efficiently:

Mysql has a practical limit to the number of tables you should join.  It is 10 (some opinions differ).
While Mysql can join more than that but statement performance may suffer due to the calculation of the statement execution plan.  The more tables Mysql has to deal with, the greater number of plans it has to explore when deciding
on how to execute the statement.

When you join 2 tables you want to make sure the columns you join on are indexed in at least of of the tables.
If the columns are indexed, Mysql can find the row in the joined table quickly via an index lookup.

Ex:
    Select xxx,yyy,xxx
    from table_a
    join table_b on table_b.indexed_column = table_a.indexed_column

In this example the column ‘indexed_column’ represents a column that has an index on it in 
table_a and/or table_b.

So for each row in table_a, the select will quickly find the matching row in table_b due to the join condition which is on the ‘indexed_column’.

Now if we join the tables on columns that are not indexed in either table_a or table_b.

    Select xxx,yyy,xxx
    from table_a
    join table_b on table_b.unindexed_column = table_a.unindexed_column

Mysql will have to perform a table scan of table_b for each row in table_a.
So if table_a has 100 rows in it and table_b has 1m rows that means that Mysql will have to scan table_b 100 times scanning a total of 100 million rows.
This is very inefficient.

Subselects as tables:

Mysql materializes subselects.  What that means is the first time the subselect is run, Mysql creates a temporary table to store the result set so Mysql does not have to re-execute subselect.  Note: if your result set is very large, say hundreds of thousands of rows, that can create a very large temp table.

Many times I have see queries that use a subselect as a table.  That is fine.
However you must remember that if you join a subselect to another table, the column used for the join condition must be indexed in the joined table.  
If the joined table does not have an index on the column used in the join condition then the joined table will be scanned for each row in the subselect result set.

That also means that you should not join one subselect to another subselect since the result sets of subselects are not indexed.

Another thing to keep in mind is that the columns used in join conditions must be the same datatype.  If they are not the same datatype and there is an index on the columns, Mysql can not make use of the indexes and will end up performing table scans.

Something else you need to know is that Mysql is pretty smart.  When it creates it’s join plan, it can detect which table should be the main table and which should be the joined table.
For example, if the join condition column is indexed in table_a but not in table_b, it will join table_a to table_b using table_b as the primary table.  It also means that Mysql may not join the tables in the same order you think it should.  I have seen situations where Mysql did not ‘guess’ correctly.
To override Mysql’s guess, you can specify ‘STRAIGHT_JOIN’ after the SELECT statement.  This will cause your tables to be joined in the order you specify. 
The primary table is where Mysql starts getting rows from first.

The ‘WHERE’ clause:

This is the most important part of an sql statement as it filters out rows.
If you don’t use or it is not used correctly, your select will be performing full tables scans.
This is very very bad.

Using a WHERE clause is easy.  You simply specify a column = another_column AND/OR a column = a CONSTANT.

Anytime you filter on a column the column should be indexed.
Also never change the datatype of the column you are filtering on (as in using CAST).  This prevents Mysql from using any associated index.
Mysql can only use an index when the column is used in it’s ‘native’ datatype.

The datatypes of the column you are filtering on and the constant/column you are comparing to must match.  If you need to change the datatype, change the datatype of the constant/column and not that of the indexed column.

Using an ‘IN’ clause:
Limit the size of the items inside an ‘IN’ clause.  If you have 50 items that’s fine. Don’t have 25 million.
There are better ways to filter.

Subselect in a WHERE clause:

Using a subselect in a WHERE clause is okay if the subselect returns a small result set.
If your subselect returns a very large result set, you may be better off simply joining the tables in the subselect like normal tables.

Get rid of subselects:

Due to the inability of subselects to be indexed you can get rid of them.
What you do is to create a temporary table.  If you specify ‘ENGINE=innodb’, the table will be created as an innodb table and will exist in the innodb buffer pool.  This avoids eating up the servers free RAM (reduces the chances of out of memory situation).  When you create this temporary table, you can create indexes.
Then, in your query you can join this temporary table just like a normal table on the indexed columns.
Maintaining the indexes (upon insert into the table) will cause a minor performance hit but the performance gain from having the indexes will more than offset that.


The ‘Takeaway’:

The takeaway from this document is that INDEXES are one of the best performance boosters.  Create them and use them.

You can not use any function (such as UPPER(), LOWER(), user_defined_function(), …..) on the indexed column or Mysql will not be able to use it.

MySQL Classes And Tutorials

These are online, self-guided ways for developers wishing to gain more RDBMS understanding (or have a refresher course) in the two most ubiquitous platforms — MySQL and PostgreSQL.  I add and remove from this list from time to time as material becomes outdated or better bits are found.

Disclaimer:  I didn’t have a hand in creating any of these, but have recommended them in one format or another for several years.

All links open in a new tab.

TUTORIALS

MySQL for Developers
https://www.mysqltutorial.org/

MySQL in 7 Days
https://www.guru99.com/mysql-tutorial.html

Learn MySQL via TutorialsPoint (quite good, complete)
https://www.tutorialspoint.com/mysql/index.htm

Simple SQL Overview (beginners, from W3Schools)
https://www.w3schools.com/sql/

PostgreSQL Tutorial (also has you install it on your platform of choice)
https://www.postgresqltutorial.com/

MAINTENANCE TIPS

15 Useful MySQL/MariaDB Performance Tuning and Optimization Tips
https://www.tecmint.com/mysql-mariadb-performance-tuning-and-optimization/

VIDEO TRAINING

MySQL Tutorial for Beginners [Full Course] (Programming with Mosh)
https://www.youtube.com/watch?v=7S_tz1z_5bA – (3h 10m)

MySQL Tutorial For Beginners (Intellipaat)
https://www.youtube.com/watch?v=WmGgxTpGs_8 – (2h 5m)

Learn PostgreSQL Tutorial – Full Course for Beginners
https://www.youtube.com/watch?v=qw–VYLpxG4 – (4h 20m)

Postgres for Beginners (Simplilearn)
https://www.youtube.com/watch?v=eMIxuk0nOkU – (1h 30m)

Run Queries On All DB Instances

Assumes there is a text file, “instances.txt” that has the IP or symbolic name of the server where the instances are Just put the query in quotes at the end of the script (such as pasting the below into runonall.sh)

params=$@
           
for i in `cat /home/rbyrd2/bin/instances.txt` ; do echo "$i:  " ; mysql -h $i -N -B -e "$params" ; done  

MySQL – Sizing And Report Scripts

Various ways to get sizing, counts, etc.

-- Total rows and size for an entire server, excluding system schemata, ordered by size DESC 
SELECT table_schema,table_name,
 CONCAT(ROUND(SUM(table_rows) / 1000000,2),'M') rows,
 CONCAT(ROUND(SUM(data_length) / (1024 * 1024 * 1024),2), 'G') DATA,
 CONCAT(ROUND(SUM(index_length) / (1024 * 1024 * 1024), 2),'G') idx,
 CONCAT(ROUND(SUM(data_length + index_length) / (1024 * 1024 * 1024), 2),'G') total_size,
 ROUND(SUM(index_length) / SUM(data_length),2) idxfrac
 FROM information_schema.TABLES
 WHERE table_schema NOT IN ('mysql','performance_schema','information_schema') 
 GROUP BY table_schema,table_name
 ORDER BY SUM(data_length + index_length) DESC;
-- Total rows and size for an entire server, excluding system schemata, ordered by schema and table name 
 SELECT table_schema,table_name,
 CONCAT(ROUND(SUM(table_rows) / 1000000,2),'M') rows,
 CONCAT(ROUND(SUM(data_length) / (1024 * 1024 * 1024),2), 'G') DATA,
 CONCAT(ROUND(SUM(index_length) / (1024 * 1024 * 1024), 2),'G') idx,
 CONCAT(ROUND(SUM(data_length + index_length) / (1024 * 1024 * 1024), 2),'G') total_size,
 ROUND(SUM(index_length) / SUM(data_length),2) idxfrac
 FROM information_schema.TABLES
 WHERE table_schema NOT IN ('mysql','performance_schema','information_schema') 
 GROUP BY table_schema,table_name
 ORDER BY table_schema,table_name;
-- Total rows and size
 SELECT COUNT(*) TABLES,
 CONCAT(ROUND(SUM(table_rows) / 1000000,2),'M') rows,
 CONCAT(ROUND(SUM(data_length) / (1024 * 1024 * 1024),2), 'G') DATA,
 CONCAT(ROUND(SUM(index_length) / (1024 * 1024 * 1024), 2),'G') idx,
 CONCAT(ROUND(SUM(data_length + index_length) / (1024 * 1024 * 1024), 2),'G') total_size,
 ROUND(SUM(index_length) / SUM(data_length),2) idxfrac
 FROM information_schema.TABLES
-- Database wise report
 SELECT COUNT(*) TABLES, table_schema,
 CONCAT(ROUND(SUM(table_rows) / 1000000,2),'M') rows,
 CONCAT(ROUND(SUM(data_length) / (1024 * 1024 * 1024),2), 'G') DATA,
 CONCAT(ROUND(SUM(index_length) / (1024 * 1024 * 1024), 2),'G') idx,
 CONCAT(ROUND(SUM(data_length + index_length) / (1024 * 1024 * 1024), 2),'G') total_size,
 ROUND(SUM(index_length) / SUM(data_length),2) idxfrac
 FROM information_schema.TABLES
 GROUP BY table_schema
 ORDER BY SUM(data_length + index_length) DESC
-- Database wise report in MB
 SELECT COUNT(*) TABLES, table_schema,
 CONCAT(ROUND(SUM(table_rows) / 1000000,2),'M') rows,
 CONCAT(ROUND(SUM(data_length) / (1024 * 1024 ),2), 'M') DATA,
 CONCAT(ROUND(SUM(index_length) / (1024 * 1024 ), 2),'M') idx,
 CONCAT(ROUND(SUM(data_length + index_length) / (1024 * 1024 ), 2),'M') total_size,
 ROUND(SUM(index_length) / SUM(data_length),2) idxfrac
 FROM information_schema.TABLES
 GROUP BY table_schema
 ORDER BY SUM(data_length + index_length) DESC
-- Engine wise breakup
 SELECT ENGINE, COUNT(*) TABLES,
 CONCAT(ROUND(SUM(table_rows) / 1000000,2),'M') rows,
 CONCAT(ROUND(SUM(data_length) / (1024 * 1024 * 1024),2), 'G') DATA,
 CONCAT(ROUND(SUM(index_length) / (1024 * 1024 * 1024), 2),'G') idx,
 CONCAT(ROUND(SUM(data_length + index_length) / (1024 * 1024 * 1024), 2),'G') total_size,
 ROUND(SUM(index_length) / SUM(data_length),2) idxfrac
 FROM information_schema.TABLES
 GROUP BY engine ORDER BY sum(data_length+index_length) DESC
-- Top 30 tables
 SELECT table_schema, table_name, engine, table_rows,
 CONCAT(ROUND(data_length / (1024 * 1024),2),'MB') AS DATA,
 CONCAT(ROUND((data_length + index_length) / (1024 * 1024), 2),'MB') AS total_size
 FROM INFORMATION_SCHEMA.TABLES
 ORDER BY data_length DESC LIMIT 30
-- List of Full Text Indexes
 SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, SEQ_IN_INDEX, COLUMN_NAME, CARDINALITY, NULLABLE
 FROM statistics
 WHERE index_type LIKE 'FULLTEXT%' ORDER BY TABLE_SCHEMA, TABLE_NAME
-- The number of columns for each datatype
 SELECT DATA_TYPE, COUNT(*) AS mycount
 FROM `COLUMNS`
 WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql')
 GROUP BY DATA_TYPE ORDER BY mycount DESC
-- The number of columns for each datatype with size (replace my_database)
 SELECT TABLE_SCHEMA, COLUMN_TYPE, count(*) FROM `COLUMNS`
 GROUP BY COLUMN_TYPE HAVING TABLE_SCHEMA = 'my_database'
-- Blob, Float and Double Data types details
 SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, ORDINAL_POSITION, COLUMN_DEFAULT,
 IS_NULLABLE, NUMERIC_PRECISION, COLUMN_TYPE, COLUMN_KEY
 FROM COLUMNS
 WHERE DATA_TYPE IN ('FLOAT','BLOB','DOUBLE')
 AND TABLE_SCHEMA != 'mysql'
 ORDER BY DATA_TYPE, COLUMN_TYPE
--All tables with no primary key
use INFORMATION_SCHEMA;
 select CONCAT(t.table_schema,".",t.table_name) as tbl
 from INFORMATION_SCHEMA.TABLES AS t
 LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS c
 ON (t.TABLE_NAME=c.TABLE_NAME
 AND c.CONSTRAINT_SCHEMA=t.TABLE_SCHEMA
 AND constraint_name='PRIMARY')
 WHERE t.table_schema!="information_schema"
 AND constraint_name IS NULL;
-- All tables and their primary keys, if exist:
use INFORMATION_SCHEMA;
 select CONCAT(t.table_schema,".",t.table_name) as tbl,
 c.column_name,c.constraint_name
 from INFORMATION_SCHEMA.TABLES AS t
 LEFT JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS c
 ON (t.TABLE_NAME=c.TABLE_NAME
 AND c.CONSTRAINT_SCHEMA=t.TABLE_SCHEMA
 AND constraint_name='PRIMARY')
 WHERE t.table_schema!="information_schema"
 order by constraint_name;
-- Foreign Key constraints for Database dbName
SELECT A.TABLE_SCHEMA AS FKTABLE_SCHEM, A.TABLE_NAME AS FKTABLE_NAME, A.COLUMN_NAME AS FKCOLUMN_NAME,
 A.REFERENCED_TABLE_SCHEMA AS PKTABLE_SCHEM, A.REFERENCED_TABLE_NAME AS PKTABLE_NAME,
 A.REFERENCED_COLUMN_NAME AS PKCOLUMN_NAME, A.CONSTRAINT_NAME AS FK_NAME
 FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A, INFORMATION_SCHEMA.TABLE_CONSTRAINTS B
 WHERE A.TABLE_SCHEMA = B.TABLE_SCHEMA AND A.TABLE_NAME = B.TABLE_NAME
 AND A.CONSTRAINT_NAME = B.CONSTRAINT_NAME AND B.CONSTRAINT_TYPE IS NOT NULL
 HAVING PKTABLE_SCHEM IS NOT NULL
 and A.TABLE_SCHEMA = 'dbName'
 ORDER BY A.TABLE_SCHEMA, A.TABLE_NAME, A.ORDINAL_POSITION limit 1000
%d bloggers like this: