๐—ง๐—ผ๐—ฝ ๐Ÿฎ๐Ÿฌ ๐—ฆ๐—ค๐—Ÿ ๐—พ๐˜‚๐—ฒ๐—ฟ๐˜† ๐—ผ๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜๐—ฒ๐—ฐ๐—ต๐—ป๐—ถ๐—พ๐˜‚๐—ฒ๐˜€

Here is the list of the top 20 SQL query optimization techniques I found important:

1. Create an index on very large tables (>1.000.000) rows

2. Use EXIST() instead of COUNT() to find an element in the table

3. SELECT fields instead of using SELECT *

4. Avoid Subqueries in WHERE Clause

5. Avoid SELECT DISTINCT where possible

6. Use WHERE Clause instead of HAVING

7. Create joins with INNER JOIN (not WHERE)

8. Use LIMIT to sample query results

9. Use UNION ALL instead of UNION wherever possible

10. Use UNION where instead of WHERE … or … query.

11. Run your query during off-peak hours

12. Avoid using OR in join queries

14. Choose GROUP BY over window functions

15. Use derived and temporary tables

16. Drop the index before loading bulk data

16. Use materialized views instead of views

17. Avoid != or <> (not equal) operator

18. Minimize the number of subqueries

19. Try to use INNER join as little as possible when you can get the same output using LEFT/RIGHT join.

20. For retrieving the same dataset frequently try to use temporary sources.

Do you know what is ๐—ค๐˜‚๐—ฒ๐—ฟ๐˜† ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฒ๐—ฟ? Its primary function is to determine ๐˜๐—ต๐—ฒ ๐—บ๐—ผ๐˜€๐˜ ๐—ฒ๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜ ๐˜„๐—ฎ๐˜† to execute a given SQL query by finding the best execution plan. The query optimizer works by taking the SQL query as input and analyzing it to determine how best to execute it. The first step is to parse the SQL query and create a syntax tree. The optimizer then analyzes the syntax tree to determine the various ways the query can be executed.

Next, the optimizer generates ๐—ฎ๐—น๐˜๐—ฒ๐—ฟ๐—ป๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฒ๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—ฝ๐—น๐—ฎ๐—ป๐˜€, which are different ways of executing the same query. Each execution plan specifies the order in which the tables should be accessed, the join methods to use, and any filtering or sorting operations to be performed. The optimizer then assigns a ๐—ฐ๐—ผ๐˜€๐˜ to each execution plan based on factors such as the number of disk reads and the amount of CPU time required to execute the query.

Finally, the optimizer ๐—ฐ๐—ต๐—ผ๐—ผ๐˜€๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ ๐—ฒ๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—ฝ๐—น๐—ฎ๐—ป with the lowest cost as the optimal execution plan for the query. This plan is then used to execute the query.

Working with CDC in SQL server

How do you read all the changes captured by CDC in a table?

USE AdventureWorks2022; 
GO 
DECLARE @from_lsn binary(10), @to_lsn binary(10); 
SET @from_lsn = sys.fn_cdc_get_min_lsn('HR_Department'); 
SET @to_lsn = sys.fn_cdc_get_max_lsn(); 
SELECT * FROM cdc.fn_cdc_get_all_changes_HR_Department (@from_lsn, @to_lsn, N'all'); 
GO

How do you read the net changes captured by CDC in a table?

USE AdventureWorks2022;  
GO  
DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);  
-- Obtain the beginning of the time interval.  
SET @begin_time = DATEADD(day, -1, GETDATE()) ;  
-- DML statements to produce changes in the HumanResources.Department table.  
INSERT INTO HumanResources.Department (Name, GroupName)  
VALUES (N'MyDept', N'MyNewGroup');  
  
UPDATE HumanResources.Department  
SET GroupName = N'Resource Control'  
WHERE GroupName = N'Inventory Management';  
  
DELETE FROM HumanResources.Department  
WHERE Name = N'MyDept';  
  
-- Obtain the end of the time interval.  
SET @end_time = GETDATE();  
-- Map the time interval to a change data capture query range.  
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);  
SET @from_lsn = ISNULL(sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time), [sys].[fn_cdc_get_min_lsn]('HR_Department') );
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);  
  
-- Return the net changes occurring within the query window.  
SELECT * FROM cdc.fn_cdc_get_net_changes_HR_Department(@from_lsn, @to_lsn, 'all');

source: cdc.fn_cdc_get_net_changes_<capture_instance> (Transact-SQL) – SQL Server | Microsoft Learn

List the schema owner in Redshift

select s.nspname as table_schema,
       s.oid as schema_id,
       u.usename as owner
from pg_catalog.pg_namespace s
join pg_catalog.pg_user u on u.usesysid = s.nspowner
where nspname not in ('information_schema', 'pg_catalog', 'public')
      and nspname not like 'pg_toast%'
      and nspname not like 'pg_temp_%'
order by table_schema;

MySQL replication types

MySQL supports two (or three, depending on how you look at it) different methods of replicating databases from master to slave. All of these methods use the binary log; however, they differ in the type of data that is written to the masterโ€™s binary log.

  • Statement-based replication Under this method, the binary log stores the SQL statements used to change databases on the master server. The slave reads this data and reexecutes these SQL statements to produce a copy of the master database. This is the default replication method in MySQL 5.1.11 and earlier and MySQL 5.1.29 onwards.
  • Row-based replication Under this method, the binary log stores the record-level changes that occur to database tables on the master server. The slave reads this data and manipulates its records accordingly to produce a copy of the master database.
  • Mixed-format replication Under this method, the server can dynamically choose between statement-based replication and row-based replication, depending on certain conditions. Some of these conditions include using a user-defined function (UDF), using an INSERT command with the DELAYED clause, using temporary tables, or using a statement that uses system variables. This is the default replication method in MySQL 5.1.12 to MySQL 5.1.28.

If youโ€™re unsure which replication method to use and your replication needs arenโ€™t complex, itโ€™s best to stick to statement-based replication, as itโ€™s been around longest and therefore has had the most time to have its kinks worked out. That said, certain types of statements cannot be replicated using this method, and it also tends to require a higher number of table locks. Row-based replication is useful for these situations. Because it replicates changes to rows, any change can be replicated, and it also requires fewer table locks.

The replication method currently in use on the server is listed in the binlog_format server variable.

mysql> SHOW VARIABLES LIKE 'binlog_format';
t0305-01
1 row in set (0.08 sec)

To alter the replication method, set a new value for this variable, as shown, using the SET command with either GLOBAL or SESSION scope. Note that using GLOBAL scope requires a server restart for the change in method to take effect.

mysql> SET binlog_format = 'MIXED';
Query OK, 0 rows affected (0.02 sec)
mysql> SELECT @@SESSION.binlog_format;
+-------------------------+
| @@SESSION.binlog_format |
+-------------------------+
| MIXED                   |
+-------------------------+
1 row in set (0.00 sec)

mysql> SET GLOBAL binlog_format = 'ROW';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @@GLOBAL.binlog_format;;
+------------------------+
| @@GLOBAL.binlog_format |
+------------------------+
| ROW                    |
+------------------------+
1 row in set (0.00 sec)

Upload files and folder into Azure Blob Storage using Powershell

Recently, I have had to automate a process to generate csv files and upload them to a certain folder in Azure blob storage. Here is the powershell code that does just that. After some research online, I have put together couple of solutions into one that worked well for me.

In order for this to work, you will need to install Azure powershell module on your machine. This solution assumes you are authenticating using account name and storage account key

#Upload csv files to Azure Blob Storage
$ErrorActionPreference = "Stop"

$acct = "storage-account-name" #Storage Account Name
$key = "storage-account-key" #Storage Account Key
$ContainerName = "container-name" #Container Name
$containerdirectory = "subfolder path within the container"
$localfilepath = "local-file-directory"

#create a context for communicating with azure storage
$ctx = New-AzStorageContext -StorageAccountName $acct -StorageAccountKey $key -Protocol Https
$container = Get-AzStorageContainer -Name $ContainerName -Context $ctx

$container.CloudBlobContainer.Uri.AbsoluteUri

if ($container) {
#use Set-AzStorageBlobContent to upload file
    $filesToUpload = Get-Childitem -Path $localfilepath -Filter "*.csv" 

        ForEach ($x in $filesToUpload) { 
            $targetPath = $containerdirectory+($x.fullname.Substring($localfilepath.Length)).Replace("\", "/")

            Write-Verbose "Uploading $("\" + $x.fullname.Substring($localfilepath.Length)) to $($container.CloudBlobContainer.Uri.AbsoluteUri + "/" + $targetPath)"
            Set-AzStorageBlobContent -File $x.fullname -Container $container.Name -Blob $targetPath -Context $ctx -Force:$Force | Out-Null
            }
        }