Deep Dive: Snowflake to Azure Private Link Service Connection

Break down in plain language, building from basics to the complete picture.

What is Azure Private Link Service? (The Analogy)

Think of Azure Private Link Service like a private, secure doorway between two buildings:

  • Building A (Snowflake’s building) is in a gated community that you don’t control
  • Building B (Your Azure resources) is your own building
  • You can’t build a bridge between the buildings (no VNet peering)
  • You can’t use the public street (no internet for PHI data)

Azure Private Link Service creates a private tunnel through the ground (Azure’s backbone network) that only connects these two specific buildings, and nobody else can use it.


1. The Purpose of Azure Private Link Service

The Problem It Solves:

Imagine you have a service (like your MySQL database) running in your private Azure network. Normally, there are only two ways for external services to reach it:

  1. Public Internet: Open a door to the internet (❌ Not acceptable for PHI)
  2. VNet Peering: Directly connect networks (❌ Not possible with Snowflake’s multi-tenant setup)

The Solution:

Azure Private Link Service provides a third way:

  • It creates a private endpoint that external services can connect to
  • Traffic flows over Microsoft’s private backbone network (never touches the internet)
  • The external service thinks it’s connecting to a simple endpoint
  • Your service remains completely private within your VNet

Real-World Analogy:

It’s like having a private phone line installed between two companies:

  • The caller (Snowflake) dials a special number (private endpoint)
  • The call routes through a private telephone network (Azure backbone)
  • It rings at your reception desk (Load Balancer)
  • Your receptionist transfers it internally (to ProxySQL, then MySQL)
  • No one can intercept the call or even know it’s happening

2. When is Azure Private Link Service Used?

Common Scenarios:

Scenario A: Third-Party SaaS Needs Private Access

  • You use a SaaS product (like Snowflake, Databricks, or vendor tools)
  • They need to access YOUR Azure resources
  • You want zero public internet exposure
  • Use Private Link Service

Scenario B: Customer Access to Your Services

  • You provide a service to customers
  • Customers are in their own Azure subscriptions
  • You want to offer private connectivity
  • Use Private Link Service

Scenario C: Cross-Region Private Access

  • Your resources are in different Azure regions
  • You want private connectivity without complex VNet peering chains
  • Use Private Link Service

Your Specific Use Case:

You’re using it because:

  1. Snowflake is external (different Azure subscription, multi-tenant)
  2. MySQL is private (VNet-integrated, no public access)
  3. PHI compliance (must avoid public internet)
  4. No VNet peering option (Snowflake’s network architecture doesn’t support it)

3. How Azure Private Link Service Connects with External Services Securely

Let me walk through this step-by-step with your Snowflake scenario:

Step 1: You Create the Private Link Service

In your Azure subscription:

Your Actions:
├─ Create Standard Load Balancer
│  └─ Backend pool: ProxySQL VMs
├─ Create Private Link Service
│  ├─ Attach to Load Balancer's frontend IP
│  ├─ Generate a unique "alias" (like a secret address)
│  └─ Set visibility controls

What happens:

  • Azure creates a special resource that “wraps” your Load Balancer
  • It generates an alias – think of this as a secret, unique address like:
    • mycompany-mysql-pls.abc123.azure.privatelinkservice
  • This alias is how external services find your Private Link Service

Step 2: You Share the Alias with Snowflake

You give Snowflake two things:

  1. The alias (the secret address)
  2. Permission (you’ll approve their connection request)

Important: The alias itself doesn’t grant access – it’s just the address. Think of it like knowing someone’s phone number doesn’t mean they’ll accept your call.

Step 3: Snowflake Creates a Private Endpoint in Their Network

In Snowflake’s Azure environment (which you don’t control):

Snowflake's Actions:
├─ Create Private Endpoint in their VNet
│  └─ Target: Your Private Link Service alias
├─ This generates a connection request
└─ Connection shows as "Pending" (waiting for your approval)

What happens:

  • Snowflake creates a network interface in THEIR VNet
  • This interface gets a PRIVATE IP address (like 10.x.x.x) in their network
  • Azure knows this interface wants to connect to YOUR Private Link Service
  • But it can’t connect yet – you must approve it

Step 4: You Approve the Connection

In your Azure portal:

You see:
├─ Connection Request from Snowflake
│  ├─ Shows their subscription ID
│  ├─ Shows requested resource (your Private Link Service)
│  └─ Status: "Pending"
└─ You click: "Approve"

What happens:

  • Azure creates a secure tunnel through its backbone network
  • This tunnel connects Snowflake’s private endpoint to your Load Balancer
  • The tunnel is encrypted and isolated at the network fabric level

Step 5: The Connection is Established

Now the complete path exists:

Snowflake OpenFlow
    ↓
Connects to: 10.x.x.x (private IP in Snowflake's VNet)
    ↓
[AZURE BACKBONE - ENCRYPTED TUNNEL]
    ↓
Arrives at: Your Load Balancer (in your VNet)
    ↓
Routes to: ProxySQL → MySQL


4. How the Secure Connection Actually Works (The Technical Magic)

Layer 1: Network Isolation

Physical Level:

  • Azure’s network uses software-defined networking (SDN)
  • Your traffic gets VXLAN encapsulation (a tunnel within the network)
  • It’s tagged with your specific Private Link connection ID
  • Other tenants’ traffic cannot see or intercept it

Analogy: It’s like sending a letter inside a locked box, which is then placed inside another locked box, with special routing instructions that only the Azure network fabric can read.

Layer 2: No Public IP Addresses

Key Point: Neither end has a public IP address

Snowflake side:
├─ Private Endpoint: 10.0.5.20 (private IP)
└─ NOT exposed to internet

Your side:
├─ Load Balancer: 10.1.2.10 (private frontend IP)
└─ NOT exposed to internet

Why this matters:

  • No IP address exists that could be attacked from the Internet
  • Port scanners can’t find it
  • DDoS attacks can’t target it

Layer 3: Azure Backbone Routing

The path traffic takes:

  1. Snowflake’s VM sends packets to its private endpoint (10.0.5.20)
  2. Azure SDN intercepts these packets
  3. Encapsulation happens: packet gets wrapped in Azure’s internal routing protocol
  4. Backbone transit: Travels through Microsoft’s private fiber network
  5. Decapsulation: Arrives at your Private Link Service
  6. Delivery: Forwarded to your Load Balancer

The security here:

  • Traffic never routes through public internet routers
  • Doesn’t traverse untrusted networks
  • Uses Microsoft’s dedicated fiber connections
  • Encrypted at the transport layer

Layer 4: Encryption in Transit

Even though it’s on a private network, data is still encrypted:

Application Layer (MySQL):
└─ TLS/SSL encryption (MySQL protocol with SSL)
   └─ Your ProxySQL and MySQL enforce encrypted connections

Transport Layer:
└─ Private Link provides network-level isolation
   └─ Even if someone accessed the physical network, data is encrypted


5. How We Ensure Azure Private Link Service is Secure

Let me give you a comprehensive security checklist:

A. Access Control (Who Can Connect?)

1. Approval Requirement

Setting: Auto-approval vs Manual approval
Your choice: MANUAL APPROVAL ✓

Why:
- You explicitly approve each connection request
- See who's requesting (subscription ID, service name)
- Can reject suspicious requests
- Can revoke access anytime

Action Item: Always use manual approval for production PHI data.

2. Visibility Settings

Options:
├─ Role-Based (Subscription): Only specific subscriptions can see your service
├─ Role-Based (Explicit list): You list exact subscription IDs
└─ Public: Anyone can request (you still approve)

Your choice: Role-Based with Snowflake's subscription ID ✓

Why:
- Limits who can even discover your Private Link Service
- Reduces surface area for connection attempts

Action Item: Configure visibility to allow ONLY Snowflake’s Azure subscription ID.

3. Connection State Monitoring

Monitor:
├─ Active connections (who's connected right now)
├─ Pending requests (who's trying to connect)
├─ Rejected connections (what you've denied)
└─ Connection history (audit trail)

Action Item: Set up Azure Monitor alerts for:

  • New connection requests
  • Approved connections
  • Connection removals

B. Network Security (Traffic Control)

1. Network Security Groups (NSGs)

On your Load Balancer subnet:

Inbound Rules:
├─ ALLOW: From Private Link Service (Azure service tag)
│  └─ Port: 3306 (MySQL)
├─ DENY: Everything else
└─ Priority: Specific to general

Why:
- Even though traffic comes via Private Link, add NSG layer
- Defense in depth principle
- Logs any unexpected traffic patterns

2. Load Balancer Restrictions

Load Balancer Configuration:
├─ Frontend: INTERNAL only (not public)
├─ Backend pool: Only specific ProxySQL VMs
├─ Health probes: Ensure only healthy backends receive traffic
└─ Session persistence: Configure based on OpenFlow needs

Action Item: Use Internal Load Balancer (not public-facing).

3. ProxySQL VM Security

ProxySQL VM:
├─ No public IP address ✓
├─ NSG allows ONLY from Load Balancer subnet ✓
├─ OS firewall (firewalld/iptables) allows only port 3306 ✓
├─ Managed identity for authentication (no passwords) ✓
└─ Regular security patching ✓


C. Data Security (Protecting the Payload)

1. Force TLS/SSL Encryption

On MySQL Flexible Server:

-- Require SSL connections
REQUIRE SSL;

-- Verify setting
SHOW VARIABLES LIKE 'require_secure_transport';
-- Should return: ON

On ProxySQL:

-- Configure SSL for backend connections
UPDATE mysql_servers 
SET use_ssl=1 
WHERE hostgroup_id=0;

-- Force SSL for users
UPDATE mysql_users 
SET use_ssl=1;

Action Item: Never allow unencrypted MySQL connections, even over Private Link.

2. Certificate Validation

MySQL SSL Configuration:
├─ Use Azure-managed certificates
├─ Enable certificate validation
├─ Rotate certificates regularly
└─ Monitor certificate expiration

ProxySQL Configuration:
├─ Validate MySQL server certificates
├─ Use proper CA bundle
└─ Log SSL handshake failures


D. Authentication & Authorization

1. MySQL User Permissions

Create a dedicated user for Snowflake OpenFlow:

-- Create dedicated user
CREATE USER 'snowflake_cdc'@'%' 
IDENTIFIED WITH mysql_native_password BY 'strong_password'
REQUIRE SSL;

-- Grant ONLY necessary permissions
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT 
ON your_database.* 
TO 'snowflake_cdc'@'%';

-- Restrict to specific tables if possible
GRANT SELECT 
ON your_database.specific_table 
TO 'snowflake_cdc'@'%';

Key Principle: Least privilege – only grant what OpenFlow needs for CDC.

2. ProxySQL Authentication Layer

-- ProxySQL can add another auth layer
-- Different credentials externally vs internally

INSERT INTO mysql_users (
  username,
  password,
  default_hostgroup,
  max_connections
) VALUES (
  'snowflake_cdc',
  'external_password',
  0,  -- backend hostgroup
  100 -- connection limit
);

Benefit: You can change backend MySQL credentials without updating Snowflake.


E. Monitoring & Auditing (Detect Issues)

1. Azure Monitor Logs

Enable diagnostics on:
├─ Private Link Service
│  └─ Log: Connection requests, approvals, data transfer
├─ Load Balancer
│  └─ Metrics: Connection count, bytes transferred, health status
├─ Network Security Groups
│  └─ Flow logs: All traffic patterns
└─ ProxySQL VM
   └─ Azure Monitor Agent: OS and application logs

2. MySQL Audit Logging

-- Enable audit log
SET GLOBAL audit_log_policy = 'ALL';

-- Monitor for:
├─ Failed authentication attempts
├─ Unusual query patterns
├─ Large data exports
└─ Access outside business hours

3. Alert on Anomalies

Set up alerts for:
├─ New Private Link connection requests
├─ High data transfer volumes (possible data exfiltration)
├─ Failed authentication attempts
├─ ProxySQL health check failures
├─ Load Balancer backend unhealthy
└─ NSG rule matches (blocked traffic attempts)


F. Compliance & Governance (PHI Requirements)

1. Data Residency

Ensure:
├─ Private Link Service: Same region as MySQL
├─ Load Balancer: Same region
├─ ProxySQL VMs: Same region
└─ Snowflake region: Appropriate for your PHI requirements

Why: Minimize data travel, maintain compliance boundaries

2. Encryption at Rest

MySQL Flexible Server:
├─ Enable: Customer-managed keys (CMK) with Azure Key Vault
├─ Rotate: Keys regularly
└─ Audit: Key access logs

Why: PHI must be encrypted at rest

3. Access Logs & Retention

Retention policy:
├─ Connection logs: 90 days minimum
├─ Query audit logs: Per HIPAA requirements
├─ NSG flow logs: 30 days minimum
└─ Azure Activity logs: 90 days minimum

Why: Compliance audits, incident investigation


6. Security Validation Checklist

Before going live, verify:

Pre-Production Checklist:

☐ Private Link Service visibility: Restricted to Snowflake subscription
☐ Manual approval: Enabled (no auto-approval)
☐ NSG rules: Properly configured on all subnets
☐ Load Balancer: Internal only (no public IP)
☐ ProxySQL VMs: No public IPs, managed identities configured
☐ MySQL SSL: Required for all connections
☐ Certificate validation: Enabled
☐ Least privilege: MySQL user has minimal permissions
☐ Audit logging: Enabled on MySQL and ProxySQL
☐ Azure Monitor: Diagnostic logs enabled on all components
☐ Alerts: Configured for security events
☐ Network flow logs: Enabled for NSGs
☐ Encryption at rest: Customer-managed keys configured
☐ Backup encryption: Verified
☐ Disaster recovery: Tested with Private Link maintained
☐ Documentation: Architecture diagram, runbooks created
☐ Incident response: Plan includes Private Link scenarios

Ongoing Security Practices:

☐ Weekly: Review connection logs
☐ Weekly: Check for pending connection requests
☐ Monthly: Audit MySQL user permissions
☐ Monthly: Review NSG rules for drift
☐ Quarterly: Certificate expiration check
☐ Quarterly: Penetration testing (authorized)
☐ Annually: Full security audit
☐ Continuous: Monitor alerts in real-time


Common Security Concerns & Answers

Q: Can someone intercept traffic on the Azure backbone?

A: No, for multiple reasons:

  1. Network isolation: Your Private Link traffic is isolated using VXLAN tunneling
  2. Logical separation: Azure’s SDN ensures tenant isolation at the hypervisor level
  3. Encryption: Even if someone accessed the physical layer (impossible), MySQL traffic is TLS-encrypted
  4. Microsoft’s security: Azure backbone is not accessible to customers or internet

Q: How do I know Snowflake is really who they say they are?

A: Multiple verification layers:

  1. Subscription ID: You see Snowflake’s Azure subscription ID in the connection request
  2. Manual approval: You explicitly approve the connection
  3. Authentication: Even after the connection, Snowflake must authenticate to MySQL
  4. Certificate validation: TLS certificates verify server identity
  5. Coordination: You coordinate with Snowflake support to verify the timing of the connection

Q: What if someone compromises Snowflake’s environment?

A: Defense in depth protects you:

  1. Authentication required: Attacker still needs MySQL credentials
  2. Least privilege: CDC user can only SELECT and read binlog, cannot modify data
  3. Audit logging: Any unusual activity is logged
  4. Revocable access: You can instantly reject/delete the Private Link connection
  5. Network monitoring: Unusual data transfer patterns trigger alerts

Q: Is Private Link more secure than VPN or ExpressRoute?

A: Yes, for several reasons:

  1. No customer routing: You don’t manage routes or BGP
  2. No gateway management: No VPN gateways to patch/secure
  3. No shared circuits: Unlike ExpressRoute, Private Link is point-to-point
  4. Simpler security model: Less complexity = fewer vulnerabilities
  5. Azure-managed: Microsoft handles underlying infrastructure security

Summary: The Security Model

Azure Private Link Service creates a multi-layered secure connection:

Security Layers (from outer to inner):
│
├─ Layer 1: Access Control
│   └─ Manual approval, subscription restrictions
│
├─ Layer 2: Network Isolation  
│   └─ Private IPs, no internet exposure, Azure backbone
│
├─ Layer 3: Traffic Control
│   └─ NSGs, Load Balancer rules, firewall policies
│
├─ Layer 4: Encryption in Transit
│   └─ TLS/SSL for MySQL protocol
│
├─ Layer 5: Authentication
│   └─ MySQL user credentials, certificate validation
│
├─ Layer 6: Authorization
│   └─ Least privilege MySQL permissions
│
├─ Layer 7: Monitoring & Auditing
│   └─ Logs, alerts, anomaly detection
│
└─ Layer 8: Encryption at Rest
    └─ Customer-managed keys for MySQL data

Each layer provides protection, so even if one layer is compromised, others maintain security.


Query Azure MySQL Audit Logs

The common practice for user management in Azure AD is to create Azure AD Security Groups for users and assign permissions to groups rather than individual users. For example, the common practice for users who need access to Azure MySQL instance would to be add them to an Azure AD Group (say MySqlDb_Users) and then setup this Azure AD Group for access on Azure MySQL Instance.

Pre-requisite

  • Ensure that Audit Logs are enabled on all the production instances for Azure MySql as documented here — https://docs.microsoft.com/en-us/azure/mysql/howto-configure-audit-logs-portal
  • Make sure to enable Auditing on Server Parameters section (at least Connection audit log event type).
  • The logs can be saved to Storage Account, Event Hubs or Azure Log Analytics Workspace. We used Log Analytics Workspace for the runbook because it provides rich querying capabilities.

The following Log Analytics Query and the corresponding screenshot shows that individual user activity is tracked in the Audit Logs in spite of them using the Azure AD Group as username to connect Azure MySQL — external_user_s field logs the exact individual user, connection_id_d and thread_id_d carry over the Connection/ThreadId which can be used to tie the user across all log records.

AzureDiagnostics
| where Category == "MySqlAuditLogs"
| order by TimeGenerated desc
| project TimeGenerated, event_class_s, event_subclass_s, user_s, external_user_s, connection_id_d, thread_id_d, event_time_t, sql_text_s, db_s, table_name_s, error_code_d

How do you set binlog retention period on MySQL?

In MySQL, you can control the retention period of binary log files using the expire_logs_days system variable. This variable determines how long binary log files are retained before they are automatically purged. Here’s how you can set up the binary log retention period:

  1. Check Current Configuration: You can check the current value of expire_logs_days by executing the following SQL query:sqlCopy codeSHOW VARIABLES LIKE 'expire_logs_days';
  2. Set Binary Log Retention Period: To set the binary log retention period to a specific number of days, you can use the following SQL statement:sqlCopy codeSET GLOBAL expire_logs_days = <number_of_days>; Replace <number_of_days> with the desired retention period.
  3. Update Configuration File (Optional): To make the change persistent across MySQL server restarts, you can add or update the expire_logs_days setting in your MySQL configuration file (e.g., my.cnf or my.ini). Add the following line:iniCopy codeexpire_logs_days = <number_of_days>
  4. Manually Purge Binary Logs (Optional): If you want to manually purge binary logs before the specified retention period, you can use the PURGE BINARY LOGS statement. For example, to purge logs older than 7 days, you can run:sqlCopy codePURGE BINARY LOGS BEFORE NOW() - INTERVAL 7 DAY; Be cautious while manually purging logs, as this operation cannot be undone.
  5. Restart MySQL Server (Optional): If you updated the configuration file, you might need to restart the MySQL server for the changes to take effect.

Keep in mind that setting a very short retention period may lead to potential data loss during certain recovery scenarios, so choose a value that aligns with your backup and recovery strategy. Additionally, be aware of the disk space implications of retaining binary logs for an extended period.

MySQL replication types

MySQL supports two (or three, depending on how you look at it) different methods of replicating databases from master to slave. All of these methods use the binary log; however, they differ in the type of data that is written to the master’s binary log.

  • Statement-based replication Under this method, the binary log stores the SQL statements used to change databases on the master server. The slave reads this data and reexecutes these SQL statements to produce a copy of the master database. This is the default replication method in MySQL 5.1.11 and earlier and MySQL 5.1.29 onwards.
  • Row-based replication Under this method, the binary log stores the record-level changes that occur to database tables on the master server. The slave reads this data and manipulates its records accordingly to produce a copy of the master database.
  • Mixed-format replication Under this method, the server can dynamically choose between statement-based replication and row-based replication, depending on certain conditions. Some of these conditions include using a user-defined function (UDF), using an INSERT command with the DELAYED clause, using temporary tables, or using a statement that uses system variables. This is the default replication method in MySQL 5.1.12 to MySQL 5.1.28.

If you’re unsure which replication method to use and your replication needs aren’t complex, it’s best to stick to statement-based replication, as it’s been around longest and therefore has had the most time to have its kinks worked out. That said, certain types of statements cannot be replicated using this method, and it also tends to require a higher number of table locks. Row-based replication is useful for these situations. Because it replicates changes to rows, any change can be replicated, and it also requires fewer table locks.

The replication method currently in use on the server is listed in the binlog_format server variable.

mysql> SHOW VARIABLES LIKE 'binlog_format';
t0305-01
1 row in set (0.08 sec)

To alter the replication method, set a new value for this variable, as shown, using the SET command with either GLOBAL or SESSION scope. Note that using GLOBAL scope requires a server restart for the change in method to take effect.

mysql> SET binlog_format = 'MIXED';
Query OK, 0 rows affected (0.02 sec)
mysql> SELECT @@SESSION.binlog_format;
+-------------------------+
| @@SESSION.binlog_format |
+-------------------------+
| MIXED                   |
+-------------------------+
1 row in set (0.00 sec)

mysql> SET GLOBAL binlog_format = 'ROW';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @@GLOBAL.binlog_format;;
+------------------------+
| @@GLOBAL.binlog_format |
+------------------------+
| ROW                    |
+------------------------+
1 row in set (0.00 sec)

Writing queries that are optimized

When writing a query there are basic parts:
1) the result set
2) the tables to get the data from
3) the where clause – provides filtering
4) group by for grouping data to be aggregated.


This article will discuss areas 2 and 3 since that’s where most query optimizations occur.

Joining Tables efficiently:

Mysql has a practical limit to the number of tables you should join.  It is 10 (some opinions differ).
While Mysql can join more than that but statement performance may suffer due to the calculation of the statement execution plan.  The more tables Mysql has to deal with, the greater number of plans it has to explore when deciding
on how to execute the statement.

When you join 2 tables you want to make sure the columns you join on are indexed in at least of of the tables.
If the columns are indexed, Mysql can find the row in the joined table quickly via an index lookup.

Ex:
    Select xxx,yyy,xxx
    from table_a
    join table_b on table_b.indexed_column = table_a.indexed_column

In this example the column ‘indexed_column’ represents a column that has an index on it in 
table_a and/or table_b.

So for each row in table_a, the select will quickly find the matching row in table_b due to the join condition which is on the ‘indexed_column’.

Now if we join the tables on columns that are not indexed in either table_a or table_b.

    Select xxx,yyy,xxx
    from table_a
    join table_b on table_b.unindexed_column = table_a.unindexed_column

Mysql will have to perform a table scan of table_b for each row in table_a.
So if table_a has 100 rows in it and table_b has 1m rows that means that Mysql will have to scan table_b 100 times scanning a total of 100 million rows.
This is very inefficient.

Subselects as tables:

Mysql materializes subselects.  What that means is the first time the subselect is run, Mysql creates a temporary table to store the result set so Mysql does not have to re-execute subselect.  Note: if your result set is very large, say hundreds of thousands of rows, that can create a very large temp table.

Many times I have see queries that use a subselect as a table.  That is fine.
However you must remember that if you join a subselect to another table, the column used for the join condition must be indexed in the joined table.  
If the joined table does not have an index on the column used in the join condition then the joined table will be scanned for each row in the subselect result set.

That also means that you should not join one subselect to another subselect since the result sets of subselects are not indexed.

Another thing to keep in mind is that the columns used in join conditions must be the same datatype.  If they are not the same datatype and there is an index on the columns, Mysql can not make use of the indexes and will end up performing table scans.

Something else you need to know is that Mysql is pretty smart.  When it creates it’s join plan, it can detect which table should be the main table and which should be the joined table.
For example, if the join condition column is indexed in table_a but not in table_b, it will join table_a to table_b using table_b as the primary table.  It also means that Mysql may not join the tables in the same order you think it should.  I have seen situations where Mysql did not ‘guess’ correctly.
To override Mysql’s guess, you can specify ‘STRAIGHT_JOIN’ after the SELECT statement.  This will cause your tables to be joined in the order you specify. 
The primary table is where Mysql starts getting rows from first.

The ‘WHERE’ clause:

This is the most important part of an sql statement as it filters out rows.
If you don’t use or it is not used correctly, your select will be performing full tables scans.
This is very very bad.

Using a WHERE clause is easy.  You simply specify a column = another_column AND/OR a column = a CONSTANT.

Anytime you filter on a column the column should be indexed.
Also never change the datatype of the column you are filtering on (as in using CAST).  This prevents Mysql from using any associated index.
Mysql can only use an index when the column is used in it’s ‘native’ datatype.

The datatypes of the column you are filtering on and the constant/column you are comparing to must match.  If you need to change the datatype, change the datatype of the constant/column and not that of the indexed column.

Using an ‘IN’ clause:
Limit the size of the items inside an ‘IN’ clause.  If you have 50 items that’s fine. Don’t have 25 million.
There are better ways to filter.

Subselect in a WHERE clause:

Using a subselect in a WHERE clause is okay if the subselect returns a small result set.
If your subselect returns a very large result set, you may be better off simply joining the tables in the subselect like normal tables.

Get rid of subselects:

Due to the inability of subselects to be indexed you can get rid of them.
What you do is to create a temporary table.  If you specify ‘ENGINE=innodb’, the table will be created as an innodb table and will exist in the innodb buffer pool.  This avoids eating up the servers free RAM (reduces the chances of out of memory situation).  When you create this temporary table, you can create indexes.
Then, in your query you can join this temporary table just like a normal table on the indexed columns.
Maintaining the indexes (upon insert into the table) will cause a minor performance hit but the performance gain from having the indexes will more than offset that.


The ‘Takeaway’:

The takeaway from this document is that INDEXES are one of the best performance boosters.  Create them and use them.

You can not use any function (such as UPPER(), LOWER(), user_defined_function(), …..) on the indexed column or Mysql will not be able to use it.