Automate SQL Server Database Role Assignment with PowerShell and dbatools

Introduction

As a SQL Server DBA, one of the most repetitive administrative tasks is granting user access to multiple databases — especially in environments with dozens or even hundreds of databases.
Instead of manually connecting to each database and assigning roles, you can automate the process with the dbatools PowerShell module.

In this post, we’ll walk through how to automatically grant a user db_datareader and db_datawriter roles across all user databases, while excluding system databases.


Prerequisites

Before running the script:

  • Install the dbatools PowerShell module: Install-Module dbatools -Scope CurrentUser -Force
  • Ensure your account has sufficient permissions (sysadmin or equivalent).
  • Know the SQL instance name and the login you want to grant permissions to.

The PowerShell Script

# Requires dbatools
# Install-Module dbatools -Scope CurrentUser -Force

$SqlInstance = "MyServer\MyInstance"  # Replace with your SQL Server instance
$Login = "MyDomain\MyUser"            # Replace with your Windows or SQL login

# Get all user databases (excluding system DBs)
$UserDatabases = Get-DbaDatabase -SqlInstance $SqlInstance | Where-Object { -not $_.IsSystemObject }

foreach ($db in $UserDatabases) {
    Write-Host "Processing database: $($db.Name)" -ForegroundColor Cyan

    try {
        # Create the user if not already present
        $user = Get-DbaDbUser -SqlInstance $SqlInstance -Database $db.Name -User $Login -ErrorAction SilentlyContinue
        if (-not $user) {
            New-DbaDbUser -SqlInstance $SqlInstance -Database $db.Name -Login $Login -Username $Login -Confirm:$false | Out-Null
        }

        # Grant roles
        Add-DbaDbRoleMember -SqlInstance $SqlInstance -Database $db.Name -Role db_datareader -User $Login -Confirm:$false -ErrorAction Stop
        Add-DbaDbRoleMember -SqlInstance $SqlInstance -Database $db.Name -Role db_datawriter -User $Login -Confirm:$false -ErrorAction Stop

        Write-Host "✅ Granted db_datareader and db_datawriter in $($db.Name)" -ForegroundColor Green
    }
    catch {
        Write-Warning "Failed to process $($db.Name): $_"
    }
}

Write-Host "Completed assigning roles for $Login on all user databases." -ForegroundColor Green


Explanation

  • Get-DbaDatabase retrieves all databases and filters out system ones.
  • New-DbaDbUser ensures the login exists as a user in each DB.
  • Add-DbaDbRoleMember grants the necessary roles.
  • The script is non-interactive (-Confirm:$false), making it perfect for automation or CI/CD pipelines.

Example Usage

.\Grant-DbRoles.ps1 -SqlInstance "SQL01" -Login "Contoso\User123"


Key Takeaways

  • Save hours by automating repetitive access management tasks.
  • dbatools provides robust error handling and clean PowerShell syntax.
  • Works seamlessly with both Windows and SQL logins.
  • Ideal for onboarding new users or service accounts.

Remediating Redshift User Permissions

Overview

This guide covers the complete process for remediating Redshift user permissions as part of quarterly user access reviews. When users leave the company or their access needs change, we receive tickets with specific Schema-Permission attributes that need to be removed.

Out of the box, Redshift doesn’t make user permission management easy – especially when dealing with default privileges, object ownership, and the various ways permissions can be granted. This guide provides a systematic approach to handle all the edge cases you’ll encounter.

Complete Remediation Process

Step 1: Comprehensive User Audit

Always start by understanding the current state. Run this comprehensive audit to see all permissions:

-- Replace 'john.doe' with the target username
select user_name, schema_name, super_user, has_create, has_insert, has_update, has_delete, has_select, has_references, valuntil from ( select u.usename user_name, u.usesuper super_user, s.schemaname schema_name, has_schema_privilege(u.usename,s.schemaname,'create') has_create, has_table_privilege(u.usename,s.schemaname||'.'||s.tablename,'insert') has_insert, has_table_privilege(u.usename,s.schemaname||'.'||s.tablename,'update') has_update, has_table_privilege(u.usename,s.schemaname||'.'||s.tablename,'delete') has_delete, has_table_privilege(u.usename,s.schemaname||'.'||s.tablename,'select') has_select, has_table_privilege(u.usename,s.schemaname||'.'||s.tablename,'references') has_references, valuntil from pg_user u CROSS join ( SELECT DISTINCT schemaname, tablename FROM pg_tables where schemaname not like 'pg_%' and tablename not like '%newsletter_exp_prior_lookback_temptable%' ) s where (super_user = 1 or has_create = 1 or has_insert = 1 or has_update = 1 or has_delete = 1 or has_select = 1 or has_references = 1 ) and (u.valuntil > NOW() or u.valuntil is NULL) and u.usename = 'john.doe' ) group by user_name, schema_name, super_user, has_create, has_insert, has_update, has_delete, has_select, has_references, valuntil order by user_name, schema_name, has_select, has_create, has_insert, has_update, has_delete;

Step 2: Check for Object Ownership

Critical: If the user owns any tables, views, or functions, they need to be reassigned before permissions can be fully revoked:

select * from ( SELECT n.nspname AS schema_name, c.relname AS rel_name, c.relkind AS rel_kind, pg_get_userbyid(c.relowner) AS owner_name FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace UNION ALL SELECT n.nspname AS schema_name, p.proname, 'p', pg_get_userbyid(p.proowner) FROM pg_proc p JOIN pg_namespace n ON n.oid = p.pronamespace ) sub1 where owner_name = 'john.doe';

If this returns results, you’ll need to reassign ownership:

-- Example: Reassign table ownership to a service account
ALTER TABLE schema_name.table_name OWNER TO service_account;

Step 3: Check for Default Privileges

Critical for ETL accounts: Users with default privileges cannot be dropped until these are cleaned up. Check for default privileges:

select pg_get_userbyid(d.defacluser) as user, n.nspname as schema, decode(d.defaclobjtype, 'r', 'tables', 'f', 'functions') as object_type, array_to_string(d.defaclacl, ' + ') as default_privileges from pg_catalog.pg_default_acl d left join pg_catalog.pg_namespace n on n.oid = d.defaclnamespace where array_to_string(defaclacl, ',') like '%john.doe%';

If default privileges exist, you must clean them up using this workaround (due to a PostgreSQL bug):

-- Grant temporary CREATE permission (required due to PostgreSQL bug)
grant create on schema ops to john.doe;

-- Revoke the default privileges
alter default privileges for user john.doe in schema ops revoke all privileges on tables from group ops, group engineering; 

-- Remove the temporary CREATE permission
revoke create on schema ops from john.doe;

Step 4: Identify Group Memberships

Find all groups the user belongs to and generate removal statements:

SELECT u.usesysid, g.groname, u.usename, 'ALTER GROUP "' || g.groname || '" DROP USER "' || u.usename || '";' as drop_statement FROM pg_user u LEFT JOIN pg_group g ON u.usesysid = ANY (g.grolist) WHERE u.usename = 'john.doe';

Step 5: Remove Group Memberships

Execute the group removal statements. Based on typical Schema-Permission patterns:

-- Remove user from read-only groups (SELECT permissions)
ALTER GROUP "ops" DROP USER "john.doe"; 
ALTER GROUP "person" DROP USER "john.doe"; 
-- Remove user from groups with broader permissions
ALTER GROUP "nonpii" DROP USER "john.doe"; 
ALTER GROUP "nonpii_readwrite" DROP USER "john.doe";

Step 6: Handle Direct Table Permissions

Some users may have been granted direct permissions on specific tables. This query will find them and generate REVOKE statements:

with users as ( select 'john.doe'::text as username ) select 'REVOKE ALL PRIVILEGES ON TABLE ' || pg_namespace.nspname || '.' || pg_class.relname || ' FROM ' || u.username || ';' as revoke_statement, pg_namespace.nspname as schemaname, pg_class.relname as tablename, array_to_string(pg_class.relacl, ',') as acls from pg_class left join pg_namespace on pg_class.relnamespace = pg_namespace.oid join users u on (array_to_string(pg_class.relacl, ',') like '%' || u.username || '=%') where pg_class.relacl is not null and pg_namespace.nspname not in ('pg_catalog', 'pg_toast', 'information_schema');

Execute the generated REVOKE statements:

-- Example output from above query
REVOKE ALL PRIVILEGES ON TABLE ops.claims_grading FROM john.doe; REVOKE ALL PRIVILEGES ON TABLE person.user_segments FROM john.doe;

Step 7: Handle Schema-Level Permissions

Remove any direct schema-level permissions:

-- Revoke CREATE permissions
REVOKE CREATE ON SCHEMA ops FROM "john.doe"; 
REVOKE CREATE ON SCHEMA person FROM "john.doe"; 

-- Revoke USAGE permissions
REVOKE USAGE ON SCHEMA ops FROM "john.doe"; 
REVOKE USAGE ON SCHEMA person FROM "john.doe";

Step 8: Comprehensive Verification

After remediation, verify all permissions have been removed:

-- Check for any remaining table permissions
SELECT n.nspname AS schema_name, c.relname AS table_name, u.usename AS username, has_table_privilege(u.usename, c.oid, 'SELECT') AS has_select, has_table_privilege(u.usename, c.oid, 'INSERT') AS has_insert, has_table_privilege(u.usename, c.oid, 'UPDATE') AS has_update, has_table_privilege(u.usename, c.oid, 'DELETE') AS has_delete, 'REVOKE ALL ON "' || n.nspname || '"."' || c.relname || '" FROM "' || u.usename || '";' as cleanup_statement, 'SHOW GRANTS ON TABLE "' || n.nspname || '"."' || c.relname || '";' as verification_statement FROM pg_catalog.pg_namespace n JOIN pg_catalog.pg_class c ON n.oid = c.relnamespace CROSS JOIN pg_catalog.pg_user u WHERE u.usename = 'john.doe' AND n.nspname IN ('ops', 'personalization') AND (has_table_privilege(u.usename, c.oid, 'SELECT') = true OR has_table_privilege(u.usename, c.oid, 'INSERT') = true OR has_table_privilege(u.usename, c.oid, 'UPDATE') = true OR has_table_privilege(u.usename, c.oid, 'DELETE') = true) AND c.relkind = 'r' ORDER BY n.nspname ASC;

If this query returns no rows, the remediation was successful.

Step 9: Final Verification

Run these final checks to ensure complete cleanup:

-- Verify no group memberships remain
SELECT u.usesysid, g.groname, u.usename FROM pg_user u LEFT JOIN pg_group g ON u.usesysid = ANY (g.grolist) WHERE u.usename = 'john.doe' AND g.groname IS NOT NULL;
-- Show any remaining grants
SHOW GRANTS FOR "john.doe";
-- Check specific tables if needed
SHOW GRANTS ON TABLE "ops"."claims_grading";

Advanced Troubleshooting

Case 1: User Still Has Access After All Revocations

This usually means permissions were granted to PUBLIC. Check and revoke:

-- Check for PUBLIC grants on specific problematic tables 
SHOW GRANTS ON TABLE ops.claims_grading;
-- If PUBLIC has access, revoke it (this is the "nuclear option")
REVOKE ALL ON ops.claims_grading FROM PUBLIC;

Case 2: Cannot Drop User Due to Default Privileges

If you see this error:

ERROR: user "username" cannot be dropped because some objects depend on it DETAIL: owner of default privileges on new relations belonging to user username in schema schema_name

Follow the default privileges cleanup process in Step 3. This is a known PostgreSQL limitation that requires the temporary CREATE permission workaround.

Case 3: Permission Denied When Revoking Default Privileges

If you get “permission denied for schema” when trying to revoke default privileges, you need to temporarily grant CREATE permissions first (see Step 3). This is due to a semi-bug in PostgreSQL.

Case 4: Complex Permission Inheritance

Sometimes users inherit permissions through multiple group memberships or nested groups. In these cases:

  1. Run the comprehensive audit query multiple times during remediation
  2. Check for indirect permissions through role inheritance
  3. Verify that group memberships are removed (some systems cache group information)

Best Practices

  1. Always follow the order: Object ownership → Default privileges → Group memberships → Direct permissions → Schema permissions → Verification
  2. Document edge cases: Each remediation teaches you something new. Keep notes on unusual patterns.
  3. Test in non-production first: For complex users (especially ETL accounts), test the remediation process in a non-production environment.
  4. Handle default privileges immediately: Don’t wait until user departure to clean up default privileges – they’re the most significant source of complications.
  5. Use the verification queries: The verification step isn’t optional – it’s the only way to be specific remediation was successful.
  6. Check PUBLIC permissions last: If a user still has unexpected access, PUBLIC permissions are usually the culprit.

Emergency Procedures

If You Need to Immediately Revoke All Access

For urgent security situations, you can disable a user account immediately:

-- Disable the account (prevents login but doesn't remove permissions)ALTER USER "john.doe" VALID UNTIL '1900-01-01';
-- Then follow the normal remediation process when time permits

If You Accidentally Revoke Too Much

If you accidentally remove permissions that should remain:

  1. Check the original ticket carefully – what should actually be removed?
  2. Re-grant the appropriate group memberships
  3. Verify using the audit query that permissions match expectations
  4. Document the mistake to prevent future occurrences

This comprehensive approach ensures that user permissions are properly remediated while handling all the edge cases that make Redshift user management challenging.

Decoding a Python Script: An Improv-Inspired Guide for Beginners

By Vinay Rahul Are, Python Enthusiast \& Improv Comedy Fan


Introduction

Learning Python can feel intimidating—unless you approach it with a sense of play! Just like improv comedy, Python is about saying “yes, and…” to new ideas, experimenting, and having fun. In this post, I’ll walk you through a real-world Python script, breaking down each part so you can understand, explain, and even perform it yourself!


The Script’s Purpose

The script we’ll explore automates the process of running multiple SQL files against an Amazon Redshift database. For each SQL file, it:

  • Executes the file’s SQL commands on Redshift
  • Logs how many rows were affected, how long it took, and any errors
  • Moves the file to a “Done” folder when finished

It’s a practical tool for data engineers, but the structure and logic are great for any Python beginner to learn from.


1. The “Show Description” (Docstring)

At the top, you’ll find a docstring—a big comment block that tells you what the script does, what you need to run it, and how to use it.

"""
Batch Redshift SQL Script Executor with Per-Script Logging, Timing, and Post-Execution Archiving

Pre-requisites:
---------------
1. Python 3.x installed on your machine.
2. The following Python packages must be installed:
    - psycopg2-binary
3. (Recommended) Use a virtual environment to avoid dependency conflicts.
4. Network access to your Amazon Redshift cluster.

Installation commands:
----------------------
python -m venv venv
venv\Scripts\activate        # On Windows
pip install psycopg2-binary

Purpose:
--------
This script automates the execution of multiple .sql files against an Amazon Redshift cluster...
"""

2. Importing the “Cast and Crew” (Modules)

Every show needs its cast. In Python, that means importing modules:

import os
import glob
import psycopg2
import getpass
import shutil
import time
  • os, glob, shutil: Handle files and folders
  • psycopg2: Talks to the Redshift database
  • getpass: Securely prompts for passwords
  • time: Measures how long things take

3. The “Stage Directions” (Configuration)

Before the curtain rises, set your stage:

HOST = '<redshift-endpoint>'
PORT = 5439
USER = '<your-username>'
DATABASE = '<your-database>'
SCRIPT_DIR = r'C:\redshift_scripts'
DONE_DIR = os.path.join(SCRIPT_DIR, 'Done')
  • Replace the placeholders with your actual Redshift details and script folder path.

4. The “Comedy Routine” (Function Definition)

The main function, run_sql_script, is like a well-rehearsed bit:

def run_sql_script(script_path, conn):
    log_path = os.path.splitext(script_path)[0] + '.log'
    with open(script_path, 'r', encoding='utf-8') as sql_file, open(log_path, 'w', encoding='utf-8') as log_file:
        sql = sql_file.read()
        log_file.write(f"Running script: {script_path}\n")
        start_time = time.perf_counter()
        try:
            with conn.cursor() as cur:
                cur.execute(sql)
                end_time = time.perf_counter()
                elapsed_time = end_time - start_time
                rows_affected = cur.rowcount if cur.rowcount != -1 else 'Unknown'
                log_file.write(f"Rows affected: {rows_affected}\n")
                log_file.write(f"Execution time: {elapsed_time:.2f} seconds\n")
                conn.commit()
                log_file.write("Execution successful.\n")
        except Exception as e:
            end_time = time.perf_counter()
            elapsed_time = end_time - start_time
            log_file.write(f"Error: {str(e)}\n")
            log_file.write(f"Execution time (until error): {elapsed_time:.2f} seconds\n")
            conn.rollback()
  • Reads the SQL file
  • Logs what’s happening
  • Measures execution time
  • Handles success or errors gracefully

5. The “Main Event” (main function)

This is the showrunner, making sure everything happens in order:

def main():
    password = getpass.getpass("Enter your Redshift password: ")
    if not os.path.exists(DONE_DIR):
        os.makedirs(DONE_DIR)
    sql_files = glob.glob(os.path.join(SCRIPT_DIR, '*.sql'))
    conn = psycopg2.connect(
        host=HOST,
        port=PORT,
        user=USER,
        password=password,
        dbname=DATABASE
    )
    for script_path in sql_files:
        print(f"Running {script_path} ...")
        run_sql_script(script_path, conn)
        try:
            shutil.move(script_path, DONE_DIR)
            print(f"Moved {script_path} to {DONE_DIR}")
        except Exception as move_err:
            print(f"Failed to move {script_path}: {move_err}")
    conn.close()
    print("All scripts executed.")
  • Prompts for your password (no peeking!)
  • Makes sure the “Done” folder exists
  • Finds all .sql files
  • Connects to Redshift
  • Runs each script, logs results, and moves the file when done

6. The “Curtain Call” (Script Entry Point)

This line ensures the main event only happens if you run the script directly:

if __name__ == "__main__":
    main()

7. Explaining the Script in Plain English

“This script automates running a bunch of SQL files against a Redshift database. For each file, it logs how many rows were affected, how long it took, and any errors. After running, it moves the file to a ‘Done’ folder so you know it’s finished. It’s organized with clear sections for setup, reusable functions, and the main execution flow.”


8. Why This Structure?

  • Imports first: So all your helpers are ready before the show starts.
  • Functions: Keep the code neat, reusable, and easy to understand.
  • Main block: Keeps your script from running accidentally if imported elsewhere.
  • Comments and docstrings: Make it easy for others (and future you) to understand what’s going on.

9. Final Thoughts: Python is Improv!

Just like improv, Python is best learned by doing. Try things out, make mistakes, and remember: if your code “crashes,” it’s just the computer’s way of saying, “Yes, and…let’s try that again!”

If you want to dig deeper into any part of this script, just ask in the comments below. Happy coding—and yes, and… keep learning!


How Increasing Azure PostgreSQL IOPS Supercharged Our Bulk Insert Performance

Loading millions of records into a cloud database can be a frustratingly slow task—unless you identify where your bottlenecks are. In this post, I will share how we significantly improved our insertion speeds on Azure Database for PostgreSQL Flexible Server by adjusting a single, often-overlooked setting: provisioned IOPS.


The Challenge: Slow Inserts Despite Low CPU

We were running a large data migration from Databricks to Azure Database for PostgreSQL Flexible Server. Our setup:

  • Instance: Memory Optimized, E8ds_v4 (8 vCores, 64 GiB RAM, 256 GiB Premium SSD)
  • Insert Method: 8 parallel threads from Databricks, each batching 50,000 rows

Despite this robust configuration, our insert speeds were disappointing. Monitoring showed:

  • CPU usage: ~10%
  • Disk IOPS: 100% utilization

Clearly, our CPU wasn’t the problem—disk I/O was.


The Bottleneck: Disk IOPS Saturation

Azure Database for PostgreSQL Flexible Server ties write performance directly to your provisioned IOPS (Input/Output Operations Per Second). PostgreSQL is forced to queue up write operations when your workload hits this limit, causing inserts to slow down dramatically.

Key signs you’re IOPS-bound:

  • Disk IOPS metric at or near 100%
  • Low CPU and memory utilization
  • Inserts (and possibly other write operations) are much slower than expected

The Fix: Increase Provisioned IOPS

We increased our provisioned IOPS from 1,100 to 5,000 using the Azure Portal:

  1. Go to your PostgreSQL Flexible Server in Azure.
  2. Select Compute + storage.
  3. Adjust the IOPS slider (or enter a higher value if using Premium SSD v2).
  4. Save changes—no downtime required.

Result:
Insert speeds improved immediately and dramatically. Disk performance no longer throttled the database, and we could fully utilize our CPU and memory resources.


Lessons Learned & Best Practices

  • Monitor your bottlenecks: Always check disk IOPS, CPU, and memory during heavy data loads.
  • Scale IOPS with workload: Azure lets you increase IOPS on the fly. For bulk loads, temporarily raising IOPS can save hours or days of processing time.
  • Batch and parallelize wisely: Match your parallel threads to your vCPU count, but remember that IOPS is often the true limiter for bulk writes.
  • Optimize indexes and constraints: Fewer indexes mean fewer writes per insert. Drop non-essential indexes before bulk loads and recreate them afterward.

Conclusion:
If your PostgreSQL inserts are slow on Azure, check your disk IOPS. Increasing provisioned IOPS can unlock the performance your hardware is capable of—sometimes, it’s the simplest tweak that makes the biggest difference.


pg_whoisactive

/*
================================================================================
pg_whoisactive - PostgreSQL Session Monitoring Tool
================================================================================

DESCRIPTION:
    A comprehensive PostgreSQL inspired by Adam Machanic's sp_whoisactive stored 
    procedure used on SQL server. This function provides detailed real-time information about active 
    database sessions, including:
    
    - Current executing queries and their duration
    - Session blocking and lock information  
    - I/O statistics and resource usage
    - Connection details and client information
    - Transaction status and isolation levels
    - Wait states and performance metrics

CAPABILITIES:
    ✓ Monitor active and idle database sessions
    ✓ Identify blocking chains and deadlock scenarios
    ✓ Track query execution times and resource consumption
    ✓ Display I/O statistics (reads, writes, temp usage)
    ✓ Show connection information (host, application, user)
    ✓ Filter sessions by various criteria
    ✓ Format output for easy analysis
    ✓ Help system with parameter documentation

PARAMETERS:
    show_own_spid (BOOLEAN, default: FALSE)
        - Include the current session in results
        
    show_system_spids (BOOLEAN, default: FALSE)  
        - Include system/background processes
        
    show_sleeping_spids (INTEGER, default: 1)
        - 0: Show only active sessions
        - 1: Show active + idle sessions with open transactions
        - 2: Show all sessions (including completely idle)
        
    get_full_inner_text (BOOLEAN, default: FALSE)
        - TRUE: Display complete SQL query text
        - FALSE: Truncate long queries to 100 characters
        
    get_plans (BOOLEAN, default: FALSE)
        - Reserved for future query plan integration
        
    get_locks (BOOLEAN, default: FALSE)
        - Include detailed lock information in wait_info column
        
    get_avg_time (BOOLEAN, default: FALSE)
        - Reserved for future average execution time tracking
        
    get_additional_info (BOOLEAN, default: FALSE)
        - Reserved for additional session metadata
        
    find_block_leaders (BOOLEAN, default: TRUE)
        - Identify and report blocking relationships between sessions
        
    delta_interval (INTEGER, default: 5)
        - Reserved for future delta calculations between samples
        
    format_output (INTEGER, default: 1)
        - 1: Normal output format
        - 2: Reserved for detailed format
        
    destination_table (TEXT, default: NULL)
        - Reserved for future table output functionality
        
    help (BOOLEAN, default: FALSE)
        - Display help information and parameter descriptions

OUTPUT COLUMNS:
    dd_hh_mm_ss              - Session duration (DD:HH:MM:SS or HH:MM:SS)
    session_id               - PostgreSQL process ID (PID)
    sql_text                 - Current SQL statement being executed
    sql_command              - SQL command type (SELECT, INSERT, UPDATE, etc.)
    login_name               - Database username
    wait_info                - Current wait state or blocking information
    tran_log_writes          - Transaction information and duration
    cpu_time                 - CPU time used (reserved for future use)
    tempdb_allocations       - Temporary file count
    tempdb_current           - Current temporary space usage
    blocking_session_id      - PID of session blocking this one
    blocked_session_count    - Number of sessions this one is blocking
    reads                    - Read I/O statistics
    writes                   - Write I/O statistics  
    context_switches         - Context switch information (reserved)
    physical_io              - Physical I/O metrics (reserved)
    query_cost               - Query execution cost (reserved)
    status                   - Current session status
    isolation_level          - Transaction isolation level (reserved)
    open_tran_count          - Number of open transactions
    percent_complete         - Query completion percentage (reserved)
    host_name                - Client hostname or IP address
    database_name            - Current database name
    program_name             - Client application name
    start_time               - When the current query started
    login_time               - When the session was established
    request_id               - Request identifier (uses PID)
    collection_time          - Timestamp when data was collected

USAGE EXAMPLES:
    -- Basic monitoring (most common use case)
    SELECT * FROM pg_whoisactive();
    
    -- Show all sessions including idle ones
    SELECT * FROM pg_whoisactive(show_sleeping_spids := 2);
    
    -- Focus on blocking issues
    SELECT * FROM pg_whoisactive(find_block_leaders := true, get_locks := true)
    WHERE blocking_session_id IS NOT NULL OR blocked_session_count > 0;
    
    -- Detailed analysis of active sessions
    SELECT * FROM pg_whoisactive(
        show_sleeping_spids := 0,
        get_full_inner_text := true,
        get_locks := true
    );
    
    -- Monitor specific session
    SELECT * FROM pg_whoisactive() WHERE session_id = 12345;
    
    -- Get help
    SELECT * FROM pg_whoisactive(help := true);

COMPANION FUNCTIONS:
    pg_blocking_tree() - Shows hierarchical blocking relationships

COMPATIBILITY:
    - PostgreSQL 9.6+
    - Requires pg_stat_activity and pg_locks system views
    - Works with all PostgreSQL editions

AUTHOR: Database Monitoring Tool
VERSION: 1.0
CREATED: 2024
================================================================================
*/

CREATE OR REPLACE FUNCTION public.pg_whoisactive(
    show_own_spid BOOLEAN DEFAULT FALSE,           -- Show current session
    show_system_spids BOOLEAN DEFAULT FALSE,       -- Show system processes  
    show_sleeping_spids INTEGER DEFAULT 1,         -- 0=no idle, 1=sleeping with open trans, 2=all sleeping
    get_full_inner_text BOOLEAN DEFAULT FALSE,     -- Show full query text
    get_plans BOOLEAN DEFAULT FALSE,               -- Get query plans (if available)
    get_locks BOOLEAN DEFAULT FALSE,               -- Show lock information
    get_avg_time BOOLEAN DEFAULT FALSE,            -- Show average execution times
    get_additional_info BOOLEAN DEFAULT FALSE,     -- Show additional session info
    find_block_leaders BOOLEAN DEFAULT TRUE,       -- Identify blocking chain leaders
    delta_interval INTEGER DEFAULT 5,              -- Seconds between samples for delta calculations
    format_output INTEGER DEFAULT 1,               -- 1=normal, 2=detailed
    destination_table TEXT DEFAULT NULL,           -- Future: output to table
    help BOOLEAN DEFAULT FALSE                     -- Show help information
)
RETURNS TABLE (
    dd_hh_mm_ss TEXT,                    -- Session duration
    session_id INTEGER,                  -- Process ID
    sql_text TEXT,                       -- Current SQL statement
    sql_command TEXT,                    -- Command type (SELECT, INSERT, etc.)
    login_name TEXT,                     -- Username
    wait_info TEXT,                      -- What the session is waiting for
    tran_log_writes TEXT,               -- Transaction info
    cpu_time TEXT,                      -- CPU time (if available)
    tempdb_allocations TEXT,            -- Temp space usage
    tempdb_current TEXT,                -- Current temp usage
    blocking_session_id INTEGER,        -- Session that's blocking this one
    blocked_session_count INTEGER,      -- How many sessions this one is blocking
    reads TEXT,                         -- Read statistics
    writes TEXT,                        -- Write statistics
    context_switches TEXT,              -- Context switch info
    physical_io TEXT,                   -- Physical I/O
    query_cost TEXT,                    -- Query cost (if available)
    status TEXT,                        -- Session status
    isolation_level TEXT,               -- Transaction isolation level
    open_tran_count INTEGER,            -- Number of open transactions
    percent_complete TEXT,              -- Progress (if available)
    host_name TEXT,                     -- Client hostname/IP
    database_name TEXT,                 -- Current database
    program_name TEXT,                  -- Application name
    start_time TIMESTAMP WITH TIME ZONE, -- When session started
    login_time TIMESTAMP WITH TIME ZONE, -- When user logged in
    request_id INTEGER,                  -- Request ID
    collection_time TIMESTAMP WITH TIME ZONE -- When this data was collected
) 
LANGUAGE plpgsql
AS $function$

-- pg_whoisactive - PostgreSQL equivalent of SQL Server's sp_whoisactive
-- A comprehensive session monitoring tool for PostgreSQL

DECLARE
    rec RECORD;
    blocking_info RECORD;
    blocked_count INTEGER;
    lock_info TEXT;
    wait_description TEXT;
    query_text TEXT;
    cmd_type TEXT;
    duration_formatted TEXT;
    days INTEGER;
    hours INTEGER;
    minutes INTEGER;
    seconds INTEGER;
    total_seconds NUMERIC;
    reads_value TEXT;
    writes_value TEXT;
    temp_files_value TEXT;
    temp_bytes_value TEXT;
BEGIN
    -- Show help if requested
    IF help THEN
        RETURN QUERY
        SELECT 
            'HELP'::TEXT as dd_hh_mm_ss,
            0 as session_id,
            'pg_whoisactive - PostgreSQL equivalent of sp_whoisactive'::TEXT as sql_text,
            'HELP'::TEXT as sql_command,
            'show_own_spid: Include current session (default: false)'::TEXT as login_name,
            'show_system_spids: Include system processes (default: false)'::TEXT as wait_info,
            'show_sleeping_spids: 0=no idle, 1=idle with trans, 2=all idle (default: 1)'::TEXT as tran_log_writes,
            'get_full_inner_text: Show complete SQL text (default: false)'::TEXT as cpu_time,
            'get_locks: Include lock information (default: false)'::TEXT as tempdb_allocations,
            'find_block_leaders: Identify blocking leaders (default: true)'::TEXT as tempdb_current,
            NULL::INTEGER as blocking_session_id,
            NULL::INTEGER as blocked_session_count,
            'Usage: SELECT * FROM pg_whoisactive();'::TEXT as reads,
            'Advanced: SELECT * FROM pg_whoisactive(show_sleeping_spids := 2, get_locks := true);'::TEXT as writes,
            NULL::TEXT as context_switches,
            NULL::TEXT as physical_io,
            NULL::TEXT as query_cost,
            NULL::TEXT as status,
            NULL::TEXT as isolation_level,
            NULL::INTEGER as open_tran_count,
            NULL::TEXT as percent_complete,
            NULL::TEXT as host_name,
            NULL::TEXT as database_name,
            NULL::TEXT as program_name,
            NULL::TIMESTAMP WITH TIME ZONE as start_time,
            NULL::TIMESTAMP WITH TIME ZONE as login_time,
            NULL::INTEGER as request_id,
            now() as collection_time;
        RETURN;
    END IF;

    -- Main query logic
    FOR rec IN 
        SELECT 
            a.pid,
            a.usename,
            a.application_name,
            a.client_addr,
            a.client_hostname,
            a.backend_start,
            a.xact_start,
            a.query_start,
            a.state_change,
            a.state,
            a.query,
            a.datname,
            a.backend_type,
            -- Get I/O stats if available
            COALESCE(s.blks_read, 0) as blks_read,
            COALESCE(s.blks_hit, 0) as blks_hit,
            COALESCE(s.tup_returned, 0) as tup_returned,
            COALESCE(s.tup_fetched, 0) as tup_fetched,
            COALESCE(s.tup_inserted, 0) as tup_inserted,
            COALESCE(s.tup_updated, 0) as tup_updated,
            COALESCE(s.tup_deleted, 0) as tup_deleted,
            COALESCE(s.temp_files, 0) as temp_files,
            COALESCE(s.temp_bytes, 0) as temp_bytes
        FROM pg_stat_activity a
        LEFT JOIN pg_stat_database s ON s.datname = a.datname
        WHERE 
            -- Filter current session
            (show_own_spid = TRUE OR a.pid <> pg_backend_pid())
            -- Filter system processes  
            AND (show_system_spids = TRUE OR (a.datname IS NOT NULL AND a.backend_type = 'client backend'))
            -- Filter idle connections based on show_sleeping_spids parameter
            AND (
                show_sleeping_spids = 2 OR  -- Show all
                (show_sleeping_spids = 1 AND (a.state <> 'idle' OR a.xact_start IS NOT NULL)) OR  -- Show active or idle in transaction
                (show_sleeping_spids = 0 AND a.state = 'active')  -- Show only active
            )
        ORDER BY 
            CASE WHEN a.state = 'active' THEN 1 ELSE 2 END,  -- Active sessions first
            a.query_start DESC NULLS LAST
    LOOP
        -- Calculate and format session duration
        total_seconds := EXTRACT(EPOCH FROM (COALESCE(now() - rec.backend_start, INTERVAL '0')));
        days := FLOOR(total_seconds / 86400);
        hours := FLOOR((total_seconds % 86400) / 3600);
        minutes := FLOOR((total_seconds % 3600) / 60);
        seconds := FLOOR(total_seconds % 60);
        
        IF days > 0 THEN
            duration_formatted := LPAD(days::TEXT, 2, '0') || ':' || LPAD(hours::TEXT, 2, '0') || ':' || LPAD(minutes::TEXT, 2, '0') || ':' || LPAD(seconds::TEXT, 2, '0');
        ELSE
            duration_formatted := LPAD(hours::TEXT, 2, '0') || ':' || LPAD(minutes::TEXT, 2, '0') || ':' || LPAD(seconds::TEXT, 2, '0');
        END IF;

        -- Process SQL text
        IF get_full_inner_text THEN
            query_text := rec.query;
        ELSE
            query_text := CASE 
                WHEN rec.query IS NULL THEN NULL
                WHEN LENGTH(rec.query) > 100 THEN LEFT(rec.query, 97) || '...'
                ELSE rec.query 
            END;
        END IF;

        -- Extract command type
        IF rec.query IS NOT NULL AND TRIM(rec.query) <> '' THEN
            cmd_type := UPPER(SPLIT_PART(TRIM(rec.query), ' ', 1));
            IF cmd_type = '' THEN cmd_type := NULL; END IF;
        ELSE
            cmd_type := NULL;
        END IF;

        -- Determine wait info and status
        wait_description := CASE 
            WHEN rec.state = 'active' THEN 'RUNNABLE'
            WHEN rec.state = 'idle' THEN 'IDLE'
            WHEN rec.state = 'idle in transaction' THEN 'IDLE IN TRANSACTION'
            WHEN rec.state = 'idle in transaction (aborted)' THEN 'IDLE IN TRANSACTION (ABORTED)'
            WHEN rec.state = 'fastpath function call' THEN 'FASTPATH FUNCTION CALL'
            WHEN rec.state = 'disabled' THEN 'DISABLED'
            ELSE UPPER(rec.state)
        END;

        -- Find blocking information if requested
        IF find_block_leaders THEN
            -- Check if this session is blocked
            SELECT b.blocking_pid INTO blocking_info
            FROM (
                SELECT 
                    blocked_locks.pid as blocked_pid,
                    blocking_locks.pid as blocking_pid
                FROM pg_catalog.pg_locks blocked_locks
                    JOIN pg_catalog.pg_locks blocking_locks ON (
                        blocking_locks.locktype = blocked_locks.locktype
                        AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
                        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
                        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
                        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
                        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
                        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
                        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
                        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
                        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
                        AND blocking_locks.virtualtransaction IS NOT DISTINCT FROM blocked_locks.virtualtransaction
                    )
                WHERE NOT blocked_locks.granted 
                    AND blocking_locks.granted
                    AND blocked_locks.pid <> blocking_locks.pid
            ) b
            WHERE b.blocked_pid = rec.pid;

            -- Count how many sessions this one is blocking
            SELECT COUNT(*) INTO blocked_count
            FROM (
                SELECT 
                    blocked_locks.pid as blocked_pid,
                    blocking_locks.pid as blocking_pid
                FROM pg_catalog.pg_locks blocked_locks
                    JOIN pg_catalog.pg_locks blocking_locks ON (
                        blocking_locks.locktype = blocked_locks.locktype
                        AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
                        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
                        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
                        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
                        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
                        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
                        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
                        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
                        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
                        AND blocking_locks.virtualtransaction IS NOT DISTINCT FROM blocked_locks.virtualtransaction
                    )
                WHERE NOT blocked_locks.granted 
                    AND blocking_locks.granted
                    AND blocked_locks.pid <> blocking_locks.pid
            ) b
            WHERE b.blocking_pid = rec.pid;
        ELSE
            blocking_info := NULL;
            blocked_count := 0;
        END IF;

        -- Get lock information if requested
        IF get_locks THEN
            SELECT STRING_AGG(
                l.locktype || ':' || l.mode || 
                CASE WHEN l.granted THEN ':GRANTED' ELSE ':WAITING' END, 
                ', '
            ) INTO lock_info
            FROM pg_locks l
            WHERE l.pid = rec.pid;
        ELSE
            lock_info := NULL;
        END IF;

        -- Format read/write statistics
        reads_value := CASE 
            WHEN rec.blks_read > 0 OR rec.blks_hit > 0 THEN 
                'Logical: ' || (rec.blks_read + rec.blks_hit)::TEXT || ', Physical: ' || rec.blks_read::TEXT
            ELSE NULL 
        END;

        writes_value := CASE 
            WHEN (rec.tup_inserted + rec.tup_updated + rec.tup_deleted) > 0 THEN 
                'Rows: ' || (rec.tup_inserted + rec.tup_updated + rec.tup_deleted)::TEXT
            ELSE NULL 
        END;

        -- Format temp usage
        temp_files_value := CASE 
            WHEN rec.temp_files > 0 THEN rec.temp_files::TEXT || ' files'
            ELSE NULL 
        END;

        temp_bytes_value := CASE 
            WHEN rec.temp_bytes > 0 THEN pg_size_pretty(rec.temp_bytes)
            ELSE NULL 
        END;

        -- Return the row
        RETURN QUERY
        SELECT 
            duration_formatted,                                         -- dd_hh_mm_ss
            rec.pid,                                                   -- session_id
            query_text,                                                -- sql_text
            cmd_type,                                                  -- sql_command
            rec.usename::TEXT,                                         -- login_name
            CASE 
                WHEN blocking_info.blocking_pid IS NOT NULL THEN 
                    'BLOCKED BY: ' || blocking_info.blocking_pid::TEXT
                WHEN get_locks AND lock_info IS NOT NULL THEN lock_info
                ELSE wait_description 
            END,                                                       -- wait_info
            CASE 
                WHEN rec.xact_start IS NOT NULL THEN 
                    'Open: ' || EXTRACT(EPOCH FROM (now() - rec.xact_start))::INTEGER::TEXT || 's'
                ELSE NULL 
            END,                                                       -- tran_log_writes
            NULL::TEXT,                                                -- cpu_time (not available in PostgreSQL)
            temp_files_value,                                          -- tempdb_allocations
            temp_bytes_value,                                          -- tempdb_current
            blocking_info.blocking_pid,                                -- blocking_session_id
            CASE WHEN blocked_count > 0 THEN blocked_count ELSE NULL END, -- blocked_session_count
            reads_value,                                               -- reads
            writes_value,                                              -- writes
            NULL::TEXT,                                                -- context_switches (not available)
            NULL::TEXT,                                                -- physical_io
            NULL::TEXT,                                                -- query_cost (would need EXPLAIN)
            wait_description,                                          -- status
            NULL::TEXT,                                                -- isolation_level (would need transaction info)
            CASE WHEN rec.xact_start IS NOT NULL THEN 1 ELSE 0 END,   -- open_tran_count
            NULL::TEXT,                                                -- percent_complete (not available)
            COALESCE(rec.client_hostname, rec.client_addr::TEXT, 'local'), -- host_name
            rec.datname::TEXT,                                         -- database_name
            COALESCE(rec.application_name::TEXT, 'unknown'),           -- program_name
            rec.query_start,                                           -- start_time
            rec.backend_start,                                         -- login_time
            rec.pid,                                                   -- request_id (using pid)
            now();                                                     -- collection_time
    END LOOP;
END;
$function$;

-- Create a helper function for quick blocking analysis
CREATE OR REPLACE FUNCTION pg_blocking_tree()
RETURNS TABLE (
    level INTEGER,
    blocking_pid INTEGER,
    blocked_pid INTEGER,
    blocked_user TEXT,
    blocked_query TEXT,
    blocked_duration TEXT
)
LANGUAGE sql
AS $$
/*
================================================================================
pg_blocking_tree - Hierarchical Blocking Chain Analysis
================================================================================

DESCRIPTION:
    Displays blocking relationships in a hierarchical tree format, showing
    which sessions are blocking others and the chain of dependencies.
    
RETURNS:
    level               - Depth in the blocking hierarchy
    blocking_pid        - Session causing the block
    blocked_pid         - Session being blocked
    blocked_user        - Username of blocked session
    blocked_query       - SQL being blocked
    blocked_duration    - How long the session has been blocked

USAGE:
    SELECT * FROM pg_blocking_tree();
================================================================================
*/
    WITH RECURSIVE blocking_tree AS (
        -- Find the root blockers (sessions that block others but aren't blocked themselves)
        SELECT 
            1 as level,
            blocking_locks.pid as blocking_pid,
            blocked_locks.pid as blocked_pid,
            blocked_activity.usename as blocked_user,
            blocked_activity.query as blocked_query,
            EXTRACT(EPOCH FROM (now() - blocked_activity.query_start))::INTEGER::TEXT || 's' as blocked_duration
        FROM pg_catalog.pg_locks blocked_locks
            JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
            JOIN pg_catalog.pg_locks blocking_locks ON (
                blocking_locks.locktype = blocked_locks.locktype
                AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
                AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
                AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
                AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
                AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
                AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
                AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
                AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
                AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
                AND blocking_locks.virtualtransaction IS NOT DISTINCT FROM blocked_locks.virtualtransaction
            )
        WHERE NOT blocked_locks.granted 
            AND blocking_locks.granted
            AND blocked_locks.pid <> blocking_locks.pid
    )
    SELECT * FROM blocking_tree
    ORDER BY level, blocking_pid, blocked_pid;
$$;

/*
================================================================================
USAGE EXAMPLES AND REFERENCE
================================================================================

-- BASIC MONITORING
--------------------------------------------------------------------------------
-- View all active sessions (most common usage)
SELECT * FROM pg_whoisactive();

-- Show only currently executing queries
SELECT * FROM pg_whoisactive(show_sleeping_spids := 0);

-- Include all sessions (even completely idle ones)
SELECT * FROM pg_whoisactive(show_sleeping_spids := 2);


-- BLOCKING ANALYSIS
--------------------------------------------------------------------------------
-- Quick check for blocking issues
SELECT * FROM pg_whoisactive(find_block_leaders := true) 
WHERE blocking_session_id IS NOT NULL OR blocked_session_count > 0;

-- Detailed blocking analysis with lock information
SELECT session_id, login_name, sql_text, wait_info, blocking_session_id, blocked_session_count
FROM pg_whoisactive(get_locks := true, find_block_leaders := true)
WHERE blocking_session_id IS NOT NULL OR blocked_session_count > 0;

-- Show complete blocking hierarchy
SELECT * FROM pg_blocking_tree();


-- DETAILED SESSION ANALYSIS  
--------------------------------------------------------------------------------
-- Full query text and detailed information
SELECT * FROM pg_whoisactive(
    get_full_inner_text := true,
    get_locks := true,
    find_block_leaders := true
);

-- Monitor specific session by PID
SELECT * FROM pg_whoisactive() WHERE session_id = 12345;

-- Long-running transactions
SELECT session_id, login_name, sql_command, tran_log_writes, dd_hh_mm_ss
FROM pg_whoisactive()
WHERE tran_log_writes IS NOT NULL
ORDER BY dd_hh_mm_ss DESC;


-- PERFORMANCE MONITORING
--------------------------------------------------------------------------------
-- Sessions with high I/O activity
SELECT session_id, login_name, sql_command, reads, writes, tempdb_current
FROM pg_whoisactive()
WHERE reads IS NOT NULL OR writes IS NOT NULL OR tempdb_current IS NOT NULL;

-- Currently active sessions sorted by duration
SELECT session_id, dd_hh_mm_ss, login_name, sql_text, status
FROM pg_whoisactive(show_sleeping_spids := 0)
ORDER BY dd_hh_mm_ss DESC;


-- TROUBLESHOOTING QUERIES
--------------------------------------------------------------------------------
-- Find sessions from specific application
SELECT * FROM pg_whoisactive() WHERE program_name LIKE '%myapp%';

-- Find sessions from specific user
SELECT * FROM pg_whoisactive() WHERE login_name = 'problem_user';

-- Find sessions connected from specific host
SELECT * FROM pg_whoisactive() WHERE host_name LIKE '%problemhost%';

-- Sessions in specific database
SELECT * FROM pg_whoisactive() WHERE database_name = 'production_db';


-- ADMINISTRATIVE TASKS
--------------------------------------------------------------------------------
-- Help and parameter reference
SELECT * FROM pg_whoisactive(help := true);

-- Include system processes for maintenance operations
SELECT * FROM pg_whoisactive(show_system_spids := true);

-- Monitor your own session activity
SELECT * FROM pg_whoisactive(show_own_spid := true) WHERE session_id = pg_backend_pid();

================================================================================
*/