By Vinay Rahul Are, Python Enthusiast \& Improv Comedy Fan
Introduction
Learning Python can feel intimidating—unless you approach it with a sense of play! Just like improv comedy, Python is about saying “yes, and…” to new ideas, experimenting, and having fun. In this post, I’ll walk you through a real-world Python script, breaking down each part so you can understand, explain, and even perform it yourself!
The Script’s Purpose
The script we’ll explore automates the process of running multiple SQL files against an Amazon Redshift database. For each SQL file, it:
- Executes the file’s SQL commands on Redshift
- Logs how many rows were affected, how long it took, and any errors
- Moves the file to a “Done” folder when finished
It’s a practical tool for data engineers, but the structure and logic are great for any Python beginner to learn from.
1. The “Show Description” (Docstring)
At the top, you’ll find a docstring—a big comment block that tells you what the script does, what you need to run it, and how to use it.
"""
Batch Redshift SQL Script Executor with Per-Script Logging, Timing, and Post-Execution Archiving
Pre-requisites:
---------------
1. Python 3.x installed on your machine.
2. The following Python packages must be installed:
- psycopg2-binary
3. (Recommended) Use a virtual environment to avoid dependency conflicts.
4. Network access to your Amazon Redshift cluster.
Installation commands:
----------------------
python -m venv venv
venv\Scripts\activate # On Windows
pip install psycopg2-binary
Purpose:
--------
This script automates the execution of multiple .sql files against an Amazon Redshift cluster...
"""
2. Importing the “Cast and Crew” (Modules)
Every show needs its cast. In Python, that means importing modules:
import os
import glob
import psycopg2
import getpass
import shutil
import time
os,glob,shutil: Handle files and folderspsycopg2: Talks to the Redshift databasegetpass: Securely prompts for passwordstime: Measures how long things take
3. The “Stage Directions” (Configuration)
Before the curtain rises, set your stage:
HOST = '<redshift-endpoint>'
PORT = 5439
USER = '<your-username>'
DATABASE = '<your-database>'
SCRIPT_DIR = r'C:\redshift_scripts'
DONE_DIR = os.path.join(SCRIPT_DIR, 'Done')
- Replace the placeholders with your actual Redshift details and script folder path.
4. The “Comedy Routine” (Function Definition)
The main function, run_sql_script, is like a well-rehearsed bit:
def run_sql_script(script_path, conn):
log_path = os.path.splitext(script_path)[0] + '.log'
with open(script_path, 'r', encoding='utf-8') as sql_file, open(log_path, 'w', encoding='utf-8') as log_file:
sql = sql_file.read()
log_file.write(f"Running script: {script_path}\n")
start_time = time.perf_counter()
try:
with conn.cursor() as cur:
cur.execute(sql)
end_time = time.perf_counter()
elapsed_time = end_time - start_time
rows_affected = cur.rowcount if cur.rowcount != -1 else 'Unknown'
log_file.write(f"Rows affected: {rows_affected}\n")
log_file.write(f"Execution time: {elapsed_time:.2f} seconds\n")
conn.commit()
log_file.write("Execution successful.\n")
except Exception as e:
end_time = time.perf_counter()
elapsed_time = end_time - start_time
log_file.write(f"Error: {str(e)}\n")
log_file.write(f"Execution time (until error): {elapsed_time:.2f} seconds\n")
conn.rollback()
- Reads the SQL file
- Logs what’s happening
- Measures execution time
- Handles success or errors gracefully
5. The “Main Event” (main function)
This is the showrunner, making sure everything happens in order:
def main():
password = getpass.getpass("Enter your Redshift password: ")
if not os.path.exists(DONE_DIR):
os.makedirs(DONE_DIR)
sql_files = glob.glob(os.path.join(SCRIPT_DIR, '*.sql'))
conn = psycopg2.connect(
host=HOST,
port=PORT,
user=USER,
password=password,
dbname=DATABASE
)
for script_path in sql_files:
print(f"Running {script_path} ...")
run_sql_script(script_path, conn)
try:
shutil.move(script_path, DONE_DIR)
print(f"Moved {script_path} to {DONE_DIR}")
except Exception as move_err:
print(f"Failed to move {script_path}: {move_err}")
conn.close()
print("All scripts executed.")
- Prompts for your password (no peeking!)
- Makes sure the “Done” folder exists
- Finds all
.sqlfiles - Connects to Redshift
- Runs each script, logs results, and moves the file when done
6. The “Curtain Call” (Script Entry Point)
This line ensures the main event only happens if you run the script directly:
if __name__ == "__main__":
main()
7. Explaining the Script in Plain English
“This script automates running a bunch of SQL files against a Redshift database. For each file, it logs how many rows were affected, how long it took, and any errors. After running, it moves the file to a ‘Done’ folder so you know it’s finished. It’s organized with clear sections for setup, reusable functions, and the main execution flow.”
8. Why This Structure?
- Imports first: So all your helpers are ready before the show starts.
- Functions: Keep the code neat, reusable, and easy to understand.
- Main block: Keeps your script from running accidentally if imported elsewhere.
- Comments and docstrings: Make it easy for others (and future you) to understand what’s going on.
9. Final Thoughts: Python is Improv!
Just like improv, Python is best learned by doing. Try things out, make mistakes, and remember: if your code “crashes,” it’s just the computer’s way of saying, “Yes, and…let’s try that again!”
If you want to dig deeper into any part of this script, just ask in the comments below. Happy coding—and yes, and… keep learning!