SELECT
namespace AS schemaname,
item AS object,
pu.groname AS groupname,
DECODE(
charindex (
'r',
split_part (
split_part (
array_to_string (relacl, '|'),
pu.groname,
2
),
'/',
1
)
),
0,
0,
1
) AS
SELECT
,
DECODE(
charindex (
'w',
split_part (
split_part (
array_to_string (relacl, '|'),
pu.groname,
2
),
'/',
1
)
),
0,
0,
1
) AS
UPDATE
,
DECODE(
charindex (
'a',
split_part (
split_part (
array_to_string (relacl, '|'),
pu.groname,
2
),
'/',
1
)
),
0,
0,
1
) AS
INSERT,
DECODE(
charindex (
'd',
split_part (
split_part (
array_to_string (relacl, '|'),
pu.groname,
2
),
'/',
1
)
),
0,
0,
1
) AS
DELETE
FROM
(SELECT
use.usename AS SUBJECT,
nsp.nspname AS namespace,
c.relname AS item,
c.relkind AS TYPE,
use2.usename AS OWNER,
c.relacl
FROM
pg_user USE
CROSS JOIN pg_class c
LEFT JOIN pg_namespace nsp
ON (c.relnamespace = nsp.oid)
LEFT JOIN pg_user use2
ON (c.relowner = use2.usesysid)
WHERE c.relowner = use.usesysid
AND item = 'fa_analystbyregion'
AND nsp.nspname NOT IN (
'pg_catalog',
'pg_toast',
'information_schema'
))
JOIN pg_group pu
ON array_to_string (relacl, '|') LIKE '%' || pu.groname || '%';
Month: June 2022
Generalized Best Practices
Executive Summary
Advice is seldom welcome.
And those who want it most
always like it the least.Philip Dormer Stanhope, Earl of Chesterfield (1694-1773)
Guiding philosophy of this document: The hero isn’t the developer that battles problems through the weekend, but the one that avoided the problem.
Best practices (BP in this document) can be a wide avenue leading to a solid, reliable infrastructure, or alternately the last refuge of a lazy scoundrel. The pivot on which this turns is justification. If a set of guidelines and rules leads to a cleaner business and technical process, then it is useful. Otherwise, the practices are not acceptable. BP must apply to all cases or show clearly where they are applicable.
In short, No Justification == Not Acceptable.
In this set of practices, the software development Capability Maturity Model will be used as a framework. These, in order of maturity, are:
- INITIAL – “Individual Heroics”
- REPEATABLE – Basic Processes, e.g. Project Management, Quality Assurance
- DEFINED – Documentation, Standardization, Integration, Communication
- MANAGED – Monitoring, Measurement, Reporting
- OPTIMIZED – Continuous Improvement
General Guidelines
The goal of this document is to create a solution set will allow any member of the development team who examines the tables and schema of Vacasa databases—whether production or development—to get a fair idea of the purpose and methodology of the database. This can be achieved in a number of ways:
- Comment all functions, procedures, triggers, CREATE scripts, macros and code:
Good commenting does not have to be a novella, nor necessarily dry and purely technical, but all relevant components should have at least an informative, brief explanation of the type of data requested, for what it will be used, and its final appearance. This information is entered in the design view of the table or within the script/code. - Please execute database work via JIRA tickets, and only via JIRA tickets
The sole purpose of a ticketing system is to provide automatic documentation and an audit trail. Do not work from email or Slack; this is simply asking for trouble. Rather, encourage project sponsors and/or your colleagues to file a ticket for actual work requests. Never execute work from a verbal order; this only leads to confusion, and IMs aren’t much better. If the issue at hand is your own project, file the ticket yourself so there is a work log. NO CHANGES to the database DDL should be made without a JIRA ticket in the database queue. - Provide written documentation
As long as the item above is being followed, this is no problem. Within the ticket(s), include the schema and how to use the database, relevant create scripts, and when necessary, file attachments and screen shots. This method also instantly allows feedback from other members of the technical group. - Naming conventions
With respect to data structure and schema creation, the Vacasa standard is all lowercase. Neither Leszynski and other Hungarian-style systems nor Microsoft/C/Java/Javascript systems will be used within Vacasa databases. The following conventions will apply:- Production databases will all be prefixed with prod_ (prod_domain, prod_stat, etc.)
- Database names will be singular.
- Table names will be singular. (e.g., account is valid, prod_domain.accounts is not)
- Column names are singular.
- Use the underscore as a word delimiter (e.g., flag_type is correct, flagtype is not)
- Never use any capital letters in a database, table, or column name. Within DBMS PERL scripts, follow the same conventions. For PHP and other development department coding, follow the development department’s best practices. The key here is to integrate into whatever environment you may find yourself.
- Remember that business users are neither technologists nor engineers. Do not expect them to be. Information technology exists to support what they do, not the reverse.
Detailed Standards
Architecture
Vacasa’s database servers can be divided into the following broad groups:
- Production
- Staging
- Development
- Utility
More servers may be added to the development or utility areas, but these are the primary classifications.
All revenue bearing processes and websites will access databases on production servers only. Under no circumstances will production websites or processes access any databases other than those housed on production-grade servers. Periodic auditing by senior developers and DBAs will ensure this is so.
Security
Development servers
All database instances defined as “development” class are relatively open access to all members of the tech team. Full privileges will be granted to all necessary personnel, including Update, Create, Drop, Alter, Create Routine, Alter Routine, and Lock Tables permissions. When requested, a DBA will properly notify all users of the server, refresh the database from production, and then reinitialize it. Note that prior notice is required with data refresh operations (see above under General Guidelines)
Utility servers
TBD.
Staging servers
TBD.
Production servers
Production servers are defined as restricted access boxes. There are two areas of security to consider—direct access to shell accounts on these servers and access/ permissions to the database instance(s) running on the production servers.
Users and passwords
- The shell-level access on production servers will be limited to system administrators and DBAs. Developers shall not have access to these servers. Likewise, only DBAs and automated processes will have any access at all to production databases. Each human user will have their own account, and each automated process will have its own account, which will be granted appropriate access only to the databases/tables to which it requires access. (see below)
- The “root” MySQL user on all production databases will be removed immediately after the clean installation of the DBMS is completed. The root user in MySQL has both superuser powers and is a known user. No users will be created on production databases without ticket-based approval from the senior DBA. Unapproved accounts will be immediately deleted. No passwords may ever be blank, and production passwords for software and automated operations will be appropriately encrypted. Whenever feasible, do not use the “all hosts” hostname (%) when creating users; rather, grant access only for hosts from which the user will be accessing the database instance.
- No production database server will be exposed directly to the internet for any reason.
- Automated tasks—shell/PERL/PHP scripts, stored procedures, etc.—will have individual database users created for their needs. Access will be granted to these automated users based on precisely what they need to accomplish, with table level, not global access. The nomenclature for these users will take the form of xxxxxxx_user, where xxxxxxx indicates the function of the process in question, e.g. sproc_user, tableau_user, appname_user, etc. Passwords for these users will remain confidential in the same manner as other production passwords.
Queries
- The first line of offense when tuning queries is EXPLAIN SELECT. Use it. Always.
- Always try to isolate index fields on one side of condition in a query
- Avoid using CURRENT_DATE() or NOW() as it invalidates the cache. Instead, grab the date/time value and assign it to a constant and then use the constant variable repeatedly within the query/stored procedure.
- Avoid correlated subqueries. Think in sets not loops.
- Avoid more than one self-join; if feasible, use temporary/memory tables to shrink the dataset with which the larger queries are working.
Programming standards
The following guidelines are presented in a bullet-pointed format as a set of practices which will do much towards optimizing our underlying database systems.
- Do not use the database for BLOB or TEXT storage unless a strong case can be made for the utility of such an endeavor. They are slow, involve external storage files, and are inefficient and indexing them is very expensive.
- Likewise, do not store images within the database. It’s a database, not a filesystem.
- Stick to ANSI SQL whenever possible. Using MySQL’s extra toolsets is certainly encouraged where warranted, but code becomes un-portable when extensions are utilized. Be aware that rewriting will be required if the ANSI standard is abandoned.
- Do not mix display code and database code.
- Do not store display code or HTML in the database, save parking templates.
- Use connection pooling whenever possible. This saves time and RAM.
- Design applications from the ground up to have separate read and write connection strings/filehandles. This makes load balancing possible when load becomes an issue.
- All shell/PERL/PHP scripts residing on production servers must be checked into source code control.
- All utility code written by DBA personnel on utility servers must be checked into source code control.
Table schemas
The following guidelines are presented in a bullet-pointed format as a set of practices which will do much towards optimizing our underlying database systems.
- Every table within approved tables will have one primary key. This column will be named “id” which is an integer type and defined as BIGINT UNSIGNED AUTO_INCREMENT.
- All tables will have one column named “mod_date” which is a DATETIME column and has the qualifier ON UPDATE CURRENT_TIMESTAMP enabled.
- Whenever appropriate, include one column named “create_date” of type DATETIME which inserts the creation date of the row via its DEFAULT VALUE.
- Normalize wherever possible, but remember that every step towards a true Type-5 normalization requires a separate JOIN statement to retrieve data. It is suggested that normalization only occur to the extent of removing redundant columns.
- Denormalization should only occur when the benefits are immediate, provable, and readily apparent, and do not conversely lead to multiple copies of the same data in multiple tables.
- InnoDB is the Vacasa standard table storage engine.
- Order your columns in such a way that the first column is always the “id” column, the penultimate column is “create_date,” and the last column is always “mod_date.”
Replication
TBD. Still getting my head around this vis a vis the wonky AWS setup.
Server Configuration, Tuning, and Maintenance
- When we move to standalone database instances, a standard skeleton my.cnf will be available via Confluence. Until such time, a set of standard SET GLOBAL commands will be available to be executed on RDS instance startup.
- Do not make changes in production without peer review.
- NEVER benchmark without a goal. Have a stated objective such as “improve overall performance by 20%”. Otherwise you’ll waste a lot of time tuning milliseconds out and miss other areas—perhaps even within the codebase—that are the actual bottleneck.
- Change just one thing at a time and re-run the benchmarks
- Disable the query cache by setting the cache size to 0 when running MySQL benchmarks.
- Log slow queries and use mysqldumpslow to parse the log. [NB: A shell script has already been developed to extract logs from table-based logs in a standard mysql logging formate for further analysis) The option (–log-queries-not-using-indexes) of logging any query that does not use an index on a table will also be judiciously utilized. However, always bear in mind that production logging is a two edged sword; every millisecond spent writing/flushing logs takes away from overall instance performance.
- Use the mytop utility for monitoring the threads and overall performance of MySQL servers.
- Repeated queries on an unindexed field will kill your application faster than anything else. Ensure that all queries are on indexed fields. I say again: EXPLAIN EXTENDED.
- Don’t de-normalize just because you think it will be easier to initially code. Start with normalized database schemes. Remember that someone will most likely have to maintain your code later, and it may not be you.
- Server parameter tweaking is not a catch-all. Tuning server parameters can help but it’s very specific to certain situations.
- On multi-column indexes, pay attention to order of fields within the index definition. Match the composite indexes to the queries, and assist developers in always querying in the same order.
- Use the smallest data types possible. Don’t use bigint, when int will do. Or, don’t use char(200), when a varchar or smaller char() would do.
- Consider horizontally spitting many-columned tables if they contain a lot of NULLs or rarely used columns. As a very generalized rule of thumb, if a table has more than 10 columns, it may need to be split into multiple tables.
- InnoDB can’t optimize SELECT COUNT(*) Use counter tables or gather this information from information_schema.
Find Locking/Blocking Redshift Queries
https://aws.amazon.com/premiumsupport/knowledge-center/prevent-locks-blocking-queries-redshift/
#!/bin/bash
#################################################################################
# findlockblocks.sh
#
# Dead-stupid script that leverages existing RS queries and does a mashup that reports
# the current running queries that are blocking others, sorted by time running.
#
# Nice, simple way to see if there's actually a problem or if RS is just swamped.
#
#
# TODO: Utilize a sourced bash file with usernames and passwords, as with other scripts
# in the library.
#
#################################################################################
RSUSER="vacasaroot"
RSPASS="xxxxxxxxxx"
read -r -d '' "stats_sql" << 'EOF'
SELECT
a.txn_owner,
a.xid,
a.pid,
a.txn_start,
a.lock_mode,
a.relation AS table_id,
nvl (TRIM(c."name"), d.relname) AS tablename,
a.granted,
b.pid AS blocking_pid,
DATEDIFF(s, a.txn_start, getdate ()) / 86400 || ' days ' || DATEDIFF(s, a.txn_start, getdate ()) % 86400 / 3600 || ' hrs ' || DATEDIFF(s, a.txn_start, getdate ()) % 3600 / 60 || ' mins ' || DATEDIFF(s, a.txn_start, getdate ()) % 60 || ' secs' AS txn_duration
FROM
svv_transactions a
LEFT JOIN
(SELECT
pid,
relation,
granted
FROM
pg_locks
GROUP BY 1,
2,
3) b
ON a.relation = b.relation
AND a.granted = 'f'
AND b.granted = 't'
LEFT JOIN
(SELECT
*
FROM
stv_tbl_perm
WHERE slice = 0) c
ON a.relation = c.id
LEFT JOIN pg_class d
ON a.relation = d.oid
WHERE a.relation IS NOT NULL;
EOF
echo "$stats_sql" > /tmp/stats.sql
BLOCKPIDS=`cat /tmp/stats.sql | PGPASSWORD=${RSPASS} psql -h warehouse.vacasa.services -p 5439 -U${RSUSER} -dwarehouse | cut -d'|' -f9 | grep "\S" | egrep -v "blocking|rows|\-\-\-"| sed 's/ //g' | sort -u`
#construct IN clause
for PID in $BLOCKPIDS
do
PIDCLAUSE="$PIDCLAUSE,$PID"
done
PIDCLAUSE=" AND pid in (${PIDCLAUSE:1}) "
stats_sql="select pid, duration/1000000 as seconds, trim(user_name) as user,substring(query,1,200) as querytxt from stv_recents where status = 'Running' and seconds >=1 ${PIDCLAUSE} order by seconds desc;"
echo $stats_sql > /tmp/stats.sql
echo "
Current active Redshift queries that are blocking the execution of other queries (may or may not be critical)
=============================================================================================================
"
cat /tmp/stats.sql | PGPASSWORD=${RSPASS} psql -h warehouse.vacasa.services -p 5439 -U${RSUSER} -dwarehouse
Sum Total Number Of Rows Affected By Queries
Stupid command line tricks: I often run SQL jobs that can take many days to fully execute, and I always pipe the error and stdout to a file. As such, you end up with lots of lines like this in the output:Query OK, 81218 rows affected (52.76 sec)Here’s a silly one-liner that will quickly total up ALLLLLL the rows that have been affected. (this assumes the output is being directed to a file named do_archive.out)
grep affected do_archive.out | cut -d' ' -f3 | perl -nle '$sum += $_ } END { print $sum'
Want to format your number with commas? Pipe it through this:
sed ':a;s/\B[0-9]\{3\}\>/,&/;ta'
Installing Homebrew On A Mac
Advanced Mac users may appreciate using the Homebrew package manager, which greatly simplifies the process of installing command line software and tools on a Mac.
For example, if you want to easily install favorite command line tools on a Mac like cask, htop, wget, nmap, tree, irssi, links, colordiff, or virtually any other familiar unix command line utility, you can do so with a simple command. Homebrew downloads and builds the package for you.
This is obviously aimed at more technically savvy Mac users who spend a lot of time at the command line. While there’s no particular issue for novice users installing Homebrew on their Mac, the odds of novices finding it useful are slim, unless they intend to embark on learning the command line environment. Contrast that to power users who practically live in a terminal environment, whether longtime Mac users or migrating to the platform from the Windows or Linux world, who will immediately see the value of Homebrew.
Requirements for Installing Homebrew on Mac OS
prerequisites to installing Homebrew on a Mac include the following:
- A Mac running Mac OS X 10.10 or later, though earlier versions are sort of supported
- Command Line Tools must be installed on the Mac (either independently or through Xcode)
- Knowledge of the command line and using bash or zsh
Assuming you’re interested in installing Homebrew and meet those requirements, then the rest is equally straight forward.
How to Install Homebrew on Mac OS
The simplest way to install Homebrew is through ruby and curl, accomplished with a single command. This approach is the same for installing Homebrew in all supported versions of Mac OS and Mac OS X.
- Open the “Terminal” application, found in /Applications/Utilities/
- Enter the following command into a single line of the terminal:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- Hit Return and you’ll see a series of lines about what the script will install and where, hit Return again to agree or hit Escape to cancel
- Enter the administrator password (required by sudo execution) to begin installation
Installation of Homebrew will take a while depending on the speed of your Mac and internet connection, as each necessary package is downloaded and installed by the script.
When complete, you will see an “Installation successful!” message.
Now you’re ready to install software packages through Homebrew, or you can read the help documentation with the following command:
brew help
Installing Software Packages through Homebrew on Mac
Installing packages with Homebrew is super easy, just use the following syntax:
brew install [package name]
For example, to install wget through Homebrew you could use the following syntax:
brew install wget
Simple, easy. Once complete you can run wget as usual.