Database developers have a tough job. Whether they are using SQL Server, Oracle, DB2, MySQL, PostgreSQL or SQLite, the challenges are similar. It’s easy to write queries that perform poorly, waste system resources, or don’t take advantage of database features designed to make life easier.
Here are seven common pitfalls to avoid when writing database applications.
7 SQL mistakes to avoid
Blindly reusing queries
Nesting views
Executing large, multi-table transactions in a single transaction
Clustering on GUIDs or other “temporary” columns
Counting rows to check if data is present
Using triggers
Making negative calls
Blindly reusing queries
A SQL query is often tailored to retrieve the data needed for a specific job. If you reuse a query that fits most of your use case, it can work externally but also provide a lot of data.
Nesting views
Views provide a standard way to look at data and eliminate users from having to deal with complex queries. The problem occurs when we use views to query other views.
These views, called nesting views, have many disadvantages. First, they query much more data than you normally need.
If you use a view, don’t query other views with it. All nested views should be “flattened” and rewritten to take only what is needed.
Executing large, multi-table transactions in a single transaction
Let’s say you need to delete data from 10 tables as part of a transaction. You may be tempted to run all deletes on all tables in a single transaction, but don’t do this. Instead, treat each table’s transactions separately.
If you want deletes on tables to occur atomically, you can split it into many smaller operations. For example, if you have 10,000 rows across 20 tables that need to be deleted, you can delete the first thousand rows from all 20 tables in one operation, then the next thousand rows in another operation, and so on. you can delete it. (This is another good use case for a task queue mechanism in your business logic, where processes like this can be managed, paused, and resumed if necessary.)
Clustering on GUIDs or other ‘temporary’ columns
GUIDs, or unique identifiers in general, are 16-byte random numbers used to give objects some distinct identifier. Many databases support these as native column types. However, they should not be used to cluster the rows they inhabit. Because they are random, they cause the clustering of the table to become highly fragmented. Table operations can become much slower very quickly. In short, don’t cluster in columns with too much randomness. Dates or ID columns work best.
Counting rows to check if data is present
Using an operation such as SELECT COUNT(ID) FROM table1 to determine whether some data exists in a table is generally inefficient. Some databases can intelligently optimize SELECT COUNT() operations, but not all have this capability. A better approach is to use something like IF EXISTS (SELECT 1 FROM table1 LIMIT 1) BEGIN…END if your SQL dialect offers this.
If what you want is number of rows, another approach is to obtain row count statistics from the system table. Some database vendors also have specific queries; for example, in MySQL you can use SHOW TABLE STATUS to get statistics on all tables, including row counts. Microsoft T-SQL has the sp_spaceused stored procedure.
Using triggers
As useful as triggers are, they come with a big limitation: They must occur in the same transaction as the original transaction. If you create a trigger to change one table while changing the other, both tables will be locked, at least until the trigger finishes. If you must use a trigger, make sure it doesn’t lock up more resources than can be tolerated. Stored procedures might be a better solution because they can break trigger-like operations across multiple transactions.
Making negative calls
Queries such as SELECT * FROM Users WHERE Users.Status <> 2 are problematic. An index on the Users.Status column is useful, but negative searches like this often fall back on a table scan. The better solution is to write queries in a way that efficiently uses spanning indexes; for example, SELECT * FROM Users WHERE User.ID NOT IN (Select Users.ID FROM USERS WHERE Users.Status=2). This allows us to use indexes on both the Identity and Status columns to parse things we don’t want, without doing a table scan.