Software Development Resources
Create account | Log in | Log in with OpenID | Help


From DocForge

Revision as of 13:30, 25 November 2011 by Matt (Talk | contribs)

SQL, Structured Query Language, is a language for storing and retrieving data in a relational database management system. It's an ANSI/ISO standard, yet most databases implement some additions to the language to compensate for its perceived shortcomings. It is the most famous and widely used language for relational database programming, although it departs significantly from the theoretical model of relational databases with features such as support for NULL values.



SQL statements are fundamentally similar to English sentences in their simplest form. They are supplemented with functions and mathematical statements. The easiest way to learn SQL is to master statements in their simplest form and then build progressively more complicated statements over time.

The terms "row" and "record" refer to one item in a table. Each record can have one or more "fields", also called "columns." Usually the terms records/fields or rows/columns are used together.


There are three statements for updating the data stored in database tables. One statement is used to retrieve the data. Generally, the reserved words used in the statements are not case sensitive, but that's determined by each database system's implementation and settings. Table and column name case sensitivity is also determined by the system.


The SELECT statement retrieves data from database tables and views.


    [ALL | DISTINCT] select_expression1, select_expression2, ...
    [FROM table_references
    [WHERE where_condition]
    [GROUP BY {column_name | expression | position}
      [ASC | DESC], ...]
    [HAVING where_condition]
    [ORDER BY {column_name | expression | position}
      [ASC | DESC], ...]
    [LIMIT {[offset,] row_count | row_count OFFSET offset}]

A simple SELECT statement would be:

SELECT id, username, favorite_color FROM users WHERE favorite_color = 'green';

This would return a result set similar to:

| id | username | favorite_color |
|  26|   jhonnym|           green|
|  57|   martinh|           green|
|  23| samanthag|           green|


The INSERT statement adds one or more rows to a table.


    [INTO] table_name [(column_name,...)]
    VALUES ({expression | DEFAULT},...),(...),...

    [INTO] table_name [(column_name,...)]
    SELECT ...


The UPDATE statement alters data in columns.

UPDATE [LOW_PRIORITY] [IGNORE] table_references
    SET column_name1=expression1 [, column_name2=expression2 ...]
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

The table_references syntax is identical to the FROM clause of a SELECT statement.


The DELETE statement removes one or more rows from a table.

    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

    table_name[.*] [, table_name[.*]] ...
    FROM table_references
    [WHERE where_condition]

From Clause

The FROM clause used in SELECT and DELETE statements, as well as table references in UPDATE statements, is used to specify one or more tables on which to act. It's also the standard place to specify table joins.

FROM table1 [table1_alias], table2 [table2_alias], ...

FROM table1 [table1_alias] INNER JOIN table2 [table2_alias] ON field1 = field2

FROM table1 [table1_alias] 
     INNER JOIN (table2 [table2_alias] 
          INNER JOIN table3 [table3_alias]
          ON table2.field = table3.field)
     ON table1.field = table2.field

The table alias names can be used in other parts of the same query as shorthand for a table name. For example:

SELECT, ca.state
FROM contact_address ca

Table Joins

Joins define the correlations between tables for processing a query. Without specifying the joins between tables, a SELECT query FROM two tables would assume the results should be from the cartesian product of those data sets, i.e. every record in one table returned for every record in the other table. Typically in a relational database tables are joined in queries by associated identifyers. For example, a product id column in a table of products might relate to a product id column in a table of shopping carts.

INNER JOINs define an exact matching correlation between columns. Only records from two data sets where the ON statement is always true are affected. Inner joining table a to table b on column id would only affect rows for which the ids match in both tables.

OUTER JOINs define a correlation where records exist in one table, but might not exist in another. Outer joins specify a direction to identify the table for which all records are affected. LEFT OUTER JOIN table a to table b ON id affects every row in table a and only rows in table b for which id matches in table a. The inverse it true for RIGHT OUTER JOINs.


Standard SQL includes a small set of scalar and aggregate functions. Scalar functions operate on a single value while aggregate functions operate on a set of values, typically from multiple rows.

For example, using the scalar function char_length:

SELECT char_length(col) FROM table1

will return the length of each column value in the table.

However, using the aggregate function sum:

SELECT sum(col) FROM table1

will return the sum of all column values combined in the table.

To provide more useful features, most database implementations offer more functions than the standard. While very useful, care must be taken if portability of SQL is a concern.

Aggregate Functions

Most aggregate functions can have their expression prefixed by an optional ALL or DISTINCT. ALL, the default, evaluates all rows. DISTINCT will only evaluate unique values when computing the aggregate.

Returns the average of the expression values; NULL values are ignored
Returns the count of rows as defined by the expression
Returns the minimum value
Returns the maximum value
Returns the sum of the expression values; NULL values are ignored

Scalar Functions

Returns the number of characters in a string as an integer
CONCATENATE(string1 || string2)
Returns string1 and string2 concatenated together as one string
SUBSTRING(string FROM starting_position [FOR length] [COLLATE name])
Returns part of a string

Query Performance

Query performance is based on many factors, including database engine implementation, server configuration, proper indexing, and also how the query is written. One query, producing one particular result set, can often be written in multiple ways with varying performance. While each situation must be analyzed individually, there are some general best practices for performance when writing SQL:

  • Only use DISTINCT when needed. If the result set is guaranteed to have distinct results, adding DISTINCT will cause the database to check for duplicates when it's not actually needed.
  • Don't join to any tables unnecessarily. Avoid joining to any table that is not required for obtaining data in a result set or for adding limitations to the query. The database server will perform the join, causing a performance penalty with no benefits.
  • Use indexes for columns often used in table joins and WHERE clauses. The purpose of any table index is to speed data lookups and these will be used whenever possible by the database server.
  • Create unique indexes when appropriate. Informing the database that indexed columns are unique provides for more efficient indexing and searching algorithms.
  • Don't use an outer join when an inner join will suffice. Inner joins, with proper indexes, are far more efficient.