Tutorials  /  Using SQL  /  Basic SQL

Common SQL Operations

When doing maps it’s a good idea to introduce you to a bit of the query language we use to render our maps. SQL is a language widely used to query relational databases and actually a powerful tool to analyze your data.

CARTO allows you to interact with your datasets using the interface so you can filter, order and modify your data values directly from Builder. Sometimes it will be useful to use the SQL view to perform more advanced tasks like formatting your data joining different tables.

This section will cover the basics of SQL and is meant to be exercised using either the procedure outlined in the frontpage of this module or the CARTO DATA tab SQL view, where you can run queries and see the results immediately. If you don’t have it yet, import from CARTO Data Library the ne_10m_populated_places_simple and world_bordersdatasets so you can follow all these queries.

Selecting all columns

The most basic query to a table is requesting all rows and columns.

SELECT *
  FROM ne_10m_populated_places_simple;

Selecting some columns

Sometimes we don’t need all the columns of a table so we can select just some of them by putting their names. This is specially useful when you have big tables. Note also that we can change the name of the columns using aliases.

  SELECT cartodb_id,
         name AS city,
         adm1name AS region,
         adm0name AS country,
         pop_max,
         pop_min
    FROM ne_10m_populated_places_simple

Selecting distinct values

If for any reason you want to know the domain of values that of a field use the DISTINCT keyword.

SELECT DISTINCT featurecla
  FROM ne_10m_populated_places_simple

Filtering numeric fields

Filtering is a common operation when working with CARTO. With the following examples you’ll see how to subset your table according to different criteria.

You can use the >, <, =, != operators to restrict a numeric or a date field.

SELECT name, pop_max
  FROM ne_10m_populated_places_simple
 WHERE pop_max > 5000000

Filtering character fields

Even you can use = with text fields, is more convenient to use LIKE or even better ILIKE. The former will do a case-insesitive search.

SELECT name, adm0name
  FROM ne_10m_populated_places_simple
 WHERE adm0name ilike 'spain'

Filtering a range

If you want to filter by several values you can use the IN keyword and pass a list of values between parenthesis and comma separated.

  SELECT *
    FROM ne_10m_populated_places_simple
   WHERE name IN ('Madrid', 'Barcelona')
     AND adm0name ilike 'spain'

Combining character and numeric filters

Filters can be combined using the AND, OR and NOT keywords. If you have doubts about the operator precedence is always good idea to use parenthesis to make explicit your conditions.

SELECT *
  FROM ne_10m_populated_places_simple
 WHERE name IN ('Madrid', 'Barcelona')
   AND adm0name ilike 'spain'
   AND pop_max > 5000000

Ordering results

Even you can order the results on Builder, sometimes it’s useful to order explicitly the results of your query by some field. By default ORDER works in ascending order (ASC) so you don’t need to specify it.

  SELECT cartodb_id,
         name AS city,
         adm1name AS region,
         adm0name AS country,
         pop_max
    FROM ne_10m_populated_places_simple
   WHERE adm0name ilike 'spain'
ORDER BY pop_max DESC

Limiting results

If your data is ordered, then you can limit the number of results to retrieve for example the top ten municipalities of Spain by population where Spanish PP party won.

  SELECT cartodb_id,
         name AS city,
         adm1name AS region,
         adm0name AS country,
         pop_max
    FROM ne_10m_populated_places_simple
   WHERE adm0name ilike 'spain'
ORDER BY pop_max DESC
   LIMIT 10

Making calculations

You can make calculations and run functions on your query SELECT part and also on the WHERE section. This way you can compute densities, normalize columns, format dates and numbers, etc. Next example shows how to get the 20 most densified countries.

WITH data AS (
  SELECT *,
         pop2005/ST_Area(the_geom::geography) * 1e6 AS pop_km2
    FROM world_borders
)
  SELECT *
    FROM data
ORDER BY pop_km2 DESC

More about mathematical functions here.

Note also the use of the WITH clause that creates a virtual named table that is afterwards referenced on the final SELECT. This is known as a Common Table Expression and you can find more about it at the official documentation.

Joining and grouping datasets

It’s very common to have different datasets that we need to join to produce a map. Typically we have a geographic reference dataset and we need to join it with some business data. To do so we use the JOIN clause where we refer to another table and make explicit the condition to join fields from one table to the other, normally using a common field or like in this case a spatial relation (don’t worry about that ST_Intersects function as for now).

Our example relates cities and countries, so we have a one to many relationship because we are counting cities inside every country and summing their population. To make the aggregation we group by country name. Finally on our example we order and limit the results to get the 10 countries with the highest sum of population on their main cities.

SELECT w.*,
       count(pp.*) AS places,
       sum(pp.pop_max)  AS cities_pop
  FROM world_borders AS w
  JOIN ne_10m_populated_places_simple AS pp
    ON ST_Intersects(w.the_geom,pp.the_geom)
 GROUP BY w.cartodb_id
 ORDER BY cities_pop DESC
 LIMIT 10

Note the use of alias for tables and how they are used at the SELECT section.

Note also the GROUP BY. Normally we should put all the world_borders fields but as the cartodb_id field is a Primary Key we don’t need to do so.

Other aggregation functions

There are other functions you can apply to your columns. For example you can compute aggregated functions to get the maximum, minimum and average values for a column.

SELECT count(*) AS counts,
       max(pop_max) AS max_pob,
       min(pop_max) AS min_pob,
       avg(pop_max) AS avg_pob
  FROM ne_10m_populated_places_simple;

ROUND and TRUNC will convert float numbers into integers, the first rounding to the nearest one. ROUND can also accept a second parameter to round to a specific decimal position. TO_CHAR is a more complex function that can be used to format numbers and dates into strings with decimal and thousand separators, any arbitrary date format, etc.

SELECT round(1.9) AS rounded,
       round(1.193,1) AS rounded2,
       trunc(1.9) AS truncated,
       to_char(12345.9332,'999,999.99') AS formatted,
       to_char(now(),'Day DD/MM/YY HH:mm') AS today;10:06:32' */

More about the TO_CHAR function here.