Data Synchronization with OGR

CartoDB has an excellent and easy-to-use synced tables feature, that reads data from a public URL on a regular schedule and keeps an associated CartoDB table up-to-date. However, synced tables require the data to be publicly available so CartoDB can read it, and that the destination table get over-written at each update.

For some users, these limitations are not acceptable–their data is not public or they want to continually append new information to their table.

Fortunately, it’s easy to create your own live data tables in CartoDB, updated on your own schedule, from your private data, using the ogr2ogr tool from the open source GDAL library. The support for CartoDB in GDAL has gotten a lot better recently, so use the latest version from SVN, or the upcoming GDAL 2.0 release.

Initial Load

You can load a fresh table directly into CartoDB using ogr2ogr like this (I’m loading my trusty 1:250K map grid for British Columbia):

ogr2ogr \
  --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \
  -t_srs EPSG:4326 \
  -f CartoDB \
  "CartoDB:pramsey" \
  g250.shp

The commandline syntax of ogr2ogr is not the most obvious thing in the world.

  • The API key is required to allow write access into the database.
  • The -t_srs EPSG:4326 option tells ogr2ogr reproject the data to geographic coordinates, which is required for data to be published via the CartoDB Editor.
  • The -f CartoDB option tells ogr2ogr to use the CartoDB driver for writing.
  • The CartoDB:pramsey entry is the destination data, in this case, the pramsey account at CartoDB.
  • The g250.shp entry is the source data.

Once the command has run though, the data are available in the CartoDB dashboard, ready for mapping. (It may take a minute for the dashboard to recognize the new table, be patient, and hit reload.)

Refreshing the Data

Once you’ve loaded and styled your data, and published your map, you want to keep the map up-to-date with the latest data. (Remember, if your data are public, then the easiest way to do this is with a synced table).

If you’ve read the ogr2ogr manual, you may want to use the -overwrite option: don’t.

The overwrite option will drop the table, and then create a replacement. For some applications this would be fine, but for CartoDB it means the associated visualizations will also get dropped. For that reason we want to leave the table in place, but simply remove the data inside it before loading new data.

  • First, remove all the existing data. We do that by sending the TRUNCATE command directly to the SQL API to quickly delete all the records in the table.

      key=abcdefghijabcdefghijabcdefghijabcdefghij
      sql=`echo "TRUNCATE TABLE g250" | tr ' ' +`
      curl "http://pramsey.cartodb.com:80/api/v2/sql?api_key=${key}&q=${sql}"
    
  • Then, load new data into the table. Now that the table is empty, we can run ogr2ogr in -append mode to put the new records in place.

      ogr2ogr \
          --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \
          -append \
          -t_srs EPSG:4326 \
          -f CartoDB \
          "CartoDB:pramsey" g250.shp
    

Now your table (and all the maps that depend on it) is full of the latest data, and you didn’t have to place the raw data on a public URL to make it happen.

Appending new Data

Sometimes applications with real-time data will not need to simply replace existing data, but rather add new data as it comes in. You can do this by just running ogr2ogr in -append mode without first running the TRUNCATE command.

The table will grow and grow and grow as you add data, and you may want to delete older data. If your raw data has a time stamp, you can use that as a key to delete old data. If it doesn’t, you can alter your table after import, to include a timestamp field that holds the date the data was loaded.

  • First, load your initial raw data.

      ogr2ogr \
          --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \
          -t_srs EPSG:4326 \
          -f CartoDB \
          "CartoDB:pramsey" g250.shp
    
  • Then, add a time column with a default value, so that it gets filled in automatically when new data are inserted.

      key=abcdefghijabcdefghijabcdefghijabcdefghij
      sql=`echo "ALTER TABLE g250 ADD COLUMN ts TIMESTAMP DEFAULT now()" | tr ' ' +`
      curl "http://pramsey.cartodb.com:80/api/v2/sql?api_key=${key}&q=${sql}"
    
  • Then, run your import in -append mode to see that new columns are timestamped with the time of import.

      ogr2ogr \
          --config CARTODB_API_KEY abcdefghijabcdefghijabcdefghijabcdefghij \
          -append \
          -t_srs EPSG:4326 \
          -f CartoDB \
          "CartoDB:pramsey" g250.shp
    
  • Finally, run a scheduled job to delete all records you consider too old to map. For example, 1 week old.

      key=abcdefghijabcdefghijabcdefghijabcdefghij
      sql=`echo "DELETE FROM g250 WHERE ts < now() - '1 week'::interval" | tr ' ' +`
      curl "http://pramsey.cartodb.com:80/api/v2/sql?api_key=${key}&q=${sql}"
    

That’s all there is to it! The GDAL 2.0 release is expected in the next month or so, at which point binary builds for Windows and other platforms will be available. But for now, pull down the source, build, and start syncing your data with ogr2ogr!

About the author

Ready to optimize your territories with Location Intelligence?

Close circle icon

Contact us

Please fill out the below form and we'll be in touch real soon.