Are you a python developer? Have you ever implemented some python code to produce, load or process some data and then wanted to upload it to CARTO? Here is how.

Introducing the CARTO Python SDK’s COPY methods

A few weeks ago Paul posted about our new endpoints in the SQL API to use the PostgreSQL COPY statement for bulk ingestion of data into CARTO.

Although writing a client with requests library can be relatively straightforward, getting it right may be tricky. That’s why we decided to implement what we hope will be a reference client for the SQL API.

This implementation takes care of your COPY’s, and does it in a performant way. It takes care of the data to make sure it is transferred in chunks and uses compression by default, resulting in faster data transfer, in a more pythonic fashion.

Show me the code!

Here are some examples of usage of the python SDK for fast data transfer. Hopefully they will inspire you to accomplish greater endeavors with the SDK.

Working with CSV files

This is the simplest task to accomplish: import or export a CSV file. Okay, you can do it with the Import API. But using python and the CopySQLClient you gain great control over the process. Furthermore, it is several times faster than the traditional imports.

To COPY FROM a CSV file (import), you just need to specify the columns and the format in the COPY query and pass it along with the file to the copyfrom_file_path method:

copy_client = CopySQLClient(auth_client)

from_query = """COPY copy_example (the_geom, name, age)
FROM stdin WITH (FORMAT csv, HEADER true)"""

result = copy_client.copyfrom_file_path(
    from_query,
    'copy_from.csv'
)

Similarly, to COPY TO a CSV file (export), you specify a table or a query and the output file and you just get it done:

to_query = """COPY copy_example TO stdout
WITH (FORMAT csv, HEADER true)"""

copy_client.copyto_file_path(to_query, 'export.csv')

Pro tip: In the COPY TO query you can specify an arbitrary subquery. E.g:

COPY (SELECT * FROM copy_example WHERE cartodb_id <= 1000)
TO stdout WITH (FORMAT csv, HEADER true)

Here’s the complete source code of a working example demonstrating the basic usage of the methods above.

Transferring tables

The example above shows how to deal with CSV files stored on the disk.

However, you may want to transfer some data that lives within your script and that does not need to be stored just for the sake of transferring it. Here’s how to pipe the data from a table in an account to another table in another account:

response = copy_src_client.copyto(
    'COPY %s TO STDOUT' % TABLE_NAME
)

result = copy_dst_client.copyfrom(
    'COPY %s FROM STDIN' % TABLE_NAME,
    response
)

As simple as it gets, doesn’t it?

Here’s the complete source of a tool that lets you transfer tables across CARTO accounts. It does work across clouds and on-premise installations. Enjoy it!

Loading data from external services

The most interesting use case, in my opinion, is when you use python for extracting and processing some data, and then you want to transfer it.

As a demonstration of this concepts, you can find here a tool that gets fresh radar information, process it with numpy, and uploads it straight to CARTO. The resulting dataset was then used to produce the cover image of this post.

Use the Source, Luke

You can find here the documentation for the CopySQLClient as well as the reference documentation. That should be your next stop if you want to make the most of this new interface.

Here’s the source code repository. As always, we’re open to contributions via issues and PR’s.

What’s next

We plan to integrate this with CARTOFrames, so that its users can benefit from the speed boost.

We may also extend the interface of CopySQlClient in order to keep the implementation of client apps and integrations as simple as possible while keeping all the advantages of a good usage of the SQL API and underlying services.

Let us know if you have any feedback and suggestions.

Thanks for reading!