COPY'ing with the Python SDK

Summary

We'd like to introduce the COPY in the CARTO's Python SDK to you

This post may describe functionality for an old version of CARTO. Find out about the latest and cloud-native version here.
COPY'ing with the Python SDK

Introducing the CARTO Python SDK's COPY methods

A few weeks ago Paul posted about our new endpoints in the SQL API to use the PostgreSQL COPY statement for bulk ingestion of data into CARTO.

Although writing a client with requests library can be relatively straightforward  getting it right may be tricky. That's why we decided to implement what we hope will be a reference client for the SQL API.

This implementation takes care of your COPY's  and does it in a performant way. It takes care of the data to make sure it is transferred in chunks and uses compression by default  resulting in faster data transfer  in a more pythonic fashion.

Show me the code!

Here are some examples of usage of the python SDK for fast data transfer. Hopefully they will inspire you to accomplish greater endeavors with the SDK.

Working with CSV files

This is the simplest task to accomplish: import or export a CSV file. Okay  you can do it with the Import API. But using python and the CopySQLClient you gain great control over the process. Furthermore  it is several times faster than the traditional imports.

To COPY FROM a CSV file (import)  you just need to specify the columns and the format in the COPY query and pass it along with the file to the copyfrom_file_path method:

copy_client = CopySQLClient(auth_client)

from_query = """COPY copy_example (the_geom  name  age)
FROM stdin WITH (FORMAT csv  HEADER true)"""

result = copy_client.copyfrom_file_path(
    from_query 
    'copy_from.csv'
)

Similarly  to COPY TO a CSV file (export)  you specify a table or a query and the output file and you just get it done:

pre>to_query = """COPY copy_example TO stdout WITH (FORMAT csv HEADER true)""" copy_client.copyto_file_path(to_query 'export.csv')

Pro tip: In the COPY TO query you can specify an arbitrary subquery. E.g:

COPY (SELECT * FROM copy_example WHERE cartodb_id <= 1000)
TO stdout WITH (FORMAT csv  HEADER true)

Here's the complete source code of a working example demonstrating the basic usage of the methods above.

Transferring tables

The example above shows how to deal with CSV files stored on the disk.

However  you may want to transfer some data that lives within your script and that does not need to be stored just for the sake of transferring it. Here's how to pipe the data from a table in an account to another table in another account:

response = copy_src_client.copyto(
    'COPY %s TO STDOUT' % TABLE_NAME
)

result = copy_dst_client.copyfrom(
    'COPY %s FROM STDIN' % TABLE_NAME 
    response
)

As simple as it gets  doesn't it?

Here's the complete source of a tool that lets you transfer tables across CARTO accounts. It does work across clouds and on-premise installations. Enjoy it!

Loading data from external services

The most interesting use case  in my opinion  is when you use python for extracting and processing some data  and then you want to transfer it.

As a demonstration of this concepts  you can find here a tool that gets fresh radar information  process it with numpy  and uploads it straight to CARTO. The resulting dataset was then used to produce the cover image of this post.

Use the Source  Luke

You can find here the documentation for the CopySQLClient as well as the reference documentation. That should be your next stop if you want to make the most of this new interface.

Here's the source code repository. As always  we're open to contributions via issues and PR's.

What's next

We plan to integrate this with CARTOFrames  so that its users can benefit from the speed boost.

We may also extend the interface of CopySQlClient in order to keep the implementation of client apps and integrations as simple as possible while keeping all the advantages of a good usage of the SQL API and underlying services.

Let us know if you have any feedback and suggestions.

Thanks for reading!Are you a python developer? Have you ever implemented some python code to produce  load or process some data and then wanted to upload it to CARTO? Here is how.