Are you a python developer? Have you ever implemented some python code to produce, load or process some data and then wanted to upload it to CARTO? Here is how.
Introducing the CARTO Python SDK’s COPY methods
A few weeks ago Paul posted about our new endpoints in the SQL API to use the PostgreSQL COPY statement for bulk ingestion of data into CARTO.
Although writing a client with
requests library can be relatively straightforward, getting it right may be tricky. That’s why we decided to implement what we hope will be a reference client for the SQL API.
This implementation takes care of your COPY’s, and does it in a performant way. It takes care of the data to make sure it is transferred in chunks and uses compression by default, resulting in faster data transfer, in a more pythonic fashion.
Show me the code!
Here are some examples of usage of the python SDK for fast data transfer. Hopefully they will inspire you to accomplish greater endeavors with the SDK.
Working with CSV files
This is the simplest task to accomplish: import or export a CSV file. Okay, you can do it with the Import API. But using python and the
CopySQLClient you gain great control over the process. Furthermore, it is several times faster than the traditional imports.
To COPY FROM a CSV file (import), you just need to specify the columns and the format in the COPY query and pass it along with the file to the
copy_client = CopySQLClient(auth_client) from_query = """COPY copy_example (the_geom, name, age) FROM stdin WITH (FORMAT csv, HEADER true)""" result = copy_client.copyfrom_file_path( from_query, 'copy_from.csv' )
Similarly, to COPY TO a CSV file (export), you specify a table or a query and the output file and you just get it done:
to_query = """COPY copy_example TO stdout WITH (FORMAT csv, HEADER true)""" copy_client.copyto_file_path(to_query, 'export.csv')
Pro tip: In the COPY TO query you can specify an arbitrary subquery. E.g:
COPY (SELECT * FROM copy_example WHERE cartodb_id <= 1000) TO stdout WITH (FORMAT csv, HEADER true)
Here’s the complete source code of a working example demonstrating the basic usage of the methods above.
The example above shows how to deal with CSV files stored on the disk.
However, you may want to transfer some data that lives within your script and that does not need to be stored just for the sake of transferring it. Here’s how to pipe the data from a table in an account to another table in another account:
response = copy_src_client.copyto( 'COPY %s TO STDOUT' % TABLE_NAME ) result = copy_dst_client.copyfrom( 'COPY %s FROM STDIN' % TABLE_NAME, response )
As simple as it gets, doesn’t it?
Here’s the complete source of a tool that lets you transfer tables across CARTO accounts. It does work across clouds and on-premise installations. Enjoy it!
Loading data from external services
The most interesting use case, in my opinion, is when you use python for extracting and processing some data, and then you want to transfer it.
As a demonstration of this concepts, you can find here a tool that gets fresh radar information, process it with
numpy, and uploads it straight to CARTO. The resulting dataset was then used to produce the cover image of this post.
Use the Source, Luke
Here’s the source code repository. As always, we’re open to contributions via issues and PR’s.
We plan to integrate this with CARTOFrames, so that its users can benefit from the speed boost.
We may also extend the interface of
CopySQlClient in order to keep the implementation of client apps and integrations as simple as possible while keeping all the advantages of a good usage of the SQL API and underlying services.
Let us know if you have any feedback and suggestions.
Thanks for reading!