COPY'ing with the Python SDK
Introducing the CARTO Python SDK's COPY methods
A few weeks ago Paul posted about our new endpoints in the SQL API to use the PostgreSQL COPY statement for bulk ingestion of data into CARTO.
Although writing a client with requests library can be relatively straightforward getting it right may be tricky. That's why we decided to implement what we hope will be a reference client for the SQL API.
This implementation takes care of your COPY's and does it in a performant way. It takes care of the data to make sure it is transferred in chunks and uses compression by default resulting in faster data transfer in a more pythonic fashion.
Show me the code!
Here are some examples of usage of the python SDK for fast data transfer. Hopefully they will inspire you to accomplish greater endeavors with the SDK.
Working with CSV files
This is the simplest task to accomplish: import or export a CSV file. Okay you can do it with the Import API. But using python and the CopySQLClient you gain great control over the process. Furthermore it is several times faster than the traditional imports.
To COPY FROM a CSV file (import) you just need to specify the columns and the format in the COPY query and pass it along with the file to the copyfrom_file_path method:
Similarly to COPY TO a CSV file (export) you specify a table or a query and the output file and you just get it done:
Pro tip: In the COPY TO query you can specify an arbitrary subquery. E.g:
Here's the complete source code of a working example demonstrating the basic usage of the methods above.
Transferring tables
The example above shows how to deal with CSV files stored on the disk.
However you may want to transfer some data that lives within your script and that does not need to be stored just for the sake of transferring it. Here's how to pipe the data from a table in an account to another table in another account:
As simple as it gets doesn't it?
Here's the complete source of a tool that lets you transfer tables across CARTO accounts. It does work across clouds and on-premise installations. Enjoy it!
Loading data from external services
The most interesting use case in my opinion is when you use python for extracting and processing some data and then you want to transfer it.
As a demonstration of this concepts you can find here a tool that gets fresh radar information process it with numpy and uploads it straight to CARTO. The resulting dataset was then used to produce the cover image of this post.
Use the Source Luke
You can find here the documentation for the CopySQLClient as well as the reference documentation. That should be your next stop if you want to make the most of this new interface.
Here's the source code repository. As always we're open to contributions via issues and PR's.
What's next
We plan to integrate this with CARTOFrames so that its users can benefit from the speed boost.
We may also extend the interface of CopySQlClient in order to keep the implementation of client apps and integrations as simple as possible while keeping all the advantages of a good usage of the SQL API and underlying services.
Let us know if you have any feedback and suggestions.
Thanks for reading!Are you a python developer? Have you ever implemented some python code to produce load or process some data and then wanted to upload it to CARTO? Here is how.