Tutorials  /  Data Management  /  Connect Data

How to connect to CARTO from external applications using Databricks

Connecting to CARTO from external applications using Databricks.

This tutorial shows to connect with the CARTO database using the ODBC connection. We would strongly recommend reading the SQL direct connection article to know more about this feature before reading this one.

Introduction

With the Direct SQL Connector, you can connect directly from your Databricks cluster to your CARTO database. You can read CARTO datasets as Spark dataframes, perform spatial analysis on massive datasets (using the many available libraries), and store the results back into CARTO for visualizations.

Tested environments

  • Databricks Community Edition
  • Runtime 6.4 (Scala 2.11, Spark 2.4.5, OpenJDK 8)

Connect from notebook

  • Go to the Cluster configuration page. Select the Spark Cluster UI - Master tab and get the master node IP address from the hostname label

  • Through the Settings page in your CARTO dashboard, add this IP address to the list of IP addresses

  • Click Home in the sidebar and create a new Python notebook

  • Test connectivity from your cluster to CARTO server running the following instruction in a cell:

%sh nc -vz dbconn-geuw.carto.com 5432
  • Copy the certificate files to the master node. You can use a S3 bucket, the wget tool or any other method.
%sh wget {url} -O {path_to_file/filename}
  • Read a CARTO dataset as Spark DataFrame (you must use the DER PKCS #8 file, with a “.pk8” extension) with the following instruction:
remote_table = spark.read.format("jdbc")\
    .option("url", "jdbc:postgresql://dbconn-geuw.carto.com:5432/cartodb") \
    .option("dbtable", "{dataset}") \
    .option("user", "{username}") \
    .option("password", "{API key}") \
    .option("ssl", True) \
    .option("sslmode", "verify-full" ) \
    .option("sslcert", "{path_to_file}/client.crt" ) \
    .option("sslkey", "{path_to_file}/client.key.pk8" ) \
    .option("sslrootcert", "{path_to_file}/server_ca.pem") \
    .load()
  • Run your spatial analysis in your Databricks cluster. Then store the results in your CARTO dataset.