Installing ODBC on Databricks

Databricks comes with JDBC drivers installed by default. Databricks has an ODBC driver available to allow you to connect to Databricks. However despite the pyodbc Python library being part of the standard Databricks libraries, the Microsoft ODBC drivers are not installed by default. Sometimes you might need these drivers to allow libraries like SQLAlchemy to run stored procedures or perform other operations on your Microsoft or Azure SQL databases. You can install ODBC on Databricks using a shell script on cluster start up. I have detailed this before but for newer versions of Databricks a new script is now required.

a snail with blue writing on it, a visual pun on shell scripting

Choosing your script

Which script you need will depend on which Databricks runtime you are using. In their infinite wisdom Databricks do not specify the Ubuntu version for each LTS version in their LTS runtime details table. However it appears that versions are as follows looking at the individual LTS notes.

Databricks LTS versionLinux versionKey handling
12.2 LTS Ubuntu 20.04.5 LTSapt-key
13.3 LTSUbuntu 22.04.2 LTSapt-key
14.3 LTSUbuntu 22.04.3 LTSapt-key
15.4 LTSUbuntu 22.04.4 LTSapt-key
16.4 LTSUbuntu 24.04.2 LTSgpg
17.3 LTSUbuntu 24.04.3 LTSgpg

Script for Databricks up to and Including 15.4 LTS

For earlier versions of Databricks up to 15.4 LTS, you can install msodbcsql17 drivers with a startup shell script as follows:

#!/bin/bash
echo "Installing msodbcsql17"
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
echo "Installation complete"

Note that you will need to adjust the Ubuntu version to match with that for the Databricks version if using 12.2 LTS (its out of support though, so not recommended).

Script for Databricks 16.4 LTS Onwards

From 16.4 LTS onwards, the script above will not work because apt-key has been removed. Therefore I recommend this rather different script which uses gpg directly for key verification:

#!/bin/bash
set -e

echo "Installing msodbcsql17 for Ubuntu 24.04"

# 1. Create the keyrings directory if it doesn't exist
mkdir -p /usr/share/keyrings

# 2. Download the Microsoft GPG key and convert it to a format apt understands (.gpg)
curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg

# 3. Add the repository, explicitly referencing the new keyring
echo "deb [arch=amd64,arm64,armhf signed-by=/usr/share/keyrings/microsoft-prod.gpg] https://packages.microsoft.com/ubuntu/24.04/prod noble main" > /etc/apt/sources.list.d/mssql-release.list

# 4. Update and Install
apt-get update
ACCEPT_EULA=Y apt-get install -y msodbcsql17

# Optional: Install mssql-tools if needed for bcp/sqlcmd
# ACCEPT_EULA=Y apt-get install -y mssql-tools17
# echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc

echo "Installation complete"

Startup vs Workbook

You can also run these direct in a workbook using the %sh command at the head of the cell. This is useful if if you do not want to install the ODBC drivers onto your clusters as a matter of course. The disadvantage of running in the workbook is that the script only runs on the controlling node and thus cannot be used in distributed processes. Loading as an init script ensures all nodes and all workbooks have access to the driver as noted previously. This means an init script is likely the best way of installing ODBC on Databricks.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.