Using SQLAlchemy to Export Data from Pandas

In the last blog post I discussed using SQL Alchemy to import SQL database data into pandas for data analysis. But what if you wish to export processed data from pandas or another data source back to an SQL database. Fortunately once again SQLAlchemy has your back.

image of pandas and arrow pointing to database symbol

As usual first you need to define your connection to the database.

 
# import the relevant python libraries 
import pandas as pd 
from sqlalchemy import create_engine 

# set your parameters for the database 
user = "user" 
password = "password" 
host = "abc.efg.hij.rds.amazonaws.com" 
port = 3306 
schema = "db_schema" 

# connect to the database using sqlalchemy 
conn_string = 'mysql+pymysql://{0}:{1}@{2}:{3}/{4}'.format(
        user, password, host, port, schema)
db = create_engine(conn_string) conn = db.connect() 

The basic format for writing data to the database is to send an SQL INSERT statement via the SQLAlchemy connection.

# set up our query to get the feature details 
query = """
    INSERT 
        INTO customer (
            id, name, surname
        ) 
        VALUES (
            'AX-12345', 'Bill', 'Smith'
        )
    ON DUPLICATE KEY UPDATE updatedAt=CURRENT_TIMESTAMP;
"""
stmt = text(query)
self.connection.execute(stmt)

Obviously hard coding information into our query is not especially flexible. However as before we can use the format command for more flexibility.

# set up a dictionary holding the data
query_variables = {'id': "AX34587",
'name':"Bill",
'surname': "Smith"}

# set up our query to get the feature details
query = """
    INSERT
        INTO customer (
            id, name, surname
        )
        VALUES (
            '{id}', '{name}', '{surname}'
        )
        ON DUPLICATE KEY UPDATE
            id = '{id}',
            name = '{name}',
            surname = '{surname}';
"""

# substitute in our variables using format
query = query.format(
        id = query_variables['id'],
        name = query_variables['name'],
        surname = query_variables['surname'],
)

# now we can execute the SQL command
stmt = text(query)
self.connection.execute(stmt)

This is better, but we are still only uploading one row of data at a time. This would not be efficient for a large dataframe. Fortunately SQLAlchemy has a neat feature allowing from the upload of large numbers of records. To do this it makes use of a colon syntax. This prefixes column name information in the SQL query with a colon allowing us to insert multiple records at a time. An example would be :id for the id column.

Before we can do this we need to convert our pandas dataframe into the correct format for upload.  First rename the columns in the dataframe for upload to match the column names we intend to use in the query. Next convert the dataframe into a dictionary choosing “records” format. An example is given below.

# use of :variables in the sqlalchemy library

# ensure dataframe columns match upload names
df.columns = ["id", "name", "surname"]
# convert our dataframe chunk to a dictionary
values = df.to_dict('records')
# create our standardised insertion query
query="""
    INSERT
        INTO customer (
            id, name, surname
        )
        VALUES (
            :id, :name, :surname
        )
        ON DUPLICATE KEY UPDATE
            id = :id,
            name = :name,
            surname = :surname;
"""
# now we can execute the SQL command
stmt = text(query)
# note the additional 'values' parameter in the execution statement
res = self.connection.execute(stmt, values) 

This method allows you to upload an entire pandas dataframe into an SQL database in one go which is very useful. Important note: due to SQLAlchemy’s colon format, for SQL statements where a colon is required verbatim, as within an inline string, use a backslash to escape the colon when using SQLAlchemy.

Now you can both download and upload data between SQL databases and pandas, happy data processing!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.