In the last blog post I discussed using SQL Alchemy to import SQL database data into pandas for data analysis. But what if you wish to export processed data from pandas or another data source back to an SQL database. Fortunately once again SQLAlchemy has your back.
As usual first you need to define your connection to the database.
# import the relevant python libraries import pandas as pd from sqlalchemy import create_engine # set your parameters for the database user = "user" password = "password" host = "abc.efg.hij.rds.amazonaws.com" port = 3306 schema = "db_schema" # connect to the database using sqlalchemy conn_string = 'mysql+pymysql://{0}:{1}@{2}:{3}/{4}'.format( user, password, host, port, schema) db = create_engine(conn_string) conn = db.connect()
The basic format for writing data to the database is to send an SQL INSERT statement via the SQLAlchemy connection.
# set up our query to get the feature details query = """ INSERT INTO customer ( id, name, surname ) VALUES ( 'AX-12345', 'Bill', 'Smith' ) ON DUPLICATE KEY UPDATE updatedAt=CURRENT_TIMESTAMP; """ stmt = text(query) self.connection.execute(stmt)
Obviously hard coding information into our query is not especially flexible. However as before we can use the format command for more flexibility.
# set up a dictionary holding the data query_variables = {'id': "AX34587", 'name':"Bill", 'surname': "Smith"} # set up our query to get the feature details query = """ INSERT INTO customer ( id, name, surname ) VALUES ( '{id}', '{name}', '{surname}' ) ON DUPLICATE KEY UPDATE id = '{id}', name = '{name}', surname = '{surname}'; """ # substitute in our variables using format query = query.format( id = query_variables['id'], name = query_variables['name'], surname = query_variables['surname'], ) # now we can execute the SQL command stmt = text(query) self.connection.execute(stmt)
This is better, but we are still only uploading one row of data at a time. This would not be efficient for a large dataframe. Fortunately SQLAlchemy has a neat feature allowing from the upload of large numbers of records. To do this it makes use of a colon syntax. This prefixes column name information in the SQL query with a colon allowing us to insert multiple records at a time. An example would be :id for the id column.
Before we can do this we need to convert our pandas dataframe into the correct format for upload. First rename the columns in the dataframe for upload to match the column names we intend to use in the query. Next convert the dataframe into a dictionary choosing “records” format. An example is given below.
# use of :variables in the sqlalchemy library # ensure dataframe columns match upload names df.columns = ["id", "name", "surname"] # convert our dataframe chunk to a dictionary values = df.to_dict('records') # create our standardised insertion query query=""" INSERT INTO customer ( id, name, surname ) VALUES ( :id, :name, :surname ) ON DUPLICATE KEY UPDATE id = :id, name = :name, surname = :surname; """ # now we can execute the SQL command stmt = text(query) # note the additional 'values' parameter in the execution statement res = self.connection.execute(stmt, values)
This method allows you to upload an entire pandas dataframe into an SQL database in one go which is very useful. Important note: due to SQLAlchemy’s colon format, for SQL statements where a colon is required verbatim, as within an inline string, use a backslash to escape the colon when using SQLAlchemy.
Now you can both download and upload data between SQL databases and pandas, happy data processing!