If you work in data science you have probably come across the pipeline model for handling data transformations. It is used by many machine learning…
Tag: pandas
This is a follow on post from my last post about starting with PySpark and Databricks. Here is a link to a table I have…
Occasionally you may want to invoke a stored procedure from your python code in order to manipulate data as part of a larger task. Naively…
In the last blog post I discussed using SQL Alchemy to import SQL database data into pandas for data analysis. But what if you wish…
Sometimes may want to use Python to extract data from a SQL database to analyse using pandas. There are a couple of issues here. Firstly…
Currently there is a fun competition running over on the Kaggle Data Science website. The objective is to use metrics from a large data set…
Now we have obtained our dataset from the Edinburgh Open Data store, we need to tidy it up and see if we need to transform…
As a keen cyclist I thought I would take a look at Edinburgh Council’s Bike Counter dataset. The website states that “The dataset includes bike…