Recently I needed to check for equality between Pyspark dataframes as part of a test suite. To my surprise I discovered that there is no…
Category: Problem Solving
Recently I have been involved in organising a free online table top games convention called ScaleCon. This came about at very short notice due to…
Following on from my earlier post with useful tools for geeks here are some more free online resources I find handy. Click on the titles…
There are a variety of ways to filter strings in PySpark, each with their own advantages and disadvantages. This post will consider three of the…
A quick list of tech meetups I have found useful in Edinburgh. Given the fact that Meetup has announced that it may start charging high…
Jupyter notebooks are great, however the interface for file handling has its issues. One issue is that files have to be downloaded individually, there is…
Pyspark is very powerful. However because it is based on Scala we need to be careful about types as they are not Pythonic. And because…
If you work in data science you have probably come across the pipeline model for handling data transformations. It is used by many machine learning…
Occasionally you may want to invoke a stored procedure from your python code in order to manipulate data as part of a larger task. Naively…
In the last blog post I discussed using SQL Alchemy to import SQL database data into pandas for data analysis. But what if you wish…