Justin’s Blog

Justin's Coding and Geek Blog

Skip to content
Menu
  • Welcome
  • Data Science
  • Problem Solving
  • Computer Graphics
  • Contact Details

Tag: Data Science

Data Science

SQL to PySpark Conversion Cheatsheet

Posted on November 24, 2019 by justinmatters

Following on from my pandas to pyspark cheatsheet here is another cheatsheet to help convert SQL queries into PySpark dataframe commands. Like the last one…

Data Science

PySpark’s Delta Storage Format

Posted on October 19, 2019November 23, 2019 by justinmatters

Recently the Apache Foundation have released a very useful new storage format for use with Spark called Delta. Delta is an extension to the parquet…

Data Science

Some PySpark Gotchas

Posted on August 25, 2019August 25, 2019 by justinmatters

Pyspark is very powerful. However because it is based on Scala we need to be careful about types as they are not Pythonic. And because…

Data Science

PyData Talk – Comparing Advertising Channel Performance

Posted on June 8, 2019June 8, 2019 by justinmatters

I recently gave a talk at PyData Edinburgh about some of the work I am doing at QueryClick. We investigated the effectiveness of TV and…

Data Science

Pandas to PySpark Conversion Cheatsheet

Posted on June 1, 2019July 9, 2019 by justinmatters

This is a follow on post from my last post about starting with PySpark and Databricks. Here is a link to a table I have…

Data Science

Useful Things to Know when Starting with PySpark and Databricks

Posted on May 26, 2019September 28, 2019 by justinmatters

Databricks is a very handy cloud platform for large scale data processing and machine learning using Spark. However it does have some idiosyncrasies. Here are…

Data Science

Using SQLAlchemy to Export Data from Pandas

Posted on March 16, 2019April 4, 2019 by justinmatters

In the last blog post I discussed using SQL Alchemy to import SQL database data into pandas for data analysis. But what if you wish…

Data Science

Using SQLAlchemy to Import Data to Pandas

Posted on February 24, 2019April 4, 2019 by justinmatters

Sometimes may want to use Python to extract data from a SQL database to analyse using pandas. There are a couple of issues here. Firstly…

Data Science

Kaggle PUBG Competition Building a Model

Posted on November 22, 2018November 24, 2018 by justinmatters

Having completed our analysis for the Player Unknown Battlegrounds dataset from Kaggle we can now build a model. We can start by building a very…

Data Science

Kaggle PUBG Competition Data Analysis

Posted on November 19, 2018December 6, 2018 by justinmatters

Currently there is a fun competition running over on the Kaggle Data Science website. The objective is to use metrics from a large data set…

Posts navigation

Page 1 Page 2 Next Page

Recent Posts

  • SQL to PySpark Conversion Cheatsheet
  • Tech Meetups in Edinburgh
  • PySpark’s Delta Storage Format
  • Bulk Downloads from Jupyter
  • Some PySpark Gotchas

Recent Comments

  • justinmatters on Image Recognition 2 of 4 – Using Beautiful Soup to Extract Webpage Information for a Data Set
  • Huw Millington on Image Recognition 2 of 4 – Using Beautiful Soup to Extract Webpage Information for a Data Set
  • justinmatters on Memoization : Using Decorators to Speed Up Recursion
  • Madhur Gupta on Memoization : Using Decorators to Speed Up Recursion

Archives

  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018

Categories

  • Computer Graphics
  • Data Science
  • Problem Solving
  • Uncategorized

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
© Copyright 2019 – Justin's Blog
Wisteria Theme by WPFriendship ⋅ Powered by WordPress