March 6, 2020March 22, 2020 by justinmatters

My PyData Presentation on PySpark and Databricks

I gave a flash talk at Edinburgh Pydata yesterday. It covered the merits and pitfalls of PySpark and Databricks as a big data processing platform. I thought I would take a moment to share my slides online in case anyone else would like to take a look. They can be found here on Google Slides

TLDR: a promising set of tools that are still under rapid development. The learning curve can be a little steep here and there due to some departures from standard Python and dataframe design principles. Very useful now (I am using them in production at QueryClick) and likely to get even better over time.

Published by justinmatters

Data Scientist specialising in Python, PySpark, SQL and Machine Learning View all posts by justinmatters

Published by justinmatters

Leave a Reply Cancel reply