My PyData Presentation on PySpark and Databricks

I gave a flash talk at Edinburgh Pydata yesterday. It covered the merits and pitfalls of PySpark and Databricks as a big data processing platform. I thought I would take a moment to share my slides online in case anyone else would like to take a look. They can be found here on Google Slides

TLDR: a promising set of tools that are still under rapid development. The learning curve can be a little steep here and there due to some departures from standard Python and dataframe design principles. Very useful now (I am using them in production at QueryClick) and likely to get even better over time.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.