I gave a flash talk at Edinburgh Pydata yesterday. It covered the merits and pitfalls of PySpark and Databricks as a big data processing platform.…
Tag: Databricks
There are a variety of ways to filter strings in PySpark, each with their own advantages and disadvantages. This post will consider three of the…
Pyspark is very powerful. However because it is based on Scala we need to be careful about types as they are not Pythonic. And because…
This is a follow on post from my last post about starting with PySpark and Databricks. Here is a link to a table I have…
Databricks is a very handy cloud platform for large scale data processing and machine learning using Spark. However it does have some idiosyncrasies. Here are…