When you ask Databricks to write a JSON file, you may be surprised by the results. Rather than writing a simple individual JSON file, instead…
Tag: PySpark
Sometimes a situation will crop up where you want access functionality in Databricks which is not readily accessible via Python. In these cases Databricks allows…
I recently came across a strange little problem with a satisfying solution which people building path based models might be interested in. The Problem Imagine…
The updates to graphing with PySpark onDatabricks have made it much nicer to work with. Options exist to aggregate data direct in the graph, handle…
Recently Databricks made an exciting announcement. They have created a new library which allows you to use large language models to perform operations on PySpark…
Recently I noticed that the ArrayType in PySpark is missing some useful aggregation functions. Lets suppose you have a data frame created as follows: If…
Exploding arrays is often very useful in PySpark. However because row order is not guaranteed in PySpark Dataframes, it would be extremely useful to be…
Sometimes it is useful to not only compute aggregated functions, but also to be able to compare them on a row by row basis with…
If you use PySpark you are likely aware that as well as being able group by and count elements you are also able to group…
We can frequently find that we want to combine the results of several calculations into a single column. For instance perhaps we have various data…