How time flies, Data Scotland 2025 was another great event. Here are the things I learned about this year. Where possible I have linked to…
Category: Data Science
Apache Druid is a very powerful, scalable Online Analytical Processing (OLAP) database system. However getting started with it can be a little intimidating for some.…
I recently attended Data Scotland 2024 and attended a number of interesting talks. Here are some takeaways. Where possible I have linked to the talk…
When you ask Databricks to write a JSON file, you may be surprised by the results. Rather than writing a simple individual JSON file, instead…
Two common place and tedious tasks people are often faced with, are summarising documents, or comparing them for similarities and differences. The good news is…
Last month I produced a proof of concept zero shot categoriser for images using OpenAI’s GPT-4 Vision-Language model (VLM) capabilities. However this had the disadvantage…
Categorisation of images has long been possible with a variety of visual machine learning models. However in almost all cases the categories have to be…
Sometimes a situation will crop up where you want access functionality in Databricks which is not readily accessible via Python. In these cases Databricks allows…
I recently came across a strange little problem with a satisfying solution which people building path based models might be interested in. The Problem Imagine…
The updates to graphing with PySpark onDatabricks have made it much nicer to work with. Options exist to aggregate data direct in the graph, handle…