{"id":737,"date":"2020-08-30T15:25:58","date_gmt":"2020-08-30T15:25:58","guid":{"rendered":"https:\/\/justinmatters.co.uk\/wp\/?p=737"},"modified":"2020-09-01T15:46:49","modified_gmt":"2020-09-01T15:46:49","slug":"pyspark-window-functions","status":"publish","type":"post","link":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/","title":{"rendered":"PySpark Window Functions"},"content":{"rendered":"<p>PySpark window functions are useful when you want to examine relationships within groups of data rather than between groups of data as for groupBy. To use them you start by defining a window function, then select a separate function or set of functions to operate within that window.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-630 aligncenter\" src=\"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png\" alt=\"\" width=\"410\" height=\"224\" srcset=\"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png 640w, https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo-300x164.png 300w\" sizes=\"auto, (max-width: 410px) 100vw, 410px\" \/><\/p>\n<p>If you prefer to work through the example, you can <a href=\"https:\/\/github.com\/JustinMatters\/pyspark-example-workbooks\/blob\/master\/PySpark%20Window%20Functions.ipynb\">download the workbook from Github<\/a>. The workbook is designed to work on <a href=\"https:\/\/community.cloud.databricks.com\/\">Databricks Community Edition<\/a>. Alternatively for the next six months <a href=\"https:\/\/databricks-prod-cloudfront.cloud.databricks.com\/public\/4027ec902e239c93eaaa8714f173bcfc\/968100988546031\/157591980591166\/8836542754149149\/latest.html\">this link should work to give you direct access to a copy of the workbook on Databricks<\/a> since happily their public publishing system works now.<\/p>\n<p>Lets take a look at the sorts of things that can be achieved with window functions of varying complexity. The imports and dataframe\u00a0 I am going to be using for this explanation are:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport pandas as pd\r\nimport pyspark.sql.functions as fn\r\nfrom pyspark.sql import SparkSession\r\nfrom pyspark.sql import Window\r\n\r\n# Create a spark session\r\nspark_session = SparkSession.builder.getOrCreate()\r\n\r\n# lets define a demonstration DataFrame to work on\r\ndf_data = {\r\n  'partition': &#x5B;'a','a', 'a', 'a', 'b', 'b', 'b', 'c', 'c',],\r\n  'col_1': &#x5B;1,1,1,1,2,2,2,3,3,], \r\n  'aggregation': &#x5B;1,2,3,4,5,6,7,8,9,],\r\n  'ranking': &#x5B;4,3,2,1,1,1,3,1,5,],\r\n  'lagging': &#x5B;9,8,7,6,5,4,3,2,1,],\r\n  'cumulative': &#x5B;1,2,4,6,1,1,1,20,30,],\r\n}\r\ndf_pandas = pd.DataFrame.from_dict(df_data)\r\n# create spark dataframe\r\ndf = spark_session.createDataFrame(df_pandas)\r\n<\/pre>\n<h2>Simple aggregation functions<\/h2>\n<p>we can use the standard group by aggregations with window functions. These functions use the simplest form of window which just defines grouping. However instead of producing one value for each group, we get the value added to every row of each partition in the window.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# aggregation functions use the simplest form of window \r\n# which just defines grouping\r\naggregation_window = Window.partitionBy('partition')\r\n# then we can use this window function for our aggregations\r\ndf_aggregations = df.select(\r\n  'partition', 'aggregation'\r\n).withColumn(\r\n  'aggregation_sum', fn.sum('aggregation').over(aggregation_window),\r\n).withColumn(\r\n  'aggregation_avg', fn.avg('aggregation').over(aggregation_window),\r\n).withColumn(\r\n  'aggregation_min', fn.min('aggregation').over(aggregation_window),\r\n).withColumn(\r\n  'aggregation_max', fn.max('aggregation').over(aggregation_window),\r\n)\r\n<\/pre>\n<h2>Row wise ordering and ranking functions<\/h2>\n<p>We can also use window funtions to order and rank data. These functions add an element to the definition of the window which defines both grouping and ordering<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# lets define a ranking window\r\nranking_window = Window.partitionBy('partition').orderBy('ranking')\r\n\r\ndf_ranks = df.select(\r\n  'partition', 'ranking'\r\n).withColumn(\r\n# note that fn.row_number() does not take any arguments\r\n  'ranking_row_number', fn.row_number().over(ranking_window)\r\n).withColumn(\r\n  # rank will leave spaces in ranking to account for preceding rows\r\n  # receiving equal ranks\r\n  'ranking_rank', fn.rank().over(ranking_window)\r\n).withColumn(\r\n  # dense rank does not account for previous equal rankings\r\n  'ranking_dense_rank', fn.dense_rank().over(ranking_window)\r\n).withColumn(\r\n  # percent rank ranges between 0-1 not 0-100\r\n  'ranking_percent_rank', fn.percent_rank().over(ranking_window)\r\n).withColumn(\r\n  # fn.ntile takes a parameter for now many 'buckets' to divide \r\n  # rows into when ranking\r\n  'ranking_ntile_rank', fn.ntile(2).over(ranking_window)\r\n)\r\n<\/pre>\n<p>We can also revers the order of ranking using .desc(). In this case the window would be defined as follows<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# lets define a ranking window in reverse order using .desc() \r\n# note that we need to use fn.col to define the column \r\ndesc_ranking_window = Window.partitionBy(\r\n  'partition'\r\n).orderBy(\r\n  fn.col('ranking').desc()\r\n)\r\n<\/pre>\n<h3>Cumulative Calculations (Running Totals and Averages)<\/h3>\n<p>There are often good reasons to want to create a running total or running average column. In some cases we might want running totals for subsets of data. Window functions can be useful for that sort of thing.<\/p>\n<p>In order to calculate such things we need to add yet another element to the window. Now we account for partition, order and which rows should be covered by the function. This can be done in two ways we can use <code>rangeBetween<\/code> to define how similar values in the window must be to be considered, or we can use <code>rowsBetween<\/code> to define how many rows should be considered. The current row is considered row zero, the following rows are numbered positively and the preceding rows negatively. For cumulative calculations you can define &#8220;all previous rows&#8221; with <code>Window.unboundedPreceding<\/code> and &#8220;all following rows&#8221; with <code>Window.unboundedFolowing<\/code><\/p>\n<p>Note that the window may vary in size as it progresses over the rows since at the start and end part of the window may &#8220;extend past&#8221; the existing rows<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n#suppose we want to average over the previous, current and next values\r\n# running calculations need a more complicated window as shown here\r\ncumulative_window_1 = Window.partitionBy(\r\n  'partition'\r\n).orderBy(\r\n  'cumulative'\r\n  # for a rolling average lets use rowsBetween\r\n).rowsBetween(\r\n  -1,1\r\n)\r\n\r\ndf_cumulative_1 = df.select(\r\n  'partition', 'cumulative'\r\n).withColumn(\r\n  'cumulative_avg', fn.avg('cumulative').over(cumulative_window_1)\r\n)\r\n\r\n# running totals also require a more complicated window as here.\r\ncumulative_window_2 = Window.partitionBy(\r\n  'partition'\r\n).orderBy(\r\n  'cumulative'\r\n  # in this case we will use rangeBetween for the sum\r\n).rangeBetween(\r\n  # here we use Window.unboundedPreceding to catch all earlier rows\r\n  Window.unboundedPreceding, 0\r\n)\r\n\r\ndf_cumulative_2 = df.select(\r\n  'partition', 'cumulative'\r\n).withColumn(\r\n  'cumulative_sum', fn.sum('cumulative').over(cumulative_window_2)\r\n)\r\n<\/pre>\n<h2>Combining Windows and Calling Different Columns<\/h2>\n<p>It is also possible to combine windows and also to call windows on columns other than the ordering column. These more advanced uses can require careful thought to ensure you achieve the intended results.<\/p>\n<p>First lets look at using multiple window functions in a single expression<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# we can make a window function equivalent to a standard groupBy:\r\n\r\n# first define two windows\r\naggregation_window = Window.partitionBy('partition')\r\ngrouping_window = Window.partitionBy('partition').orderBy('aggregation')\r\n\r\n# then we can use this window function for our aggregations\r\ndf_aggregations = df.select(\r\n  'partition', 'aggregation'\r\n).withColumn(\r\n  # note that we calculate row number over the grouping_window\r\n  'group_rank', fn.row_number().over(grouping_window) \r\n).withColumn(\r\n  # but we calculate other columns over the aggregation_window\r\n  'aggregation_sum', fn.sum('aggregation').over(aggregation_window),\r\n).withColumn(\r\n  'aggregation_avg', fn.avg('aggregation').over(aggregation_window),\r\n).withColumn(\r\n  'aggregation_min', fn.min('aggregation').over(aggregation_window),\r\n).withColumn(\r\n  'aggregation_max', fn.max('aggregation').over(aggregation_window),\r\n).where(\r\n  fn.col('group_rank') == 1\r\n).select(\r\n  'partition', \r\n  'aggregation_sum', \r\n  'aggregation_avg', \r\n  'aggregation_min', \r\n  'aggregation_max'\r\n)\r\n\r\n# this is equivalent to the rather simpler expression below\r\ndf_groupby = df.select(\r\n  'partition', 'aggregation'\r\n).groupBy(\r\n  'partition'\r\n).agg(\r\n  fn.sum('aggregation').alias('aggregation_sum'),\r\n  fn.avg('aggregation').alias('aggregation_avg'),\r\n  fn.min('aggregation').alias('aggregation_min'),\r\n  fn.max('aggregation').alias('aggregation_max'),\r\n)\r\n<\/pre>\n<p>Secondly here is an example of ordering by one column but operating on a different column. This can only be done for calculation functions which take an column as a parameter<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# create a window on one column but use the window on another column \r\nlag_window = Window.partitionBy('partition').orderBy('lagging')\r\n\r\ndf_cumulative_2 = df.select(\r\n  'partition', 'lagging', 'cumulative',\r\n).withColumn(\r\n  'lag_the_laggging_col', fn.lag('lagging', 1).over(lag_window)\r\n).withColumn(\r\n  # It is possible to lag a column which was not the orderBy column\r\n  'lag_the_cumulative_col', fn.lag('cumulative', 1).over(lag_window)\r\n)\r\n<\/pre>\n<p>The effect of windows functions is best understood by experimenting with them so I encourage you to make use of the <a href=\"https:\/\/databricks-prod-cloudfront.cloud.databricks.com\/public\/4027ec902e239c93eaaa8714f173bcfc\/968100988546031\/157591980591166\/8836542754149149\/latest.html\">Databricks workbook linked at the top of the page<\/a>.<\/p>\n<p>Further Reading<\/p>\n<p>Further useful references about PySpark window functions can be found here:<\/p>\n<ul>\n<li><a href=\"https:\/\/databricks.com\/blog\/2015\/07\/15\/introducing-window-functions-in-spark-sql.html\">https:\/\/databricks.com\/blog\/2015\/07\/15\/introducing-window-functions-in-spark-sql.html<\/a><\/li>\n<li><a href=\"https:\/\/sparkbyexamples.com\/pyspark\/pyspark-window-functions\/\">https:\/\/sparkbyexamples.com\/pyspark\/pyspark-window-functions\/<\/a><\/li>\n<li><a href=\"https:\/\/knockdata.github.io\/spark-window-function\/\">https:\/\/knockdata.github.io\/spark-window-function\/<\/a><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>PySpark window functions are useful when you want to examine relationships within groups of data rather than between groups of data as for groupBy. To&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[33,55,54,14,56,76],"class_list":["post-737","post","type-post","status-publish","format-standard","hentry","category-data-science","tag-data-science","tag-databricks","tag-pyspark","tag-python","tag-spark","tag-window-functions"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>PySpark Window Functions - Justin&#039;s Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"PySpark Window Functions - Justin&#039;s Blog\" \/>\n<meta property=\"og:description\" content=\"PySpark window functions are useful when you want to examine relationships within groups of data rather than between groups of data as for groupBy. To&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/\" \/>\n<meta property=\"og:site_name\" content=\"Justin&#039;s Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-30T15:25:58+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-09-01T15:46:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png\" \/>\n<meta name=\"author\" content=\"justinmatters\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"justinmatters\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/\"},\"author\":{\"name\":\"justinmatters\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/#\\\/schema\\\/person\\\/7c3e0740e1fef74f705c19f175f6f321\"},\"headline\":\"PySpark Window Functions\",\"datePublished\":\"2020-08-30T15:25:58+00:00\",\"dateModified\":\"2020-09-01T15:46:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/\"},\"wordCount\":1214,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/wp-content\\\/uploads\\\/2019\\\/12\\\/Pyspark_logo.png\",\"keywords\":[\"Data Science\",\"Databricks\",\"PySpark\",\"Python\",\"Spark\",\"window functions\"],\"articleSection\":[\"Data Science\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/\",\"url\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/\",\"name\":\"PySpark Window Functions - Justin&#039;s Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/wp-content\\\/uploads\\\/2019\\\/12\\\/Pyspark_logo.png\",\"datePublished\":\"2020-08-30T15:25:58+00:00\",\"dateModified\":\"2020-09-01T15:46:49+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/#\\\/schema\\\/person\\\/7c3e0740e1fef74f705c19f175f6f321\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#primaryimage\",\"url\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/wp-content\\\/uploads\\\/2019\\\/12\\\/Pyspark_logo.png\",\"contentUrl\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/wp-content\\\/uploads\\\/2019\\\/12\\\/Pyspark_logo.png\",\"width\":640,\"height\":350},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/pyspark-window-functions\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"PySpark Window Functions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/#website\",\"url\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/\",\"name\":\"Justin's Blog\",\"description\":\"Justin&#039;s Coding and Geek Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/#\\\/schema\\\/person\\\/7c3e0740e1fef74f705c19f175f6f321\",\"name\":\"justinmatters\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/27cf337940887c098b79716aa7025ce782bd51de3f6b07a9dcad710bbf576c59?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/27cf337940887c098b79716aa7025ce782bd51de3f6b07a9dcad710bbf576c59?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/27cf337940887c098b79716aa7025ce782bd51de3f6b07a9dcad710bbf576c59?s=96&d=mm&r=g\",\"caption\":\"justinmatters\"},\"description\":\"Data Scientist specialising in Python, PySpark, SQL and Machine Learning\",\"sameAs\":[\"https:\\\/\\\/justinmatters.co.uk\\\/wp\\\/\",\"https:\\\/\\\/uk.linkedin.com\\\/in\\\/justin-matters-edinburgh\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"PySpark Window Functions - Justin&#039;s Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/","og_locale":"en_US","og_type":"article","og_title":"PySpark Window Functions - Justin&#039;s Blog","og_description":"PySpark window functions are useful when you want to examine relationships within groups of data rather than between groups of data as for groupBy. To&hellip;","og_url":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/","og_site_name":"Justin&#039;s Blog","article_published_time":"2020-08-30T15:25:58+00:00","article_modified_time":"2020-09-01T15:46:49+00:00","og_image":[{"url":"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png","type":"","width":"","height":""}],"author":"justinmatters","twitter_card":"summary_large_image","twitter_misc":{"Written by":"justinmatters","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#article","isPartOf":{"@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/"},"author":{"name":"justinmatters","@id":"https:\/\/justinmatters.co.uk\/wp\/#\/schema\/person\/7c3e0740e1fef74f705c19f175f6f321"},"headline":"PySpark Window Functions","datePublished":"2020-08-30T15:25:58+00:00","dateModified":"2020-09-01T15:46:49+00:00","mainEntityOfPage":{"@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/"},"wordCount":1214,"commentCount":0,"image":{"@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#primaryimage"},"thumbnailUrl":"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png","keywords":["Data Science","Databricks","PySpark","Python","Spark","window functions"],"articleSection":["Data Science"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/","url":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/","name":"PySpark Window Functions - Justin&#039;s Blog","isPartOf":{"@id":"https:\/\/justinmatters.co.uk\/wp\/#website"},"primaryImageOfPage":{"@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#primaryimage"},"image":{"@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#primaryimage"},"thumbnailUrl":"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png","datePublished":"2020-08-30T15:25:58+00:00","dateModified":"2020-09-01T15:46:49+00:00","author":{"@id":"https:\/\/justinmatters.co.uk\/wp\/#\/schema\/person\/7c3e0740e1fef74f705c19f175f6f321"},"breadcrumb":{"@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#primaryimage","url":"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png","contentUrl":"https:\/\/justinmatters.co.uk\/wp\/wp-content\/uploads\/2019\/12\/Pyspark_logo.png","width":640,"height":350},{"@type":"BreadcrumbList","@id":"https:\/\/justinmatters.co.uk\/wp\/pyspark-window-functions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/justinmatters.co.uk\/wp\/"},{"@type":"ListItem","position":2,"name":"PySpark Window Functions"}]},{"@type":"WebSite","@id":"https:\/\/justinmatters.co.uk\/wp\/#website","url":"https:\/\/justinmatters.co.uk\/wp\/","name":"Justin's Blog","description":"Justin&#039;s Coding and Geek Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/justinmatters.co.uk\/wp\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/justinmatters.co.uk\/wp\/#\/schema\/person\/7c3e0740e1fef74f705c19f175f6f321","name":"justinmatters","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/27cf337940887c098b79716aa7025ce782bd51de3f6b07a9dcad710bbf576c59?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/27cf337940887c098b79716aa7025ce782bd51de3f6b07a9dcad710bbf576c59?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/27cf337940887c098b79716aa7025ce782bd51de3f6b07a9dcad710bbf576c59?s=96&d=mm&r=g","caption":"justinmatters"},"description":"Data Scientist specialising in Python, PySpark, SQL and Machine Learning","sameAs":["https:\/\/justinmatters.co.uk\/wp\/","https:\/\/uk.linkedin.com\/in\/justin-matters-edinburgh"]}]}},"_links":{"self":[{"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/posts\/737","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/comments?post=737"}],"version-history":[{"count":14,"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/posts\/737\/revisions"}],"predecessor-version":[{"id":752,"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/posts\/737\/revisions\/752"}],"wp:attachment":[{"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/media?parent=737"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/categories?post=737"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/justinmatters.co.uk\/wp\/wp-json\/wp\/v2\/tags?post=737"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}