<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Lakehouse Path]]></title><description><![CDATA[From Fundamentals to Production - the Lakehouse Way.]]></description><link>https://www.thelakehousepath.com</link><image><url>https://substackcdn.com/image/fetch/$s_!5ZtM!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89a29407-5ef2-4752-9bd6-bcf212989a03_462x462.png</url><title>The Lakehouse Path</title><link>https://www.thelakehousepath.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 04 May 2026 12:42:52 GMT</lastBuildDate><atom:link href="https://www.thelakehousepath.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Martin Debus]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thelakehousepath@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thelakehousepath@substack.com]]></itunes:email><itunes:name><![CDATA[Martin Debus]]></itunes:name></itunes:owner><itunes:author><![CDATA[Martin Debus]]></itunes:author><googleplay:owner><![CDATA[thelakehousepath@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thelakehousepath@substack.com]]></googleplay:email><googleplay:author><![CDATA[Martin Debus]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Building a LinkedIn Analytics Pipeline on Databricks — Part 2: Connecting to the LinkedIn API]]></title><description><![CDATA[Part 2 of 4 of the Series on getting LinkedIn data into Databricks]]></description><link>https://www.thelakehousepath.com/p/building-a-linkedin-analytics-pipeline-part2</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/building-a-linkedin-analytics-pipeline-part2</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Thu, 30 Apr 2026 05:08:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1jkl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is the second post in a four-part series. <a href="https://open.substack.com/pub/thelakehousepath/p/building-a-linkedin-analytics-pipeline-part1">Part 1</a> set up the project scaffold with Declarative Automation Bundles. Part 3 covers building the medallion pipeline with Declarative Pipelines. Part 4 covers scheduling, cost, and going to production.</em></p><p>In <a href="https://open.substack.com/pub/thelakehousepath/p/building-a-linkedin-analytics-pipeline-part1">Part 1</a> we set up the project structure and deployed the three schemas and the landing volume to our dev workspace. Now we need to fill that volume with data.</p><p>This post covers everything required to pull your LinkedIn analytics via API: getting access, authenticating, calling the right endpoints, handling a few quirks the documentation doesn&#8217;t warn you about, and writing the results to the landing volume we created in Part 1.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1jkl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1jkl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 424w, https://substackcdn.com/image/fetch/$s_!1jkl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 848w, https://substackcdn.com/image/fetch/$s_!1jkl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 1272w, https://substackcdn.com/image/fetch/$s_!1jkl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1jkl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic" width="1456" height="1052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/adf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1052,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74853,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1jkl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 424w, https://substackcdn.com/image/fetch/$s_!1jkl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 848w, https://substackcdn.com/image/fetch/$s_!1jkl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 1272w, https://substackcdn.com/image/fetch/$s_!1jkl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadf21860-0971-4f03-bb9b-2bbd0520da3e_2163x1563.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>LinkedIn API access</h1><p>LinkedIn&#8217;s API is not a developer-friendly, open ecosystem. In order to use the APIs you need to own a LinkedIn Page for an organization, create a developer application, and apply for access to the Community Management API. Once that&#8217;s in place, you can fetch analytics data for your personal profile and your organization. In this series we focus on the personal profile.</p><p>Here&#8217;s the full process.</p><blockquote><p>If you ask yourself why I did not use some Agent for automation, this is a valid question. My intention was to create an Enterprise grade process that uses reliable APIs with minimal cost. You can as well set up an Agent that pull the data from LinkedIn. If you have done that, feel free to contact me. I would be interested in how you achieved this. </p></blockquote><h2>Create a LinkedIn Page for an Organization</h2><p>You need an organization page to back your developer application. In my case, I have an organization page for my consulting company <a href="https://snowglobe.ai">SNOWGLOBE</a> anyway. So I went with this, but I think you can <a href="https://www.linkedin.com/help/linkedin/answer/a543852">create a page</a> for your purpose, but note that later you have to provide details for your organization like homepage and address that will be reviewed by LinkedIn before approval. If you want to use your company's existing page, make sure you know the page administrator, who will need to approve your developer application.</p><blockquote><p>Turns out, this is the trickiest part. LinkedIn can deny API access for your organization without providing a reason. You could get the impression that they don't want individuals to use their API.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RvY4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RvY4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 424w, https://substackcdn.com/image/fetch/$s_!RvY4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 848w, https://substackcdn.com/image/fetch/$s_!RvY4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 1272w, https://substackcdn.com/image/fetch/$s_!RvY4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RvY4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png" width="1456" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:187610,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RvY4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 424w, https://substackcdn.com/image/fetch/$s_!RvY4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 848w, https://substackcdn.com/image/fetch/$s_!RvY4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 1272w, https://substackcdn.com/image/fetch/$s_!RvY4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5024fc9e-fe25-49d8-b191-5630ab507d4e_1612x886.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Organization Page example (for my company SNOWGLOBE)</figcaption></figure></div><h2>Create a Developer Application</h2><p>The LinkedIn Developer Portal allows you to create a <a href="https://www.linkedin.com/developers/apps/new">developer application</a>. Once created, the page has to be verified by an administrator of the organization page. In the <code>Settings</code> tab of your newly created application, click on the <code>Verifiy</code> button. You will get a URL you can send to an admin to verify your application. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ons-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ons-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 424w, https://substackcdn.com/image/fetch/$s_!Ons-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 848w, https://substackcdn.com/image/fetch/$s_!Ons-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 1272w, https://substackcdn.com/image/fetch/$s_!Ons-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ons-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png" width="1456" height="1418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1418,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:269020,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ons-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 424w, https://substackcdn.com/image/fetch/$s_!Ons-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 848w, https://substackcdn.com/image/fetch/$s_!Ons-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 1272w, https://substackcdn.com/image/fetch/$s_!Ons-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3014324-8aed-4f3e-80f3-cefd9c3f3f80_1978x1926.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">LinkedIn Application Creation Page</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fkgf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fkgf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 424w, https://substackcdn.com/image/fetch/$s_!Fkgf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 848w, https://substackcdn.com/image/fetch/$s_!Fkgf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 1272w, https://substackcdn.com/image/fetch/$s_!Fkgf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fkgf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png" width="1270" height="822" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1270,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fkgf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 424w, https://substackcdn.com/image/fetch/$s_!Fkgf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 848w, https://substackcdn.com/image/fetch/$s_!Fkgf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 1272w, https://substackcdn.com/image/fetch/$s_!Fkgf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F010199f8-cb86-4ddc-b923-0143cda98afb_1270x822.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">My newly created li-analytics application that still needs verification</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nk6M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nk6M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 424w, https://substackcdn.com/image/fetch/$s_!Nk6M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 848w, https://substackcdn.com/image/fetch/$s_!Nk6M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 1272w, https://substackcdn.com/image/fetch/$s_!Nk6M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nk6M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png" width="461" height="328.07361963190186" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:696,&quot;width&quot;:978,&quot;resizeWidth&quot;:461,&quot;bytes&quot;:88266,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nk6M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 424w, https://substackcdn.com/image/fetch/$s_!Nk6M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 848w, https://substackcdn.com/image/fetch/$s_!Nk6M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 1272w, https://substackcdn.com/image/fetch/$s_!Nk6M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f3d14e-8c7c-4090-82e6-2ae86cea7aa6_978x696.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">This is what the administrator will see when clicking on the verification link</figcaption></figure></div><h2>Request Access to the Community Management API</h2><p>In the Products tab of your application, request access to the Community Management API. This covers all the analytics endpoints we need, for both your personal profile and your organization. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b9zC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b9zC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 424w, https://substackcdn.com/image/fetch/$s_!b9zC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 848w, https://substackcdn.com/image/fetch/$s_!b9zC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 1272w, https://substackcdn.com/image/fetch/$s_!b9zC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b9zC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png" width="566" height="743.6561514195583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1666,&quot;width&quot;:1268,&quot;resizeWidth&quot;:566,&quot;bytes&quot;:240064,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b9zC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 424w, https://substackcdn.com/image/fetch/$s_!b9zC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 848w, https://substackcdn.com/image/fetch/$s_!b9zC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 1272w, https://substackcdn.com/image/fetch/$s_!b9zC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc186afb9-aea7-42e7-b62d-067ac3f33192_1268x1666.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LinkedIn will ask you to verify your business email and fill out a Development Tier Access Form with details about your use case and organization. Once approved, the API endpoints are unlocked.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3dth!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3dth!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 424w, https://substackcdn.com/image/fetch/$s_!3dth!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 848w, https://substackcdn.com/image/fetch/$s_!3dth!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 1272w, https://substackcdn.com/image/fetch/$s_!3dth!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3dth!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png" width="1456" height="1154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1154,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:247902,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3dth!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 424w, https://substackcdn.com/image/fetch/$s_!3dth!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 848w, https://substackcdn.com/image/fetch/$s_!3dth!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 1272w, https://substackcdn.com/image/fetch/$s_!3dth!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81b1f58a-6e36-4220-b0a6-d79e660f0c7a_2166x1716.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">LinkedIn Community Management API - Development Tier Access Form</figcaption></figure></div><h2>Create an Access Token</h2><p>In order to use the API, you have to <a href="https://learn.microsoft.com/en-us/linkedin/shared/authentication/developer-portal-tools?view=li-lms-2026-02#generate-a-token-in-the-developer-portal">create an access token</a> in the Developer Portal. In the Auth tab of your application, click on the OAuth 2.0 tools link in order to generate a Token. For this pipeline, the required scopes are:</p><ul><li><p>r_member_postAnalytics</p></li><li><p>r_member_profileAnalytics</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-0Md!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-0Md!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 424w, https://substackcdn.com/image/fetch/$s_!-0Md!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 848w, https://substackcdn.com/image/fetch/$s_!-0Md!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 1272w, https://substackcdn.com/image/fetch/$s_!-0Md!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-0Md!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png" width="1456" height="438" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/edd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:438,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:125985,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/191877916?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-0Md!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 424w, https://substackcdn.com/image/fetch/$s_!-0Md!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 848w, https://substackcdn.com/image/fetch/$s_!-0Md!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 1272w, https://substackcdn.com/image/fetch/$s_!-0Md!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd7287f-cd62-4f40-bdc2-f36572f6dfcf_1966x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Click on the OAuth 2.0 tools link in the Auth section of the application</figcaption></figure></div><h2>Storing the Token securely</h2><p>The access token should not live in the code or notebook. Ever. Not hardcoded, not in a widget default value, not in a comment. A notebook that gets committed to source control with a token in it is a security incident waiting to happen.</p><p>The production pattern here is:</p><p>1. Store the token in <strong>Azure Key Vault</strong></p><p>2. Back the Key Vault with a <strong>Databricks secret scope</strong></p><p>3. Retrieve it at runtime with <code>dbutils.secrets.get()</code></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;aceb6371-1970-4930-9885-15ba6d9bfb26&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">access_token = dbutils.secrets.get(&#8221;secret-scope-key-vault-adb&#8221;, &#8220;linkedin-token&#8221;)</code></pre></div><p>This means the token never appears in logs, never gets printed to a notebook output cell, and never touches source control. Databricks redacts secret values automatically in cell outputs. if you accidentally print the token, you&#8217;ll see <code>[REDACTED]</code> instead of the actual value.</p><p>If you're running in Databricks Free Edition without Key Vault, you can use a <a href="https://docs.databricks.com/security/secrets/secret-scopes.html">Databricks-backed secret scope</a> as an alternative.</p><blockquote><p>In the following sections, I outline how to get the data from the API. Only excerpts of the code is shown. Please visit my GitHub for the full ingestion notebook. </p></blockquote><h1>The five endpoints that matter</h1><p>For a personal LinkedIn analytics pipeline, there are exactly five calls you need using the <a href="https://learn.microsoft.com/en-us/linkedin/marketing/community-management/community-management-overview?view=li-lms-2026-02#member-analytics">Community Management</a> member analytics APIs. All of them sit under <code>api.linkedin.com/rest/</code>.</p><blockquote><p>The full code is available in the accompanying GitHub Repo. Click the Button below to view the repo.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-2&quot;,&quot;text&quot;:&quot;GitHub Repo&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-2"><span>GitHub Repo</span></a></p></blockquote><h2>Follower delta by date (1)</h2><p>This gives you the net change in followers for each day within a date range. The response comes back as a list of elements, one per day, each with a <code>DateRange</code> object and a <code>memberFollowersCount</code> field.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;8ab2f953-3d3d-4aaa-8204-cf5ecf600c97&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">url = f&#8221;https://api.linkedin.com/rest/memberFollowersCount?q=dateRange&amp;dateRange=(start:({start}),end:({end}))&#8221;</code></pre></div><h2>Total follower snapshot (2)</h2><p>This gives you your current total follower count as a snapshot at the moment of the API call. There&#8217;s no date range parameter here, it&#8217;s just &#8220;right now.&#8221; I&#8217;ll come back to why this creates an interesting data modeling problem.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;90b38698-e571-4694-92a3-88623d4b35dc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">url = &#8220;https://api.linkedin.com/rest/memberFollowersCount?q=me&#8221;</code></pre></div><h2>Post analytics (3, 4, 5)</h2><p>The <code>memberCreatorPostAnalytics</code> endpoint serves impressions, reactions, and comments depending on the <code>queryType</code> parameter. Each is a separate API call with <code>aggregation=DAILY</code> and a date range.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;44fc5e15-7288-48f0-8b21-54fc3d7c4078&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># Impressions
url = f&#8221;https://api.linkedin.com/rest/memberCreatorPostAnalytics?q=me&amp;queryType=IMPRESSION&amp;aggregation=DAILY&amp;dateRange=(start:({start}),end:({end}))&#8221;

# Reactions
url = f&#8221;https://api.linkedin.com/rest/memberCreatorPostAnalytics?q=me&amp;queryType=REACTION&amp;aggregation=DAILY&amp;dateRange=(start:({start}),end:({end}))&#8221;

# Comments
url = f&#8221;https://api.linkedin.com/rest/memberCreatorPostAnalytics?q=me&amp;queryType=COMMENT&amp;aggregation=DAILY&amp;dateRange=(start:({start}),end:({end}))&#8221;</code></pre></div><blockquote><p>Initially, I wanted to pull the metrics for each individual post as well. Unfortunately, I was not able to accomplish that although I followed the <a href="https://learn.microsoft.com/en-us/linkedin/marketing/community-management/members/post-statistics?view=li-lms-2026-02&amp;tabs=http#retrieve-single-post-statistics">documentation</a>. If you have done it successfully please reach out to me and share your solution. </p></blockquote><h1>The date format LinkedIn expects</h1><p>LinkedIn&#8217;s API does not use ISO 8601 date strings. It uses a custom struct format that looks like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;649ba128-7a87-45db-ba5e-37bf6e6567be&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">(start:(year:2024,month:3,day:1),end:(year:2024,month:3,day:31))</code></pre></div><p>This is LinkedIn&#8217;s Restli protocol format. Perfectly parseable, but unusual enough that it trips people up the first time they look at the API reference. The Python to generate it looks like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;c106c6bc-92f2-45a0-b0e1-5a2d479176bd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import datetime as dt

end_dt = dt.datetime.now() - dt.timedelta(days=1)
end = f&#8221;year:{end_dt.year},month:{end_dt.month},day:{end_dt.day}&#8221;

start_dt = end_dt - dt.timedelta(days=60)
start = f&#8221;year:{start_dt.year},month:{start_dt.month},day:{start_dt.day}&#8221;</code></pre></div><p>Note the minus one day on <code>end_dt</code>. LinkedIn&#8217;s data for &#8220;today&#8221; isn&#8217;t fully settled yet when you&#8217;re running a 6am job, so I always pull up through yesterday to avoid partial-day numbers showing up in the pipeline.</p><h1>The Required Headers</h1><p>Every request needs three headers beyond the authorization token:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;1e967de3-e774-45eb-9a4c-bd3974b70929&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">headers = {
  &#8216;Authorization&#8217;: f&#8217;Bearer {access_token}&#8217;,
  &#8216;Content-Type&#8217;: &#8216;application/json&#8217;,
  &#8216;X-Restli-Protocol-Version&#8217;: &#8216;2.0.0&#8217;,
  &#8216;Linkedin-Version&#8217;: &#8216;202506&#8217;
}</code></pre></div><p>The <code>X-Restli-Protocol-Version: 2.0.0</code> header is required for the Restli query syntax to work. Without it, LinkedIn either rejects the request or returns an unexpected response shape.</p><p>The <code>Linkedin-Version</code> header pins you to a specific API version. LinkedIn&#8217;s REST API is versioned by date (year + month), and they do deprecate old versions. Pinning this explicitly in code means your pipeline doesn&#8217;t silently break when LinkedIn releases a new API version with a different response schema.</p><h1>The silent failure to watch out for</h1><p>This one cost me time. LinkedIn&#8217;s API returns HTTP 200 with an error payload when the token expires. It looks like a success to your HTTP client, but the body contains something like:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;3efb281e-66f0-4276-9c8b-78c904a961c4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  &#8220;code&#8221;: "EXPIRED_ACCESS_TOKEN",
  &#8220;message&#8221;: &#8220;The token used in the request has expired&#8221;
}</code></pre></div><p>A naive <code>response.raise_for_status()</code> call will not catch this. The request returned 200, so as far as <code>requests</code> is concerned, everything is fine. Your pipeline runs, writes a file with an error body to storage, and you won&#8217;t notice until you wonder why your follower count is null.</p><p>The fix is a second check after deserializing the response:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;b1670eec-130d-42d2-b979-ac26e37d98ab&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def get_linkedin_data(url, access_token):
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    data = response.json()

# LinkedIn can return error details with HTTP 200
if &#8220;status&#8221; in data and data[&#8221;status&#8221;] &gt;= 400:
  raise RuntimeError(f&#8221;LinkedIn API error {data[&#8217;status&#8217;]}: {data.get(&#8217;message&#8217;, data)}&#8221;)

return data</code></pre></div><p>This pattern, checking for an error status in the response body after already checking the HTTP status code, is worth keeping in any LinkedIn API integration. It&#8217;s explicitly handled in the bronze transformation layer too, which I&#8217;ll cover in Part 3.</p><h1>Storing the data: landing in a Databricks Volume</h1><p>Once you have the API responses, you need somewhere to put them. I&#8217;m writing raw JSON directly to a Databricks Unity Catalog Volume. Each API call produces one timestamped file:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;648f0094-ecdc-4260-806d-f1ed46d5bac4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def write_data_to_json_file(data, path, file_prefix):
    timestamp = dt.datetime.now().strftime(&#8221;%Y%m%d%H%M%S&#8221;)
    file_path = f&#8221;{path}{file_prefix}{timestamp}.json&#8221;
    dbutils.fs.put(file_path, json.dumps(data), overwrite=True)</code></pre></div><p>The directory structure in the volume mirrors the data type:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;78ddf956-8643-45ad-b0eb-59d61a837256&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">/Volumes/&lt;catalog&gt;/bronze_linkedin/landing/
&#9500;&#9472;&#9472; followers_delta/    followers_delta_20260315060112.json
&#9500;&#9472;&#9472; followers_count/    followers_count_20260315060114.json
&#9500;&#9472;&#9472; impressions/        impressions_20260315060118.json
&#9500;&#9472;&#9472; reactions/          reactions_20260315060121.json
&#9492;&#9472;&#9472; comments/           comments_20260315060124.json</code></pre></div><p>The timestamp in the filename turns out to be more than just cosmetic. The <code>followers_count</code> endpoint (the <code>q=me</code> one that returns your current total) doesn&#8217;t include a date in its response. It&#8217;s a point-in-time snapshot. In the transformation layer, I derive the date of that snapshot from the filename timestamp itself. More on that in Part 2.</p><h1>Initial load vs. delta</h1><p>My example notebook accepts a <code>run_type</code> parameter with two values: <code>initial</code> and <code>delta</code>.</p><p><strong>Initial load</strong> pulls everything from January 2020 to yesterday. You run this once when you first set up the pipeline.</p><p><strong>Delta load</strong> pulls a rolling 60-day window ending yesterday. This is what runs every morning. The 60-day window is intentionally wider than one day to handle late-arriving data and to make re-runs safe. If a day&#8217;s job fails and you re-run it the next day, you&#8217;ll pick up yesterday&#8217;s data within the delta window without needing a manual backfill.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;07a35583-21d0-45fb-ad00-1bcc834febb2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">if run_type == &#8220;delta&#8221;:
    start_dt = end_dt - dt.timedelta(days=60)
    start = f&#8221;year:{start_dt.year},month:{start_dt.month},day:{start_dt.day}&#8221;
else:
    start = f&#8221;year:2020,month:1,day:1&#8221;</code></pre></div><p>Why 60 days and not 7, or 30? LinkedIn&#8217;s analytics data can be revised retroactively for a few days after the fact. A wider window means the pipeline naturally self-corrects if LinkedIn updates historical numbers. The deduplication logic in the silver layer (covered in Part 3) handles the overlapping records cleanly using CDC upserts keyed by date.</p><blockquote><p>The full code is available in the accompanying GitHub Repo. Click the Button below to view the repo.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-2&quot;,&quot;text&quot;:&quot;GitHub Repo&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-2"><span>GitHub Repo</span></a></p></blockquote><h2>Adding the notebook to the bundle</h2><p>Now we wire the notebook into the bundle as a job task. Create <code>bundles/projects/linkedin/resources/jobs.yml</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;456ac231-43dc-4e5e-bab5-f64ecd3e3585&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">resources:
  jobs:
    linkedin_ingest:
      name: linkedin-medallion

      parameters:
        - name: catalog
          default: ${var.catalog}
        - name: schema
          default: ${resources.schemas.bronze_linkedin.name}
        - name: volume
          default: ${resources.volumes.landing.name}
        - name: run_type
          default: ${var.run_type}

      run_as:
        service_principal_name: ${var.service_principal_app_id}

      email_notifications:
        on_success:
          - &lt;your-email@domain.com&gt;
        on_failure:
          - &lt;your-email@domain.com&gt;

      tasks:
        - task_key: ingest
          notebook_task:
            notebook_path: ../notebooks/ingest.ipynb
          environment_key: serverless

      environments:
        - environment_key: serverless
          spec:
            client: "4"</code></pre></div><p>The job passes the catalog, schema, volume, and run_type down to the notebook as parameters, and the notebook reads them via <code>dbutils.widgets</code>. The service principal identity runs the job, and email notifications alert you on both success and failure.</p><p>Note there&#8217;s only one task here for now. In Part 3 we&#8217;ll add the pipeline task as a second step with an explicit dependency on this ingest task.</p><p>Deploy the updated bundle:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;6294291a-557f-4b2f-8e2e-cd7bc2e06653&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks bundle deploy --target dev-user</code></pre></div><h2>Run it manually and verify</h2><p>Trigger a manual run from the Databricks UI (or via the CLI: <code>databricks bundle run linkedin_ingest --target dev-user</code>). Set <code>run_type </code>to <code>initial</code> for the first run to pull the full historical data.</p><p>When it completes, navigate to your catalog in the workspace, open the <code>bronze_linkedin</code> schema, click on the <code>landing</code>volume, and browse the folder structure. You should see JSON files in each of the five subdirectories, timestamped from this run.</p><p>That&#8217;s the data. In Part 3 we&#8217;ll build the Declarative Pipeline that reads these files, transforms them through bronze and silver, and produces a clean <code>daily_metrics</code> gold table.</p><h2>What&#8217;s next</h2><p>Part 3 covers the transformation layer: streaming tables for incremental file ingestion, CDC upserts for idempotent daily records, and a materialized view to join all five metrics into a single queryable table. We'll also add the pipeline as the second task in the job so that ingestion and transformation run end-to-end automatically.</p><p>Part 3 will be published 2026/05/28.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Building a LinkedIn Analytics Pipeline on Databricks — Part 1: Project Setup with Declarative Automation Bundles]]></title><description><![CDATA[Part 1 of 4 of the Series on getting LinkedIn data into Databricks]]></description><link>https://www.thelakehousepath.com/p/building-a-linkedin-analytics-pipeline-part1</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/building-a-linkedin-analytics-pipeline-part1</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Thu, 02 Apr 2026 05:43:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EtVM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>This is the first post in a four-part series. Part 2 covers connecting to the LinkedIn API and pulling your analytics data. Part 3 covers building the medallion pipeline with Declarative Pipelines. Part 4 covers scheduling, cost, and going to production.</em></p><p>Ever since I started posting regularly on LinkedIn, I wanted to have my metrics (follower count, impressions, comments, reactions) not just in the LinkedIn UI, but in my own Databricks Lakehouse. Something I own, can query freely, and that accumulates history indefinitely rather than disappearing behind a rolling API window.</p><p>It turns out this is also a near-perfect project for learning how to build a proper Lakehouse-style data pipeline. It has everything: data ingestion via API, processing through the medallion layers (Bronze, Silver, Gold), and insights delivered via a dashboard as the end result. Not a toy example, not an enterprise-scale complexity monster, just a real use case at a manageable size, with all the patterns that matter.</p><p>Across four posts, we&#8217;ll build this end to end. By the time we&#8217;re done you&#8217;ll have a pipeline that runs automatically every morning, a clean daily metrics table in Unity Catalog, and a Lakeview dashboard on top of it, all deployed from version-controlled code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EtVM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EtVM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 424w, https://substackcdn.com/image/fetch/$s_!EtVM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 848w, https://substackcdn.com/image/fetch/$s_!EtVM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 1272w, https://substackcdn.com/image/fetch/$s_!EtVM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EtVM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png" width="1456" height="1052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1052,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:229642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/192087362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EtVM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 424w, https://substackcdn.com/image/fetch/$s_!EtVM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 848w, https://substackcdn.com/image/fetch/$s_!EtVM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 1272w, https://substackcdn.com/image/fetch/$s_!EtVM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb59279-38ca-4c99-91c3-974a40b42969_2163x1563.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Before writing a single line of pipeline code, I want to set up the project properly. That means infrastructure as code from the start: schemas, volumes, jobs, pipelines, all defined in version-controlled YAML and deployable to any environment with a single command.</p><p>The tool that makes this possible on Databricks is Declarative Automation Bundles. If you set this up first, everything that follows in Parts 2, 3, and 4 slots in cleanly. If you skip it and create resources manually, you&#8217;ll eventually want to come back and do it properly anyway.</p><p>So let&#8217;s start here.</p><h2>What are Declarative Automation Bundles?</h2><p>A Declarative Automation Bundle (DAB, formerly known as Databricks Asset Bundles) is a project format that packages your Databricks code and resource definitions together. You define everything (jobs, pipelines, schemas, volumes, dashboards) as YAML files alongside your SQL and Python code, commit it all to source control, and use the Databricks CLI to deploy to any target workspace.</p><p>The mental model is similar to Terraform, but Databricks-native. The CLI understands the relationships between resources, handles variable substitution across environments, and deploys notebooks and SQL files alongside the infrastructure that runs them.</p><h2>Prerequisites</h2><p>To follow along you&#8217;ll need:</p><ul><li><p>A Databricks workspace with Unity Catalog enabled</p></li><li><p>A catalog where you have permission to create schemas (or an existing catalog you can use)</p></li><li><p>The Databricks CLI installed (<a href="https://docs.databricks.com/dev-tools/cli/install.html">official docs</a>)</p></li><li><p>A terminal and a code editor like VS Code</p></li><li><p>A service principal added to your Databricks workspace. Create one either via Azure Entra ID and add it to your workspace or create a Databricks scoped SP directly in the Databricks UI. Note down its application (client) ID, you&#8217;ll need it shortly.</p></li></ul><p>Authenticate the CLI against your workspace. The recommended approach with the newer Databricks CLI (v0.200+) is OAuth, which opens a browser for interactive login, no token to create or manage:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;1d8822e1-0bf7-49c3-9444-4cdfadfb65e2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks auth login --host https://&lt;your-workspace-url&gt;</code></pre></div><p>Alternatively, if you prefer a personal access token, use:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;f5971c44-1b1a-48fb-8ca8-919be938e0bf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks configure --token</code></pre></div><h2>Project structure</h2><p>Now we can start creating our resource definitions. In order to that, we need a repo with a proper project structure. I will go with a structure that separates shared infrastructure from project-specific resources. Here&#8217;s the full folder layout we&#8217;ll build across this series:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6438573b-79ce-4c87-aaa7-9fd72a62da7e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">bundles/
&#9500;&#9472;&#9472; databricks.yml # Root bundle config
&#9500;&#9472;&#9472; resources/
&#9474;   &#9492;&#9472;&#9472; sql-warehouses.yml # Shared SQL warehouse
&#9492;&#9472;&#9472; projects/
    &#9492;&#9472;&#9472; linkedin/
        &#9500;&#9472;&#9472; resources/
        &#9474;   &#9500;&#9472;&#9472; jobs.yml # The daily ingest job
        &#9474;   &#9500;&#9472;&#9472; pipelines.yml # The Declarative Pipeline
        &#9474;   &#9500;&#9472;&#9472; schemas.yml # Bronze, silver, gold schemas
        &#9474;   &#9492;&#9472;&#9472; volumes.yml # Landing volume
        &#9492;&#9472;&#9472; src/
            &#9500;&#9472;&#9472; dashboards/
            &#9474;    &#9492;&#9472;&#9472; linkedin.lvdash.json # Dashboard
            &#9500;&#9472;&#9472; notebooks/
            &#9474;   &#9492;&#9472;&#9472; ingest.ipynb # LinkedIn API ingest notebook
            &#9492;&#9472;&#9472; pipelines/
                 &#9500;&#9472;&#9472; load_bronze.sql
                 &#9500;&#9472;&#9472; load_silver.sql
                 &#9492;&#9472;&#9472; load_gold.sql
            </code></pre></div><p>In this post we&#8217;ll create the bundle root, the shared warehouse, and the LinkedIn project&#8217;s schemas and volume. By the end you&#8217;ll have a working <code>databricks bundle deploy</code> that creates real resources in your workspace.</p><p>The structure is built for growth. You can easily add more projects like <code>linkedin</code> and deploy them in one bundle. However, when things get really big and you have several completely separated data products, I recommend to go with multiple bundles. But for this use case we are good to go with this structure with some room for further development. </p><blockquote><p>The full code is available in the accompanying GitHub Repo. Click the Button below to view the repo. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-1&quot;,&quot;text&quot;:&quot;GitHub Repo&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-1"><span>GitHub Repo</span></a></p></blockquote><h2>The root bundle configuration</h2><p>Create <code>bundles/databricks.yml</code>. This is the entry point for everything:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;2ceffeaa-8098-42aa-8c6d-a90ae84e72ce&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">bundle:
  name: my-lakehouse

include:
  - resources/*.yml
  - projects/*/resources/*.yml

variables:
  catalog:
    description: "Unity Catalog catalog name for this target"
  service_principal_app_id:
    description: "Application (client) ID of the service principal used to run jobs"

targets:

  dev-user:
    mode: development
    workspace:
      host: https://&lt;your-workspace-url&gt;.azuredatabricks.net
    variables:
      catalog: &lt;your-dev-catalog&gt;
      service_principal_app_id: &lt;your-sp-app-id&gt;

  dev:
    mode: production
    workspace:
      host: https://&lt;your-workspace-url&gt;.azuredatabricks.net
      root_path: /Users/${var.service_principal_app_id}/.bundle/${bundle.name}/${bundle.target}
    variables:
      catalog: &lt;your-dev-catalog&gt;
      service_principal_app_id: &lt;your-sp-app-id&gt;

  prod:
    mode: production
    workspace:
      host: https://&lt;your-prod-workspace-url&gt;.azuredatabricks.net
      root_path: /Users/${var.service_principal_app_id}/.bundle/${bundle.name}/${bundle.target}
    variables:
      catalog: &lt;your-prod-catalog&gt;
      service_principal_app_id: &lt;your-prod-sp-app-id&gt;</code></pre></div><p>A few things to understand here:</p><p><code>include</code> tells the bundle which YAML files contain resource definitions. The glob patterns mean any new project you add under <code>projects/</code> is automatically picked up without touching the root config.</p><p><code>variables</code> defines the parameters that differ between environments, primarily the catalog name and the service principal identity. These are referenced throughout all resource files as <code>${var.catalog}</code> and <code>${var.service_principal_app_id}</code>.</p><p><code>targets</code> defines three environments, and the distinction between <code>dev-user</code> and <code>dev</code> is worth explaining.</p><p><code>dev-user</code> is for active development on your local machine. It uses <code>mode: development</code>, which automatically prefixes all deployed resource names with your username (e.g. <code>martin_linkedin-medallion</code> instead of <code>linkedin-medallion</code>). This means multiple developers can deploy their own version of the project to the same shared workspace simultaneously without stepping on each other.</p><p><code>dev</code> is an integration environment. It uses <code>mode: production</code> (no name prefixing) and deploys to a fixed path owned by the service principal. The purpose is to have a stable, always-running version of the pipeline in the dev catalog that you can validate against before deploying to prod. It&#8217;s what you&#8217;d point a shared team dashboard at, for example.</p><p><code>prod</code> is a completely separate workspace with its own catalog and service principal, isolated from dev by design.</p><p>If you&#8217;re working alone or just getting started, you can simplify to just <code>dev-user</code> and <code>prod</code> pointing at the same workspace with different catalogs. The key is that <code>prod</code> should always use a separate catalog from dev.</p><div><hr></div><h2>Shared resources: the SQL warehouse</h2><p>Create <code>bundles/resources/sql-warehouses.yml</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;d6e160a2-bcb2-4e44-a1e2-20b18348f7ba&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">resources:
  sql_warehouses:
    serverless_warehouse:
      name: "serverless-warehouse"
      cluster_size: "2X-Small"
      warehouse_type: PRO
      enable_serverless_compute: true
      auto_stop_mins: 10
      channel:
        name: CHANNEL_NAME_CURRENT</code></pre></div><p>This warehouse will be used by the Lakeview dashboard we add in Part 4. It&#8217;s defined at the bundle level (not inside the LinkedIn project folder) because it&#8217;s shared infrastructure, and other projects in the same bundle can reference it.</p><p><code>enable_serverless_compute: true</code> means the warehouse uses serverless compute rather than classic clusters. <code>auto_stop_mins: 10</code>keeps costs low by shutting down after 10 minutes of inactivity.</p><blockquote><p>If you are running this on Databricks Free Edition, make sure to remove the SQL Warehouse by commenting out the <code>resources/*.yml</code> include in the <code>databricks.yml</code>. There is only one Warehouse allowed in the Free Edition and you will run into a long running (30 mins) deployment and finally into an error. You can use the Warehouse provided by Databricks Free Edition later on. </p></blockquote><h2>Project resources: schemas and volumes</h2><p>Create <code>bundles/projects/linkedin/resources/schemas.yml</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;1c80530c-74c6-40ca-8e3e-f03642699521&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">resources:
  schemas:
    bronze_linkedin:
      name: bronze_linkedin
      catalog_name: ${var.catalog}
      comment: "Raw ingested data from LinkedIn"
      grants:
        - principal: &lt;your-user@email.com&gt;
          privileges:
            - ALL_PRIVILEGES
        - principal: ${var.service_principal_app_id}
          privileges:
            - ALL_PRIVILEGES

    silver_linkedin:
      name: silver_linkedin
      catalog_name: ${var.catalog}
      comment: "Cleaned and deduplicated LinkedIn data"
      grants:
        - principal: &lt;your-user@email.com&gt;
          privileges:
            - ALL_PRIVILEGES
        - principal: ${var.service_principal_app_id}
          privileges:
            - ALL_PRIVILEGES

    gold_linkedin:
      name: gold_linkedin
      catalog_name: ${var.catalog}
      comment: "Aggregated LinkedIn metrics for reporting"
      grants:
        - principal: &lt;your-user@email.com&gt;
          privileges:
            - ALL_PRIVILEGES
        - principal: ${var.service_principal_app_id}
          privileges:
            - ALL_PRIVILEGES</code></pre></div><p>Three schemas, one per medallion layer. Each has grants defined inline, meaning your user account and the service principal both get full access. When you deploy to a new environment, the grants are applied automatically. No manual permission management in the UI.</p><p>Now create the landing volume in <code>bundles/projects/linkedin/resources/volumes.yml</code>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;8124b13b-66d7-470d-a2e1-ab1f6d2596e4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">resources:
  volumes:
    landing:
      name: landing
      catalog_name: ${var.catalog}
      schema_name: ${resources.schemas.bronze_linkedin.name}
      volume_type: MANAGED
      comment: "Landing zone for raw LinkedIn API data before loading into Delta tables"</code></pre></div><p>The volume lives inside <code>bronze_linkedin</code> and uses <code>MANAGED</code> storage, meaning Databricks controls where the files are stored. The <code>${resources.schemas.bronze_linkedin.name}</code> reference means the bundle knows the schema must exist before the volume can be created, and the bundle handles the dependency ordering automatically.</p><blockquote><p>The full code is available in the accompanying GitHub Repo. Click the Button below to view the repo. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-1&quot;,&quot;text&quot;:&quot;GitHub Repo&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://github.com/MartinDebus/databricks-linkedin-api-pipeline/releases/tag/Part-1"><span>GitHub Repo</span></a></p></blockquote><h2>First deploy</h2><p>With the root config, the warehouse, the schemas, and the volume in place, you can do your first deploy via terminal in VS Code (being in the bundles root folder):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;0f642c22-a185-44f7-b0bd-50c2aba864f1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks bundle deploy --target dev-user</code></pre></div><p>The CLI will validate your configuration, resolve all variable substitutions, and create the resources in your workspace. The validation can be done without deploying to check if everything is in order. For that, just use <code>validate</code> instead of <code>deploy</code>: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;9712a8d5-0843-4dc3-8f0a-87b40a338125&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks bundle validate --target dev-user</code></pre></div><p>During deployment, you should see output like:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7hZl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7hZl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 424w, https://substackcdn.com/image/fetch/$s_!7hZl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 848w, https://substackcdn.com/image/fetch/$s_!7hZl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 1272w, https://substackcdn.com/image/fetch/$s_!7hZl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7hZl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png" width="728" height="120.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:241,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:51704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/192087362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7hZl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 424w, https://substackcdn.com/image/fetch/$s_!7hZl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 848w, https://substackcdn.com/image/fetch/$s_!7hZl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 1272w, https://substackcdn.com/image/fetch/$s_!7hZl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0641ab0f-17dc-4815-bb1e-6199df8ad55a_1596x264.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Deployment to target dev-user from VS Code Terminal</figcaption></figure></div><p>Open your Databricks workspace, navigate to the catalog, and you&#8217;ll see the three schemas and the landing volume already there, exactly as defined.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1nzN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1nzN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 424w, https://substackcdn.com/image/fetch/$s_!1nzN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 848w, https://substackcdn.com/image/fetch/$s_!1nzN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 1272w, https://substackcdn.com/image/fetch/$s_!1nzN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1nzN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png" width="1456" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147766,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/192087362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1nzN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 424w, https://substackcdn.com/image/fetch/$s_!1nzN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 848w, https://substackcdn.com/image/fetch/$s_!1nzN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 1272w, https://substackcdn.com/image/fetch/$s_!1nzN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8798ed5-30c5-4173-9acb-6e90739f5284_1782x666.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Resulting prefixed Schemas and Volume for LinkedIn data in the Catalog</figcaption></figure></div><p>The prefixed shared Serverless SQL Warehouse should appear in the SQL Warehouses section.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_NqO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_NqO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 424w, https://substackcdn.com/image/fetch/$s_!_NqO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 848w, https://substackcdn.com/image/fetch/$s_!_NqO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 1272w, https://substackcdn.com/image/fetch/$s_!_NqO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_NqO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png" width="1456" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121275,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/192087362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_NqO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 424w, https://substackcdn.com/image/fetch/$s_!_NqO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 848w, https://substackcdn.com/image/fetch/$s_!_NqO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 1272w, https://substackcdn.com/image/fetch/$s_!_NqO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd8f6af3-43e4-4a36-b347-fc3a6e81ad72_2114x700.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Serverless SQL Warehouse with prefix in name</figcaption></figure></div><p>Nothing runs yet (although the Serverless SQL Warehouse is started after deployment, but will shut down after 10 minutes if you not terminate it manually). There&#8217;s no job, no pipeline, no notebook. But the scaffolding is in place and version-controlled. Everything we add in Parts 2, 3, and 4 will slot into this structure.</p><p>If you want to start from scratch, you can destroy all deployed resources like this: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;39e3a1d1-f8d1-4962-bffa-3661ee82eee5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">databricks bundle destroy --target dev-user</code></pre></div><p>You get prompted before anything critical like Schemas are destroyed (if they contain data).</p><h2>What&#8217;s next</h2><p>In Part 2 we&#8217;ll connect to the LinkedIn API, walking through the API access setup, building the ingest notebook step by step, and wiring it into the bundle as the first job task. By the end of Part 2 you&#8217;ll have raw JSON files landing in the Volume you just created.</p><p>Part 2 will be published 2026/04/30.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Understanding Query Caching in Databricks SQL Warehouses]]></title><description><![CDATA[Features, Limitations, and How to Monitor Cache Usage]]></description><link>https://www.thelakehousepath.com/p/understanding-query-caching-in-databricks</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/understanding-query-caching-in-databricks</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Thu, 12 Feb 2026 06:40:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pjly!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Recently, I took a deep dive into SQL Warehouse query caching in Databricks. In this blog post I shared what I have learned along the way. </p><h2>How to use Query Caching</h2><p>The SQL query cache in Databricks SQL Warehouses<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> can be a powerful tool. Once a query is sent to the Warehouse, its results are cached and reused in subsequent runs.</p><p>Query caching is automatically activated in Databricks SQL. You can check if the cache was used by looking at the statement execution details. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pjly!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pjly!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 424w, https://substackcdn.com/image/fetch/$s_!pjly!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 848w, https://substackcdn.com/image/fetch/$s_!pjly!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 1272w, https://substackcdn.com/image/fetch/$s_!pjly!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pjly!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png" width="1456" height="1432" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1432,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:780089,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/175434993?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pjly!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 424w, https://substackcdn.com/image/fetch/$s_!pjly!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 848w, https://substackcdn.com/image/fetch/$s_!pjly!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 1272w, https://substackcdn.com/image/fetch/$s_!pjly!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174e6176-b491-4d06-aabc-23217fd5652c_2660x2616.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Query Execution Details for a Cached Query</figcaption></figure></div><p>If a query is run for the first time, it is fully executed and stored in the cache. </p><pre><code><code>-- First invocation, populates cache
SELECT 
  * 
FROM 
  samples.bakehouse.sales_customers; </code></code></pre><p>Subsequent queries will use the cached result. All these queries will utilize the cached results from the first query: </p><pre><code><code>-- Reads from cache 
SELECT 
  * 
FROM 
  samples.bakehouse.sales_customers; 

-- Reads from cache, too
select 
  * 
from 
  SAMPLES.BAKEHOUSE.SALES_CUSTOMERS; 

-- Still reads from cache :-)
SELECT 
  * 
FROM 
  samples.bakehouse.sales_customers sc; </code></code></pre><p>Note that the cache is not case-sensitive, <code>SELECT * FROM table_name</code> yields the same result as <code>select * from TABLE_NAME</code>. This is a thing that Snowflake handles differently<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, because the cache there is case-sensitive.  </p><h2>Getting Query Details</h2><p>To examine query execution details, use the system tables for query metadata. The following example shows how to retrieve caching information for specific queries from the <code>system.query.history</code> table:</p><pre><code>SELECT 
  statement_text,
  total_duration_ms,
  read_rows,
  produced_rows,
  from_result_cache,
  cache_origin_statement_id
FROM
  system.query.history
WHERE 
  statement_id in ("put", "your", "statement_ids", "here")</code></pre><p>This example shows four query executions&#8212;two distinct queries that were each run twice. Notice that when <code>from_result_cache = true</code>, the <code>read_rows</code> value is 0 and <code>total_duration_ms</code> is significantly lower. The <code>cache_origin_statement_id</code> field provides a helpful reference back to the original query that populated the cache.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BAmF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BAmF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 424w, https://substackcdn.com/image/fetch/$s_!BAmF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 848w, https://substackcdn.com/image/fetch/$s_!BAmF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 1272w, https://substackcdn.com/image/fetch/$s_!BAmF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BAmF!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png" width="1200" height="133.5164835164835" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:162,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:84353,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/175434993?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BAmF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 424w, https://substackcdn.com/image/fetch/$s_!BAmF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 848w, https://substackcdn.com/image/fetch/$s_!BAmF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 1272w, https://substackcdn.com/image/fetch/$s_!BAmF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93356b0c-e6b0-4065-b791-05a0820b95a0_2332x260.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Result from Query History System Table</figcaption></figure></div><p>System tables like these are invaluable for administrators monitoring cache utilization and identifying opportunities to optimize queries for better cache performance.</p><h2>Technical Details</h2><p>There is not only one, but four layers of caches involved. They are checked in the following order: </p><ol><li><p>UI Cache: a user-scoped cache in the workspace file system</p></li><li><p>Local Cache: the in-memory cache of the SQL Warehouse that is lost when the Warehouse is terminated</p></li><li><p>Remote Cache: this applies to Serverless Warehouses only and is shared by Warehouses across the Workspace</p></li><li><p>Disk Cache: the last resort for retrieving query results</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DCEI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DCEI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 424w, https://substackcdn.com/image/fetch/$s_!DCEI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 848w, https://substackcdn.com/image/fetch/$s_!DCEI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 1272w, https://substackcdn.com/image/fetch/$s_!DCEI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DCEI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png" width="1444" height="944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1444,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;query caches&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="query caches" title="query caches" srcset="https://substackcdn.com/image/fetch/$s_!DCEI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 424w, https://substackcdn.com/image/fetch/$s_!DCEI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 848w, https://substackcdn.com/image/fetch/$s_!DCEI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 1272w, https://substackcdn.com/image/fetch/$s_!DCEI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0565517-1090-460f-b8b5-04abdf5acd8b_1444x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Types of query caches in Databricks SQL, <a href="https://learn.microsoft.com/en-us/azure/databricks/sql/user/queries/query-caching">source</a></figcaption></figure></div><p>Here are some more useful facts about caching: </p><ul><li><p>The cache is invalidated after 24 hours or when the underlying tables get updated. </p></li><li><p>If you want to disable caching, you can add <code>SET use_cached_result = false</code> to the top of your script. This could be useful for true &#8220;cold&#8221; benchmarking of query performance.</p></li><li><p>Because of the remote cache, switching from one Serverless Warehouse to another will utilize the cache.</p></li></ul><h2>Limitations</h2><ul><li><p>The remote cache is only available in Serverless SQL Warehouses. In practice, this means that the Warehouse can be terminated and started again, and the cache is still used. This will not work with Classic and Pro Warehouses.</p></li><li><p>Query caching does not work with dynamic views or functions like <code>current_timestamp()</code>. Using this will lead to cache invalidation and execution of the query.</p></li><li><p>Users have no control over the cache lifecycle.</p></li><li><p>The user cannot interact with the cache entries such as listing all cached queries or deleting.</p></li></ul><h2>Key Takeaways</h2><p>Query caching in Databricks SQL can dramatically improve performance for analytical workloads with repeated queries. To make the most of it, design your queries to be cache-friendly by avoiding unnecessary dynamic functions, and consider using Serverless Warehouses when cache persistence across warehouse restarts is important for your use case.</p><p>Use the <code>system.query.history</code> table to monitor cache utilization and verify performance gains. The <code>from_result_cache</code> and <code>cache_origin_statement_id</code> fields provide valuable insights into which queries benefit from caching, helping you identify optimization opportunities and understand your workload patterns.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>https://learn.microsoft.com/en-us/azure/databricks/sql/user/queries/query-caching</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>https://docs.snowflake.com/en/user-guide/querying-persisted-results</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Simple Python Logger Framework for Databricks, Part 3: Analyze Cluster Log Files in Databricks SQL]]></title><description><![CDATA[Part 3 of 3 of the Simple Python Logger Framework for Databricks series]]></description><link>https://www.thelakehousepath.com/p/simple-python-logger-framework-for</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/simple-python-logger-framework-for</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Mon, 05 Jan 2026 11:20:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ts1A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the third part the tutorial in which we will explore how to build a simple yet powerful Python logging framework with minimal effort in Azure Databricks.</p><ul><li><p><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part1">Part 1</a> is about redirecting the Driver logs from Databricks Clusters to Volume in order to permanently store and analyze them.</p></li><li><p>In <a href="https://open.substack.com/pub/thelakehousepath/p/simple-python-logger-framework-for-databricks-part2?utm_campaign=post-expanded-share&amp;utm_medium=web">Part 2</a> we are creating a custom logger object in Python that can be used in the data processing framework of the Data Lakehouse.</p></li><li><p>Part 3 is on how the resulting cluster log files can be analyzed using Databricks onboard tools.</p></li></ul><p>In this third and final part we will make use of the log entries we created. The log files could be analyzed with specific tools like Grafana or Azure Monitoring (Log Analytics), but we will stick to an all Databricks approach. The log entries will be loaded into delta tables and prepared for further analytics. Finally, we will build a simple dashboard to visualize the log information.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ts1A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ts1A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!ts1A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!ts1A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!ts1A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ts1A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png" width="872" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:872,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58219,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ts1A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!ts1A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!ts1A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!ts1A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b6e7775-2d94-4441-b498-2fc0c2e82afd_872x570.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Create some (synthetic) Log Entries</h1><p>For this tutorial, I have used the Logger to create some artificial log entries. I simply created a Notebook and ran the following code multiple times and with different names for the Logger. </p><pre><code>import random
import time

from databrickslogger import DatabricksLogger

# Define log levels and their weights (higher weight = more frequent)
log_levels = [
    (logging.INFO, 40),    # INFO: 40% chance
    (logging.DEBUG, 30),   # DEBUG: 30% chance
    (logging.WARNING, 20), # WARNING: 20% chance
    (logging.ERROR, 10)    # ERROR: 10% chance
]

# Generate artificial logs
def generate_logs(num_logs=200):
    for _ in range(num_logs):
        # Choose a random log level based on weights
        level = random.choices(
            [l[0] for l in log_levels],
            weights=[l[1] for l in log_levels],
            k=1
        )[0]

        # Generate a log message
        if level == logging.ERROR:
            message = f"Failed due to {random.choice(['timeout', 'permission denied', 'corrupt data'])}"
        elif level == logging.WARNING:
            message = f"{random.choice(['Slow response', 'Partial data loaded', 'Retrying...'])}"
        elif level == logging.DEBUG:
            message = f"{random.choice(['Debug info: row count=1000', 'Debug info: schema validated', 'Debug info: connection established'])}"
        else:  # INFO
            message = f"{random.choice(['Success', 'In progress', 'No issues detected'])}"

        # Log the message
        logger.log(level, message)

        # Simulate delay between logs
        time.sleep(random.uniform(0.1, 0.5))

# Create Logger Object
logger = DatabricksLogger('DataLakehouseLogger')

# Run the log generator
generate_logs()
</code></pre><p>This generates log entries with the configured distribution of log level with some artificial messages. To test the loading mechanism of log files, this is sufficient. </p><h1>Read the Log with a Lakeflow Declarative Pipeline</h1><p>Assuming the DatabricksLogger was used to create some log entries (either real or synthetic like described above), we now proceed to read the log files, process them and write them into some tables. This will be a two-step process: First, we will read all log data from the driver stdout into a table. Second, we will extract only the entries created by the DatabricksLogger and write them to another table. </p><p>We will use <a href="https://learn.microsoft.com/en-us/azure/databricks/ldp/concepts">Spark Declarative Pipelines</a> and SQL to create the data processing pipeline. </p><p>Create a new pipeline like this:</p><ul><li><p>Go to <em>Pipelines &amp; Jobs</em></p></li><li><p>Select <em>ETL pipeline</em></p></li><li><p>Give it a name like <em>Process Log Files</em></p></li><li><p>Select <em>Start with an empty file</em></p></li><li><p>Select SQL as language for the first file</p></li><li><p>On the on the right side next to the pipeline, configure your default <code>catalog</code> and <code>schema</code> for this pipeline. If no further name spaces are used in the code, this will be used. </p></li></ul><p>Rename the <code>my_transformation.sql</code> to <code>driver_stdout_raw.sql</code> in the <code>transformations</code> folder and add the following content. </p><pre><code>CREATE OR REFRESH STREAMING TABLE 
  driver_stdout_raw
COMMENT "Raw Driver Log Data."

AS 

SELECT 
  value, 
  current_timestamp AS _load_timestamp, 
  _metadata.file_path AS _file_path
FROM STREAM 
  read_files(
    '/Volumes/&lt;catalog-name&gt;/system/logging/*/driver/stdout*',
    format =&gt; "text"
  )</code></pre><p>This will read all driver log stdout data from all clusters that wrote their log data into the volume. The table is defined as a STREMING TABLE reading from a Volume. That means the Databricks Autoloader is utilized and only new files will be processed. If no updates are found, no data is loaded. </p><blockquote><p><strong>Note:</strong> In order to not only catch new files but also changed files, <a href="https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/manage-external-locations#-recommended-enable-file-events-for-an-external-location">enable file events for an external location</a>.</p></blockquote><p>Next, we extract the DatabricksLogger data and stored them in a MATERIALIZED VIEW. We split the value from the log file using the logger pattern defined in our custom DatabricksLogger. Add a <code>logger_data.sql</code> to the <code>transformations</code> folder with this code.</p><pre><code>CREATE OR REFRESH MATERIALIZED VIEW 
  logger_data
COMMENT "Data From Databricks Logger."

AS

SELECT 
  SPLIT(value,"\\|")[0] AS app,
  TO_TIMESTAMP(
         REGEXP_REPLACE(SPLIT(value,"\\|")[1], ',', '.'),
         'yyyy-MM-dd HH:mm:ss.SSS'
       ) AS timestamp,
  SPLIT(value,"\\|")[2] AS level,
  SPLIT(value,"\\|")[3] AS message,
  _file_path AS file_path
FROM 
  driver_stdout_raw
  
WHERE SPLIT(value,"\\|")[3] IS NOT NULL</code></pre><p>Now you can run the Pipeline and see if everything works without error. In this case it should look something like this. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cXV2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cXV2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 424w, https://substackcdn.com/image/fetch/$s_!cXV2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 848w, https://substackcdn.com/image/fetch/$s_!cXV2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 1272w, https://substackcdn.com/image/fetch/$s_!cXV2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cXV2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png" width="1456" height="963" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:963,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:454510,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cXV2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 424w, https://substackcdn.com/image/fetch/$s_!cXV2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 848w, https://substackcdn.com/image/fetch/$s_!cXV2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 1272w, https://substackcdn.com/image/fetch/$s_!cXV2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2153f0a1-c325-42de-b6bc-cb14ed37b20a_2744x1814.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Successful Run of Spark Declarative Pipeline loading the Log Entries</figcaption></figure></div><p>Finally, create a schedule to run the job periodically to refresh the tables. </p><blockquote><p>I recommend <a href="https://learn.microsoft.com/en-us/azure/databricks/ldp/convert-to-dab">turning the pipeline into a Databricks Asset Bundle</a> and deploy it to your workspace. </p></blockquote><h1>Visualize Data on a Dashboard</h1><p>The data can now be analyzed using SQL. We will go a step further and create a simple Databricks Dashboard. </p><ul><li><p>go to <em>Dashboards</em></p></li><li><p>click <em>Create dashboard</em></p></li><li><p>give the dashboard a name like <em>Databricks Logger</em></p></li><li><p>on the data tab, click <em>Add data source</em> and select the <em>logger_data</em> materialized view</p></li><li><p>go to the <em>Untitled page </em>tab and rename it into <em>Overview</em></p></li></ul><p>You can now create visualizations for your data. I did the following: </p><ul><li><p>added a filter for App</p></li><li><p>added a counter for the total number of log entries</p></li><li><p>added a bar chart to show number of logs per App</p></li><li><p>added a bar chart to show number of logs per Log-Level</p></li><li><p>added a table with the full log data</p></li></ul><p>The result looks like this. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zKQ_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zKQ_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 424w, https://substackcdn.com/image/fetch/$s_!zKQ_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 848w, https://substackcdn.com/image/fetch/$s_!zKQ_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 1272w, https://substackcdn.com/image/fetch/$s_!zKQ_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zKQ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png" width="1456" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:322073,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zKQ_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 424w, https://substackcdn.com/image/fetch/$s_!zKQ_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 848w, https://substackcdn.com/image/fetch/$s_!zKQ_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 1272w, https://substackcdn.com/image/fetch/$s_!zKQ_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14aff1e-8669-4e83-9238-ed7b16e1c937_3174x1634.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Databricks Logger data visualization with Databricks Dashboard</figcaption></figure></div><p>You should publish your Dashboard (I recommend <em>Individual Data Permissions</em>) and share it with others. </p><h1>Wrap Up</h1><p>In this comprehensive end-2-end tutorial I presented my approach on how to create a custom Python logger for Databricks. I focused on using Databricks built in features like redirecting the cluster logs into a Volume (Part 1), creating a custom logger object from the standard Python logging module (Part 2) and finally ingesting the log data into tables to enable log analytics (Part 3). </p><p>Thank you for reading and following along. If you liked this tutorial consider subscribing to my Substack. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Simple Python Logger Framework for Databricks, Part 2: Create a Python DatabricksLogger]]></title><description><![CDATA[Part 2 of 3 of the Simple Python Logger Framework for Databricks series]]></description><link>https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part2</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part2</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Thu, 02 Oct 2025 07:41:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e6Xn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to Part 2 of the tutorial in which we will explore how to build a simple yet powerful Python logging framework with minimal effort in Azure Databricks.  The tutorial is split into three parts, each covering a distinct topic: </p><ul><li><p><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part1">Part 1</a> is about redirecting the Driver logs from Databricks Clusters to Volume in order to permanently store and analyze them.</p></li><li><p>In Part 2 we are creating a custom logger object in Python that can be used in the data processing framework of the Data Lakehouse. </p></li><li><p><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for?r=58nu1d">Part 3</a> is on how the resulting cluster log files can be analyzed using Databricks onboard tools. </p></li></ul><p>In the first part of this tutorial, we made sure that the log files are permantly stored in a Unity Catalog Volume. The second part is about proper logging inside Python that will show up in the cluster logs. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e6Xn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e6Xn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!e6Xn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!e6Xn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!e6Xn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e6Xn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png" width="872" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da3d081e-5fce-434c-ada8-d704373206d4_872x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:872,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56931,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e6Xn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!e6Xn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!e6Xn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!e6Xn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda3d081e-5fce-434c-ada8-d704373206d4_872x570.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Create and Test a Custom Python Logger Class</h1><blockquote><p>Note: All things described in this tutorial will work without completing <strong>Part 1: Deliver Cluster Logs to a Volume</strong>. However, in order to properly analyze the logged events this should be taken care of. </p></blockquote><p>In order to quickly create and test a custom Python logger class, go to your Databricks Workspace and create a Notebook. In the first cell copy and paste the following code.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;9fe77876-f389-4555-85d5-a31ae3b2e7bd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import logging
import sys

class DatabricksLogger(logging.Logger):
    """
    A custom logger class that extends logging.Logger to facilitate    
    logging with configurable log levels using string inputs. 

    The logs will be directed to the stdout destination. Make sure to
    redirect cluster logs of the Databricks Cluster to a Volume for      
    later use.

    This class allows users to set the logging level using strings, 
    abstracting the need to import and use the logging module directly.

    Parameters
    ----------
    name : str
        The name of the logger.
    level : str, optional
        The logging level as a string (e.g., 'DEBUG', 'INFO'). 
        Default is 'DEBUG'.

    Attributes
    ----------
    _LOG_LEVELS : dict
        A dictionary mapping string representations of log levels to 
        their corresponding logging level constants.
    """

    # Define a mapping of level names to logging levels
    _LOG_LEVELS = {
        'DEBUG': logging.DEBUG,
        'INFO': logging.INFO,
        'WARNING': logging.WARNING,
        'ERROR': logging.ERROR,
        'CRITICAL': logging.CRITICAL
    }

    def __init__(self, name, level='DEBUG'):
        """
        Initialize the DatabricksLogger with a specified name and
        logging level.

        Parameters
        ----------
        name : str
            The name of the logger.
        level : str, optional
            The logging level as a string (e.g., 'DEBUG', 'INFO'). 
            Default is 'DEBUG'.
        """
        # Map the string level to the corresponding logging level 
        # constant
        log_level = self._LOG_LEVELS.get(level.upper(), logging.DEBUG)
        super().__init__(name, log_level)
        self._setup_logging(log_level)

    def _setup_logging(self, level):
        """
        Set up the logging configuration with the specified level.

        Parameters
        ----------
        level : int
            The logging level constant from the logging module.
        """
        handler = logging.StreamHandler(sys.stdout)
        formatter = (
            logging
                .Formatter(
                    '%(name)s|%(asctime)s|%(levelname)s|%(message)s'
                )
        )
        handler.setFormatter(formatter)
        # prevent duplicate handlers
        if not any(
            isinstance(h, logging.StreamHandler) 
            and h.stream == sys.stdout
            for h in self.handlers
        ):
            self.addHandler(handler)

        self.setLevel(level)</code></pre></div><p>Our DatabricksLogger directly inherits from the build-in Python logging facility.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Basically, we are only changing three things: </p><ol><li><p>We create a Mapping from strings to logging objects for the log levels. This increases usability since we only have to pass the log level as a string (I love string parameters, although I know some would like to kill me for that). </p></li><li><p>We add a StreamHandler that redirects the logs into the stdout destination, which in Databricks are the Cluster logs. This way, we don&#8217;t have to take care of the file handling of the log files since it is already implemented by Databricks. </p></li><li><p>The formatting of the log string ist defined as the name of the Logger, the timestamp, the log level and the actual message. They are all divided by a pipe so we can comfortably parse them later on. </p></li></ol><p>To test the logger, run the cell. The compute must not be a serverless instance since log file delivery is not available yet for this type of compute.</p><p>Next, create another cell and an instance of our logger. With this logger object, anything might be logged within Notebooks or custom Python packages using this logger. </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;03b6f6c1-2428-47af-b2d2-659bec9de793&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">logger = DatabricksLogger("DataLakehouseLogger")

logger.info("Logger is working")</code></pre></div><p>This results in something like this in the console: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;dd8ebaf6-0b26-47a5-9e5a-0cf46a0d0d05&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">DataLakehouseLogger|2025-08-13 10:23:20,065|INFO|Logger is working</code></pre></div><p>When inspecting the driver logs of the cluster this was running on, you will see something like this. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QSQD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QSQD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 424w, https://substackcdn.com/image/fetch/$s_!QSQD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 848w, https://substackcdn.com/image/fetch/$s_!QSQD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 1272w, https://substackcdn.com/image/fetch/$s_!QSQD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QSQD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png" width="724" height="516.1483516483516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1038,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:366196,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QSQD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 424w, https://substackcdn.com/image/fetch/$s_!QSQD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 848w, https://substackcdn.com/image/fetch/$s_!QSQD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 1272w, https://substackcdn.com/image/fetch/$s_!QSQD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17817503-1700-4c99-ae1d-9676b76226d4_2130x1518.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Driver logs after the logger was used</figcaption></figure></div><h1>Creating a Python Package</h1><p>Now that we have tested our logger, it is time to create a Python package that can be used in our Databricks Workspace.</p><h2>Setup the Repo Structure</h2><blockquote><p><strong>Note:</strong> Instead of creating this Repo from scratch like described here, you can download it from my GitHub.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/MartinDebus/databricks-simple-python-logger&quot;,&quot;text&quot;:&quot;GitHub Repo&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/MartinDebus/databricks-simple-python-logger"><span>GitHub Repo</span></a></p></blockquote><p>Go to your favorite version control platform and create a Git-Repository. Check it out to your local computer. I use Azure DevOps or GitHub in combination with VS Code most of the time. </p><p>We will create a very basic Python package with only the absolutely necessary files. For this, create the following folder structure with these files in it. </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;9dec15c6-7643-4d96-87b1-afc2c84fd61d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">DatabricksLogger/
&#9474;
&#9500;&#9472;&#9472; src/
&#9474;   &#9474;
&#9474;   &#9492;&#9472;&#9472; databrickslogger
&#9474;       &#9500;&#9472;&#9472; __init__.py
&#9474;       &#9492;&#9472;&#9472; logger.py
&#9474;
&#9500;&#9472;&#9472; .gitignore
&#9492;&#9472;&#9472; setup.py</code></pre></div><p>Copy &amp; paste the code from our very first cell from the Notebook which defines the custom logger class into <code>logger.py</code>. Next, add the following code to the <code>__init__.py</code> file:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;9001c814-33b1-4ee1-8507-c81c867ddc80&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from .logger import DatabricksLogger</code></pre></div><blockquote><p><strong>Note: </strong>This will make sure we will later be able to directly import the logger like this: </p><pre><code>from databrickslogger import DatabricksLogger</code></pre><p>If we wouldn&#8217;t do this, we would have to import it like this, which is a bit ugly:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;25bad3b6-c113-4360-a05e-e81f75c7472a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"> from databrickslogger.logger import DatabricksLogger</code></pre></div></blockquote><p>In the root of your repository, add to the <code>setup.py</code> file the following content: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;21dd50c1-f22a-45ca-94be-ccc17de14447&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">from setuptools import setup, find_packages

setup(
    name="DatabricksLogger",
    version="0.1",
    author="Martin Debus",
    author_email="martin.debus@snowglobe.ai",
    description="A simple Databricks Logger",
    python_requires="&gt;=3.9",
    classifiers=[
        "Programming Language :: Python :: 3",
        "Operating System :: OS Independent",
    ],
    license="MIT",
    packages=find_packages(),
)</code></pre></div><p>You may change the personal information or apply your own versioning scheme. </p><p>Finally, add the following lines to the .gitignore file: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6173f514-910d-4426-8af6-6a045077da9a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dist
DatabricksLogger.egg-info</code></pre></div><p>This keeps your repository clean from the files created during the build process of the wheel package file when committing changes. </p><h2>Create a Wheel </h2><p>With all the necessary files in place, we will proceed to create a wheel file. In this tutorial, I will describe how to do that using VS Code on Mac. The process is quite similar on Windows. </p><ol><li><p>Make sure you have checked out the repository and the files to your local machine. </p></li><li><p>In VS, open the terminal and navigate to the root of the repository. </p></li><li><p>Run: <code>python setup.py bdist_wheel</code></p></li></ol><p>This creates a wheel file in the <code>dist</code> folder (this folder is automatically created).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XXlO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XXlO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 424w, https://substackcdn.com/image/fetch/$s_!XXlO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 848w, https://substackcdn.com/image/fetch/$s_!XXlO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 1272w, https://substackcdn.com/image/fetch/$s_!XXlO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XXlO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png" width="450" height="328.29545454545456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:880,&quot;resizeWidth&quot;:450,&quot;bytes&quot;:64035,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XXlO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 424w, https://substackcdn.com/image/fetch/$s_!XXlO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 848w, https://substackcdn.com/image/fetch/$s_!XXlO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 1272w, https://substackcdn.com/image/fetch/$s_!XXlO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdcfaa6-5b05-4460-877e-ff7f6ee6baab_880x642.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Wheel file as the result of the build process</figcaption></figure></div><p>This wheel might now be used in any Package manager. We will skip this step and will directly upload into Databricks. This is not ideal for production environments, but since we do not intend to change the logger frequently, for this use case this is the most straightforward approach. </p><h1>Use the Logger in Databricks</h1><p>To use the Logger in Databricks we have to upload it into a Volume and add it to the cluster libraries. Then we can use it in our code. </p><h2>Upload to Volume</h2><p>We will now upload the wheel file to a Volume in Unity Catalog where any Cluster might be able to access it. In <a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part1">Part 1</a> of this tutorial we created a system schema and within this a Volume named <code>packages</code>. Upload the wheel file into this Volume. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rl64!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rl64!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 424w, https://substackcdn.com/image/fetch/$s_!rl64!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 848w, https://substackcdn.com/image/fetch/$s_!rl64!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 1272w, https://substackcdn.com/image/fetch/$s_!rl64!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rl64!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png" width="1274" height="790" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d765912f-203d-4565-aeac-527ede9374b5_1274x790.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121116,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb98e6fec-637a-48c0-a2b9-d7bbe57de467_1280x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rl64!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 424w, https://substackcdn.com/image/fetch/$s_!rl64!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 848w, https://substackcdn.com/image/fetch/$s_!rl64!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 1272w, https://substackcdn.com/image/fetch/$s_!rl64!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd765912f-203d-4565-aeac-527ede9374b5_1274x790.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Wheel file uploaded to the packages Volume</figcaption></figure></div><h2>Install on Cluster</h2><p>Next, go to your Cluster and do the following: </p><ul><li><p>go to the <code>libraries</code> section</p></li><li><p>click <code>install</code></p></li><li><p>select <code>Volume</code></p></li><li><p>browse to your wheel file and select it</p></li></ul><p>It will now be installed on your cluster every time you start it. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jf1L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jf1L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 424w, https://substackcdn.com/image/fetch/$s_!jf1L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 848w, https://substackcdn.com/image/fetch/$s_!jf1L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 1272w, https://substackcdn.com/image/fetch/$s_!jf1L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jf1L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png" width="1456" height="257" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:257,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:150883,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169215174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jf1L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 424w, https://substackcdn.com/image/fetch/$s_!jf1L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 848w, https://substackcdn.com/image/fetch/$s_!jf1L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 1272w, https://substackcdn.com/image/fetch/$s_!jf1L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb00dc568-2c9a-4f0e-bd4e-d6435d241fe4_2948x520.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">DatabricksLogger installed on an All Purpose Cluster</figcaption></figure></div><blockquote><p><strong>Note: </strong>For any job cluster, this must be defined as a library in the job configs in order to use it. </p></blockquote><h2>Usage in code</h2><p>In any Notebook or Python package you can now use the logger to log events that will end up in the log files in the <code>logging</code> Volume like this: </p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;23c6d0a6-41b8-4ecb-b9fa-f790b13a53b9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">from databrickslogger import DatabricksLogger

logger = DatabricksLogger("DataLakehouseLogger")

logger.info("Logger is working")</code></pre></div><h1>Wrap Up</h1><p>This concludes the second part of the series <strong>Simple Python Logger Framework for Databricks</strong>. We created a custom Python logger object, packaged it to a wheel file and deployed it to our Databricks Workspace. Using this logger writes the log messages to the console as well as to the Cluster logs. </p><p>The last part of the series will dive into how to use the cluster logs and analyze them: <strong><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for?r=58nu1d">Part 3: Analyze Cluster Log Files in Databricks SQL</a></strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>https://docs.python.org/3/library/logging.html</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Simple Python Logger Framework for Databricks, Part 1: Deliver Cluster Logs to a Volume]]></title><description><![CDATA[Part 1 of 3 of the Simple Python Logger Framework for Databricks series]]></description><link>https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part1</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part1</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Thu, 04 Sep 2025 10:16:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dWrj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the Simple Python Logger Framework for Databricks tutorial. We will explore how to build a simple yet powerful Python logging framework with minimal effort in Azure Databricks.  The tutorial is split into three parts, each covering a distinct topic: </p><ul><li><p>Part 1 is about redirecting the Driver logs from Databricks Clusters to Volume in order to permanently store and analyze them.</p></li><li><p>In <a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part2?r=58nu1d">Part 2</a> we are creating a custom logger object in Python that can be used in the data processing framework of the Data Lakehouse. </p></li><li><p><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for?r=58nu1d">Part 3</a> is on how the resulting cluster log files can be analyzed using Databricks onboard tools. </p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dWrj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dWrj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!dWrj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!dWrj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!dWrj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dWrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png" width="872" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:872,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54718,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169208842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dWrj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!dWrj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!dWrj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!dWrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b3f8ddc-c3ab-4273-881b-46616958741a_872x570.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Why Logging Matters</h1><p>When building a Data Lakehouse from scratch, most start by adding value through the integration of data sources. I believe this is the right approach because it allows you to deliver business value as quickly as possible. However, there are some downsides to this method. After integrating the second data source, you may realize that some kind of framework to automate processes would be very helpful. Otherwise, you risk ending up with a growing collection of ingestion and data transformation scripts that become difficult to manage.</p><p>As you progress, you might find that information about your processing is needed to debug or audit your data pipeline runs. More often than not, this leads to the use of print commands. While this approach is valid for debugging and development purposes, it is not suitable for a production environment.</p><p>Therefore, implementing a lightweight logging framework would be beneficial to keep things organized and structured. Here are some common use cases for logging in Data Lakehouses:</p><ul><li><p>Good coding practice: Avoid using print statements in code; instead, use Python logging.</p></li><li><p>Keep logs for audit purposes.</p></li><li><p>Manage numerous job runs efficiently without manually searching through them; have all logs in one place.</p></li><li><p>Utilize log analytics frameworks like Azure Log Analytics.</p></li></ul><blockquote><p><strong>Note: </strong>All technical details explained in this article apply for Azure Databricks in Azure region Western Europe. They were verified and tested by me in August 2025. They may be subject to changes by Databricks in the future. Feature enablement and technical details may vary for other regions and cloud providers. Please refer the official documentation.</p></blockquote><p>These are the assumptions and prerequisites to implement this solution: </p><ul><li><p>An Azure Databricks Workspace is in place. </p></li><li><p>A catalog was created in Unity Catalog and attached to the Workspace by an Account Admin.</p></li><li><p>You are an Admin or member of a special group like <em>developers</em> in the Databricks Workspace.</p></li><li><p>(Optional) You have a Service Principal that you are allowed to use in the Workspace and this service principal is Admin or in the special group like <em>developers</em>. </p></li></ul><p>The steps described in the following sections will enable you to recreate it manually step-by-step. Creating a CI/CD process that can easily be deployed to several environments is explicitly out of scope for this tutorial. However, we take care that the resulting artifacts can be integrated into some CI/CD strategy (e.g. Azure DevOps, Databricks Asset Bundles, GitHub Actions) with low effort. </p><h1>Driver Logs in Databricks</h1><p>Every Compute type in Databricks creates log files. There are several log files for the driver, the executor, event logs and even init_script logs. We will focus on the driver logs here. They can be accessed in the Driver Logs section of a Cluster in the UI. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VqgQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VqgQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 424w, https://substackcdn.com/image/fetch/$s_!VqgQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 848w, https://substackcdn.com/image/fetch/$s_!VqgQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!VqgQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VqgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png" width="858" height="1236" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1236,&quot;width&quot;:858,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:371990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169208842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VqgQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 424w, https://substackcdn.com/image/fetch/$s_!VqgQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 848w, https://substackcdn.com/image/fetch/$s_!VqgQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!VqgQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6680f33a-becf-469c-91c3-ee4a9e0aef6e_858x1236.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Databricks Driver Logs UI</figcaption></figure></div><p>There are three types of driver logs delivered: </p><ul><li><p><strong>stdout (Standard Output):</strong> This log captures the standard output of the driver node. It typically includes the results of print statements, logs from user applications, and other output generated by the execution of code on the driver. It's useful for tracking the progress and output of your applications.</p></li><li><p><strong>stderr (Standard Error):</strong> This log captures error messages and stack traces that are output to the standard error stream. It is particularly useful for debugging issues as it contains error messages and exceptions that occur during the execution of your code on the driver node.</p></li><li><p><strong>log4j:</strong> This log contains messages generated by the Log4j logging framework, which is used by Databricks and many other Java-based applications. It includes detailed log messages from various components of the Databricks environment and your applications, allowing for comprehensive logging and monitoring.</p></li></ul><p>By default, Databricks retains cluster logs (that includes the driver logs) for 30 days. They may also be manually be purged from the Workspace. </p><h1>Redirect the Cluster Logs to a Volume</h1><p>To permanently store the Cluster logs they can by redirected into a Volume. This feature was introduced by Databricks in March 2025<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. This is currently (as of August 2025) not available for Serverless Compute. Therefore this will not work with the Databricks Free Edition. </p><p>To set it all up, we will need to do these two steps: </p><ul><li><p>Setup a Schema and Volume as a destination for the logs</p></li><li><p>Configure your clusters to write the logs to this destination</p></li></ul><h2>Create Schema and Volume</h2><p>If you already have a Schema and Volume where system related data is stored, you can use them. If not, create a Schema and/or Volume as a destination for the logs. </p><p>I recommend creating there using code to apply via your CI/CD in all environments in the same manner. I assume you have a catalog tied to your workspace in Unity Catalog and you will be using this. I my case this the catalog is called <code>snowglobe_demo_dev_we</code> since it is for demo purposes on my dev environment in Western Europe. </p><p>In Databricks, create a new Notebook named something like <code>init-system-schema</code>. Put the following code in this Notebook: </p><pre><code><code># Create a Widget to get the catalog and schema name
dbutils.widgets.text("catalog", "snowglobe_demo_dev", "Catalog")
dbutils.widgets.text("schema", "system", "Schema")

# Get the catalog and schema name
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")

# Create a managed Schema system
spark.sql(f&#8220;CREATE SCHEMA IF NOT EXISTS {catalog}.{schema}&#8221;)

# Grant access for developers group (including SP)
spark.sql(f"GRANT ALL PRIVILEGES ON SCHEMA {catalog}.{schema} TO `developers`")

# Create a managed Volume logging
spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog}.{schema}.logging")
# Create a managed Volume package
spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog}.{schema}.package")</code></code></pre><p>The catalog and schema name can be dynamically set by a Widget in order to work in different environments and may be injected via a Job Parameter. I am going with the name <code>system</code> for the Schema and <code>logging</code> and <code>package</code> for the Volume to describe their purpose. </p><blockquote><p>Note that the <em>developers</em> group gets all privileges on the Schema. You may want to restrict that in a production environment.</p></blockquote><p>Run this Notebook in interactive mode or create a Job with one Task running this Notebook. I would recommend running it as a Job in the name of the Service Principal. </p><p>Now, we have set up the location where the log should be delivered. In the next step we will make sure the log are delivered to this location. </p><h2>Configure Clusters</h2><p>There are several ways to configure the clusters to deliver the cluster logs to the Volume. You can use one or any of them to fit your needs. </p><h3>Databricks Web UI</h3><p>The most basic way ist to simply use the Databricks UI to configure a cluster to deliver the log to the Volume. </p><ol><li><p>Go to the Cluster and click <code>Edit</code></p></li><li><p>Go to <code>Advanced Options</code></p></li><li><p>Select the <code>Logging</code> tab</p></li><li><p>Select <code>Volume</code> as Destination</p></li><li><p>Browse your Catalog to the Volume and select it as path or simply paste the path into the field</p></li></ol><p>To quickly go ahead and try things out, this is definitely the way to go. However, this is not very convenient for multiple clusters and environments, so it might be advisable to automate this using Terraform or configuration files. We will briefly cover those in the next sections. </p><h3>Terraform</h3><p>If you are deploying your Databricks resources like clusters and/or jobs via Terraform, simple add this to the <code>databricks_cluster</code> resource<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>: </p><pre><code>cluster_log_conf {
  volumes {
    destination = "/Volumes/${local.catalog_name}/system/logging/"
  }
}</code></pre><p>This assumes that the catalog name is set in the Terraform locals to keep it dynamic. </p><h3>Job Cluster Configuration</h3><p>If you manage your jobs using yml-config files, add the following to the job configuration: </p><pre><code>cluster_log_conf:
  volumes:
    destination: /Volumes/{catalog}/system/logging</code></pre><h3>Cluster Policies</h3><p>Cluster Policies are also a convenient way to configure cluster log delivery. You can either to it by hand and overwrite the policies or use the Databricks SDK to automate it and apply dynamic resolution of environment specific variables. The code looks quite similar to the job cluster configuration: </p><pre><code>cluster_log_conf.path:
  type: fixed
  value: /Volumes/{catalog}/system/files/cluster_logs</code></pre><h2>Detour: Resolve Variables in Configuration Files</h2><p>In order to dynamically resolve variables like the catalog that might vary between environments, you can apply some custom Python functions to the configuration files. </p><p>First, add a function that recursively formats values in a dictionary or list. The <code>format_values</code> dictionary contains the mapping of the actual variables to values like <code>&#8220;catalog&#8221; : &#8220;snowglobe_demo_dev_we&#8221;</code>. </p><pre><code>def format_dict_values(
    dictionary: dict, format_values: dict = {}
) -&gt; dict:

    def recursive_format(obj):
        if isinstance(obj, str):
            return obj.format(**format_values)
        elif isinstance(obj, dict):
            return {k: recursive_format(v) for k, v in obj.items()}
        elif isinstance(obj, list):
            return [recursive_format(item) for item in obj]
        else:
            return obj

    return recursive_format(dictionary)</code></pre><p>Second, when reading the configuration, e.g. the <code>jobs.yml</code> where jobs are described, create a mapping dictionary and apply the <code>format_dict_values</code> function to the configuration. In our case, the implementation of the <code>get_catalog_name()</code> function dynamically returns the currently used catalog (e.g. the <code>dev</code> or <code>prod</code> catalog). When applied to the configuration file, the variables <code>{catalog}</code> will be replaced with the actual value during runtime. </p><pre><code>import yaml 

with open(f"{path}/jobs.yml", 'r') as file:
  jobs_config = yaml.safe_load(file)

mapping = {"catalog": get_catalog_name()}

jobs_config_mapped = format_dict_values(jobs_config, mapping)</code></pre><h1>Wrap Up</h1><p>After the cluster log destination is configured, the cluster logs are delivered every five minutes to the Volume based on a best effort method. That means: </p><ul><li><p>If the Volume does not (yet) exists, the logs are not delivered. This is done silently, so no error occurs. </p></li><li><p>The user running the cluster must have write access to the Volume. If not, the logs are not delivered.</p></li></ul><p>The logs show up in folders named with the corresponding cluster id. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4cUh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4cUh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 424w, https://substackcdn.com/image/fetch/$s_!4cUh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 848w, https://substackcdn.com/image/fetch/$s_!4cUh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 1272w, https://substackcdn.com/image/fetch/$s_!4cUh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4cUh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png" width="793" height="523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:523,&quot;width&quot;:793,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51442,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/169208842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4cUh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 424w, https://substackcdn.com/image/fetch/$s_!4cUh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 848w, https://substackcdn.com/image/fetch/$s_!4cUh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 1272w, https://substackcdn.com/image/fetch/$s_!4cUh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe035da4-05ef-4641-a9b2-acf007dd0186_793x523.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Logging information for a Cluster in the destination Volume</figcaption></figure></div><p>This concludes the first part of the series <strong>Simple Python Logger Framework for Databricks</strong>. The next parts of this series are:</p><ul><li><p><strong><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for-databricks-part2?r=58nu1d">Part 2: Create a Python DatabricksLogger</a></strong> </p></li><li><p><strong><a href="https://www.thelakehousepath.com/p/simple-python-logger-framework-for?r=58nu1d">Part 3: Analyze Cluster Log Files in Databricks SQL</a></strong> </p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Cluster log delivery to Volume: <a href="https://learn.microsoft.com/en-us/azure/databricks/release-notes/product/2025/march#compute-logs-can-now-be-delivered-to-volumes-public-preview">https://learn.microsoft.com/en-us/azure/databricks/release-notes/product/2025/march#compute-logs-can-now-be-delivered-to-volumes-public-preview</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Terraform Databricks Cluster Resource Documentation: <a href="https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/cluster">https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/cluster</a></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Thinking in Pandas vs. Thinking in PySpark]]></title><description><![CDATA[My Journey from Pandas to PySpark]]></description><link>https://www.thelakehousepath.com/p/thinking-in-pandas-vs-thinking-in</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/thinking-in-pandas-vs-thinking-in</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Thu, 07 Aug 2025 06:55:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!b6B-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When I started out learning Python as a Data Scientist and Data Engineer, I naturally ran into Pandas. It is the default when beginning to explore the language from a data perspective. I really liked the straightforward approach from Pandas, but something felt wrong all the time. I could not place it.</p><p>This changed when I first encountered PySpark.</p><p>That was back in the days when we ran a Hadoop Cluster on-premise and had Spark installed on top. I began exploring PySpark, because we had a ton of data we could not handle with Pandas. While it often required more code to achieve the same goals as Pandas, it felt more natural to me. And over time, I began to understand why.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b6B-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b6B-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!b6B-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!b6B-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!b6B-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b6B-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png" width="872" height="570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:872,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:47230,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/168646367?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b6B-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 424w, https://substackcdn.com/image/fetch/$s_!b6B-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 848w, https://substackcdn.com/image/fetch/$s_!b6B-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 1272w, https://substackcdn.com/image/fetch/$s_!b6B-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F248bdca3-a8fe-4ba5-b67b-84fdc86e7562_872x570.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>One of the first things that clicked was how PySpark encourages chaining operations. I love that style: transforming data in a series of clearly separated steps, each returning a new DataFrame. No surprises. No accidental overwriting. No mysterious <code>inplace=True</code> flags that behave inconsistently. In PySpark, every transformation is explicit and composable. The whole pipeline feels clear and easy to follow, especially when revisiting your code months later.</p><p>With Pandas, I often found myself juggling side effects and debugging strange behaviors. Something about the way Pandas allows (and sometimes encourages) mutation of data structures in place, combined with its rich but sometimes inconsistent syntax, just didn&#8217;t sit well with me.</p><p>I could say that Pandas is just not my style, but I tried to dig a bit deeper. </p><h2><strong>Pandas and PySpark in Action</strong></h2><p>To illustrate the difference let's take the simple operation of adding a column to a Dataframe. We will look at the Pandas way and at the PySpark way.</p><p>We create a simple dataset and add a new column by multiplying an existing one with the factor <code>2</code>.</p><p><em>Pandas</em></p><pre><code><code>import pandas as pd  
df = pd.DataFrame(
  {
     "id": [1, 2, 3, 4], 
     "value": [10, 20, None, 40] 
  }
)  

# Create a new column
df['value_doubled'] = df['value'] * 2

print(df)</code></code></pre><p>This modifies <code>df</code> directly. The original version of df does not exist any longer.</p><p><em>PySpark</em></p><pre><code><code>import pyspark.sql.functions as F

df = spark.createDataFrame(
  [(1, 10),(2, 20),(3, None), (4, 40)], 
  ["id", "value"]
)  

# Transformations always return a new DataFrame 
df_new = df.withColumn('value_doubled', F.col('value') * 2)</code></code></pre><p>With PySpark, the original <code>df</code> is preserved. You always assign the result of a transformation to a new variable. Even more, PySpark encourages chaining of commands, which I especially like. Say you want to add another column, you can simple adjust the code to this:</p><pre><code><code>df_new = (
  df
    .withColumn('value_doubled', F.col('value') * 2) # first new column
    .withColumn('department', F.lit('Marketing')) # second new column
)</code></code></pre><p>The dot syntax allows for clean chaining of commands. This is also possible in Pandas, but not always smart. To find out why, we have to take a little look under the hood.</p><h2><strong>The technical difference</strong></h2><p>The core difference between Pandas and PySpark is how they run operations and where the data lives during processing.</p><p>Pandas runs eagerly. Every time you write a line like <code>df['value'] * 2</code>, the operation is executed right away. The result is held in memory. If your dataset is small, this is fast and efficient. But as data grows, so does memory usage. And unless you explicitly copy your DataFrame, you're often modifying it in place. That can be risky. You might overwrite something by accident or struggle to track what changed and when.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xh4R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xh4R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 424w, https://substackcdn.com/image/fetch/$s_!xh4R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 848w, https://substackcdn.com/image/fetch/$s_!xh4R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 1272w, https://substackcdn.com/image/fetch/$s_!xh4R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xh4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png" width="332" height="333.31641554321965" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1266,&quot;width&quot;:1261,&quot;resizeWidth&quot;:332,&quot;bytes&quot;:145708,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/168646367?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xh4R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 424w, https://substackcdn.com/image/fetch/$s_!xh4R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 848w, https://substackcdn.com/image/fetch/$s_!xh4R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 1272w, https://substackcdn.com/image/fetch/$s_!xh4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e45921-6643-40c8-81f2-a5645a8bee18_1261x1266.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>PySpark works the other way around. It uses lazy execution. When you write <code>withColumn</code> or <code>filter</code>, nothing happens immediately. You're not processing data yet. You're building a plan. Spark collects all the steps you've described and waits until you trigger an action. That could be something like <code>.show()</code>, <code>.collect()</code>, or writing the data out. Only then it starts doing the actual work.</p><p>This is a big difference. Because Spark waits, it can look at the full plan and find the most efficient way to run it. It can skip steps that don&#8217;t matter. It can reorder operations. It can push filters down to the data source so less data needs to be loaded in the first place.</p><p>Another important point is mutability. Pandas allows you to change data in place. That can save memory, but it also opens the door to side effects. PySpark takes a different approach. Each transformation creates a new DataFrame. The original stays untouched. This leads to cleaner code and fewer surprises, especially in larger projects or shared environments.</p><p>So the technical difference isn&#8217;t just about memory or performance. It&#8217;s also about how they execute. Pandas is immediate and flexible, great for quick work. PySpark is deliberate and structured, built for scale and clarity.</p><h2><strong>Wrap Up</strong></h2><p>Technically, Pandas and PySpark are very different, of course. Pandas runs in-memory on a single machine and is great for small to medium-sized datasets. PySpark was designed for distributed computing from the beginning. It can handle massive datasets by spreading the work across a cluster. That adds complexity, but also enables scalability.</p><p>Pandas is working in memory and for this reason is very fast. But this comes with limits, too, because if Pandas runs out of memory, the processing will fail. So, one strategy to avoid this, is to change objects in place without replicating them. From a memory point of view, this is the smartest thing you can do.</p><p>PySpark brings a more functional mindset. You don&#8217;t mutate, you transform. And although the syntax can be more verbose, it enforces a discipline that leads to cleaner code structure. That creates a predictable data flow and allows easier debugging. At least for me.</p><p>Today, I always use PySpark. Not just because of the scale it offers, but because I genuinely prefer the structure, clarity, and coding style. Even when my data would fit into memory, I find myself reaching for PySpark. It just feels right.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Lakehouse Path! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Save 4–5 workdays of waiting per Year with Databricks Serverless Compute]]></title><description><![CDATA[Databricks Serverless Compute is starting within seconds.]]></description><link>https://www.thelakehousepath.com/p/save-45-workdays-of-waiting-per-year</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/save-45-workdays-of-waiting-per-year</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Sat, 12 Jul 2025 09:33:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FdfD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here&#8217;s how I calculated it:</p><p>Waiting for a Databricks cluster to start up is frustrating&#8212;especially when you just want to test something or run a small piece of code against your data.</p><p>Let&#8217;s assume you start a compute cluster 2&#8211;3 times a day and wait 4&#8211;5 minutes for it to be ready. Replace that with a 15-second startup time, and you&#8217;ll save over 40 hours of waiting throughout the year. That&#8217;s based on an average of 2.5 cluster starts per day, 200 workdays, and a 5-minute waiting period.</p><p>There are still some limitations, like custom Python environments, but Databricks is addressing those. Meanwhile, we can enjoy the time saved running (and debugging) our code.</p><p>So, why not test serverless while you wait for your cluster to start?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FdfD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FdfD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 424w, https://substackcdn.com/image/fetch/$s_!FdfD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 848w, https://substackcdn.com/image/fetch/$s_!FdfD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 1272w, https://substackcdn.com/image/fetch/$s_!FdfD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FdfD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif" width="716" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:716,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:444331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thelakehousepath.substack.com/i/168136233?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FdfD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 424w, https://substackcdn.com/image/fetch/$s_!FdfD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 848w, https://substackcdn.com/image/fetch/$s_!FdfD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 1272w, https://substackcdn.com/image/fetch/$s_!FdfD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F964b5473-785e-4fb8-b438-b36b4017aef8_716x562.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>]]></content:encoded></item><item><title><![CDATA[Welcome to The Lakehouse Path]]></title><description><![CDATA[Your resource to all things Databricks. New and helpful features, in-depth tutorials and everything along my journey mastering The Lakehouse Path myself.]]></description><link>https://www.thelakehousepath.com/p/coming-soon</link><guid isPermaLink="false">https://www.thelakehousepath.com/p/coming-soon</guid><dc:creator><![CDATA[Martin Debus]]></dc:creator><pubDate>Fri, 11 Jul 2025 12:44:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5ZtM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89a29407-5ef2-4752-9bd6-bcf212989a03_462x462.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to "The Lakehouse Path," a dedicated Substack focused on exploring how to build Data Lakehouses. </p><h1>Who I am</h1><p>I am Martin, a Data Professional since 2008. I started out as a Data Scientist in the publishing industry, learning the basics using SAS. After three years I moved on to grocery retail and build a team analyzing customer behaviour and building data products around the loyalty program. Over time, my role changed more in the direction on how to architecture the next generation of our data platform. Still being on SAS in 2015, I began exploring Python and ended up with Databricks in 2018 when it was first launched on Azure, which was the cloud provider of our choice back then. Since then, I am designing and building Data Solutions with Databricks. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mcci!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mcci!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mcci!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mcci!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mcci!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mcci!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg" width="1456" height="1052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1052,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1647392,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thelakehousepath.com/i/168071327?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mcci!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mcci!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mcci!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mcci!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fada6b19d-32be-4fe2-84cf-a646e9179953_3939x2845.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In 2022, I launched a boutique consultancy called SNOWGLOBE, based in Hamburg, Germany. We focus on enabling our clients to get the most out of their data with Azure Databricks. We already built custom data solution for the retail, construction and financial industry. Our clients range from mid-sized to large scale businesses. </p><h1>What to Expect</h1><p>In &#8220;The Lakehouse Path&#8221; I share what I am learning building data lakehouses and using Databricks. These are the main content categories you will see on this blog: </p><ul><li><p><strong>Databricks Bits &amp; Pieces</strong>: On LinkedIn, I share Bit &amp; Pieces about best practises using Databricks. From time to time, I will compile them into longer and more comprehensive posts in this blog. </p></li><li><p><strong>Deep-Dive Tutorials</strong>: Occasional in-depth guides and tutorials that explore complex topics and advanced techniques in Databricks.</p></li></ul><p>Thank you for joining me on "The Lakehouse Path."</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.thelakehousepath.com/subscribe?&quot;,&quot;text&quot;:&quot;Jetzt abonnieren&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.thelakehousepath.com/subscribe?"><span>Jetzt abonnieren</span></a></p>]]></content:encoded></item></channel></rss>