This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
…
continue reading
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading hig ...
…
continue reading
Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
…
continue reading
Databases and data engineering episodes of Software Engineering Daily
…
continue reading
Discussions around Data Engineering
…
continue reading
Unlocking the Power of Data: A Guide for Leaders and Executives" As a leader or executive, you know the importance of data in driving business decisions and staying ahead of the competition. But, with the increasing amount of data generated daily, it can be overwhelming to know where to start and how to utilize this valuable asset effectively. This blog, with multiple topics, addresses the technical terminology in data engineering and analytics on the cloud.
…
continue reading
1
Bridging Code and UI in Data Orchestration with Kestra
44:30
44:30
Spill senere
Spill senere
Lister
Lik
Likt
44:30
Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to enabl…
…
continue reading
1
Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success
24:56
24:56
Spill senere
Spill senere
Lister
Lik
Likt
24:56
Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter,…
…
continue reading
1
How Uber Manages 1 Million Daily Tasks Using Airflow, with Shobhit Shah and Sumit Maheshwari
28:44
28:44
Spill senere
Spill senere
Lister
Lik
Likt
28:44
When data orchestration reaches Uber’s scale, innovation becomes a necessity, not a luxury. In this episode, we discuss the innovations behind Uber’s unique Airflow setup. With our guests Shobhit Shah and Sumit Maheshwari, both Staff Software Engineers at Uber, we explore how their team manages one of the largest data workflow systems in the world.…
…
continue reading
1
Streaming Data Into The Lakehouse With Iceberg And Trino At Going
39:49
39:49
Spill senere
Spill senere
Lister
Lik
Likt
39:49
In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads th…
…
continue reading
1
An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin
56:11
56:11
Spill senere
Spill senere
Lister
Lik
Likt
56:11
Summary The challenges of integrating all of the tools in the modern data stack has led to a new generation of tools that focus on a fully integrated workflow. At the same time, there have been many approaches to how much of the workflow is driven by code vs. not. Burak Karakan is of the opinion that a fully integrated workflow that is driven entir…
…
continue reading
1
Building Resilient Data Systems for Modern Enterprises at Astrafy with Andrea Bombino
28:29
28:29
Spill senere
Spill senere
Lister
Lik
Likt
28:29
Efficient data orchestration is the backbone of modern analytics and AI-driven workflows. Without the right tools, even the best data can fall short of its potential. In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, shares insights into his team’s approach to optimizing data transformation and orchestration …
…
continue reading
1
Feldera: Bridging Batch and Streaming with Incremental Computation
47:36
47:36
Spill senere
Spill senere
Lister
Lik
Likt
47:36
Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous computation of data, machine learning, and AI workloads. The discussion covers the concept of incremental computation, the origins of Feldera, and its unique ability to handle both streaming and batch …
…
continue reading
1
Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan
28:02
28:02
Spill senere
Spill senere
Lister
Lik
Likt
28:02
In this special roundup episode of The Data Engineering Show, the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia…
…
continue reading
1
Inside Airflow 3: Redefining Data Engineering with Vikram Koka
30:08
30:08
Spill senere
Spill senere
Lister
Lik
Likt
30:08
Data orchestration is evolving faster than ever and Apache Airflow 3 is set to revolutionize how enterprises handle complex workflows. In this episode, we dive into the exciting advancements with Vikram Koka, Chief Strategy Officer at Astronomer and PMC Member at The Apache Software Foundation. Vikram shares his insights on the evolution of Airflow…
…
continue reading
1
Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent
48:50
48:50
Spill senere
Spill senere
Lister
Lik
Likt
48:50
Summary Gleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the challenges and innovations in data migrations. Gleb shares his experiences building and scaling data platforms at companies like Autodesk and Lyft, and how these experiences inspired the creation of DataFold to address data quality issues across teams. He out…
…
continue reading
1
Building a Data-Driven HR Platform at 15Five with Guy Dassa
20:25
20:25
Spill senere
Spill senere
Lister
Lik
Likt
20:25
Data and AI are revolutionizing HR, empowering leaders to measure performance and drive strategic decisions like never before. In this episode, we explore the transformation of HR technology with Guy Dassa, Chief Technology Officer at 15Five, as he shares insights into their evolving data platform. Guy discusses how 15Five equips HR leaders with to…
…
continue reading
1
Bring Vector Search And Storage To The Data Lake With Lance
58:01
58:01
Spill senere
Spill senere
Lister
Lik
Likt
58:01
Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner workings of the Lance format for table definitions and file storage, …
…
continue reading
1
The Role of Python in Shaping the Future of Data Platforms with DLT
54:08
54:08
Spill senere
Spill senere
Lister
Lik
Likt
54:08
Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the P…
…
continue reading
1
Build Your Data Transformations Faster And Safer With SDF
42:36
42:36
Spill senere
Spill senere
Lister
Lik
Likt
42:36
Summary In this episode of the Data Engineering Podcast Lukas Schulte, co-founder and CEO of SDF, explores the development and capabilities of this fast and expressive SQL transformation tool. From its origins as a solution for addressing data privacy, governance, and quality concerns in modern data management, to its unique features like static an…
…
continue reading
1
The Intersection of AI and Data Management at Dosu with Devin Stein
20:18
20:18
Spill senere
Spill senere
Lister
Lik
Likt
20:18
Unlocking engineering productivity goes beyond coding — it’s about managing knowledge efficiently. In this episode, we explore the innovative ways in which Dosu leverages Airflow for data orchestration and supports the Airflow project. Devin Stein, Founder of Dosu, shares his insights on how engineering teams can focus on value-added work by automa…
…
continue reading
1
The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn
32:57
32:57
Spill senere
Spill senere
Lister
Lik
Likt
32:57
In this episode of The Data Engineering Show, the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows. Together they cover: Automated Data Pipelines: Ryan…
…
continue reading
1
Scaling Airbyte: Challenges and Milestones on the Road to 1.0
57:11
57:11
Spill senere
Spill senere
Lister
Lik
Likt
57:11
Summary Airbyte is one of the most prominent platforms for data movement. Over the past 4 years they have invested heavily in solutions for scaling the self-hosted and cloud operations, as well as the quality and stability of their connectors. As a result of that hard work, they have declared their commitment to the future of the platform with a 1.…
…
continue reading
1
AI-Powered Vehicle Automation at Ford Motor Company with Serjesh Sharma
26:11
26:11
Spill senere
Spill senere
Lister
Lik
Likt
26:11
Harnessing data at scale is the key to driving innovation in autonomous vehicle technology. In this episode, we uncover how advanced orchestration tools are transforming machine learning operations in the automotive industry. Serjesh Sharma, Supervisor ADAS Machine Learning Operations (MLOps) at Ford Motor Company, joins us to discuss the challenge…
…
continue reading
1
From Task Failures to Operational Excellence at GumGum with Brendan Frick
24:06
24:06
Spill senere
Spill senere
Lister
Lik
Likt
24:06
Data failures are inevitable but how you manage them can define the success of your operations. In this episode, we dive deep into the challenges of data engineering and AI with Brendan Frick, Senior Engineering Manager, Data at GumGum. Brendan shares his unique approach to managing task failures and DAG issues in a high-stakes ad-tech environment.…
…
continue reading
1
Enhancing Data Accessibility and Governance with Gravitino
38:41
38:41
Spill senere
Spill senere
Lister
Lik
Likt
38:41
Summary As data architectures become more elaborate and the number of applications of data increases, it becomes increasingly challenging to locate and access the underlying data. Gravitino was created to provide a single interface to locate and query your data. In this episode Junping Du explains how Gravitino works, the capabilities that it unloc…
…
continue reading
1
From Sensors to Datasets: Enhancing Airflow at Astronomer with Maggie Stark and Marion Azoulai
22:25
22:25
Spill senere
Spill senere
Lister
Lik
Likt
22:25
A 13% reduction in failure rates — this is how two data scientists at Astronomer revolutionized their data pipelines using Apache Airflow.In this episode, we enter the world of data orchestration and AI with Maggie Stark and Marion Azoulai, both Senior Data Scientists at Astronomer. Maggie and Marion discuss how their team re-architected their use …
…
continue reading
1
Mastering Data Orchestration with Airflow at M Science with Ben Tallman
24:36
24:36
Spill senere
Spill senere
Lister
Lik
Likt
24:36
Mastering the flow of data is essential for driving innovation and efficiency in today’s competitive landscape. In this episode, we explore the evolution of data orchestration and the pivotal role of Apache Airflow in modern data workflows.Ben Tallman, Chief Technology Officer at M Science, joins us and shares his extensive experience with Airflow,…
…
continue reading
Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward.Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workf…
…
continue reading
1
Enhancing Business Metrics With Airflow at Artlist with Hannan Kravitz
23:51
23:51
Spill senere
Spill senere
Lister
Lik
Likt
23:51
Data orchestration is revolutionizing the way companies manage and process data. In this episode, we explore the critical role of data orchestration in modern data workflows and how Apache Airflow is used to enhance data processing and AI model deployment.Hannan Kravitz, Data Engineering Team Leader at Artlist, joins us to share his insights on lev…
…
continue reading
1
Cutting-Edge Data Engineering at Teya with Alexandre Magno Lima Martins
23:46
23:46
Spill senere
Spill senere
Lister
Lik
Likt
23:46
Data engineering is constantly evolving and staying ahead means mastering tools like Apache Airflow. In this episode, we explore the world of data engineering with Alexandre Magno Lima Martins, Senior Data Engineer at Teya. Alexandre talks about optimizing data workflows and the smart solutions they've created at Teya to make data processing easier…
…
continue reading
1
The Evolution of DataOps: Insights from DataKitchen's CEO
53:30
53:30
Spill senere
Spill senere
Lister
Lik
Likt
53:30
Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures, the need for rapid changes, and high customer demands. Chris delves …
…
continue reading
1
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
49:26
49:26
Spill senere
Spill senere
Lister
Lik
Likt
49:26
Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be us…
…
continue reading
1
Airflow Strategies for Business Efficiency at Campbell with Larry Komenda
26:10
26:10
Spill senere
Spill senere
Lister
Lik
Likt
26:10
Managing data workflows well can change the game for any company. In this episode, we talk about how Airflow makes this possible. Larry Komenda, Chief Technology Officer at Campbell, shares how Airflow supports their operations and improves efficiency.Larry discusses his role at Campbell, their switch to Airflow, and its impact. We look at their st…
…
continue reading
1
How Generative AI Is Impacting Data Engineering Teams
54:45
54:45
Spill senere
Spill senere
Lister
Lik
Likt
54:45
Summary Generative AI has rapidly gained adoption for numerous use cases. To support those applications, organizational data platforms need to add new features and data teams have increased responsibility. In this episode Lior Gavish, co-founder of Monte Carlo, discusses the various ways that data teams are evolving to support AI powered features a…
…
continue reading
1
How Laurel Uses Airflow To Enhance Machine Learning Pipelines with Vincent La and Jim Howard
23:58
23:58
Spill senere
Spill senere
Lister
Lik
Likt
23:58
The world of timekeeping for knowledge workers is transforming through the use of AI and machine learning. Understanding how to leverage these technologies is crucial for improving efficiency and productivity.In this episode, we’re joined by Vincent La, Principal Data Scientist at Laurel, and Jim Howard, Principal Machine Learning Engineer at Laure…
…
continue reading
1
The Role of Product Managers in Data-Centric Organizations
52:58
52:58
Spill senere
Spill senere
Lister
Lik
Likt
52:58
Summary In this episode Praveen Gujar, Director of Product at LinkedIn, talks about the intricacies of product management for data and analytical platforms. Praveen shares his journey from Amazon to Twitter and now LinkedIn, highlighting his extensive experience in building data products and platforms, digital advertising, AI, and cloud services. H…
…
continue reading
1
Neon: A Serverless And Developer Friendly Postgres
57:43
57:43
Spill senere
Spill senere
Lister
Lik
Likt
57:43
Summary Postgres is one of the most widely respected and liked database engines ever. To make it even easier to use for developers to use, Nikita Shamgunov decided to makee it serverless, so that it can scale from zero to infinity. In this episode he explains the engineering involved to make that possible, as well as the numerous details that he an…
…
continue reading
Databases underpin almost every user experience on the web, but scaling a database is one of the most fundamental infrastructure challenges in software development. PlanetScale offers a MySQL platform that is managed and highly scaleable. Sam Lambert is the CEO of PlanetScale and he joins the show to talk about why he started the platform, scaling …
…
continue reading
1
Improve Data Quality Through Engineering Rigor And Business Engagement With Synq
59:48
59:48
Spill senere
Spill senere
Lister
Lik
Likt
59:48
Summary This episode features an insightful conversation with Petr Janda, the CEO and founder of Synq. Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for …
…
continue reading
1
How Vibrant Planet's Self-Healing Pipelines Revolutionize Data Processing
23:51
23:51
Spill senere
Spill senere
Lister
Lik
Likt
23:51
Discover the cutting-edge methods Vibrant Planet uses to revolutionize geospatial data processing and resource management.In this episode, we delve into the intricacies of scaling geospatial data processing and resource allocation with experts from Vibrant Planet. Joining us are Cyrus Dukart, Engineering Lead, and David Sacerdote, Staff Software En…
…
continue reading