April 9th, 2020 | 32 mins 1 sec
data processing, data science, pyspark, python
Apache Spark is a unified analytics engine for large-scale data processing.
PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
November 30th, 2019 | 22 mins 48 secs
data engineering, data pipelines, data science, machine learning, pipeline debt, pipeline tests
Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.
January 31st, 2019 | 33 mins 34 secs
corporate training, data science, machine learning, python
Matt Harrison is an author and instructor of Python and Data Science. This episode focuses on his training company, MetaSnake, and corporate training.
December 10th, 2018 | 30 mins 47 secs
data engineering, data pipelines, data science, etl, machine learning, software engineering
Data science, data engineering, data analysis, and machine learning are part of the recent massive growth of Python.
But really what is data science?
Vicki Boykis works on projects in machine learning and data engineering across a variety of industries, and joins this episode to help us understand really what is data science.
November 30th, 2017 | 37 mins 14 secs
data science, fuzz testing, software testing
A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.