Former insurance professional reformed data scientist, Jonathan is obsessed by finding better ways to express thoughts through code. During the day time, he works as a data science lead at EPAM Systems, where he tries to explain easily what he learned the hard way. He is also the author of the "PySpark in Action" book, which greatly simplifies big data analysis. Mechanical keyboard enthusiast, armchair DIY-ist and soon-to-be ex-procrastinator, when the right time comes by.
April 9th, 2020 | 32 mins 1 sec
data processing, data science, pyspark, python
Apache Spark is a unified analytics engine for large-scale data processing.
PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.