Python has an extremely rich and healthy ecosystem of data science tools (known as PyData ecosystem). Python's popularity for data science is largely due to the strength of its core libraries, high productivity for prototyping and building small and reusable systems. Unfortunately, to outsiders this ecosystem can be a bit confusing for those new to Python, or even experienced programmers moving to Python for its excellent data analysis capabilities.
Peadar will touch on pure python, NumPy, Pandas, Blaze, xray, bcolz, Dask, and Spark, with a focus on the use-cases for each one. What do you do when your data doesn't fit in-memory, when do you need to use a functional programming approach - when do you need a compression? Where does Dask fit into all of this? When do you need Spark?
- Peadar Coyle, data scientist et Channel 4