Credit: Data Science Central
Last time, I posted Part 2 of a blog trilogy on data programming with Python. That article revolved on showcasing NumPy, a comprehensive library created in 2005 that extends the Python cor… “large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.” In addition to introducing a wealth of highly-performant new data structures and mathematical functions, NumPy changed the data programming metaphor in Python from procedural to specification.
Part 1 demonstrated basic data programming with Python. There, I resurrected scripts written 10 years ago that deployed core Python data structures, functions, and looping-like code to assemble a Python list for analyzing stock market returns. Fun, but a lot more work to perform the same tasks in Part 2.
This Part 3 post replicates the work done in Parts 1 and 2, using the even more productive Pandas library. In Pandas, core Python data structures such as lists/dictionaries and functionals like list comprehensions serve mainly to feed the Pandas beast.
Since the Part 3 code is simpler than the NumPy of Part 2, and much less involved than the list processing of Part 1, I’ve added a few graphs at the end, implemented with the productive Seaborn statistical data visualization library, built on top of Python mainstay matplotlib. Seaborn’s grown by leaps and bounds recently and is now a legitimate competitor to R’s ggplot2 for statistical graphics.
In the analysis that follows, I focus on performance of the Russell 3000 index, a Wilshire 5000-like portfolio for “measuring the market”. I first download two files — a year-to-date and a history, that provide final 3000 daily index levels starting in 2005. Attributes include index name, date, level without dividends reinvested, and level with dividends reinvested. I then wrangle the data using Pandas to get to the desired end state.
The technology used for all three articles revolves on JupyterLab 0.32.1, Anaconda Python 3.6.5, NumPy 1.14.3, and Pandas 0.23.0.
See the remainder of the blog here.