A note before we start, this is by no means an original analysis, just my attempt at learning something new.  With that said: A slow perio...

Python, Pandas, and other Data Analysis animals

A note before we start, this is by no means an original analysis, just my attempt at learning something new.  With that said:

A slow period in work, and a desire to learn something has had me fumbling around with Python for data analysis over the last few weeks.  I've dabbled in Python before, mainly for parsing the rugby data collected in a previous post.  However this time I wanted to see what I could do with Python from start to finish.

Therefore I trawled the internet for resources and training material, and came across an excellent blog by yhat using their Rodeo IDE for data science. Be sure to check it out. So Rodeo downloaded, and my iPhone Health app's step counter data extracted, I decided to repeat their analysis and train my Python skills.

The source

First of all I extracted my Health app data from my iPhone and loaded into Rodeo to get a sense of the data it collects.  Depending on how much you utilise the Health app features there's actually quite a bit of data you can pull from it e.g. heart rates, distances, steps.  


Ok, so a quick time series doesn't show much, just that there was a period of time (around day 550) when either I didn't walk at all, or maybe just didn't have my phone.  The flat part of the time series, up to day 80 or so, is the period of time before I received the iPhone, and the first point is actually 9 steps; they must have been messing around with my phone in the factory.

The interesting


I'm pretty sure I walk much less during the weekend than the weekdays...


Yup- that definitely confirms it, the above plot shows a frequency distribution for my steps during the weekends (purple) and weekday (teal), that's a clear conclusion that I need to walk more during the weekends.  One can clearly see a bi-modal distribution in weekdays, and I believe this is due to the 'weekend' effect i.e. on bank holidays, I do very little walking, just like the weekends.

Now currently I'm using two phones, and tend to have them with me most days.  My second phone is an Android device, the WileyFox Swift.  I have the Google Fit app installed on this, therefore I could load up the step data from this device and perform a quick comparison between the two:


The negative step difference is due to Android, so it looks like there's actually plenty of occasions where I've forgotten to carry my iPhone. 

For this final plot, I've taken the max number of steps each day, for either iPhone or Android and then cut them based on whether the date falls before or after 14th July 2016.  Why this date? That's when PokemonGo came out in the UK...


One can clearly see the bi-modal distribution in the pre- and post-PokemonGo step distributions, however the standard deviation of the post-PokemonGo distribution is clearly larger.

The conclusion

Well what does all this tell me:
  1. I need to walk more on the weekends
  2. PokemonGo is helping with that, just not enough
So I'm going out for a walk.

0 comments: