Step 1 - First Impressions

Before diving into data exploration, let's take a peek into one of the files... It's always good to have an idea of the data we'll be working with!

Notes

Nice. The data files have "txt" extensions, but contain structured, tabular data. In other words - CSV! So we can use the Pandas library, which will make our lives easier.


Step 2 - Loading the Data

Now we load the contents of the other files into the same structure, and differentiate the data source (file) with the user number.

Notes

We have a reasonably large dataset: 137824 readings for our 4 users.


Step 3 - Plotting Movement

Since this data describes user movements let's see what it looks like to project one into a surface - it's good to have a feeling about our users'movement patterns. We could have used a map, but these coordinates were transformed for privacy reasons so the map would just be noise.

Notes

We can easily see in Figure 1 that there is some overlap between the locations of all users. We cannot conclude much, however, since the data was transformed.


Step 4 - Finding Important Locations

Let's see if there are any relevant locations for each of our users - i.e., the place where the users spend the most of their time. Assuming that readings were taken regularly, the most relevant locations for the users would be the ones that have the most readings. In other words, one could expect that relevant locations would have a higher reading density.

So we'll create a density heatmap for the readings of each user.

Notes

It's easy to see from the heatmaps and the marginal histograms of figures 2-5 that, for most users, location readings fall mostly into a couple of coordinate bins. More simply, each user has a couple of relevant locations (...they may have more, but it's enough for this analysis if we stick to the two most relevant locations per user). So here are the coordinate ranges of those locations, as extracted from the charts above (just hover the cursor over the heatmap cells/histogram bins).

Location 1 Location 2
Latitude (range) Longitude (range) Latitude (range) Longitude (range)
User 1 [-15.30, -15.25] [-64.76, -64.74] [-15.70, -15.65] [-64.98, -64.96]
User 2 [-15.26, -15.24] [-64.76, -64.74] [-15.08, -15.06] [-64.66, -64.64]
User 3 [-15.26, -15.24] [-64.76, -64.74] [-15.46, -15.44] [-64.74, -64.72]
User 4 [-15.26, -15.24] [-64.76, -64.74] [-15.04, -15.02] [-64.61, -64.60]


Step 5 - Adding Meaning to Locations

Now that we found the most relevant locations for each of our users, we can try to learn more about those locations. So let's explore another important dimension in the dataset - time. Our readings already have the day of the week (column "dow"), so we can easily see how many readings were taken at each day of the week (i.e., Monday readings vs. Tuesday readings vs...)

Notes

There is a considerable number of "Location 1" readings for every working day of the week, while "Location 2" has readings every day and is especially prevalent in the weekends. In that sense we can assume:


Step 6 - User Routines

Now we can focus on something more interesting than just coordinates and time. We can get a sense of our users' routines! Let's try to gain some insight into our users' daily lives.


Conclusions

There! We can now wrap our analysis with three important findings about our users: