How to Evaluate Machine Learning Algorithms for Human Activity Recognition
Human activity recognition is the problem of classifying sequences of accelerometer data recorded by specialized harnesses or smart phones into known well-defined movements.
Classical approaches to the problem involve hand crafting features from the time series data based on fixed-sized windows and training machine learning models, such as ensembles of decision trees. The difficulty is that this feature engineering requires deep expertise in the field.
Recently, deep learning methods such as recurrent neural networks and one-dimensional convolutional neural networks, or CNNs, have been shown to provide state-of-the-art results on challenging activity recognition tasks with little or no data feature engineering, instead using feature learning on raw data.
These are the main parts:
- Activity Recognition Using Smartphones Dataset
- Modeling Feature Engineered Data
- Modeling Raw Data
Activity Recognition Using Smartphones Dataset
Human Activity Recognition, or HAR for short, is the problem of predicting what a person is doing based on a trace of their movement using sensors.
A standard human activity recognition dataset is the ‘Activity Recognition Using Smart Phones’ dataset made available in 2012.
It was prepared and made available by Davide Anguita, et al. from the University of Genova, Italy and is described in full in their 2013 paper “A Public Domain Dataset for Human Activity Recognition Using Smartphones.” The dataset was modeled with machine learning algorithms in their 2012 paper titled “Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine.”
The dataset was made available and can be downloaded for free from the UCI Machine Learning Repository:
Modeling Feature Engineered Data
In this section, we will develop code to load the feature-engineered version of the dataset and evaluate a suite of nonlinear machine learning algorithms, including SVM used in the original paper.
The goal is to achieve at least 89% accuracy on the test dataset.
The results of methods using the feature-engineered version of the dataset provide a baseline for any methods developed for the raw data version.
This section is divided into five parts; they are:
- Load Dataset
- Define Models
- Evaluate Models
- Summarize Results
- Complete Example
The first step is to load the train and test input (X) and output (y) data.
Specifically, the following files:
The input data is in CSV format where columns are separated via whitespace. Each of these files can be loaded as a NumPy array.