Yoga Pose Identifier using PyTorch

Harini Ashok
4 min readJun 29, 2020

For my first real project with PyTorch, I decided to do a Yoga Pose Classifier. I used three models of varying complexity and compared results to find the best model for this dataset.

Dataset and Preprocessing:

I found a custom dataset collected by Anastasia Marchenkova, which has a collection of 10 types of Yoga poses, and a total training set of 730 images. The objective is to classify a particular image into one of the 10 categories to the maximum accuracy possible.

The various Poses/classes in the dataset:

There are also varying number of input images in each class:

Since the images are of different sizes, I resized them to a standard 256 x 256 and normalized them with a set of mean and variances that I calculated.

Code to calculate mean and variance: (uncomment and run once and then save the mean and variance in a list)

A sample image from the dataset:

Out of a 730 images, I used 630 for training and 100 for validation. I used the random_split() function of torch.utils.data. I then loaded the training and validation sets into two different dataloaders, The standard DataLoader for using with Resnet model and a data bunch using ImageDataBunch to use with the fast.ai library.

A single batch of data from the DataLoader and ImageDataBunch loaders:

Model Definition:

I defined a CNN model with 8 convolutional layers and Linear layers at the end to modify the number of output layers. I used Rectified Linear Unit activation and a max pooling layer after every two convolution layers with a 2x2 window.

The second model is a ResNet9 Model (learn more about ResNets and it’s variants here)and in each convolution block, there’s batch normalization, LeakyRelu activation and a MaxPooling function along with the generic Convolution Layer.

For the third model, I used the fast.ai library’s amazing transfer learning modules, and used the pre-trained Squeezenet1_1 model.

Using the GPU:

In order to facilitate Torch to use the GPU, for faster execution and parallel computing

  1. Testing GPU availability
  2. Define method to move tensors onto GPU
  3. Define method to wrap DataLoader and move data onto GPU

Training:

Model 1: To train the CNN model, which initially had a validation loss of 2.3069 , I set the learning rate at 0.001 ( both0.01 and 0.0001 gave very bad results while training) and used the Adam optimizer (worked better than SGD and trained for 20 epochs.

The final validation accuracy was 0.4766.

Model 2: To train the ResNet9 model, which initially had a validation loss of 2.3069 too, I set the learning rate at 1e-5 , a weight decay of 1e-4and used Adam optimizer and trained for 35 epochs.

The final validation accuracy was 1.2623.

Model 3: Initial Val loss: 1.3419.Using the learner module of the fast.ai library and having frozen the layers initially, I trained the model using fit_one_cycle() method for 10 epochs. After that, I unfroze the layers and trained for 10 epochs at a learning rate ranging between 1e-3 — 1e-5 and 10 more epochs at learning rate between 1e-4 — 1e-6.

Final Val loss: 0.6744

Loss and Accuracy Curve:

Accuracy Curve: Model 1 — Blue line, Model 2 — Yellow line

The CNN model accuracy seems to oscillate up and down before settling at around 50 percent whereas the ResNet model accuracy takes a jump from 10 to 60 percent and then settles at around 65 percent.

Loss Curve: Model 1

There is a huge difference between the validation loss and training loss. While the training loss decreases exponentially, the validation loss increases with more training, which is a clear case of overfitting.

Loss Curve: Model 2

Both the training and validation loss seem to exponentially decrease and flatten around a point, training loss at 0.01 and validation at 1.2.

Prediction:

I tried predicting using images from the test dataset and directly from google, for the ResNet model and got the following results:

Example of a correctly classified and misclassified image from test set:

Classification of external image:

For the Squeezenet model, since the test set and validation set were the same, the model was able to accurately classify most of the images in the test set

The model also correctly classified the same external images tested with Resnet

Conclusion and Future Work:

The Squeezenet model works really well in classifying images into different poses with an accuracy of around 93%, which is really good considering the small training set size. Transfer Learning along with techniques like regularization and batch normalization creates a highly optimized model capable of obtaining meticulous results with only a little dataset.

This work can be extended for classification of the Yoga82 dataset (containing more than 28.4 images) , with 82 different classes. It poses a much more complicated problem, that requires a deeper neural network, which is highly customized to this particular type of data. Possibly a ResNet101, a DenseNet or an Inception model.

The Categorization of Yoga82 dataset

--

--

Harini Ashok

An introspective reader and writer trying to learn more about why things are the way they are.