fbpx
Lucas Cramer
Engineering Partner, Lucas Cramer

A supportive neural network: My Hackathon Onboarding Project

July 14, 2021 in CHOP

In their first week with Commit, each new Engineering Partner takes on a hackathon onboarding project. They build a project, present it to the Commit community, and write a short summary of their experience. There are no restrictions, no limits, no joke.

I self-identify as a perpetual worrier (the pandemic has certainly not helped!), and two strategies that have always helped me are to write down my thoughts and to find others who share similar thoughts and feelings. I created Thought Bubbles, my Hackathon Onboarding Project, to give users a safe and anonymous space to share whatever is on their minds and to offer comfort in knowing they are never alone in their thoughts.

As an example, suppose Bob is anxious about a presentation at work. Bob could “send” his anxiety as a thought bubble, and immediately receive thoughts from other anonymous users who are also experiencing anxiety. This would help Bob remember that anxiety is perfectly normal and there are always others with similar feelings.

Classifying feelings

Thought Bubbles is powered by a trained neural network that attempts to classify thoughts into one of joy, fear, anger, sadness, disgust, shame, or guilt. PsychExp was used as the training data set, which contains around 7500 phrases classified into one of the aforementioned emotions. A pre-trained BERT model was used to tokenize phrases and create word embeddings.

A pre-trained BERT model was used to tokenize phrases and create word embeddings.
A high-level overview of the predictor with the outputted classification probabilities

Now, you may be wondering what BERT is. In fact, if you have used Google in the past year you have already been exposed to BERT, since as of 2020 it powers nearly every single English-based query on Google search. BERT stands for bidirectional encoder representations from transformers, with bidirectional being the keyword. Without getting into too much technical detail, BERT is highly contextual, meaning that it can understand the meaning of each word in the context of an entire sentence, rather than only the word in isolation. As a trivial example, this allows the word bank in river bank and bank deposit to be treated as having entirely different meanings.

Working it out

The application itself is rather simple. A React app calls a single endpoint on a backend Flask service with the contents of the “thought”, which runs it through the trained model and matches its classifications with other thoughts in a SQLite database. Finally, the original thought is inserted into the database with its classifications such that future thoughts can be matched to it. The Commit App Playground was used to build and deploy both the frontend and backend, which is built on top of Zero and facilitates quickly spinning up your infrastructure in AWS and securely exposing it over a publicly accessible route.

A React app calls a single endpoint on a backend Flask service with the contents of the “thought”, which runs it through the trained model and matches its classifications with other thoughts in a SQLite database.
Sending a thought
the “thought”, which runs it through the trained model and matches its classifications with other thoughts in a SQLite database
Viewing matched thoughts
the original thought is inserted into the database with its classifications such that future thoughts can be matched to it.
High-level system design

Learning machine learning

The main challenges came from training and working with the machine learning model, largely because I had close to zero prior machine learning experience. For training the model itself I was fortunate to have found James Briggs on YouTube, who created an excellent guide for building a neural network using a pre-trained BERT model. However, the trained model ended up being close to 500 MB in size, which is too large to commit to GitHub. This was problematic because the Docker image to be deployed to Kubernetes was built from the code in the GitHub repository, and without the trained image the predictor would fail. I resolved this by uploading the saved model to my Dropbox account and creating a bash script to download the model to the expected directory. Finally, I modified the Dockerfile to invoke the script while building the image, which ensured the trained model was available on the deployed Kubernetes pod.

#!/usr/bin/env bash
set -e

# Download the zip file to a tmp directory
MODEL_URL=https://www.dropbox.com/s/<path-to-saved-model>

# Create a tmp directory
mkdir tmp
cd tmp

# Download to the tmp directory
wget -O thought_classifier_model.zip $MODEL_URL

# Unzip the zip file to the tmp directory
unzip thought_classifier_model.zip

# Move the thought_classifier model to the required directory
mv thought_classifier_model ../

# Return to the main directory and delete the tmp directory
cd ../
rm -rf tmp

Then, invoking it from within the Dockerfile was as simple as adding

RUN scripts/download-model.sh.

After deploying the code, I was surprised to discover that I needed to bump the memory requests and limits for each Kubernetes pod to 2GB and 4GB respectively, to avoid an immediate out-of-memory crash. It turns out Tensorflow is highly memory intensive, particularly for larger trained models.

The code to predict the classification of thoughts was quite simple as well. The thought text is tokenized in the exact same way as the training data and run through the predictor. Finally, the classification with the highest probability is returned.

def predict_classification_ids(self, thought_text):
        prepped_data = self.__prep_data(thought_text)
        predictions = self.model.predict(prepped_data)

        # Note that this only returns the top prediction
        return [np.argmax(predictions[0])]

 def __prep_data(self, text):
        tokens = self.tokenizer.encode_plus(text,
                                            max_length=self.SEQ_LEN,
                                            truncation=True,
                                            padding="max_length",
                                            add_special_tokens=True,
                                          return_token_type_ids=False,
                                            return_attention_mask=True,
                                            return_tensors='tf')
        return {
            'input_ids': tf.cast(tokens['input_ids'], tf.float64),
            'attention_mask': tf.cast(tokens['attention_mask'], tf.float64)
        }

Thinking of the future

Overall I am extremely happy with how Thought Bubbles turned out, given my lack of prior machine learning and Python/Flask experience. In the future, I hope to harden the classifier such that thoughts can be matched based on the content of the thoughts themselves, rather than only their predicted emotion, and to allow users to report when their thoughts are misclassified in order to improve the model.

You can try out Thought Bubbles for yourself here, but please note that because it requires an exceedingly large amount of memory in a shared sandbox environment, there is no guarantee that the service will be up and running at any given time.

The following resources were particularly helpful throughout the development of Thought Bubbles:

  • ML Zero to Hero” YouTube series from the official TensorFlow channel for introductory concepts
  • James Briggs on YouTube for working with the BERT word embedder

Lucas is part of Commit’s Engineering Partner Program. Visit commit.dev to learn more, or apply to join today!

Lucas Cramer is a full-stack Software Engineer who loves creating resilient and extensible software with end users in mind. In his spare time he enjoys tearing up the courts in tennis and nerding out over video games.