A couple of weeks ago, my office was all abuzz about making brackets for “The Bachelor.” “The Bachelor” is a so-called reality TV show in which about 30 contestants vie for the heart of a single bachelor. In the end, he and this one lucky lady will get married and live happily ever after. The brackets are people’s predictions for the order in which contestants will get eliminated and who will be the eventual winner.
The reality of The Bachelor is that the couple almost never stays together and the bachelor and contestants have very little say in what happens; instead, the producers call most of the shots to create drama and drive viewers. But while they might not find love, the top contestants may walk away with fame and fortune. And isn’t that what living happily ever after is really about?
Given the reality of the show, the best way to predict a winner would be to watch the previous 23 seasons and learn the formula for The Bachelor - understand what type of person makes the shows exciting and how the direction and editing of the early episodes foreshadow what’s to come. But, I had an hour before work the day the brackets were due, so I did what everyone seems to do these days: Throw deep learning at the problem and ask questions later.
I found a dataset of “facial beauty” ratings created by a team at the South China University of Technology. The researchers collected from the internet 5500 images of Asian and Caucasian men and women, aged 15–60. The faces were then rated on a scale of 1 to 5, 5 being the most beautiful. They were rated by 60 volunteers aged 18–27. They don’t comment on the diversity of the raters.
I used Keras which makes it super easy to work with image data. The following code creates a generator for feeding input into a model. It includes doing data augmentation, which modifies the incoming images by doing transformations such as rotation and stretching. This helps prevent overfitting and make your model more generalizable.
With Keras, you can repurpose state of the art architectures in just a few lines of code. For example, the following uses the Inception ResNet V2 architecture. This model is one of the top performers in image classification benchmarks. But we can easily use it for regression (predicting beauty ratings from images).
It took around 10 minutes on Google Colab for the validation mean squared error to stop improving at around 0.2. I then ran the contestants’ images through the final model and got a beauty score for each one.
This model rated Sarah the highest with 3.6/5. Another version had Alexa on top with 3.4/5.
Too bad both of them were already eliminated! I guess good TV is more than good looks ¯\_(ツ)_/¯
This was a (mostly) harmless exercise — I don’t feel too bad about ranking the appearance of people who volunteer to be on a show that is “reductive and objectifying.”
But it illustrates larger societal issues with how AI is being used in ways that seriously affect people’s lives. For example, Applicant-screening algorithms perpetuate the biased hiring practices they seek to fix. Police are using black-box, secret algorithms for generating risk scores. And computer-vision products are less accurate for females with darker skin because they aren’t trained on diverse, representative datasets.
My process was imbued with the same questionable and misguided practices. I took a biased dataset that attempts to quantify the subjective, used it to train a model that I barely understand, and passed it off as authoritative under the guise of science, artificial intelligence and deep learning.
And now you can do it too!