Recently, I found a great dataset at Kaggle including a The Big Bang Theory TV Series transcript. The Big Bang Theory is a very popular sitcom with my favorite nerds! The show premiered in 2007 and concluded in 2019. At Kaggle, I made a notebook submission including data visualization in Word Clouds. A Word Cloud is a data visualization technique that is used for representing text data. The size of each word in Word Clouds indicates the importance or the frequency of a word. Generating Word Clouds with Python could be a handy tool to explore data.
Introduction to Word Clouds
Let’s explore The Big Bang Theory TV Series transcript by visualizing text data in Word Clouds. First of all, I import the dataset into my notebook and I should import the required packages to generate a Word Cloud. I import Pandas, NumPy, and Matplotlib.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file
import matplotlib.pyplot as plt # data visualization
# Continue with loading all necessary libraries
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
Aside from the Matplotlib package to plot the figure, you should import the Word Cloud module. The module contains a stopword functionality.
Word Clouds and stopwords
Stopwords do not provide any useful information. Words without any meaning are prepositions, conjunctions, etc. Therefore, I exclude stopwords from the analysis. The code below is the code to exclude stopwords from Word Clouds. You should put the words inside the squared brackets you want to exclude from the Word Cloud.
# Create stopword list:
stopwords = set(STOPWORDS)
stopwords.update([])
Creating Word Clouds
First, I have created a list of persons I want to include in the dataset to create the Word Clouds. I filter the names of the main characters and some other characters. Then, I create a new dataset including the filtered names.
#list the main characters
persons = ['Sheldon', 'Leonard', 'Raj', 'Penny','Howard','Amy','Bernadette']
#other characters
others = ['Ramona','Beverley']
data = df[df.person_scene.isin(persons)]
data.head(5)
Now, let’s create the Word Cloud. I do not want any stopwords in the Word Cloud. In the code below, I refer to the list with stopwords I created previously. I use a standard background color. Also, I include 60 words in the image.
# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords, max_words=60, background_color="white").generate(sheldon)
# Display the generated image:
# the matplotlib way:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Now, let’s take a look at the words that have been used by Leonard in all episodes. In this case, you should set the dialogue to Leonard, see the code below.
leonard = " ".join(dialogue for dialogue in data[data["person_scene"]=="Leonard"].dialogue)
# Generate a word cloud image
wordcloud = WordCloud(stopwords=stopwords, max_words=60, background_color="white").generate(leonard)
# Display the generated image:
# the matplotlib way:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Word Clouds by episode
When you look at the vocabulary choices in all episodes, the most commonly used words include words that are often used during the day. Now, let’s look at the words that are most frequently used by the main characters in some episodes.
Season 1 Episode 6: The Halloween Party
In the first season, Penny throws a Halloween Party. She has invited her friends, Leonard, Sheldon, Howard, and Raj. Sheldon gets dressed as the doppler effect. The doppler effect refers to the change in sound wave that increases when the object moves forward you, for example the sound of the train when you are waiting in your car at a railroad crossing. It is a scientific thing. In the word clouds, many words about the costumes and costume party have been included. The words shown largest in the figures have been mentioned lots of times during this episode.




Season 2 Episode 15: The Maternal Capacitance
In season 2, Leonard’s mother comes to town. She has a cold and distant personality, and also, she is an accomplished scientist.


Season 2 Episode 6: The Cooper-Nowitzki Theorem
In this episode, a younger student finds Sheldon’s work attractive. Ramona, the younger student, helps Sheldon with his scientific breakthrough. The images below show the words that are mostly used during this episode.


Season 4 Episode 22: The Wildebeest Implementation
In this episode, Bernadette goes shopping with the girls, and later on, she has a double date with Leonard and Priya. What words did she use most often during this episode? The answer is in this Word Cloud.

Series 5 Episode 2: The Infestation Hypothesis
And how about the episode in which Penny has put a chair in her apartment someone threw away? You can choose the number of words in a Word Cloud by putting a number in the max_words criteria in the code. You can opt for including only 20 words, but you can also include a 100 words.



Season 6 Episode 20: The Tenure Turbulence
In this episode, the guys apply for a tenured position. Let’s see the Word Clouds.


Read More
https://www.kaggle.com/lydia70/big-bang-theory-tv-show
https://github.com/Lydia70/my-kaggle-projects/blob/main/big-bang-theory-tv-show.ipynb
Please upvote if you find the notebook at Kaggle useful π