Want to use artificial intelligence to analyse people’s reactions to a YouTube video? This Artificial Intelligence model can be a solution to your problem. Here we have an AI-based Python project for YouTube comments sentiment analysis. This is a natural language processing project. Here will use the Naive Bayes classifier to classify positive and negative comments.
We have divided this project into three parts.
- Scraping Youtube comments for Dataset preparation
- Creating an AI model for sentiment analysis
- Scraping comments in real-time and Applying sentiment analysis
Scraping Youtube Comments for Dataset
In my previous blog, we saw the method to scrape youtube comments. You can that blog to scrape Youtube comments. After Scraping youtube comments we saved them in an excel sheet like this.👇
Now here in the next column, we will classify each comment. This is a manual task and it will take time. I can understand that this step is a bit nasty but trust me this step will define the accuracy of your AI model. The more comments you add to this sheet the more accurate your model will be. Keep in mind if your comments are not classified correctly then the accuracy of your AI model will be low.
In front of each positive comment, we will put 1 and in front of negative comments, we will put 0. So your Dataset will look like this👇
We are done with the first step, now in the next step we will create a model and use this dataset to train that model.
Creating an AI model for sentiment analysis
First of all, we have to do some preprocessing of the dataset. Here we have imported some libraries for data preprocessing.
import re import nltk import pandas as pd nltk.download('stopwords') from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer ps = PorterStemmer() all_stopwords = stopwords.words('english') all_stopwords.remove('not') all_stopwords.remove('very') all_stopwords.remove('such') all_stopwords.remove('than') all_stopwords.remove("wasn't") all_stopwords.remove('was')
First of all, we will load Dataset from excel into a pandas data frame.
df=pd.read_excel(r"C:\Users\Anique Khan\Desktop\ai project.xlsx") #you can change the path of the file
let us do some preprocessing.
corpus=[] for i in range(0, len(df.Comments)): review = re.sub('[^a-zA-Z]', ' ', df['Comments'][i]) review = review.lower() review = review.split() review = [ps.stem(word) for word in review if not word in set(all_stopwords)] review = ' '.join(review) corpus.append(review)
Now we have to apply CountVectorizer to the data. (CountVectorizer means breaking down a sentence or any text into words by performing preprocessing tasks like converting all words to lowercase, thus removing special characters.)
from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(max_features = 5000) X = cv.fit_transform(corpus).toarray() y = df.iloc[:, -1].values
Now we have our input data in the X variable and output data in the Y variable. (output data means the results in the form of 0 or 1 )
Importing Model
let us import the naive Bayes classifier from sklearn to train our model. Next, we are passing our data to this model for training.
from sklearn.naive_bayes import GaussianNB classifier = GaussianNB() classifier.fit(X, y)
After training the model let us do some predictions.
comment=["Thank you so Much"] y_pred = classifier.predict(comment) print(y_pred)
You can also save this model for later use so you don’t have to train it every time.
# Exporting NB Classifier to later use in prediction import joblib joblib.dump(classifier, 'nclassifier')
Youtube Comments Sentiment Analysis in Real-time
Now here we will open a browser and we will play a youtube video in the browser. Next, we will scrape all the comments from the video and then we will apply the sentiment analysis to those comments. We have designed a selenium bot for this purpose. here is the python code👇
import pandas as pd from selenium import webdriver import time,random from selenium.webdriver.common.by import By from bs4 import BeautifulSoup import time from selenium.webdriver import Firefox from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import re import nltk nltk.download('stopwords') from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer ps = PorterStemmer() all_stopwords = stopwords.words('english') all_stopwords.remove('not') all_stopwords.remove('very') all_stopwords.remove('such') all_stopwords.remove('than') all_stopwords.remove("wasn't") all_stopwords.remove('was') #replace the path with the path of geecko drivers if you are using firefox #if you are using chrome replace Firefox with Chrome and put the path of chrome drivers here #if you want more details open the link https://aniquekhan.com/how-to-use-selenium-with-python-automate-tasks/ browser=webdriver.Firefox(executable_path=r"C:\Users\Anique Khan\Desktop\Instagram Bot\geckodriver.exe") #put the link of Youtube Video here browser.get("https://www.youtube.com/watch?v=i7uyRSnsp3k") #loop for scrolling through comments i=0 while i<5000: browser.execute_script("window.scrollBy(0, 500)") i=i+1 #scrapping comments comments=[] html=browser.page_source soup=BeautifulSoup(html) comment=soup.find_all('yt-formatted-string',id="content-text") time.sleep(3) for j in comment: comments.append(j.text) #preprocessing comments final_comments=[] for i in range(0, len(comments)): review = re.sub('[^a-zA-Z]', ' ', comments[i]) review = review.lower() review = review.split() review = [ps.stem(word) for word in review if not word in set(all_stopwords)] review = ' '.join(review) final_comments.append(review) text=bow.transform(final_comments).toarray() #finalizing results result1=[] result0=[] result=cls.predict(text) for i in result: if(i==1): result1.append(1) for i in result: if(i==0): result0.append(0) pos=len(result1) neg=len(result0) result_graph=[pos,neg] #displaying results in the form of chart import matplotlib.pyplot as plt plt.pie(result_graph,labels=["positive","negative"],autopct='%1.1f%%')
Complete Code
Here is the complete python code
import re import nltk import pandas as pd nltk.download('stopwords') from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer ps = PorterStemmer() all_stopwords = stopwords.words('english') all_stopwords.remove('not') all_stopwords.remove('very') all_stopwords.remove('such') all_stopwords.remove('than') all_stopwords.remove("wasn't") all_stopwords.remove('was') df=pd.read_excel(r"C:\Users\Anique Khan\Desktop\ai project.xlsx") #you can change the path of the file corpus=[] for i in range(0, len(df.Comments)): review = re.sub('[^a-zA-Z]', ' ', df['Comments'][i]) review = review.lower() review = review.split() review = [ps.stem(word) for word in review if not word in set(all_stopwords)] review = ' '.join(review) corpus.append(review) from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(max_features = 5000) X = cv.fit_transform(corpus).toarray() y = df.iloc[:, -1].values from sklearn.naive_bayes import GaussianNB classifier = GaussianNB() classifier.fit(X, y) comment=["Thank you so Much"] y_pred = classifier.predict(comment) print(y_pred) # Exporting NB Classifier to later use in prediction import joblib joblib.dump(classifier, 'nclassifier') from selenium import webdriver import time,random from selenium.webdriver.common.by import By from bs4 import BeautifulSoup from selenium.webdriver import Firefox from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC nltk.download('stopwords') from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer ps = PorterStemmer() all_stopwords = stopwords.words('english') all_stopwords.remove('not') all_stopwords.remove('very') all_stopwords.remove('such') all_stopwords.remove('than') all_stopwords.remove("wasn't") all_stopwords.remove('was') #replace the path with the path of geecko drivers if you are using firefox #if you are using chrome replace Firefox with Chrome and put the path of chrome drivers here #if you want more details open the link https://aniquekhan.com/how-to-use-selenium-with-python-automate-tasks/ browser=webdriver.Firefox(executable_path=r"C:\Users\Anique Khan\Desktop\Instagram Bot\geckodriver.exe") #put the link of Youtube Video here browser.get("https://www.youtube.com/watch?v=i7uyRSnsp3k") #loop for scrolling through comments i=0 while i<5000: browser.execute_script("window.scrollBy(0, 500)") i=i+1 #scrapping comments comments=[] html=browser.page_source soup=BeautifulSoup(html) comment=soup.find_all('yt-formatted-string',id="content-text") time.sleep(3) for j in comment: comments.append(j.text) #preprocessing comments final_comments=[] for i in range(0, len(comments)): review = re.sub('[^a-zA-Z]', ' ', comments[i]) review = review.lower() review = review.split() review = [ps.stem(word) for word in review if not word in set(all_stopwords)] review = ' '.join(review) final_comments.append(review) text=bow.transform(final_comments).toarray() #finalizing results result1=[] result0=[] result=cls.predict(text) for i in result: if(i==1): result1.append(1) for i in result: if(i==0): result0.append(0) pos=len(result1) neg=len(result0) result_graph=[pos,neg] #displaying results in the form of chart import matplotlib.pyplot as plt plt.pie(result_graph,labels=["positive","negative"],autopct='%1.1f%%')
I hope you found this blog post will be useful and informative for you. It takes a lot of time and effort to put together this code and make it available to you. We encourage you to subscribe to our blog if you want to stay up to date. Don’t forget to follow us on social media for the most up-to-date information.