Data Science Movies Recommendation System

Arpit Bhushan Sharma
5 min readJul 11, 2021

--

Source

Nearly everybody wants to invest their recreation energy to watch motion pictures with their loved ones. We as a whole have a similar encounter when we sit on our lounge chair to pick a film that we will watch and go through the following two hours yet can’t discover one following 20 minutes. It is so baffling. We unquestionably need a PC operator to give film proposals to us when we have to pick a film and spare our time. Evidently, a film suggestion specialist has just become a fundamental aspect of our life. As indicated by Data Science Central “Albeit hard information is hard to obtain, many educated sources gauges that, for the significant online business stages like Amazon and Netflix, that recommenders might be liable for as much as 10% to 25% of steady income.”

What is the recommender System?

There are two types of recommendation systems. They are

1. Content-Based Recommender System

A content-based recommender system functions on a user’s generated data. We can create the data either directly (such as clicking likes) or indirectly (such as clicking links). This information is used to create a personal profile for the personal that includes the metadata of the user-interacted objects. The more reliable the device or engine collects results, the Interactive Recommender System becomes.

2. Collaborative Recommender System

A collaborative recommender system makes a suggestion based on how the item was liked by related people. Users with common preferences would be grouped by the system. Recommender schemes can also conduct mutual filtering using object similarities in addition to user similarities (such as ‘Users who liked this object X also liked Y’). Most systems will be a combination of these two methods.

It is not a novel idea to make suggestions. Even if e-commerce was not so prevalent, retail store sales workers promoted goods to consumers for the purpose of upselling and cross-selling, eventually optimising profit. The goal of the recommendation programmes is exactly the same.

The recommendation system’s other goal is to achieve customer satisfaction by delivering valuable content and optimising the time a person spends on your website or channel. It also tends to increase the commitment of customers. On the other hand, ad budgets can be tailored only for those who have a tendency to respond to them by highlighting products and services.

Why Recommendation systems?

1. They assist the customer with identifying objects of interest

2. Helps the provider of products distribute their products to the proper customer

a. To classify, for each consumer, the most appropriate products

b. Display each user customised content

c. Recommend the correct customer with top deals and discounts

3. User interaction will enhance websites

4. This raises company profits by increased consumption.

Daily Life Examples of Movies Recommender Systems:

1.GroupLens
a) Helped in developing initial recommender systems by pioneering collaborative filtering model.

b) It also provided many data-sets to train models including Movie Lens and Book Lens
2. Amazon
a) Implemented commercial recommender systems
b) They also implemented a lot of computational improvements
3. Netflix
a) Pioneered Latent Factor/ Matrix Factorization models
4. Google

a) Search results in search bar

b) Gmail typing next word

5. YouTube

a) Making a playlist

b) Suggesting same Genre videos
c) Hybrid Recommendation Systems
d) Deep Learning-based systems

Let’s go with the Coding part. The dataset link is: https://www.kaggle.com/rounakbanik/the-movies-dataset

import pandas as pd

import numpy as np

df1=pd.read_csv(‘../input/movies-dataset/movie_dataset.csv’)

df1.columns

df1.head(5)

import matplotlib.pyplot as plt

rich=df1.sort_values(‘budget’,ascending=False)

fig, ax = plt.subplots()

rects1 = ax.bar(rich[‘title’].head(15),rich[‘budget’].head(15), color=[“Red”,”Orange”,”Yellow”,”Green”,”Blue”])

plt.xlabel(“Movie Title”)

plt.rcParams[“figure.figsize”] = (50,50)

plt.title(“Budget Wise top movies”)

plt.ylabel(“Movie Budeget”)

def autolabel(rects):

for rect in rects:

height = rect.get_height()

ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,

‘%f’ % float(height/100000),

ha=’center’, va=’bottom’)

autolabel(rects1)

plt.xticks(rotation=90)

plt.show()

rich1=df1.sort_values(‘vote_average’,ascending=False)

rich1.head()

fig, ax = plt.subplots()

rects1 = ax.bar(rich1[‘title’].head(20),rich1[‘vote_average’].head(20), color=[“Red”,”Orange”,”Yellow”,”Green”,”Blue”])

plt.xlabel(“Movie Title”)

plt.rcParams[“figure.figsize”] = (30,20)

plt.title(“Rating Wise top movies”)

plt.ylabel(“Average rating”)

def autolabel(rects):

for rect in rects:

height = rect.get_height()

ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,

‘%f’ % float(height),

ha=’center’, va=’bottom’)

autolabel(rects1)

plt.xticks(rotation=90)

plt.show()

C= df1[‘vote_average’].mean()

print(C)

m= df1[‘vote_count’].quantile(0.9)

q_movies = df1.copy().loc[df1[‘vote_count’] >= m]

q_movies.shape

def weightedrating(x,m=m,C=C):

v = x[‘vote_count’]

R = x[‘vote_average’]

# Calculation based on the IMDB formula

return (v/(v+m) * R) + (m/(m+v) * C)

# A new column for weighted rating named weight_score in the dataset

q_movies[‘weight_score’] = q_movies.apply(weightedrating, axis=1)

#Sort movies based on score calculated above

q_movies = q_movies.sort_values(‘weight_score’, ascending=False)

#Print the top 20 movies

q_movies[[‘title’, ‘vote_count’, ‘vote_average’, ‘weight_score’]].head(20)

pop= df1.sort_values(‘popularity’, ascending=False)

import matplotlib.pyplot as plt

plt.figure(figsize=(12,4))

plt.barh(pop[‘title’].head(5),pop[‘popularity’].head(5), align=’center’,

color=[‘red’,’pink’,’orange’,’yellow’,’green’])

plt.gca().invert_yaxis()

plt.xlabel(“Popularity”)

plt.title(“Popular Movies”)

df1[‘overview’].head(5)

features = [‘keywords’,’cast’,’genres’,’director’]

##Step 3: Create a column in DF which combines all selected features

for feature in features:

df1[feature] = df1[feature].fillna(‘’)

def combine_features(row):

try:

return row[‘keywords’] +” “+row[‘cast’]+” “+row[“genres”]+” “+row[“director”]

except:

print(“Error:”, row)

df1[“combined_features”] = df1.apply(combine_features,axis=1)

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.metrics.pairwise import cosine_similarity

cv = CountVectorizer()

count_matrix = cv.fit_transform(df1[“combined_features”])

##Step 5: Compute the Cosine Similarity based on the count_matrix

cosine_sim = cosine_similarity(count_matrix)

sim_df = pd.DataFrame(cosine_sim,index=df1.title,columns=df1.title)

sim_df.head()

movie_user_likes = “Avatar”

sim_df[movie_user_likes].sort_values(ascending=False)[:20]

movie_user_likes = “Gravity”

sim_df[movie_user_likes].sort_values(ascending=False)[:20]

from scipy import sparse

from sklearn.metrics.pairwise import cosine_similarity

ratings = pd.read_csv(“../input/colab-fitting/toy_dataset.csv”,index_col=0)

ratings = ratings.fillna(0)

ratings

def standardize(row):

new_row = (row — row.mean())/(row.max()-row.min())

return new_row

ratings_std = ratings.apply(standardize)

item_similarity = cosine_similarity(ratings_std.T)

print(item_similarity)

item_similarity_df = pd.DataFrame(item_similarity,index=ratings.columns,columns=ratings.columns)

item_similarity_df

def get_similar_movies(movie_name,user_rating):

similar_score = item_similarity_df[movie_name]*(user_rating-2.5)

similar_score = similar_score.sort_values(ascending=False)

return similar_score

print(get_similar_movies(“romantic3”,1))

action_lover = [(“action1”,5),(“romantic2”,1),(“romantic3”,1)]

similar_movies = pd.DataFrame()

for movie,rating in action_lover:

similar_movies = similar_movies.append(get_similar_movies(movie,rating),ignore_index=True)

similar_movies.head()

similar_movies.sum().sort_values(ascending=False)

In case the user or the movie is very new, we do not have many records to predict results. In such cases, the last value in the prediction will appear in recommendations and the performance of the recommendation system by comparing predicted values and original rating values. We will calculate the ‘RMSE’ (root mean squared error) value. In this case, the RMSE value is 0.9313, which one can judge if it is good or bad depending on the size of the dataset.

Disadvantages of Movie Recommendation system:

  1. It does not work for a new user who has not rated any item yet as enough ratings are required content-based recommender evaluates the user preferences and provides accurate recommendations.
  2. No recommendation of serendipitous items.
  3. Limited Content Analysis- The recommender does not work if the system fails to distinguish the items that a user likes from the items that he does not like.

Give it claps if you love it as a beginner.
The blog is contributed by Arpit Bhushan Sharma !

For more projects, follow us on Github: www.github.com/arpit1920

--

--

Arpit Bhushan Sharma
Arpit Bhushan Sharma

Written by Arpit Bhushan Sharma

An AlphaCoder Guy, who loves Data Structures Algorithms and Machine Learning.

No responses yet