In this article, we will build a simple Movie Recommender system using Nodejs and MongoDB.
You can check out the demo here.
Note: Neither Nodejs nor MongoDB is an optimal solution for building a full-fledged AI/ML application. This article will give you a basic knowledge of how a recommender system works.
To put it simply, A Recommendation system gives you recommendations based on the data it has on you.
- Amazon product recommendation.
- Facebook friends recommendation.
- Netflix recommendation.
- YouTube video recommendation.
To implement such a system in our code we need to know how relevant the data are.
Euclidean distance is an algorithm for calculating the distance between two points in a plane or n-dimensional space.
To explain more on that, take two points in scale P and Q on the 2nd and 6th units respectively. If I ask you the distance between these two points, you will reply 4 ( basically 6-2 ). The same logic applies here, but in this case, we consider two points on a plane ( graph ).
In the above picture, the same P and Q points are in a plane and represented in a graph, now what If the question is the distance between these two points.
This is where we use the Euclidean distance formula. which is.
Note: the above formula is for calculating distance between two points in a plane ( 2 – dimentional area ). We can also calculate distance betweeen points on higher dimentions, but that is beyond the scope of this article.
If we apply the formula to the above graph, where
p1=2,p2=6,q1=10, and q2=8 we get.
√(10-2)2 + (8-6)2 = 12
So 12 is the distance between these two points, we use this value for determining how a value is related to the other. Lower the value the more relevant ( closer ) the two points are.
What does this have to do with recommending stuff? Say take an example we have 3 movies represented in a graph. The graph is plotted as the number of voters vs the average rating given to the movie.
Considering the above graph, If a person pics “The Avengers” as his favorite movie and expects recommended movie based on that, Then we can apply the formula in this way, how related are The Avengers, Justice League, and The Avengers, Shawshank redemption. Below will be the results.
Euclidean distance for The Avengers and Justice League is 1.71
Same way, Euclidean distance for The Avengers and Shawshank redemption is 3.21
So clearly, we can see that The Avengers and Justice League are more related in terms of rating and number of votes.
Note: It might not make sense to recommend movies based on number of voters and ratings. To recommend movies, one logical way is the take genre into account also. But for the sake of simplicity we have considered these metrics.
Enough of maths let’s dive into the coding part.
Before going into coding, below are some things you have to be familiar with.
- NodeJS and ExpressJS basics.
- MongoDB basics.
- NodeJS and npm installed in your system.
These links will get you started.
- Expressjs – https://expressjs.com/en/starter/hello-world.html.
- Nodejs – https://nodejs.org/en
- Basics – https://zellwk.com/blog/crud-express-mongodb/
In this section, we will build a simple recommendation system using NodeJS and MongoDB. We will also build a simple interface for listing the recommendation on a Web page. Below is a basic flow of the application.
- We will have a list of movies to select from, the user clicks on a movie.
- Below the selected movie, we will display a list of recommended movies in a table.
- The recommendation will be made on Number of voters and average rating given to that movie.
The data set is obtained from Kaggle ( Do check them out ).
Our folder contains two main files index.html for the interface and index.js will run the NodeJS server connected with MongoDB.
This file contains the APIs for getting movie names and get recommendations using MongoDB query.
- First we import all the environment variables using dotenv module.
- Next we do the basic express setup.
- We use monk for connecting to MongoDB.
- movie_data is the name of the collection containg our data set.
- First router ‘/’ will fetch you the index.html file.
Getting all movies:
- Here we get the list of movies to display for the user to pick.
- The data set is huge ( 45467 records ) and it make sense to implement a basic pagination, we do this by limit and skip key, where skip is calculated based on the page given.
- Then the data is sorted based on _id.
Getting movie recommendadtions:
- Here we get the vote_average, vote_count, genres and current _id (selected movie data) from the user.
- This data set might contain vote_average and vote_count as string meaning “” or “no vote” should be filtered out.
- Then for more relavance we are matching genre data.
- We don’t want to get the selected movie in the recommended list, so we are filtering that out.
$projectwe project the needed data to display to the user and $distance contains the The Euclidean distance formula.
- Then we filter out invalid distance ( NULL ).
- We sort distance in a way that the minimum values ( closest ) are at the top.
- We also limit the recommendadtions to 5 ( as discussed earlier, this is a huge data set ).
Running the server:
We, at last, run the server.
This file contains the interface for our recommender system.
- First we have select component containing movies names.
- Current movie section will contain the selected movie details.
- The table component will list the recommendastions.
selectMovie(data)function will update the current movie section and call the
getRecommendationapi with the required parameters.
- When the response is obtained, we form a table row and append in the table component.
getAllMovies()will fetch all movies with pagination to load in the select component.
toggleDropDown()function is for toggling the select component.
- inside onload function, we are listening for the scroll event of the select component for Infinite scrolling.
Note: We can handle large data in the browser using Web Wrokers, check out the tutorial here
Running the application:
To run the application just issue the command
npm start in the terminal. You should see the “Server running” text in the command line. Open your browser and navigate to http://localhost:<port> ( port should be configured in the .env file else it will fall back to 8080 ).
- You should see the above screen at the bigining.
- Select a movie from the drop down and below table will get populated with the recommendations.
If you like my content Buy me a coffee.