IMDB Movies and Directors Analysis

This project focuses on analyzing a dataset from IMDb, which includes information about movies and their directors. The aim was to uncover trends and insights within the film industry, such as popular genres, prolific directors, and the relationship between movie ratings and other factors.

The request and demand

We value your opinion and are committed to providing you with the best possible experience. To help us improve our services, we kindly request you to take a few minutes to complete the following survey.

Your feedback will be used to enhance our offerings and ensure that we meet your needs and expectations.

Sometimes we get the request in the form of paragraph with the request, and we have to generate the get demand and we might get it as follows:

    • Can you get all data about movies?

    • How do you get all data about directors?

    • Check how many movies are present in iMDB.

    • Find these 3 directors: James Cameron ; Luc Besson ; John Woo

    • Find all directors with name starting with Steven.

    • Count female directors.

    • Find the name of the 10th first women directors.

    • What are the 3 most popular movies?

    • What are the 3 most bankable movies?

    • What is the most awarded average vote since January 1st, 2000?

    • Which movie(s) were directed by Brenda Chapman?

    • Whose director made the most movies?

    • Whose director is the most bankable?

Let's get started

Firstly, we have to import the data from the database as per our system, either CRM, Data warehouse, or other source.

Our source today is SQLite dataset File Download

Second, we have to see the tables content and heads to start working on the request

				
					-- Referance Heads
SELECT
  *
FROM
  Movies
LIMIT
  (2);
				
			
				
					SELECT
  *
FROM
  directors
LIMIT
  (2);
				
			

It’s time to take some notes to understand the tables naming and content

Then let’s put the questions together and answer one by one with the SQL quarries, this like validating the tables already there, so try and check the output

Check how many movies are present in IMDB, then we will use count statement

				
					--Check how many movies are present in iMDB.
SELECT
  COUNT(*)
FROM
  movies;
				
			
Find these 3 directors: James Cameron ; Luc Besson ; John Woo
				
					--Find these 3 directors: James Cameron ; Luc Besson ; John Woo 
SELECT
  *
FROM
  directors
WHERE
  name IN ('James Cameron', 'Luc Besson', 'John Woo');

				
			
Find all directors with name starting with Steven by using LIKE statement
				
					--Find all directors with name starting with Steven.
SELECT
  *
FROM
  directors
WHERE
  name LIKE ('Steven%');
				
			
Count female directors.


After reviewing the sample of the data, we found the in the gender column in directors table 3 unique values 0, 1, 2, we have 2 options creating new column with the name of each value or just retrieve the data as it is and that what we will do at this time  

				
					--Count female directors. 
SELECT
  COUNT(*)
FROM
  directors
WHERE
  gender = 1;
				
			
 
Find the name of the 10th first women directors?
 
				
					--Find the name of the 10th first women directors? 
SELECT
  name
FROM
  directors
WHERE
  gender = 1
ORDER BY
  id ASC
LIMIT
  (10);
				
			
 
What are the 3 most popular movies?
 
				
					--What are the 3 most popular movies? 
SELECT
  original_title
FROM
  movies
ORDER By
  popularity DESC
LIMIT
  (3);
				
			
 
Extra to the last point – giving full information about the movie
 
				
					SELECT
  *
FROM
  movies
ORDER By
  popularity DESC
LIMIT
  (3);

				
			
 
What are the 3 most bankable movies? Bankbale means revenue
 
				
					--What are the 3 most bankable movies? Bankbale means revenue
SELECT
  original_title, revenue
FROM
  movies
ORDER By
  revenue DESC
LIMIT
  (3);
				
			
 
What is the most awarded average vote since the January 1st, 2000?
 
				
					--What is the most awarded average vote since the January 1st, 2000? 
SELECT
  original_title, vote_average
FROM
  movies
WHERE
  release_date > 1 -1 -2000
ORDER BY
  vote_average DESC
LIMIT
  (10);
				
			
 
Which movie(s) were directed by Brenda Chapman?

 it will be in 2 steps, first verify that we have this name in the table to make sure the next step will go right
Once verified we will go to the next step to retrieve the data

				
					--Which movie(s) were directed by Brenda Chapman? 
-- first we want to verify that we have director called Brenda Chapman
SELECT
  *
FROM
  directors
LIMIT
  (50);

-- once verified we will run the desired command to get the movies directed by her
-- second we will join director names by ID from directors table to movies table
SELECT
  original_title,
  directors.name
FROM
  movies
  JOIN directors ON movies.director_id = directors.id
WHERE
  directors.name = 'Brenda Chapman';
				
			
 
Whose director made the most movies?
 
				
					--Whose director made the most movies?
SELECT directors.name, COUNT(movies.original_title) AS movie_count
FROM directors
JOIN movies ON directors.id = movies.director_id
GROUP BY directors.name
ORDER BY movie_count DESC
LIMIT 10;

				
			
 
Whose director is the most bankable?
 
 
				
					SELECT directors.name, Max(movies.revenue) AS Revenue_count
FROM directors
JOIN movies ON directors.id = movies.director_id
GROUP BY directors.name
ORDER BY revenue_count DESC
LIMIT 1;
				
			
Now we have demonstrated a full project of IMDB movies database with request received from the client, and we still have room in this database to be analyzed  

All Rights Reserved | 2024