This project focuses on analyzing a dataset from IMDb, which includes information about movies and their directors. The aim was to uncover trends and insights within the film industry, such as popular genres, prolific directors, and the relationship between movie ratings and other factors.
The request and demand
We value your opinion and are committed to providing you with the best possible experience. To help us improve our services, we kindly request you to take a few minutes to complete the following survey.
Your feedback will be used to enhance our offerings and ensure that we meet your needs and expectations.
Sometimes we get the request in the form of paragraph with the request, and we have to generate the get demand and we might get it as follows:
-
- Can you get all data about movies?
-
- How do you get all data about directors?
-
- Check how many movies are present in iMDB.
-
- Find these 3 directors: James Cameron ; Luc Besson ; John Woo
-
- Find all directors with name starting with Steven.
-
- Count female directors.
-
- Find the name of the 10th first women directors.
-
- What are the 3 most popular movies?
-
- What are the 3 most bankable movies?
-
- What is the most awarded average vote since January 1st, 2000?
-
- Which movie(s) were directed by Brenda Chapman?
-
- Whose director made the most movies?
-
- Whose director is the most bankable?
Let's get started
Firstly, we have to import the data from the database as per our system, either CRM, Data warehouse, or other source.
Our source today is SQLite dataset File Download
Second, we have to see the tables content and heads to start working on the request
-- Referance Heads
SELECT
*
FROM
Movies
LIMIT
(2);
SELECT
*
FROM
directors
LIMIT
(2);
It’s time to take some notes to understand the tables naming and content
Then let’s put the questions together and answer one by one with the SQL quarries, this like validating the tables already there, so try and check the output
Check how many movies are present in IMDB, then we will use count statement
--Check how many movies are present in iMDB.
SELECT
COUNT(*)
FROM
movies;
--Find these 3 directors: James Cameron ; Luc Besson ; John Woo
SELECT
*
FROM
directors
WHERE
name IN ('James Cameron', 'Luc Besson', 'John Woo');
--Find all directors with name starting with Steven.
SELECT
*
FROM
directors
WHERE
name LIKE ('Steven%');
After reviewing the sample of the data, we found the in the gender column in directors table 3 unique values 0, 1, 2, we have 2 options creating new column with the name of each value or just retrieve the data as it is and that what we will do at this time
--Count female directors.
SELECT
COUNT(*)
FROM
directors
WHERE
gender = 1;
--Find the name of the 10th first women directors?
SELECT
name
FROM
directors
WHERE
gender = 1
ORDER BY
id ASC
LIMIT
(10);
--What are the 3 most popular movies?
SELECT
original_title
FROM
movies
ORDER By
popularity DESC
LIMIT
(3);
SELECT
*
FROM
movies
ORDER By
popularity DESC
LIMIT
(3);
--What are the 3 most bankable movies? Bankbale means revenue
SELECT
original_title, revenue
FROM
movies
ORDER By
revenue DESC
LIMIT
(3);
--What is the most awarded average vote since the January 1st, 2000?
SELECT
original_title, vote_average
FROM
movies
WHERE
release_date > 1 -1 -2000
ORDER BY
vote_average DESC
LIMIT
(10);
Once verified we will go to the next step to retrieve the data
--Which movie(s) were directed by Brenda Chapman?
-- first we want to verify that we have director called Brenda Chapman
SELECT
*
FROM
directors
LIMIT
(50);
-- once verified we will run the desired command to get the movies directed by her
-- second we will join director names by ID from directors table to movies table
SELECT
original_title,
directors.name
FROM
movies
JOIN directors ON movies.director_id = directors.id
WHERE
directors.name = 'Brenda Chapman';
--Whose director made the most movies?
SELECT directors.name, COUNT(movies.original_title) AS movie_count
FROM directors
JOIN movies ON directors.id = movies.director_id
GROUP BY directors.name
ORDER BY movie_count DESC
LIMIT 10;
SELECT directors.name, Max(movies.revenue) AS Revenue_count
FROM directors
JOIN movies ON directors.id = movies.director_id
GROUP BY directors.name
ORDER BY revenue_count DESC
LIMIT 1;