Profiling and Analyzing the Yelp Dataset
Tools & Skills Used
-
SQL
-
IS NULL​
-
AVG, MIN, MAX
-
SUM
-
LIKE
-
JOIN
-
Aliasing
-
This project was conducted as part of the final assessment for the 'SQL for Data Science' course on Coursera.com, were I learned how to interpret the structure, meaning, and relationships in source data and use SQL as a professional to shape your data for targeted analysis purposes
This first section of the project focused on profiling the Yelp dataset to help understand the relationship between the many tables it contains. I used queries to:
-
How search for how many unique values there are for each table
-
Determine if there were any null values
-
Calculate basic statistical values of for various given fields
-
Determine the cities with the most reviews
-
Find the users with the most reviews and fans
-
Search reviews for key phrases
The second section of the project entailed choosing one city and one business within the Yelp dataset. I then grouped them by their overall star ratings to analyze how their ratings could be affected by hours of operation, number or reviews and location. To perform this task, I used JOIN to combine tables to analyze.
The final section of the project asked us to pick our own type of analysis to conduct on the Yelp dataset. I had to describe what I intended to analyze, what tables and functions I would be using for my analysis and the conclusions I was able to determine.
