Statistical measures like mean, median, and mode are often used in data analysis to summarize data. In SQL, these measures help you derive insights from databases.
1. Mean (Average):
The mean is the sum of all values divided by the number of values. In SQL, the `AVG()` function is used to calculate this:
SELECT AVG(column_name)
FROM table_name;
- Example: To find the average salary from a `employees` table:
SELECT AVG(salary) FROM employees;
-Explanation: SQL sums the values in the salary column and divides by the total number of entries.
2.Median:
The median is the middle value in an ordered dataset. SQL doesn’t have a built-in function, but we can calculate it with a few steps. The basic idea is to:
- Order the dataset.
- Find the middle value if odd, or the average of the two middle values if even.
For odd rows:
SELECT column_name
FROM (
SELECT column_name, ROW_NUMBER() OVER (ORDER BY column_name) AS row_num
FROM table_name
) AS temp
WHERE row_num = (SELECT FLOOR(COUNT(*) / 2) FROM table_name);
For even rows, you’ll compute the average of the two middle values.
- Explanation: The query uses the ROW_NUMBER() window function to assign a rank to each row. You then extract the middle value by using a subquery to calculate the row count.
3. Mode:
The mode is the value that appears most frequently in a dataset. SQL can calculate the mode using GROUP BY and COUNT():
SELECT column_name, COUNT(*) AS frequency
FROM table_name
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 1;
- Example: To find the most frequent product in a sales table:
SELECT product_name, COUNT(*) AS frequency
FROM sales
GROUP BY product_name
ORDER BY frequency DESC
LIMIT 1;
-Explanation: The query counts occurrences of each value, orders them in descending order by frequency, and limits the result to the most frequent one.
Applications:
- Mean: Useful in summarizing average salaries, prices, or sales.
- Median: Especially useful in skewed distributions, such as income or property prices.
- Mode: Common in categorical data like finding the most popular product or frequently occurring customer feedback.
By understanding how to implement these in SQL, you can effectively analyze and summarize data directly within your database environment.
Please feel free to share your views on this topic and connect on linkedIn.