Advance SQL for Data Science

11 min readMay 1, 2023

SQL (Structure Query Language) is used to perform operations on the records stored in the database, such as updating records, inserting records, deleting records, creating and modifying the databases tables, views etc.

As this Article primarily focuses on the advance SQL, to learn the basic SQL we can follow W3 School Website ( https://www.w3schools.com/sql/)

Advanced SQL is a set of techniques and concepts used to query complex databases, manipulate data, and analyze information.

Joins: SQL joins are used to combine data from two or more tables based on a common or set of columns. To understand different joins please refer this website. And learn the different types of joins, Please refer this websites. To learn more about the joins and Relationship, we can refer this by Tableau. Now we will learn more about the different types of joins.

i) Inner Join: Inner join returns only the rows that have matching values in both the tables.

Syntax

SELECT * FROM table1

INNER JOIN table2 ON table1.id = table2.id;

In this example, we are joining two tables (table1 and table2) on their id columns. The result set will include only the rows where there is a match between the id column in table1 and table2.

ii) Left join: Left join will returns all the rows from the left table and matching row from the right table.

Syntax

SELECT * FROM table1

LEFT JOIN table2 ON table1.id = table2.id;

In this example, we are joining two tables (table1 and table2) on their id columns, but this time we are using a left join. This means that all rows from table1 will be included in the result set, regardless of whether there is a match in table2. If there is a match, the corresponding rows from table2 will also be included.

iii) Right Join: Right join will returns all the rows from the right table and matching tow from the left table.

Syntax

SELECT * FROM table1

RIGHT JOIN table2 ON table1.id = table2.id;

In this example, we are joining two tables (table1 and table2) on their id columns, but this time we are using a right join. This means that all rows from table2 will be included in the result set, regardless of whether there is a match in table1. If there is a match, the corresponding rows from table1 will also be included.

iv) Full Outer Join: Full Outer Join returns all rows from both tables, with nulls in the columns where is no match.

Syntax

SELECT * FROM table1

FULL OUTER JOIN table2 ON table.id = table.id;

In this example, we are joining two tables (table1 and table2) on their id columns, but this time we are using a full outer join. This means that all rows from both tables will be included in the result set, with null values in the columns where there is no match.

v) Self Join: A self join is when a table is joined to itself. This can be useful when a table contain hierarchical data, such as organizational chart or family tree. Self joins can also be used to create hierarchical queries, such as finding all employees who report to a particular manager or all employees who report to a manager’s manager. To do this, we can perform a series of self joins, each time joining to the same table using a different column.

For Example: Suppose a table contain the following columns: id, name, and manager_id. The manager_id column contains the id of the employee manager.

Syntax

SELECT e1.name AS employee_name, e2.name AS manager_name

FROM employees e1

JOIN employees e2 ON e1.manager_id = e2.id;

In this example, we are joining the “employees” table to itself, using aliases “e1” and “e2” to distinguish between the two instances of the table. We are joining on the “manager_id” column in e1 and the “id” column in e2.The result set will include each employee’s name and the name of their manager. If an employee does not have a manager (i.e., they are the CEO or the highest-level employee in the organization), their manager name will be NULL.

2. Sub-queries: SQL sub-queries are used to create queries that include another query as a sub-query within the main query. For example: Suppose we have two tables: “orders” and “customers”. The “orders” table contains information about each order, including the customer_id of the customer who placed the order. The “customers” table contains information about each customer, including their name and city.

Syntax

SELECT name FROM customers

WHERE city = (SELECT city

FROM customers

WHERE id =1);

In this example, the sub-query is enclosed in parentheses and is used in place of a value in the WHERE clause. The sub-query returns the city of the customer with id = 1, and the main query returns the names of all customers who live in that city.

3. Aggregate Functions: An SQL aggregated function calculates on a set of values and return the single value. Please follow this link to learn more about the Aggregated Function.

4. Window Functions: SQL window functions are used to perform calculations across a set of rows that are related to each other. These functions are applied.

i) ROW_NUMBER : This function assigns a unique number to each row in a result set.Example: If you have a table named “Customers” with columns “CustomerID”, “CustomerName” and “Country”, you can use the following query to assign a unique number to each customer:

Syntax

SELECT ROW_NUMBER() OVER(ORDER BY CustomerName) AS RowNumber, CustomerName, Country

FROM customers;

ii) RANK: This function assigns a rank to each row in a result set based on a specific colummn. Example: If you have a table named “Sales” with columns “SalespersonID”, “ProductID” and “SalesAmount”, you can use the following query to rank the salespersons based on their total sales amount.

Syntax

SELECT SalespersonID, RANK() OVER(ORDER BY SUM(SalesAmount) DESC) AS Rank

FROM Sales

GROUP BY SalespersonID;

iii) DENSE_RANK: This function assigns a rank to each row in a result set based on a specific column, but with no gaps between the ranks. Example: If you have a table named “Scores” with columns “StudentID”, “Subject” and “Marks”, you can use the following query to assign dense ranks to the students based on their marks in each subject.

Syntax

SELECT StudentID, Subject, Marks, DENSE_RANK() OVER (PARTITION BY Subject ORDER BY Marks DENSE) as Rank

FROM Scores;

iv) LEAD: This Function returns the value of a specified column for the next row in the result set.Example: If you have a table named “Orders” with columns “OrderID”, “CustomerID” and “OrderDate”, you can use the following query to find the date of the next order for each customer.

Syntax

SELECT Order_ID, Customer_ID, OrderDate, LEAD(OrderDate) OVER(PARTITION BY Customer_ID ORDER BY OrderDate) as NextOrderDate

FROM Orders;

v) LAG: This function returns the value of a specified column for the previous row in a result set.Example: If you have a table named “StockPrices” with columns “StockID”, “Date” and “Price”, you can use the following query to find the percentage change in price for each stock from the previous day.

Syntax

SELECT StockID, Date, Price, ((Price- LAG(Price) OVER(PARTITION BY StocksID ORDER BY Date)) / LAG(Price) OVER (PARTITION BY StockID ORDER BY Date)) * 100 as PercentageChange

FROM StockPrices;

To learn more about Window Functions Please refer these websites

5. Common Table Expression : Common Table Expression is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE or DELETE statement. It allows you to define a subquery that can be reference multiple times in the main query. CTEs are particularly useful for complex queries that involve multiple subqueris and joins. Using a CTE can make complex queries easier to read and understand, as well as improving performance by avoiding repeated subqueries.

Syntax

WITH CategorySales (CategoryName, TotalSales) AS ( SELECT CategoryName, SUM(SalesAmount) FROM Products JOIN Sales ON Products.ProductID = Sales.ProductID JOIN Categories ON Products.CategoryID = Categories.CategoryID GROUP BY CategoryName ) SELECT CategoryName, TotalSales FROM CategorySales ORDER BY TotalSales DESC;

6. View: A view in SQL is a virtual table that is based on the result of a SELECT statement. It’s like a shortcut to a complex query, allowing you to store and reuse the result set without actually storing the data. Here’s an example of how to create a view. Views can be useful for simplifying complex queries and reducing redundancy in your database design. They can also provide a layer of security by allowing you to restrict access to sensitive data by only granting access to the view, rather than the underlying tables.

Syntax

CREATE VIEW ProductSales AS SELECT Products.ProductID, Products.ProductName, SUM(Sales.SalesAmount) AS TotalSales FROM Products JOIN Sales ON Products.ProductID = Sales.ProductID GROUP BY Products.ProductID, Products.ProductName;

SELECT * FROM ProductSales;

Here are the similarities and differences between Common Table Expressions (CTEs) and Views in SQL:

Similarities:

Both CTEs and Views are used to simplify complex queries.
Both CTEs and Views can be used to encapsulate complex logic into a reusable query.
Both CTEs and Views can be used to improve query performance by caching the result set.
Both CTEs and Views can be used to limit access to sensitive data by restricting access to the view or CTE, rather than the underlying tables.

Differences:

CTEs are temporary result sets that exist only for the duration of a single query, while Views are named and stored as metadata in the database.
CTEs are defined within the query in which they are used, while Views are defined separately and can be used across multiple queries.
CTEs can be recursive, allowing them to traverse hierarchical data, while Views cannot.
Views can be used in JOINs, subqueries, and other SQL constructs, while CTEs are typically used in subqueries or as a derived table.
Views can be indexed, allowing for faster queries, while CTEs cannot be indexed.
Views can include an ORDER BY clause to sort the result set, while CTEs cannot.

Overall, CTEs and Views serve similar purposes, but they are used in different ways and have different limitations. CTEs are more suitable for ad-hoc queries or queries that require recursive processing, while Views are more suitable for reusable, complex queries that need to be accessed across multiple queries.

7. Indexing: Indexing in SQL is a technique used to improve the performance of database queries by reducing the amount of time required to search for and retrieve data from tables. An index is a data structure that stores the values of one or more columns in a table, along with a pointer to the corresponding row. Here’s an example of how indexing can improve the performance of a query. Suppose you have a table called Customers with the following columns: CustomerID, FirstName, LastName, EmailAddress, PhoneNumber, Address, and City. You want to retrieve all customers who live in a particular city, say "New York". Without an index, the database would have to scan the entire Customers table to find all customers who live in New York. This can be a slow and time-consuming process, especially for large tables.

Syntax

CREATE INDEX idx_customers_city ON Customers (City);

SELECT * FROM Customers WHERE City = ‘New York’;

In this query, the database can use the idx_customers_city index to quickly find all rows where the City column matches "New York". This can result in significant performance improvements, especially for large tables.

It’s important to note that creating too many indexes can actually slow down database performance, since each index requires additional storage space and overhead. It’s also important to periodically rebuild or reorganize indexes to maintain their efficiency over time.

8. Stored Procedure: A stored procedure is a set of SQL statements that are stored in the database server and can be executed as a single unit. Stored procedures can be used to encapsulate complex business logic, improve query performance, and simplify database maintenance. Here are some key features and benefits of stored procedures:

Features:

Stored procedures can accept input parameters and return output parameters, allowing for greater flexibility in data processing.
Stored procedures can contain control flow statements such as IF/ELSE and WHILE loops, allowing for more complex logic.
Stored procedures can be compiled and optimized by the database server for improved performance.
Stored procedures can be secured by granting or revoking permissions, allowing for greater control over data access.

Benefits:

Stored procedures can improve query performance by reducing network traffic and minimizing the amount of data returned from the database server.
Stored procedures can simplify database maintenance by encapsulating complex logic into a single unit that can be easily modified or updated.
Stored procedures can improve data security by allowing for more granular access control and limiting direct access to database tables.
Stored procedures can be reused across multiple applications, reducing the amount of redundant code.

Syntax

CREATE PROCEDURE gerCustomerbyCity

@city varchar(50)

BEGIN

SELECT * FROM Customers WHERE City = @city;

END

This statement would execute the getCustomersByCity stored procedure with the input parameter @City set to "New York" and return a result set of all customers who live in New York.

Stored procedures can be a powerful tool for improving database performance, simplifying maintenance, and enhancing data security. However, they should be used judiciously and with careful consideration for their impact on overall system performance and maintainability.

9 . Transactions : In SQL, a transaction is a series of operations that must be executed as a single unit of work. A transaction is used to ensure that all database operations are performed successfully, or none of them are performed at all. Transactions can be used to guarantee data consistency, prevent data corruption, and ensure data integrity.

Syntax

BEGIN TRANSACTION;

UPDATE Customers SET FirstName = ‘John’ WHERE LastName = ‘Doe’;
INSERT INTO Orders (CustomerId, OrderDate, Total) VALUES (1, GETDATE(), 100.00);
DELETE FROM ShoppingCart WHERE CustomerId = 1;

COMMIT TRANSACTION;

If we encounter an error or a problem during any of the operations, we can roll back the entire transaction using the ROLLBACK TRANSACTION statement.

In this modified example, we have added an error check after each operation. If an error is encountered, we print a message to the console and roll back the transaction. Otherwise, we commit the transaction as before.

Transactions can be a powerful tool for ensuring data consistency and maintaining data integrity in a database. By grouping related operations into a single transaction, you can prevent data corruption and ensure that all database changes are performed correctly.

10. Triggers: A trigger is a special type of stored procedure that is automatically executed in response to specific events or actions in a database. Triggers are used to enforce business rules, maintain data consistency, and perform other tasks that cannot be achieved using standard SQL statements. There are two types of triggers in SQL: DML (Data Manipulation Language) triggers and DDL (Data Definition Language) triggers. DML triggers are fired in response to changes in the data in a table, such as an INSERT, UPDATE, or DELETE operation, while DDL triggers are fired in response to changes in the structure of a database, such as the creation, modification, or deletion of tables, views, or stored procedures.

Syntax:

CREATE TRIGGER tr_orders
ON Orders
AFTER INSERT
AS
BEGIN
UPDATE Customers
SET TotalSales = TotalSales + (SELECT SUM(TotalPrice) FROM inserted WHERE CustomerID = Customers.CustomerID)
WHERE CustomerID IN (SELECT CustomerID FROM inserted)
END;

In this example, we are creating a trigger named tr_orders that is fired after an INSERT operation is performed on the Orders table. The trigger then updates the TotalSales column in the Customers table for each customer that has a new order.

Triggers can be a powerful tool for enforcing business rules, maintaining data consistency, and performing other tasks in a database. However, they should be used with caution, as poorly designed or implemented triggers can have a negative impact on database performance and stability.

Please do follow and connect for more Article related to DataScience.

Reference:

SQL Tutorial

SQL is a standard language for storing, manipulating and retrieving data in databases. Our SQL tutorial will teach you…

www.w3schools.com

https://www.javatpoint.com

https://www.freecodecamp.org

https://www.sqltutorial.org

Advance SQL for Data Science

SQL Tutorial

SQL is a standard language for storing, manipulating and retrieving data in databases. Our SQL tutorial will teach you…

Written by DS - VRP

No responses yet