SQL Data Science: Most Common Queries all Data Scientists should know
SQL is one of the top demanded skills in data science and data analysis. Indeed, SQL data science is mostly used to query the data you will be working with. It is therefore very important that you get comfortable with SQL as soon as possible. The good thing is that SQL is not a very difficult language to learn, and you should be able to apply these most common SQL operations much easier than you would if you were learning Python, R or Java.
The Golden Syntax Rule of SQL?
Queries start with the SELECT operator. Most of them do. You want to select some sort of dataset from the database. The most basic way of thinking about how the queries are run logically is displayed in the above picture (Link to original article here). ALL (MOST) SQL QUERIES that you will encounter will follow a similar structure. So, whenever you think about writing a query, no matter how complex it is, keep in mind the order of operators above and you should be fine. Let’s have a look at some common ways of querying a database.
How to select data from a SQL table?
This is probably the most important and the most used query you will have to use as a data scientist. You will need this to query a dataset from a particular table to use for your analysis.
SELECT <Column Names (separated with comma)>
FROM <Table Name>
WHERE <Condition>
The SELECT operator tells you what columns/variables/features you need to fetch, FROM operator tells you what table to get the data from, and the WHERE is used as a filter.
Example:
SELECT OrderDate,UserId, LastName, Total
FROM Sales
WHERE Date > '01/01/2021'
In the above example, we have a dataset that contains all the orders since 2021. You can translate it as “Select the data, user identifier, last name, and the total paid amount since the 1st January 2021 located at the Sales table.
NB: The column names and the table name are cases sensitive so they should be written exactly as they appear in the database or you will have an error displayed.
How to insert values in a SQL Table?
INSERT INTO <Table Name (columns names separated with comma)>
VALUES (column values)
To insert values into a database you will need to use the INSERT INTO operator. The syntax is straightforward. Add the table name and the column names in parenthesis. Finally, add the values to each column in a comma-separated parenthesis after the VALUES operator. The number of values within the parenthesis should match the number of columns.
Example:
INSERT INTO Sales (OrderDate , LastName, Total)
VALUES ('01/01/2021', 'Doe', 123)
How do you insert multiple rows in a single SQL query?
The syntax is not going to change. You will only need to add more values under the VALUES keyword. Each VALUES parenthesis will correspond to one database row
Example
INSERT INTO Sales (SaleDate , LastName, Total)
VALUES
('01/01/2021', 'Doe', 123),
('01/01/2021', 'Smith', 100),
('02/01/2021', 'James', 59),
('03/01/2021', 'Johnson', 23);
Keep in mind that the maximum number of rows in one VALUES clause is 1000.
How to delete data from a SQL table?
The next step is to delete elements from a table. To do so, you will use the operator DELETE FROM followed by the table you want to delete. Then, the deletion condition using the WHERE operator. For instance, you can set as a condition a user id, a data range, a name, or any feature/variable/column name.
DELETE FROM <Table Name>
WHERE <condition>
Example:
DELETE FROM Sales
WHERE OrderDate < '01/01/2021'
How to update data in a SQL table?
The UPDATE keyword is used to update values in the database (Well that was kind of obvious! :D). You will need to add the SET operator to change the feature value when it matches a particular condition. For example, you can UPDATE table_X by SETting the last_name to “SMITH” WHERE the user_id = 123.
UPDATE <Table Name>
SET Feature1 = Value1 , Feature2 = Value2, …
WHERE <condition>
Example:
UPDATE UserRecords
SET LastName= 'Smith'
WHERE UserId=123;
How to order data in a SQL Query
To order the data in a SQL table, the ORDER BY operator is used followed by the column name and the ASC or DESC operators to indicate an ascending or descending order.
SELECT <Column Names (separated with comma)>
FROM <Table Name>
WHERE <condition>
ORDER BY <column name>
ASC|DESC;
Example
SELECT FirstName, LastName, OrderDate FROM Orders
WHERE OrderDate > '10/10/2010'
ORDER BY OrderDate
How to select data from more than one SQL table?
Combining table is something that you will do almost daily, whether its to add features together, create data set, combine data from different databases, etc. You will need to be familiar with the various terms shown here. The most common operators for it are JOIN and UNION, and they are quite important in SQL used in data science.
Pull data from multiple tables using JOIN
JOIN is the most used keyword for combining datasets. It can be confusing at times, but once you get it, it is one of the most useful operators. To combine two tables using JOIN, you will need a common column between table 1 and table 2. All you have to do then is just input the two tables to be joined and all the common columns between the two tables after the ON operator.
SELECT <Column Names (separated with comma)>
FROM <Table Name1>
JOIN <Table Name2>
ON <Table Name1>.<Column_x>=<Table Name2>.<Column_x>
Example:
SELECT Orders.LastName, Countries.CountryName
FROM Orders
JOIN Countries
ON Orders.CountryID = Countries.ID
We have a couple of additional operators that you can all if you want to keep let’s say all data from one table and some from the other one. By default, the JOIN is INNER meaning that only common elements between the two tables are going to be merged. We have the LEFT or the RIGHT JOIN, meaning that it considers all elements of the earlier or later called table. We have the OUTER operator that joins the set outside of the common elements. For example, a FULL OUTER JOIN will consider joining all data from both tables. It works the same way as a UNION.
Refer to the picture above to visualize how the various join work. Once you have mastered the JOIN operator you will be able to do very awesome stuff with SQL.
Pull data from multiple SQL tables using UNION
To use the UNION operator, you write the dividual SELECT statements and join them by the keyword UNION. So you basically write the SELECT + Query from table 1 and the SELECT + Query from table 2 and add a UNION in between them.
The UNION can have an additional ALL operator. Using the ALL allows you to keep the duplicated rows whereas removing the ALL allows you to keep the duplicated row.
SELECT <Column Names (separated with comma)>
FROM <Table Name 1>
UNION
SELECT<Column Names (separated with comma)>
FROM <Table Name 2>
Example:
SELECT FirstName, LastName
FROM Orders2010
UNION
SELECT FirstName, LastName
FROM Orders2011
To wrap up
SQL in Data Science can get slightly more complicated than the function shown. But if you understand the intuition behind the above, you should not have a lot of issues using SQL for your analysis. In the future, I will get a sample project that goes into more detail about the different SQL operators so do subscribe to the mailing list not to miss any content.
If you made this far in the article, thank you very much.
I hope this information was of use to you.
Feel free to use any information from this page. I’d appreciate it if you can simply link to this article as the source. If you have any additional questions, you can reach out to malick@malicksarr.com or message me on Twitter. If you want more content like this, join my email list to receive the latest articles. I promise I do not spam.
[boldgrid_component type=”wp_mc4wp_form_widget”]
If you liked this article, maybe you will like these too.
A Gentle Introduction to Data Science Presentation (Storytelling)
Are Data Science Jobs in Demand?
Why is Machine Learning important? [in 2021]
Why Data Science is Important?