MySQL Cheat sheet
احييكم بتحية الإسلام السلام عليكم ورحمة الله تعالى وبركاته
Introduction
Hello world, Hossam is here. Welcome everyone to my humble blog. In this post, I will write down all the MYSQL syntax that I encountered while studying MYSQL or through my solutions on the HackerRank website and LeetCode. I simply consider this publication to be my notes that make it easier for me to quickly review SQL syntax, which helps me prepare for job interviews, in addition to being a good opportunity to share information with others.
I wrote the Syntax here through my application of the Mosh Hamidani course and some research and learning away from the course. You will also find all my solutions to SQL problems through the GitHub repository hackerrank-sql-solutions
[Note]: As long as you see these lines, it means that the post is under preparation and not yet completed. If there are any mistakes or misunderstandings of any information from my side, please inform me. Thank you in advance for your efforts.
Retrieving Data From a Single Table
Select statement
Use to retrieve columns from tables. each selection must end with ;
-- Return all user's records from user table
SELECT * FROM USERS;
-- return 1, 2
SELECT 1, 2;
-- return first_name, last_name, points columns
SELECT first_name, last_name, points FROM customers;
-- return customer name, points, and points * 5 as 5-times points
SELECT name, points, points * 5 AS '5-times points' FROM customers;
-- return the unique names from the customer's table.
SELECT DISTINCT name FROM customers;
Where clause
We can select some fields based on the condition after the WHERE
. <>
and !=
for the not equal and =
for equality.
SELECT *
FROM customers
WHERE points > 3000;
Order by clause
Used to order the results by specified column name, it’s valid also for allies columns in MySQL, and for unselected columns too.
-- get all orders by total_price column (allies) = quantity * unit_price
SELECT * FROM store.order_items WHERE order_id = 2 ORDER BY quantity * unit_price desc;
you can also order the result by the column position like this:
SELECT first_name, last_name, state FROM customers
WHERE state = 'FL'
ORDER BY 1, 2; --1: firstname, 2: last_name
Limit clause
used to limit the result by the number of records you wanna return. limit clause must come at the end of your query.
-- to get the first 3 records
SELECT * FROM store.order_items
WHERE order_id = 2
ORDER BY quantity * unit_price desc
LIMIT 3;
you can use it as a pagination clause using the following syntax.
-- Here we will skip the first 6 records
-- then we will return the 3 records after them
SELECT * FROM store.order_items
WHERE order_id = 2
ORDER BY quantity * unit_price desc
LIMIT 6, 3; --skip the first 6 records and then get the next 3 records only
Operators
AND
TRUE if all the conditions separated by AND are TRUE.
SELECT name FROM STATION WHERE LAT_N > 38.7880 AND LAT_N < 137.2345;
OR
TRUE if any of the conditions separated by OR is TRUE.
SELECT name FROM STATION WHERE LAT_N > 38.7880 OR LAT_N < 137.2345;
NOT
Reverses the value of any other Boolean operator.
SELECT name FROM STATION WHERE NOT (LAT_N > 38.7880 AND LAT_N < 137.2345);
IN
TRUE if the operand is equal to one of a list of expressions. use in logical operator to compare attributes with a list of values
SELECT * FROM Customers WHERE state in ('VA', 'FL', 'MX');
it can combined with the NOT
operator to get all customers that are not from the specified states.
SELECT * FROM Customers WHERE state NOT IN ('VA', 'FL', 'MX');
BETWEEN
TRUE if the operand lies within the range of comparisons.
we want to get all customers born after 1990/01/01
and before 2000/01/01
. we can use arithmetic operators such as <=
or <=
as follows.
SELECT * FROM Customers WHERE birth_date > '1990-01-01' AND birth_date < '2000-01-01';
or use the between
operator directly 😃
SELECT * FROM Customers WHERE birth_date BETWEEN '1990-01-01' AND '2000-01-01';
LIKE
TRUE if the operand matches a pattern, especially with a wildcard. we use _
to specify a character and %
for any number of characters. Note that %a
get all end with a
and a%
get all start with
-- Get a customer with phone ends at 9
SELECT * FROM store.customers WHERE phone LIKE '%9';
-- Get a customer with an address containing Trail or Avenue
SELECT * FROM store.customers WHERE
address LIKE '%Trail%'
OR
address LIKE '%Avenue%';
REGEXP
to get all first names that contain Elka
or Ambur
using the pipe |
that perform OR operation.
SELECT * FROM store.customers WHERE first_name REGEXP "ELKA|AMBUR";
$
indicates the end of the result to get all last names that end with EY
, and ON
SELECT * FROM store.customers WHERE last_name REGEXP "EY|ON$";
last names that start with specified regex. the ^
wildcard indicates the start of the string.
SELECT * FROM store.customers WHERE last_name REGEXP "^MY|se";
get the last name that contains B
followed by R
or U
.
here we didn’t use the pipe |
because we use the []
which allows us to specify the characters we need.
SELECT * FROM store.customers WHERE last_name REGEXP "B[ru]";
if you wanna search for a characters/numbers range use [start-end]
-- get all last names that contain characters in the range [a-z]
-- this will return all records :)
SELECT * FROM store.customers WHERE last_name REGEXP "[a-z]";
IS NULL
to get records attributes with null values.
-- get all customers that have no phone number.
SELECT * FROM store.customers WHERE phone IS NULL;
Retrieving Data From Multiple Tables
Inner Join
returns records that have matching values in both tables
SELECT order_id, first_name, last_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id;
you may face an error when you try to select customer_id
because it exists in the two tables so in this case you need to use allies such as:
SELECT order_id, o.customer_id, first_name, last_name
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id;
Joining across databases
SELECT *
FROM order_items oi
-- We add here the database name.
JOIN sql_inventory.products p
ON oi.product_id = p.product_id;
Self join
A self-join is a regular join, but the table is joined with itself.
SELECT
e.employee_id,
e.first_name,
m.first_name AS 'manager name'
FROM employees e -- for employees
JOIN employees m -- for managers
ON e.reports_to = m.employee_id;
Multiple join
join multiple tables with each other
SELECT
e.employee_id,
e.first_name,
m.first_name AS 'manager name'
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
JOIN order_status os
ON o.status = os.order_status_id;
We can combine some join conditions like the following:
SELECT *
FROM order_items oi
JOIN order_item_notes oin
ON oi.order_id = oin.order_id
AND oi.product_id = oin.product_id
Implicit Join
Making join without writing join
, if you forget to write the condition we will get a cross join.
SELECT *
FROM orders o, customers c
WHERE o.customer_id = c.customer_id;
Outer Join
Left join
Returns all records from the left table, and the matched records from the right table
SELECT *
FROM orders o
LEFT OUTER JOIN customers c
ON c.customer_id = o.customer_id;
Right join
Returns all records from the right table, and the matched records from the left table
SELECT *
FROM orders o
RIGHT OUTER JOIN customers c
ON c.customer_id = o.customer_id;
the outer
keyword is optional so you can skip it if you wanna. you can also use the outer join to join multiple tables and you can combine inner and outer joins like this:
SELECT *
FROM customers c
RIGHT JOIN orders o -- outer join
ON o.customer_id = c.customer_id
JOIN shippers sh
ON o.shipper_id = sh.shipper_id
ORDER BY c.customer_id;
to convert from lest join to outer join or the opposite you can just change the order of tables customers <==> orders, and it’ll be done 😃.
Self outer join
SELECT
e.employee_id,
e.first_name,
m.first_name AS 'manager name'
FROM employees e -- for employees
LEFT JOIN employees m -- for managers
ON e.reports_to = m.employee_id;
Using clause
is useful when both the tables share a column of the same name on which they join.
SELECT *
FROM customers c
RIGHT JOIN orders o
USING(customer_id)
JOIN shippers sh
USING(shipper_id)
ORDER BY c.customer_id;
if you have multiple conditions use it as USING(col1, col2).
Natural joins
let the database engine determine the column name to join automatically, it’s may lead to unexpected results because we have no control over it.
SELECT *
FROM customers c
JOIN shippers sh;
Cross join
used to map every record from Table 1 with every record of Table 2, it is used in the real world in mapping sizes with colors in the cloths industry.
SELECT *
FROM customers c
CROSS JOIN products p;
-- Also we can use the Implicit from id it
SELECT *
FROM customers c, products p;
Unions
The UNION
operator is used to combine the result set of two or more SELECT
statements.
SELECT customer_id, first_name, points, "Bronze" as "type"
FROM customers
where points <= 2000
UNION
SELECT customer_id, first_name, points, "Silver" as "type"
FROM customers
where points between 2000 AND 3000
UNION
SELECT customer_id, first_name, points, "Gold" as "type"
FROM customers
where points > 3000
ORDER BY first_name;
Column Attributes
column | description |
---|---|
Datatype | determine the column data type |
PK | Stands for primary key |
NN | Stands for Not Null used for required fields |
UQ | Stands for Unique |
AI | Stands for Auto Increment usually used with the primary key |
Default Value | Specify the default value of the field if you didn’t pass anything |
Inserting, Updating, and Deleting Data
Create Record
-- insert one record
INSERT INTO customers (
first_name,
last_name,
dob,
phone
)
VALUES ("Hossam", "Hamdy", "1990-01-05", NULL);
-- insert many records
INSERT INTO customers (
first_name,
last_name,
dob,
phone
)
VALUES ("Hossam", "Hamdy", "1990-01-05", NULL),
("Hassan", "Aly", "1990-01-05", NULL),
("Root", "Toor", "1990-01-05", NULL);
-- Create a new table of an old table
CREATE TABLE new_orders AS SELECT * from orders;
Note
if you use this technique make sure that the copied attribute is copied using values only, if an attribute is PK
or AI
it’ll be a normal day.
Update
-- update one record
UPDATE customers
SET first_name = "root", last_name = "toor", dob = Null, phone = Null
WHERE id = 1;
-- to update multi-record is the same syntax
UPDATE orders
SET price = 1000
WHERE customer_id = 1;
-- If we have more than one record for customer 1 it'll be updated.
You can also use the IN
operator
UPDATE orders
SET price = 1000
WHERE customer_id IN (1, 2, 3);
Sub queries
UPDATE orders
SET price = 1000
-- If we don't know the customer ID but we know the name then we can use the Sub queries
WHERE customer_id =
(SELECT customer_id
FROM customers
WHERE name = "hossam")
;
Delete
-- This will delete all records in the customer's table.
DELETE FROM customers;
-- This will delete all records in the customer table.
DELETE FROM customers
WHERE customer_id = 5; -- also you can use the Sub queries here.
Summarizing Data
here you’ll learn how to write queries that can summarize data and answer some business questions.
Aggregate Functions
MAX()
applied on Numbers, String, and date
SELECT MAX(invoice_total) as "Highest" FROM invoices;
MIN()
SELECT MIN(invoice_total) as "Lowest" FROM invoices;
AVG()
SELECT AVG(invoice_total) as "Average" FROM invoices;
SUM()
SELECT SUM(invoice_total) as "Total" FROM invoices;
COUNT()
only perform on Non-Null
values.
SELECT SUM(invoice_total) as "Total" FROM invoices;
-- COUNT(*) to get the total of records
Group By clause
The GROUP BY
statement groups rows that have the same values into summary rows, like “find the number of customers in each country”.
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
Having
The HAVING
clause is used in conjunction with the GROUP BY
clause to filter the results of a query based on the aggregated values. It allows you to specify a condition that filters the grouped rows returned by a GROUP BY
query.
SELECT column1, COUNT(*)
FROM your_table
GROUP BY column1
HAVING COUNT(*) > 1;
Rollup
used to get the accumulative value of the aggregating functions, this is only available for MySQL.
SELECT
state,
city,
SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY state, city WITH ROLLUP
ALL keyword
Used to return all values that meet the condition, the normal select will return a single value, but with this approach, we will get a list of results that met _condition_
SELECT _column_name(s)_
FROM _table_name_
WHERE _column_name operator_ ALL (
SELECT _column_name_
FROM _table_name_
WHERE _condition_
);
ANY keyword
ANY
means that the condition will be true if the operation is true for any of the values in the range. It’s equal to using the IN
operator.
SELECT *
FROM clients
WHERE client_id ANY (
SELECT client_id
FROM invoices
GROUP BY client_id
HAVING COUNT(*) >= 2
)
Correlated Sub queries
SELECT *
FROM invoicing.invoices inv
WHERE inv.invoice_total > (
SELECT AVG(invoice_total)
FROM invoicing.invoices
WHERE inv.client_id = client_id
)
EXISTS operator
The EXISTS
operator is used to test for the existence of any record in a sub-query.
this operator is better than using the IN
operator because the IN
operator will return a list and then examine the condition, but EXISTS
will return true to the condition and then return the row that matches the condition.
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (
SELECT ProductName
FROM Products
WHERE Products.SupplierID = Suppliers.supplierID
AND Price < 20);
Sub queries in select clause
SELECT
client_id,
name,
(
SELECT SUM(invoice_total)
FROM invoices i
WHERE client_id = c.client_id
) AS total_sales,
(
SELECT AVG(invoice_total)
FROM invoices
) AS average,
(SELECT total_sales - average) AS difference
FROM clients c;
Sub queries in from clause
With this, we can deal with the result table as a database table so we can perform select, aggregation functions, joins, and anything else.
SELECT *
FROM (
SELECT
client_id,
name,
(
SELECT SUM(invoice_total)
FROM invoices i
WHERE client_id = c.client_id
) AS total_sales,
(
SELECT AVG(invoice_total)
FROM invoices
) AS average,
(SELECT total_sales - average) AS difference
FROM clients c;
) AS sales_summary
Essential MySQL Functions
Numeric functions
-- Round the number
SELECT ROUND(5.123456789, 2) -- 5.12
SELECT ROUND(5.123456789, 5) -- 5.12345
-- returns the smallest than or equal to the passed number
SELECT CEILING(5.7) -- 6
SELECT FLOOR(5.2) -- 5
-- Remove the rest of the number
SELECT TRUNCATE(5.12345, 2) -- 5.12 and
SELECT ABS(-5.2) -- 5.2 returns the positive value
SELECT RAND() -- returns a random function between 0 and 1.
String functions
-- convert to lower and upper case
SELECT UPPER("hello") -- return HELLO
SELECT LOWER("HELLO") -- return hello
-- Remove any whitespace before or after the word
SELECT LTRIM(" HELLO") -- "HELLO"
SELECT RTRIM("HELLO ") -- "HELLO"
SELECT TRIN (" HELLO ") -- "HELLO"
-- get chars from the left side
SELECT LEFT("HELLO WORLD", 4) -- HELL
-- get chars from the right side
SELECT RIGHT("ROOT", 2) -- OT
-- Get string from the start to the end position
-- The end position is optional
SELECT SUBSTRING("HOW ARE YOU", 8, 11); -- YOU
-- search about char in string
-- If the search string is not in the string it'll return 0
SELECT LOCATE("D", "HELLO WORD")
-- HELLO WORLD
SELECT REPLACE("TELLO WORLD", "TELLO", "HELLO")
-- HI, HOSSAM
SELECT CONCAT("HI, ", "HOSSAM")
Date functions
SELECT NOW(), CURDATE(), CURTIME()
NOW() | CURDATE() | CURTIME() |
---|---|---|
2023-12-18 10:40:15 | 2023-12-18 | 10:40:15 |
SELECT YEAR(NOW())-- 2023
SELECT MONTH(NOW()) -- 12
SELECT DAY(NOW()) -- 18
-- You can use HOUR, MINUTE, and SECOND functions too, all of them return integers.
-- Also you can use DAYNAME
SELECT DAYNAME(NOW()) -- Monday
SELECT MONTHNAME(NOW()) -- December
SELECT EXTRACT (YEAR FROM NOW()) -- 2023
-- <<perform calculations on date and time>> --
-- add
SELECT DATE_ADD(NOW(), INTERVAL 1 YEAR) -- 2024-12-18 10:55:53
SELECT DATE_ADD(NOW(), INTERVAL -1 YEAR)--2022-12-18 10:56:21
-- def
SELECT DATEDIFF('2019-01-05', '2019-01-01') -- 4
SELECT TIME_TO_SEC('09:00') - TIME_TO_SEC('09:02') -- 120 s
IFNULL and COALESCE
-- If the test value is null it'll be replaced with NONE
IFNULL(test_vaule, "NONE")
-- Return the first non-null value in a list
SELECT COALESCE(NULL, NULL, NULL, 'W3Schools.com', NULL, 'Example.com');
IF function
SELECT
product_id,
name,
COUNT(*) AS orders,
-- true case, false case
IF(COUNT(*) > 1, 'Many times', 'Once') AS frequency
FROM products
JOIN order_items USING(product_id)
GROUP BY product_id, name
CASE operator
SELECT
CONCAT(first_name, ' ', last_name),
points,
CASE
WHEN points > 3000 THEN "Gold"
WHEN points >= 2000 THEN "Silver"
ELSE "Bronze"
END AS category
FROM customers
Views
view is a virtual table based on the result set of an SQL statement.
Create View
CREATE VIEW view_name AS
-- your script starts here
SELECT
name,
phone,
email
FROM users
after creating a view we can deal with it as a table in the database. Views don’t store data it’s a virtual table, it’s just a view
of the data.
Altering views
-- replace view if exist
CREATE OR REPLACE VIEW sales_by_client AS
SELECT
c.client_id,
s.name,
SUM(invoice_total) AS total_sales
FROM clients c
JOIN invoices i USING(client_id)
GROUP BY client_id, name
Update views
if we don’t have DISTINCT
, Aggregate function
, GROUP BY
, HAVING
, and UNION
then we can update the table using this view.
UPDATE invoices_with_balance
SET due_date = DATE_ADD(due_date, INTERVAL 2 DAY)
WHERE invoice_id = 1
WITH OPTION CHECK
The WITH CHECK OPTION
clause can be given for an updatable view to prevent inserts to rows for which the WHERE
clause in the select_statement
is not true. It also prevents updates to rows for which the WHERE
clause is true but the update would cause it to be not true (in other words, it prevents visible rows from being updated to nonvisible rows).
CREATE TABLE t1 (a INT);
CREATE VIEW v1 AS SELECT * FROM t1 WHERE a < 2
WITH CHECK OPTION;
CREATE VIEW v2 AS SELECT * FROM v1 WHERE a > 0
WITH LOCAL CHECK OPTION;
CREATE VIEW v3 AS SELECT * FROM v1 WHERE a > 0
WITH CASCADED CHECK OPTION;
Views Benefits
- Simplify queries
- Adding an abstraction layer that reduces the impact of changes
here we create a table once and apply all changes to a view to map these changes.
- restrict access to data.
Stored Procedures (SP)
A stored procedure
is a prepared SQL code that you can save, so the code can be reused over and over again.
Create SP
DELIMITER $$ -- use any sign you want
CREATE PROCEDURE get_clients()
BEGIN
SELECt * FROM clients;
END $$
DELIMITER ;
to call the SP we use the CALL
keyword
CALL get_clients()
DROP the stored procedure
DROP PROCEDURE IF EXISTS get_payments;
Passing parameters
DELIMITER $$
CREATE PROCEDURE get_clients_by_state(
state CHAR(2)
)
BEGIN
SELECT * FROM clients c
WHERE c.state = statue;
END $$
DELIMITER ;
set a parameter default value
DELIMITER $$
CREATE PROCEDURE get_clients_by_state(
state CHAR(2)
)
BEGIN
IF state IS NULL THEN
SET state = 'CA'; -- set the value you want
END IF;
SELECT * FROM clients c
WHERE c.state = statue;
END $$
DELIMITER ;
Validating SP parameters
DELIMITER $$
CREATE PROCEDURE make_payment(
invoice_id INT,
payment_amount DECIMAL(9, 2),
payment_date DATE
)
BEGIN
IF payment_amount <= 0 THEN
-- '22003' from the link below
SIGNAL SQLSTATE '22003'
SET MESSAGE_TEXT = "Invalid payment amount";
END IF;
-- your update code goes here
END $$
DELIMITER ;
Variables
-- user/session variables
SET @name = "Root";
-- local vars: any var that is defined inside a function or stored procedure
-- if we didn't assign 0 it'll be NULL
DECLARE VAR_NAME TYPE DEFAULT 0
to copy the value from select to your local variables we use the INTO
keyword.
CREATE PROCEDURE get_risk_factor()
BEGIN
DECLARE risk_factor DECIMAL(9, 2) DEFAULT 0;
DECLARE invoices_total DECIMAL(9, 2);
DECLARE invoices_count INT;
SELECT COUNT(*), SUM(invoice_total)
INTO invoices_count, invoices_total
FROM invoices;
SET risk_factor = invoices_total / invoices_count * 5;
SELECT risk_factor;
END
Functions
functions is like the stored procedure but it can only return a single value.
CREATE FUNCTION get_risk_factor_for_client(
client_id INT
) RETURNS int(11)
READ SQL DATA
BEGIN
DECLARE risk_factor DECIMAL(9, 2) DEFAULT 0;
DECLARE invoices_total DECIMAL(9, 2);
DECLARE invoices_count INT;
SELECT COUNT(*), SUM(invoice_total)
INTO invoices_count, invoices_total
FROM invoices i;
WHERE i.client_id = client_id;
SET risk_factor = invoices_total / invoices_count * 5;
RETURN risk_factor;
END
In SQL, a deterministic function is a function whose output is entirely determined by its input parameters
CREATE FUNCTION get_risk_factor_for_client(
client_id INT
) RETURNS int(11)
READ SQL DATA
DETERMINISTIC
BEGIN
DECLARE risk_factor DECIMAL(9, 2) DEFAULT 0;
DECLARE invoices_total DECIMAL(9, 2);
DECLARE invoices_count INT;
SELECT COUNT(*), SUM(invoice_total)
INTO invoices_count, invoices_total
FROM invoices i;
WHERE i.client_id = client_id;
SET risk_factor = invoices_total / invoices_count * 5;
RETURN risk_factor;
END
Triggers and Events
Triggers
a block of code gets executed after/before the (insert
, update
, delete
) actions. the trigger can’t be used on its table because it will trigger an infinite loop. We can use triggers for logging and auditing
DELIMITER $$
CREATE TRIGGER payments_after_insert
AFTER INSERT ON payments
FOR EACH ROW
BEGIN
UPDATE invoices
SET payment_total = payment_total + NEW.amount
WHERE invoice_id = NEW.invoice_id;
END $$
DELIMITER ;
to print all triggers in the database
SHOW TRIGGERS
To drop a trigger
DROP TRIGGER IF EXISTS trigger_name;
Events
block of SQL code that gets executed according to a schedule.
-- to get MySQL variables
SHOW VARIABLES;
-- to set a variable value
SET GLOBAL event_scheduler = OFF/ON;
create a new trigger, this will create a trigger that runs every year and delete last year’s audit logs from the payments_audit
table
DELIMITER $$
CREATE EVENT yearly_delete_stale_audit_rows
ON SCHEDULE
-- AT '2019-05-01' -- for run on a specified time.
EVERY 1 YEAR STARTS '2023-01-01' ENDS '2029-01-01'
DO BEGIN
DELETE FROM payments_audit
WHERE action_date < NOW() - INTERVAL 1 YEAR;
END $$
how to view, drop, and manipulate events:
-- View events
SHOW EVENTS
-- drop event
DROP EVENT IF EXISTS event_name;
-- alter events
ALTER EVENT event_name [DISABLE/ENABLE];
Transactions and Concurrency
Transactions
Transaction is a group of SQL statements that represent a single unit of work. All are completed successfully or all fail like bank transactions.
COMMIT
The COMMIT
statement is used to permanently save
the changes made during a transaction. When you issue a COMMIT
, all the changes made during the transaction become permanent and cannot be rolled back.
START TRANSACTION;
INSERT INTO orders (customer_id, order_date, status)
VALUES (1, '2019-01-01', 1);
INSERT INTO order_items
VALUES (1, 1, 1, 1);
COMMIT;
ROLLBACK
The ROLLBACK
statement is used to undo the changes made during a transaction that has not been committed yet. If there is an error or any reason you want to discard the changes made during a transaction, you issue a ROLLBACK
.
START TRANSACTION;
INSERT INTO orders (customer_id, order_date, status)
VALUES (1, '2019-01-01', 1);
INSERT INTO order_items
VALUES (1, 1, 1, 1);
ROLLBACK;
SAVEPOINT (Partial Rollback)
You can use savepoints
to create points within transactions which you can later roll back. This allows for more granular control over rolling back only part of a transaction.
START TRANSACTION;
-- Your SQL statements here
SAVEPOINT my_savepoint;
-- More SQL statements
ROLLBACK TO my_savepoint;
-- Continue or ROLLBACK again
COMMIT;
Locking
When you try to run a transaction with some rows, if any other transaction tries to deal with these rows in the first transaction it’ll wait until the first transaction is done then it will be executed and commit the changes. this is the default behavior in MySQL.
Common concurrency problems
Lost updates
this happens when two transactions try to update the same record, For example:
first wants to update customer_name
and is not committed yet, and the second wants to update phone_number
and also it’s not committed yet.
![[../../imgs/Pasted image 20231219113839.png]]
In this case if MySQL didn’t use the lock
, the second transaction will override the first transaction changes. But by default, MySQL uses locks to prevent this behavior.
Dirty Reads
A dirty read occurs in a database when a transaction reads data that has been modified by another transaction but has not yet been committed
![[../../imgs/Pasted image 20231219114334.png]]
NON-Repeating
A non-repeatable read occurs when a transaction reads the same data multiple times within the same transaction, and between those reads, another transaction modifies or deletes the data.
![[../../imgs/Pasted image 20231219114756.png]]
Phantom Reads
A phantom read is a phenomenon that can occur in database transactions, particularly in scenarios where multiple transactions are concurrently modifying a range of data. A phantom read happens when a transaction retrieves a set of rows that match a certain condition, but between the initial read and a subsequent read or modification, another transaction inserts or deletes rows that also match the condition. As a result, the second read in the first transaction includes rows that were not present in the initial read.
![[../../imgs/Pasted image 20231219115015.png]]
Each problem has its isolation level to solve it.
Isolation
levels
Name | Prevented Issue | Note |
---|---|---|
READ UNCOMMITTED | - | - |
READ COMMITTED | Dirty Reads |
- |
REPEATABLE READ | Lost Updates , Dirty Reads , NON-Repeating Reads |
MySQL Default Level |
SERIALIZABLE | Lost Updates , Dirty Reads , NON-Repeating Reads , Phantom Reads |
- |
The more isolation level the less performance we get.
READ UNCOMMITTED
Didn’t prevent any concurrency problem, you will face the Dirty Reads
issue because here you read Uncommitted changes from other transactions.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
START TRANSACTION;
SELECT points FROM customers WHERE customer_id = 1;
COMMIT;
READ COMMITTED
This isolation level prevents Dirty Reads
but you may face the NON-Repeating Reads
issue if another transaction changes a value you are reading now and commits the changes.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
-- For example, this will return 20
SELECT points FROM customers WHERE customer_id = 1;
-- another transaction runs at this moment
-- this will return 30
SELECT points FROM customers WHERE customer_id = 1;
COMMIT;
Another transaction starts to change the value
START TRANSACTION;
UPDATE customers
SET points = 20
WHERE customer_id = 1;
COMMIT;
REPEATABLE READ
Will solve Dirty Reads
, and NON-Repeating Reads
issues. this is the default option of isolation in MySQL. If the first transaction runs and is not committed yet it will return a value, in this case, if another transaction runs and updates the customer value and makes a new customer located in VA
it will not be included in the result of the first transaction because it’s not committed yet.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT * FROM customers WHERE state = 'VA';
COMMIT;
The second transaction will make a new record valid for the criteria above, but it’ll not be in the result and this is the phantom read
issue.
START TRANSACTION;
UPDATE customers
SET state = 'VA'
WHERE customer_id = 1;
COMMIT;
SERIALIZABLE
This transaction isolation level solves all concurrency issues. if the following transaction starts to update the customer with the customer_id
3 and is not committed yet. if the second transaction starts at this moment it’ll wait for the first transaction to be committed then it will start.
START TRANSACTION;
UPDATE customers
SET state = 'VA'
WHERE customer_id = 3;
COMMIT;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT * FROM customers WHERE state = 'VA';
COMMIT;
it will fix all the concurrency issues above.
Deadlock
A deadlock in MySQL, as in other relational database systems, occurs when two or more transactions are blocked indefinitely, each waiting for the other to release a lock. In simpler terms, a deadlock is a situation where transactions are unable to proceed because each holds a lock that the other needs to continue.
Deadlocks in Steps:
- Transaction A acquires a lock on resource X.
- Transaction B acquires a lock on resource Y.
- Transaction A now tries to acquire a lock on resource Y (which Transaction B holds), and Transaction B tries to acquire a lock on resource X (which Transaction A holds).
- Both transactions are now waiting for each other to release the lock, resulting in a deadlock.
-- Transaction A
START TRANSACTION;
UPDATE table1 SET column1 = 'new_value' WHERE id = 1; -- Acquires a lock on row with id = 1
-- Transaction B (concurrently)
START TRANSACTION;
UPDATE table2 SET column2 = 'new_value' WHERE id = 2; -- Acquires a lock on row with id = 2
-- Transaction A (later)
UPDATE table2 SET column2 = 'new_value' WHERE id = 2; -- Tries to acquire a lock on row with id = 2 (held by Transaction B)
-- Transaction B (later)
UPDATE table1 SET column1 = 'new_value' WHERE id = 1; -- Tries to acquire a lock on row with id = 1 (held by Transaction A)
We can avoid
the deadlocks not prevent them, To avoid
deadlocks, consider the following best practices:
- Follow the same order in the transaction code to reduce the deadlock.
- Make transactions code short as possible to take less time.
- Make Transactions run in scheduled time to run in non-peak time.
Data types
String type
Type | Size / Max Size | Potential Usage |
---|---|---|
CHAR(x) | Fixed-length | Country Code |
VARCHAR(x) | Max: 65535, characters (~64KB) - (50-255) good range | phone, usernames, passwords …etc. |
MEDIUMTEXT | Max 16MB | JSON, CSV, medium size text books |
LONGTEXT | Max 4GB | Textbooks, and years of log files |
TINYTEXT | max 255 Byte | 255 chars |
TEXT | Max 64KB | 64K char |
English Language: 1 Byte
European (Middle-Eastern): 2 Bytes
Asian: 3 Bytes
Integer type
Type | Size |
---|---|
TINTINT | 1 Bytes |
UNSIGNED TINTINT | - |
AMALLINT | 2 Bytes |
MEDIUMINT | 3 Bytes |
INT | 4 Bytes |
BIGINT | 8 Bytes |
Integer datatypes have a Zero Fill
option which allows you to pad the unused part of the integer and set them to zeros
. For example:
-- Number 1 will appear like this.
INT(4) --> 0001
NOTE:
This only affects how values will be displayed not how it will be stored
Fixed/Floating Point type
Type | Size |
---|---|
DECIMAL (number_length, digits_after_precision) | - |
DEC | - |
NUMERIC | - |
FIXED | - |
FLOAT | 4B |
DOUBLE | 8B |
Boolean type
Type | Possible values |
---|---|
BOOL/BOOLEAN | TRUE/FALSE or 1/0 |
Enum and Set type
In MySQL, the ENUM
data type is used to define a column that can have a set of predefined values. Each value in the ENUM
must be one of the possible enumeration values specified at the time of column creation.
CREATE TABLE colors (
color_name ENUM('Red', 'Green', 'Blue', 'Yellow', 'Black', 'White')
);
In MySQL, the SET
data type is similar to ENUM
but allows a column to contain zero or more values from a predefined set. Sets are case Sensitivity
CREATE TABLE hobbies (
hobbies_set SET('Reading', 'Writing', 'Drawing', 'Coding', 'Traveling')
);
If the set of possible values is likely to change over time, it might be better to use a reference table. If there are a large number of possible values, or if the values are more descriptive, a reference table might be a better choice.
Date and Time type
Type | Size |
---|---|
DATE | - |
TIME | - |
DATETIME | 8B |
TIMESTAMP | 4B (up to 2038) |
YEAR |
If you’ll store dates after the 2038 year it’s better to use the DATETIME
instead of TIMESTAMP
. Read more about the year 2038 problem 😃.
Blob type
Used for storing binary data
Type | Size |
---|---|
TINYBLOB | 225B |
BLOB | 65KB |
MEDIUMBLOB | 16MB |
LONGBLOB | 4GB |
it’s not the best practice to store binary data in your database because this approach does the following:
- Increase database size
- Slow the backup process
- Slow the performance
- More effort to write code that reads/writes images to the database
JSON type
UPDATE products
SET properties = '
{
"name": "root",
"password": "toor"
}
'
WHERE product_id = 1;
-- or --
UPDATE products
SET properties = JSON_OBJECT(
'name', 'root',
'password', 'toor',
'numbers', JSON_ARRAY(1, 2, 3),
'data', JSON_OBJECT('name', 'test')
)
WHERE product_id = 1;
All of these ways produce the following JSON object
{
"name": "root",
"password": "toor",
"numbers": [1, 2, 3],
"data": {
"name": "test"
}
}
to get data from the array attributes we can use the following ways
-- $ for the current JSON document
-- . for accessing the passed key
SELECT JSON_EXTRACT(properties, '$.name')
FROM products
WHERE product_id = 1;
-- OR --
-- $ for the current JSON document
-- . for accessing the passed key
SELECT properties -> '$.name' --> return "root"
-- SELECT properties -> '$.numbers[0]' --> return 1
-- SELECT properties -> '$.data.name' --> return "test"
-- if you want to get the value without the "" use ->>
-- this will be used too in the comparing in WHERE clause.
SELECT properties ->> '$.name' --> return "root"
FROM products
WHERE product_id = 1;
To update a JSON object we use the JSON_SET
UPDATE products
SET properties = JSON_SET(
properties, -- the JSON object will be updated
"$.name", "Hossam"
)
WHERE product_id = 1;
To remove data from JSON we use the JSON_REMOVE
, this will return a new JSON object with the new data without the removed item.
UPDATE products
SET properties = JSON_REMOVE(
properties, -- the JSON object will be updated
"$.name"
)
WHERE product_id = 1;
Designing Databases
Skipped for now
Indexing for High Performance
Indexes
Indexes
are a powerful tool used in the background of a database to speed up querying. When we create a new index on a table attribute we create a new lookup method to search over the table records using the given indexed attribute such as the following image:
in this case, if we wanna search for customers with state
it’s faster than usual because indexes are stored in memory not on the desk. the select won’t test all records it will search in the indexed state binary tree and get a reference to the row.
Cost of indexes
- Increase the size of the database because it’s stored next to the table.
- Slow down the write operation.
So we need to use indexes with performance-critical
queries. Don’t use it everywhere and in the designing phase, it’s used only if we have a slow query.
Create Index
CREATE INDEX idx ON customers(points);
Prefix index
When you create an index you can prefix the string index with the numbers of chars to be indexed, The following code will create an index that searches on the first 10 chars of the customer’s last name.
CREATE INDEX idx ON customers(last_name(10));
to define the number to create an index we use the following code to get a satisficed number of unique values to be identified based on our passed number.
SELECT
COUNT(DISTINCT LEFT(last_name, 1)) -- 25 unique value
COUNT(DISTINCT LEFT(last_name, 5)) -- 966 unique value
COUNT(DISTINCT LEFT(last_name, 10)) -- 996 unique value
FROM customers;
since 5 gives you 966 unique values and 10 gives you 996 -small improvement- we’ll choose the 5
to be our indexed length of the customer last_name
.
Full-text index
CREATE FULLTEXT INDEX idx_title_body ON posts (title, body);
To search in the text-indexed text we can use the following code
SELECT *
FROM posts
WHERE MATCH(title, body) AGAINST ("search value");
To display the record matching score you can select MATCH(title, body) AGAINST ("search value")
and it will return the score in the range from 0 to 1.
The default mode for the search the it the Natural language
mode, to enable the Boolean mode
such like this:
SELECT *
FROM posts
WHERE MATCH(title, body) AGAINST ("search value" IN BOOLEAN MODE);
Using -value
will return the records match the search value excluding the value that after -
, And the word after +
it will be included in the search result, "value"
By surrounding the search value that means this value must be included in the result.
Composite index
CREATE INDEX idx_state_points ON customers(state, points);
the order of the columns matter. We have 2 rules to apply here:
- Put the most frequently used columns first.
- Put the column with the higher cardinality to narrow the search range.
Use these rules reasonably, and choose the one that will fit your case. To choose a specific index to run in your search use the following code:
SELECT customers
USE INDEX(index_name)
WHERE state LIKE 'search_criteria' AND
last_name LIKE 'search_criteria';
When indexes are ignored?
Example 1:
SELECT customer_id
FROM customers
WHERE state = `CA` OR points > 1000;
Explain Example 1
In the previous example, if you have a composite index on points and states, The code will test all records in your database. This case isn’t a full-table
scan its a full-index
it reads data from memory not from the disk. to solve a problem like this we have to divide the query and union the result together.
EXPLAIN
SELECT customer_id
FROM customers
WHERE state = `CA`
UNION
SELECT customer_id
FROM customers
WHERE points > 1000;
Example 2:
EXPLAIN
SELECT customer_id
FROM customers
WHERE points + 10 > 2010;
Explain Example 2
this example will test the whole database records against the specified where clause but it’s also a full-index
scan. to solve this problem we can change the criteria in the where clause to points > 2000
it’ll give the same result but faster than the above code result.
EXPLAIN
SELECT customer_id
FROM customers
WHERE points > 2000;
Sorting data with indexes
If you have a composite index such as INDEX(state, points)
you can perform sorting in the following rules:
- You can sort data based on
state
value
SELECT customer_id
FROM customers
ORDER BY state
- You can sort data based on
state
, andpoints
values
SELECT customer_id
FROM customers
ORDER BY state, points
- You can sort data based on the reverse order of
both
columns
SELECT customer_id
FROM customers
ORDER BY state DESC, points DESC
- You can sort data based on the
points
only
SELECT customer_id
FROM customers
-- this case is the fastest one because data is already sorted by the state before.
ORDER BY points
To know the cost of the query you can use the following command to show the status variable with specified criteria
SHOW STATUS LIKE 'last_query_cost';
If you try to select something other than the state
, value
, and customer_id
, You’ll perform a full-table
because the indexes only have their column to look up with a reference to the record id
as we mentioned before.
Before creating new indexes check the existing ones.
Securing Databases
Skipped for now