Mastering Databases with PostgreSQL β Lecture Notes
1. Why Databases?
Persistence = storing data so it survives after the program that created it has stopped, and remains accessible across sessions and locations.
Without persistence: every app restart loses all data (a to-do app with no memory of your tasks).
Database (broad definition): Any structured persistent system that provides CRUD operations β Create, Read, Update, Delete. This includes your phoneβs contacts, browser localStorage, or even a text file.
Database (backend context): Disk-based storage systems. Why disk?
| Storage | Speed | Capacity | Cost |
|---|---|---|---|
| RAM (primary memory) | Very fast | 8β128 GB typical | Expensive |
| Disk / SSD (secondary) | Slower | 512 GB β 2 TB+ typical | Cheap |
Databases prioritize capacity over speed β disk-based. Caches (Redis, Memcached) prioritize speed β RAM-based.
2. Why Not Just Text Files?
Three fundamental problems with plain text as a database:
| Problem | Detail |
|---|---|
| Parsing overhead | Every read requires scanning and splitting the file in application code β slow and error-prone |
| No schema enforcement | Canβt prevent wrong data types; any string can go anywhere |
| No concurrency control | Two simultaneous writes to the same value β unpredictable result (last-write-wins race condition) |
These limitations motivated the creation of DBMS (Database Management System) software.
3. DBMS Responsibilities
A DBMS software (e.g., PostgreSQL, MySQL, MongoDB) handles:
- Data organization β efficient storage and retrieval algorithms
- Access β CRUD operations via a query language
- Integrity β enforces data accuracy and consistency at the database level
- Security β role-based access control, user permissions
4. Relational vs Non-Relational
| Relational (SQL) | Non-Relational (NoSQL) | |
|---|---|---|
| Structure | Tables, rows, columns | Collections, documents |
| Schema | Predefined, strict | Flexible, schema-less |
| Data Integrity | Enforced at DB level | Must be enforced in application code |
| Query language | SQL | Varies (MongoDB query language, etc.) |
| Examples | PostgreSQL, MySQL, SQLite | MongoDB, DynamoDB |
| Best for | CRM, e-commerce, financial data β anything requiring consistency | CMS, prototyping, unstructured content |
Non-relational flexibility can be a disadvantage: without schema enforcement, data inconsistencies are easier to introduce and must be caught in application code, which is more error-prone.
5. Why PostgreSQL?
- Open source and free β no licensing cost; can self-host
- SQL-standard compliant β migrating to MySQL or another SQL DB requires minimal changes
- Highly extensible β ~1,400 pages of documentation; extension system
- Reliable and scalable β production-proven at scale
- Native JSON support β
jsonbtype eliminates most reasons to choose MongoDB; PostgreSQL can handle dynamic/unstructured data without switching databases
Rule of thumb: Use PostgreSQL as your default for virtually every project. Only reconsider when youβre optimizing for very specific bottlenecks at massive scale.
6. PostgreSQL Data Types
Numeric Types
| Type | Notes |
|---|---|
SMALLINT, INTEGER, BIGINT | Integers; use BIGINT for IDs in production |
SERIAL, BIGSERIAL | Auto-incrementing integers; BIGSERIAL for primary keys |
DECIMAL(p, s) / NUMERIC(p, s) | Exact precision β use for prices, financial data |
REAL, DOUBLE PRECISION, FLOAT | Approximate β faster; use for scientific measurements |
Decimal vs Float: Use
DECIMALwhen accuracy matters (price, currency). UseFLOATwhen small inaccuracies are acceptable and speed matters (area calculations, measurements).
String Types
| Type | Behavior | When to use |
|---|---|---|
CHAR(n) | Fixed length; pads with spaces | Only if values are always exactly the same length (e.g., day codes: MON, TUE) |
VARCHAR(n) | Variable up to n characters | Legacy; mostly a MySQL convention |
TEXT | Unlimited length | Always prefer this in PostgreSQL |
PostgreSQL recommendation: Use
TEXTβ no performance difference fromVARCHAR, avoids arbitrary length limits that require migrations later, and makes schema easier to read.
Other Important Types
| Type | Notes |
|---|---|
BOOLEAN | true / false |
DATE | Date only |
TIME | Time only (HH:MM:SS) |
TIMESTAMP | Date + time |
TIMESTAMPTZ | Date + time + timezone β use this for created_at, updated_at |
INTERVAL | Duration (e.g., 10 days, 1 week) |
UUID | Universally unique identifier β preferred for primary keys |
JSON | Stored as plain text |
JSONB | Stored as binary β prefer this; faster queries and indexing |
ARRAY | Array of any data type (e.g., INTEGER[], TEXT[]) |
JSON vs JSONB: Always use
JSONBβ PostgreSQL serializes it to binary for faster processing and better query capabilities. The only reason to useJSONis if you need to preserve original formatting (rare).
Naming Conventions
- All table names and field names: lowercase snake_case (
full_name,created_at) - Table names: plural (
users,projects,tasks) - PostgreSQL is case-insensitive by default β camelCase forces you to use double quotes everywhere, making code ugly and fragile
7. Database Migrations
What are Migrations?
Versioned SQL files that track all changes made to a database schema over time.
db/
migrations/
001_create_users.sql
002_add_projects.sql
003_add_indexes.sql
A migration tool (e.g., dbmate, go-migrate) reads these files sequentially and applies them to the database.
Up vs Down Migrations
| Direction | Purpose |
|---|---|
| Up | Apply the change (CREATE TABLE, CREATE INDEX, etc.) |
| Down | Revert the change (DROP TABLE, DROP INDEX, etc.) |
Down migrations enable rollbacks β if something breaks in production, you can revert to the previous schema version.
Why Not Just Write Queries Manually?
- No version history of schema changes
- Hard to replicate state across dev/staging/production environments
- No rollback capability
- No audit trail of who changed what
The migration tool maintains a schema_migrations table in your DB to track the current version.
Seeding
Inserting test data into the database for development/testing. Typically a separate migration file labeled seed_data.
8. Schema Design β Project Management Platform
Enum Types (Custom Types)
Define a fixed set of allowed values at the database level:
CREATE TYPE project_status AS ENUM ('active', 'completed', 'archived');
CREATE TYPE task_status AS ENUM ('pending', 'in_progress', 'completed', 'cancelled');
CREATE TYPE member_role AS ENUM ('owner', 'admin', 'member');
Why enums over TEXT?
- Data integrity β DB rejects any value not in the enum; no application-level check needed
- Documentation β future developers can see all allowed values directly in the schema without reading application code
9. Table Design Patterns
Standard Fields for Every Table
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
NOT NULL rule: Add NOT NULL to every field unless you have an explicit reason for that field to be nullable. Default behavior in PostgreSQL allows nulls β donβt rely on it.
Constraints
| Constraint | Behavior |
|---|---|
PRIMARY KEY | Implicitly: NOT NULL + UNIQUE. Used to uniquely identify a row |
NOT NULL | Field cannot be null |
UNIQUE | No two rows can have the same value for this field |
CHECK (condition) | Custom condition must be true (e.g., CHECK (priority BETWEEN 1 AND 5)) |
REFERENCES table | Foreign key β value must exist as a primary key in the referenced table |
10. Relationships
One-to-One
Separate table for related data (e.g., users and user_profiles). The second table uses the primary key of the first table as its own primary key.
CREATE TABLE user_profiles (
user_id UUID PRIMARY KEY REFERENCES users ON DELETE CASCADE,
bio TEXT,
...
);
Why separate tables? User profiles grow over time (social links, preferences, etc.) without touching the core users table. Isolates concerns and reduces migration risk.
One-to-Many
A project has many tasks. The βmanyβ side stores a foreign key pointing to the βoneβ side.
CREATE TABLE tasks (
project_id UUID NOT NULL REFERENCES projects ON DELETE CASCADE,
...
);
Many-to-Many
Users can belong to many projects; projects can have many users. Requires a linking table with a composite primary key.
CREATE TABLE project_members (
project_id UUID REFERENCES projects ON DELETE CASCADE,
user_id UUID REFERENCES users ON DELETE CASCADE,
role member_role NOT NULL DEFAULT 'member',
PRIMARY KEY (project_id, user_id) -- composite primary key
);
The composite primary key ensures the same user canβt be added to the same project twice.
11. Referential Integrity Constraints
What happens to related rows when a parent row is deleted:
| Constraint | Behavior |
|---|---|
ON DELETE RESTRICT | Prevents deletion if related rows exist β must delete children first |
ON DELETE CASCADE | Automatically deletes all related rows |
ON DELETE SET NULL | Sets the foreign key to NULL (field must be nullable) |
ON DELETE SET DEFAULT | Sets the foreign key to its default value |
Example logic:
- Deleting a user who owns projects β
RESTRICT(canβt delete until projects are removed) - Deleting a project β
CASCADEon tasks (tasks belong to the project, delete them too) - Deleting a user assigned to a task β
SET NULLonassigned_to(task survives, just unassigned)
12. SQL Query Patterns for APIs
Get All Users (with Profile)
SELECT u.*, to_jsonb(up.*) AS profile
FROM users u
LEFT JOIN user_profiles up ON u.id = up.user_id
ORDER BY u.created_at DESC;
- LEFT JOIN β include users even if they have no profile entry
to_jsonb()β converts a row into a JSON object; embed profile as a nested field- Always sort β DB returns rows in undefined order without explicit ORDER BY
Get Single User (Parameterized Query)
SELECT u.*, to_jsonb(up.*) AS profile
FROM users u
LEFT JOIN user_profiles up ON u.id = up.user_id
WHERE u.id = $userId;
Dynamic Filters + Sorting + Pagination
SELECT u.*, to_jsonb(up.*) AS profile
FROM users u
LEFT JOIN user_profiles up ON u.id = up.user_id
WHERE u.full_name ILIKE $letter || '%' -- filter by first letter
ORDER BY $sortBy $sortOrder -- dynamic sort
LIMIT $limit OFFSET $offset; -- pagination
ILIKE= case-insensitive LIKE pattern matching%wildcard = match anything after the given letterOFFSET = (page - 1) * limitβ DB pages start at 0, UI pages start at 1
Insert (Create User)
INSERT INTO users (email, full_name, password_hash)
VALUES ($email, $name, $passwordHash)
RETURNING *;
RETURNING * returns the newly created row β use this instead of a second SELECT query.
Update (PATCH)
UPDATE user_profiles
SET bio = $bio, phone = $phone
WHERE user_id = $userId
RETURNING *;
Only include fields in SET that the user actually passed β construct this dynamically in application code.
13. Parameterized Queries
What: Placeholders in SQL queries ($1, $userId, etc.) filled at runtime.
Why: Prevents SQL injection β a vulnerability where user input is treated as SQL code. With parameterized queries, all input is treated as a string value, never as executable SQL.
Dangerous (string concatenation):
"SELECT * FROM users WHERE id = " + userId
β userId = "1; DROP TABLE users;" β SQL injection!
Safe (parameterized):
"SELECT * FROM users WHERE id = $userId"
β userId is always treated as a string, never parsed as SQL
Always use parameterized queries when incorporating any user-supplied value into a SQL query.
14. Indexes
The Problem
Without an index, finding a row by a field requires a sequential scan β checking every row one by one across disk locations. At millions of rows, this is very slow.
What an Index Does
An index is a lookup table maintained by the database:
Index on tasks.id:
task_id_1 β disk location A
task_id_2 β disk location B
task_id_3 β disk location C
When you query by id, the DB checks the index (fast, sequential lookup) β finds the disk location β jumps directly to the data.
Analogy: A bookβs index tells you which page each chapter starts on β you jump directly instead of scanning page by page.
When to Create an Index
Create an index when a field is used in:
- JOIN conditions (e.g.,
ON u.id = up.user_id) - WHERE clauses (e.g.,
WHERE status = 'pending') - ORDER BY / sorting (e.g.,
ORDER BY created_at DESC)
Primary keys are automatically indexed. Only index foreign keys and other frequently queried fields manually.
Index Direction
Specify ascending or descending to match your most common query:
CREATE INDEX idx_users_created_at ON users(created_at DESC);
If your API always returns results in descending order, a descending index avoids an extra sort step.
Performance Tradeoff
Every index adds overhead to INSERT and UPDATE operations (the index must be maintained). Evaluate:
- How frequently is the query called?
- Does the performance gain outweigh the write overhead?
- Monitor and drop indexes that arenβt earning their cost.
Common Indexes for the Example Schema
| Index | Reason |
|---|---|
users(email) | Frequent lookups by email (login, joins) |
users(created_at DESC) | Default sort order for list API |
tasks(project_id) | JOIN condition when fetching tasks of a project |
tasks(assigned_to) | JOIN condition when fetching tasks of a user |
tasks(status) | WHERE clause filter |
tasks(created_at DESC) | Default sort order |
project_members(user_id) | JOIN condition |
15. Triggers
Automate actions at the database level in response to events (INSERT, UPDATE, DELETE).
Use case: Automatically update updated_at whenever any row in any table is modified.
-- 1. Create the function
CREATE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- 2. Attach trigger to each table
CREATE TRIGGER set_updated_at
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_updated_at();
Without a trigger: youβd have to manually add updated_at = NOW() to every UPDATE query in your application code β easy to forget, error-prone.
Quick Revision Checklist
- Persistence = data survives program restarts; disk-based DBs trade speed for capacity
- DBMS = software managing data organization, CRUD access, integrity, security
- Text files fail at: parsing efficiency, schema enforcement, concurrency
- Relational = strict schema, tables/rows, SQL, data integrity at DB level
- Non-relational = flexible schema, collections/documents, integrity in app code
- PostgreSQL preferred: open source, SQL-compliant, extensible, native JSONB support
- Data types: prefer
BIGSERIAL/UUIDfor IDs,DECIMALfor prices,TEXToverVARCHAR,TIMESTAMPTZfor timestamps,JSONBoverJSON - Naming: lowercase snake_case for all field and table names; plural table names
- Migrations: versioned SQL files; up = apply; down = revert; seeding = test data
- Enums: enforce allowed values at DB level; serve as documentation
- Constraints:
PRIMARY KEY(NOT NULL + UNIQUE),NOT NULL,UNIQUE,CHECK, foreign keyREFERENCES - Relationships: one-to-one (shared PK), one-to-many (FK on βmanyβ side), many-to-many (linking table + composite PK)
- Referential integrity:
RESTRICT(block),CASCADE(delete children),SET NULL,SET DEFAULT - Always use
RETURNING *after INSERT/UPDATE instead of a second SELECT - Parameterized queries prevent SQL injection β always use them for user-supplied values
- Indexes = lookup tables; speed up JOIN, WHERE, ORDER BY; add overhead to writes
- Primary keys auto-indexed; manually index foreign keys and frequent query fields
- Triggers = automate DB-level actions on events (e.g., auto-update
updated_at)