Table Design: Columns, Types, and Indexes

Every database starts with a table

Simple explanation

Think of a spreadsheet.

A table is like a spreadsheet with strict rules. Each column has a name and a type (text, number, date), and every row follows those rules. You cannot put someone’s name into a column that expects a number — the database will reject it.

Indexes are like the index at the back of a textbook. Instead of reading every page to find “vector search,” you flip to the index, find the page number, and jump straight there. Without an index, the database reads every single row — which works for 100 rows but is painfully slow for 10 million.

This module teaches you how to pick the right column types, build indexes that make queries fast, and use columnstore indexes for analytics workloads.

Choosing the right data type

Every column needs a data type. Picking the right one affects storage size, query performance, and data integrity.

The golden rule: use the smallest type that fits your data.

Data	Bad Choice	Good Choice	Why?
Customer age	BIGINT (8 bytes)	TINYINT (1 byte)	Ages fit in 0–255. You save 7 bytes per row.
Product price	FLOAT	DECIMAL(10,2)	FLOAT has rounding errors. Money needs exact precision.
Country code	VARCHAR(100)	CHAR(2)	ISO codes are always 2 characters. Fixed-length is faster.
Email address	VARCHAR(MAX)	VARCHAR(320)	MAX goes off-row. 320 is the RFC maximum for emails.
Yes/No flag	VARCHAR(3)	BIT (1 bit)	BIT uses almost no space. “Yes”/“No” wastes bytes.

Exam tip: VARCHAR vs NVARCHAR

VARCHAR stores ASCII (1 byte per character). NVARCHAR stores Unicode (2 bytes per character). If your data includes non-Latin characters (Chinese, Arabic, emoji), you need NVARCHAR. The exam may present a scenario where international data is stored in VARCHAR — that is the wrong choice.

Also watch for VARCHAR(MAX) and NVARCHAR(MAX). These store data off-row when the value exceeds 8,000 bytes, which hurts performance. Only use MAX when you genuinely need unlimited-length text (like document content).

Indexes: the speed layer

Without an index, SQL Server performs a table scan — it reads every row to find matches. With the right index, it jumps directly to the matching rows.

Clustered vs nonclustered

Every table can have one clustered index (which physically sorts the data on disk) and up to 999 nonclustered indexes (which are separate lookup structures).

Clustered vs nonclustered — the exam tests when each is appropriate
Feature	Clustered Index	Nonclustered Index
How many per table?	One only	Up to 999
Data storage	Data rows stored in index order	Separate structure with pointers to data rows
Best for	The column you query by most (usually the PK)	Columns used in WHERE, JOIN, ORDER BY
Write overhead	Moderate (data must stay sorted)	Higher per additional index (each insert updates every NCI)
Default	Created automatically with PRIMARY KEY	Must be created manually

Included columns and covering indexes

A covering index contains all the columns a query needs, so SQL Server never has to look up the actual data row. You create one by adding included columns:

-- Covering index for: SELECT Email, Name FROM Customers WHERE City = 'Auckland'
CREATE NONCLUSTERED INDEX IX_Customers_City
ON Customers (City)
INCLUDE (Email, Name);

The INCLUDE columns are stored in the leaf level of the index but not in the key. This makes the index wider but avoids expensive key lookups.

Scenario: Priya's index decision at Vault Bank

Priya Sharma is optimising the transaction lookup at Vault Bank. The most common query is:

SELECT TransactionDate, Amount, Description
FROM Transactions
WHERE AccountId = @AccountId
ORDER BY TransactionDate DESC;

She creates a covering index:

CREATE NONCLUSTERED INDEX IX_Transactions_Account
ON Transactions (AccountId, TransactionDate DESC)
INCLUDE (Amount, Description);

This index covers the entire query — no key lookups needed. The AccountId in the key handles the WHERE clause, TransactionDate DESC handles the ORDER BY, and Amount and Description in INCLUDE handle the SELECT.

Columnstore indexes: analytics at speed

Traditional (rowstore) indexes are great for looking up individual rows. But when you need to scan millions of rows for analytics — totals, averages, aggregations — columnstore indexes are dramatically faster.

Aspect	Rowstore Index	Columnstore Index
Storage	Row by row	Column by column (compressed)
Best for	Point lookups, small ranges	Aggregations over millions of rows
Compression	Moderate	Excellent (often 10x smaller)
Update pattern	Fast for single-row inserts	Optimised for bulk loads
Typical use	OLTP (transactions)	OLAP (analytics, reporting)

-- Create a clustered columnstore index for analytics
CREATE CLUSTERED COLUMNSTORE INDEX CCI_SalesHistory
ON SalesHistory;

You can also create a nonclustered columnstore index on an OLTP table to enable real-time analytics without changing the table’s rowstore structure:

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Orders_Analytics
ON Orders (OrderDate, ProductId, Quantity, TotalAmount);

Exam tip: When NOT to use columnstore

Columnstore indexes shine for analytical queries (SUM, AVG, COUNT over millions of rows). They are NOT ideal for:

Frequent single-row lookups (use rowstore B-tree indexes)
Narrow result sets with many columns (columnstore reads entire column segments)
Tables with heavy singleton INSERT/UPDATE/DELETE patterns (delta store overhead)

The exam may present a scenario where an OLTP table needs point lookups — columnstore would be the wrong answer.

SEQUENCES: generating values across tables

IDENTITY generates auto-incrementing values for a single table. SEQUENCE is a database-level object that generates values you can use across multiple tables, in any order, and even peek at the next value before inserting.

-- Create a sequence
CREATE SEQUENCE dbo.InvoiceNumber
    AS INT
    START WITH 1000
    INCREMENT BY 1
    MINVALUE 1000
    MAXVALUE 9999999
    CYCLE;  -- Restarts at MINVALUE when MAXVALUE is reached

-- Use it in an INSERT
INSERT INTO Invoices (InvoiceNo, CustomerId, Amount)
VALUES (NEXT VALUE FOR dbo.InvoiceNumber, @CustomerId, @Amount);

SEQUENCE vs IDENTITY

Feature	IDENTITY	SEQUENCE
Scope	Single table	Database-wide (any table)
Get next value without inserting	No	Yes (NEXT VALUE FOR)
Reset/cycle	Requires RESEED	Built-in CYCLE option
Use in DEFAULT constraint	Automatic	Yes (with NEXT VALUE FOR)
Multiple tables sharing one counter	No	Yes
Performance	Slightly faster for single-table	Slightly more overhead

Scenario: Leo's shared sequence at SearchWave

Leo Torres at SearchWave needs a single incrementing reference number shared across three tables: Orders, Returns, and Exchanges. With IDENTITY, each table would have its own counter and numbers would overlap. With a SEQUENCE, all three tables pull from the same counter:

CREATE SEQUENCE dbo.ReferenceNumber AS BIGINT START WITH 100000 INCREMENT BY 1;

-- All three tables use the same sequence
ALTER TABLE Orders ADD CONSTRAINT DF_Orders_Ref DEFAULT NEXT VALUE FOR dbo.ReferenceNumber FOR ReferenceNo;
ALTER TABLE Returns ADD CONSTRAINT DF_Returns_Ref DEFAULT NEXT VALUE FOR dbo.ReferenceNumber FOR ReferenceNo;
ALTER TABLE Exchanges ADD CONSTRAINT DF_Exchanges_Ref DEFAULT NEXT VALUE FOR dbo.ReferenceNumber FOR ReferenceNo;

Now every reference number is unique across all three tables. Note that gaps are possible (e.g., from rollbacks or cache loss), but uniqueness is guaranteed.

Question

What is a covering index?

Click or press Enter to reveal answer

Answer

An index that contains all the columns a query needs (in the key and/or INCLUDE columns), so SQL Server can satisfy the query entirely from the index without looking up the actual data row. This eliminates key lookups and dramatically improves read performance.

Click to flip back

Question

When should you use a columnstore index instead of a rowstore index?

Click or press Enter to reveal answer

Answer

Use columnstore for analytical workloads — queries that aggregate (SUM, AVG, COUNT) over millions of rows. Columnstore stores data column-by-column with heavy compression, making it dramatically faster for scans. Use rowstore for OLTP workloads with point lookups and single-row operations.

Click to flip back

Question

What is the difference between IDENTITY and SEQUENCE?

Click or press Enter to reveal answer

Answer

IDENTITY is tied to a single table and auto-generates values on INSERT. SEQUENCE is a database-level object that generates values independently — you can share one sequence across multiple tables, get the next value without inserting (NEXT VALUE FOR), and configure cycling. Use SEQUENCE when multiple tables need a shared counter.

Click to flip back

Question

Why should you avoid VARCHAR(MAX) for short strings?

Click or press Enter to reveal answer

Answer

VARCHAR(MAX) stores data off-row when the value exceeds 8,000 bytes. Even when values are short, the engine may treat MAX columns differently — they cannot be used as index keys and may cause performance overhead. Use a sized VARCHAR (e.g., VARCHAR(200)) when you know the maximum length.

Click to flip back

Knowledge Check

Priya at Vault Bank needs to create a table for daily transaction summaries. The table will store 50 million rows and be queried primarily with aggregate functions (SUM, AVG) for monthly reports. Singleton lookups are rare. Which index strategy should she use?

Knowledge Check

Leo at SearchWave needs a reference number that is unique across the Orders, Returns, and Exchanges tables. The number must increment by 1 with no duplicates between tables. What should he use?

Next up: Constraints: Protecting Your Data — enforce data integrity with PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT constraints.