## Basic functions - Inner join vs Outer join ``` INNER JOIN ( table1 )∩( table2 ) [ overlap only ] ( O ) ( O ) ***** ***** <-- shaded overlap LEFT JOIN ( table1 )⊃( table2 ) [ all from table1 + overlap ] ( OOOOO ) ( O ) ( OOOOO ) ( O ) ***** <-- overlap shaded too RIGHT JOIN ( table1 )⊂( table2 ) [ all from table2 + overlap ] ( O ) ( OOOOO ) ( O ) ( OOOOO ) ***** <-- overlap shaded too FULL OUTER JOIN ( table1 )∪( table2 ) [ everything from both ] ( OOOOO ) ( OOOOO ) ( OOOOO ) ( OOOOO ) ``` - How window functions works Sample data | EmployeeID | Department | Salary | |------------|------------|--------| | 1 | HR | 5000 | | 2 | HR | 4000 | | 3 | IT | 7000 | | 4 | IT | 6000 | | 5 | IT | 5000 | ```sql -- Calculates the average salary per department without grouping the rows (each row still appears). SELECT EmployeeID,Department,Salary, AVG(Salary) OVER (PARTITION BY Department) AS AvgDeptSalary, ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RankInDept FROM Employees; ``` Query result | EmployeeID | Department | Salary | AvgDeptSalary | RankInDept | |------------|------------|--------|---------------|------------| | 1 | HR | 5000 | 4500 | 1 | | 2 | HR | 4000 | 4500 | 2 | | 3 | IT | 7000 | 6000 | 1 | | 4 | IT | 6000 | 6000 | 2 | | 5 | IT | 5000 | 6000 | 3 | ## Partitioning [Data partitioning recommendations for reliability - Microsoft Azure Well-Architected Framework | Microsoft Learn](https://learn.microsoft.com/en-us/azure/well-architected/design-guides/partition-data#partition-in-azure-sql-database) talks about scaling out with elastic pool. [Partitioned Tables and Indexes - SQL Server, Azure SQL Database, Azure SQL Managed Instance | Microsoft Learn](https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver17#partitioning-column) talks about partitioning table data. - Logical view (what users see): Table or Index - Physical view (what SQL Server stores on disk): Heap or B‑tree index (clustered / nonclustered) Heap (table without clustered index) ``` Table (logical) └─ Heap (physical) ├─ Data pages ├─ IAM pages └─ Allocation units ``` Table with clustered index (most common) ``` Table (logical) └─ Clustered index (physical) ├─ Root ├─ Intermediate └─ Leaf level (actual table rows) ```` Partitioning case: original table has a clustered PK - Id. The strategy is to partition by a non unique datetime column (created datetime). Initially tried adding the partitioning column into the PK, but found this requires modification in applications as well as dependent tables. Thus employed another approach - change the Id to nonclustered PK, and create a separate clustered index includes Id and partitioning column. Changing the PK to nonclustered does not require any modifications to dependent tables or application mappings. Meanwhile, creating a separate clustered index that includes the partitioning column allows the table to be physically partitioned and enables reliable partition elimination. [Partitioned Tables and Indexes - SQL Server, Azure SQL Database, Azure SQL Managed Instance | Microsoft Learn](https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver17#queries) ```sql -- Parent table — PK is NONCLUSTERED CREATE TABLE dbo.[Log] ( ID INT NOT NULL IDENTITY(1,1), insert_time DATETIME NOT NULL DEFAULT GETDATE(), CONSTRAINT PK_Log_ID PRIMARY KEY NONCLUSTERED (ID) -- logical key only ); -- Dependent table with FK CREATE TABLE dbo.[Log_data1] ( ID INT NOT NULL IDENTITY(1,1), LogID INT NOT NULL, insert_time DATETIME NOT NULL DEFAULT GETDATE(), CONSTRAINT PK_Log_data1_ID PRIMARY KEY CLUSTERED (ID) ); ALTER TABLE dbo.[Log_data1] ADD CONSTRAINT FK_Log_data1_Log FOREIGN KEY (LogID) REFERENCES dbo.[Log](ID); -- Partition the table CREATE PARTITION FUNCTION PF_Log_insert_time (DATETIME) AS RANGE RIGHT FOR VALUES ('2025-01-01', '2026-01-01'); CREATE PARTITION SCHEME PS_Log_insert_time AS PARTITION PF_Log_insert_time ALL TO ([PRIMARY]); CREATE CLUSTERED INDEX CIX_Log_insert_time ON dbo.[Log] (insert_time, ID) ON PS_Log_insert_time(insert_time) ``` If this is a large table, we can create a new table then backfill data by batches intead of making changes diretly at the original one.