## Basic concepts [Clustered index VS Nonclustered index](https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver17) > Clustered indexes sort and store the data rows in the table or view based on their key values. Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value. [The Key Differences Between Clustered vs. Nonclustered Index](https://www.solarwinds.com/database-optimization/clustered-vs-non-clustered-index)  > A write operation on a clustered index has higher overhead than simply adding to a heap, as the physical ordering still needs to be maintained. [Cardinality Estimation (SQL Server) - SQL Server | Microsoft Learn](https://learn.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server?view=sql-server-ver17) The SQL Server Query Optimizer is a cost-based Query Optimizer. This means that it selects query plans that have the lowest estimated processing cost to execute. The Query Optimizer determines the cost of executing a query plan based on two main factors: - The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan. - The cost model of the algorithm dictated by the operators used in the query. Cardinality estimation (CE) in SQL Server is derived primarily from histograms that are created when indexes or statistics are created, either manually or automatically. Sometimes, SQL Server also uses constraint information and logical rewrites of queries to determine cardinality. In the following cases, SQL Server can't accurately calculate cardinalities. This causes inaccurate cost calculations that might cause suboptimal query plans. Avoiding these constructs in queries might improve query performance. Sometimes, alternative query formulations or other measures are possible and these are pointed out: - Queries with predicates that use comparison operators between different columns of the same table. - Queries with predicates that use operators, and any one of the following are true: - There are no statistics on the columns involved on either side of the operators. - The distribution of values in the statistics isn't uniform, but the query seeks a highly selective value set. This situation can be especially true if the operator is anything other than the equality (=) operator. - The predicate uses the not equal to (!=) comparison operator or the NOT logical operator. - Queries that use any of the SQL Server built-in functions or a scalar-valued, user-defined function whose argument isn't a constant value. - Queries that involve joining columns through arithmetic or string concatenation operators. - Queries that compare variables whose values aren't known when the query is compiled and optimized. ### Index scan vs Index seek In a plan regression case we noticed the bad plan uses clustered index scan instead of clustered index seek, which appears to be a the performance bottleneck as it took most of the time. The cause was detemined to be a mismatch between the predicate and the index key order, or the presence of residual predicates that prevent an efficient seek. The plan also shows implicit conversions (CONVERT_IMPLICIT) in scalar computations, which can prevent index seeks. - Predicate: a condition or expression in a SQL query that evaluates to TRUE, FALSE, or UNKNOWN for each row. Predicates are most commonly found in the WHERE clause, JOIN conditions, and HAVING clauses. - For example, in a Clustered Index Scan, the predicate might look like: `[Table].[Column] = @Parameter` or, if there’s an implicit conversion or function: `CONVERT_IMPLICIT(decimal(19,3), [Table].[Column], 0) = @Parameter` - [SARGable (Search ARGument ABLE)](https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/performance/troubleshoot-high-cpu-usage-issues#step-6-investigate-and-resolve-sargability-issues): A predicate in a query is considered SARGable (Search ARGument-able) when SQL Server engine can use an index seek to speed up the execution of the query. Many query designs prevent SARGability and lead to table or index scans and high-CPU usageU usage of seek vs scan can be confirmed via [sys.dm_db_index_usage_stats](https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-usage-stats-transact-sql?view=sql-server-ver17) https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-index-design-guide?view=sql-server-ver17#query-considerations > The term SARGable in relational databases refers to a Search ARGumentable predicate that can use an index to speed up the execution of the query. Applying any function or computation on the column(s) in the search predicate generally makes the query non-sargable and leads to higher CPU consumption. Solutions typically involve rewriting the queries in a creative way to make the SARGable. A possible solution to this example is this rewrite where the function is removed from the query predicate, another column is searched and the same results are achieved: `WHERE SUBSTRING(ProductNumber, 0, 4) = 'HN-'` rewrite -> `WHERE Name LIKE 'Hex%'` `WHERE UnitPrice * 0.10 > 300` rewrite -> `WHERE UnitPrice > 300/0.10` SARGability applies not only to WHERE clauses, but also to JOINs, HAVING, GROUP BY and ORDER BY clauses. Frequent occurrences of SARGability prevention in queries involve CONVERT(), CAST(), ISNULL(), COALESCE() functions used in WHERE or JOIN clauses that lead to scan of columns. In the data-type conversion cases (CONVERT or CAST), the solution may be to ensure you're comparing the same data types. Additional reading: [SQL Server/SQL Database再入門 第2回 Index SeekとIndex Scan](https://qiita.com/yyukawa/items/4cc1cd5e447b5a5d1b48) ```sql -- https://qiita.com/yyukawa/items/4cc1cd5e447b5a5d1b48 -- Index Seek VS Index Scan CREATE TABLE [dbo].[test]( [id] [bigint] IDENTITY(1,1) NOT NULL PRIMARY KEY, [name] [nvarchar](50) NOT NULL, [address] [nvarchar](200) NOT NULL, [rgdt] [datetime] NOT NULL ) -- select count(1) from [dbo].[test] INSERT INTO test (name, address, rgdt) VALUES('a000', 'a000', GETDATE()) GO 4241 -- in demo it's 300000 but never finish with BASIC tier -- Analyz clustered index -- this index depth is 2, leaf page (level=0) count is 33, record count is 4241. SELECT o.name, d.index_id, d.index_depth, d.index_level, d.index_type_desc, d.page_count, d.record_count, d.avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats ( DB_ID(), OBJECT_ID('[dbo].[test]'), NULL, NULL, 'DETAILED') d JOIN sys.objects o on o.object_id = d.object_id WHERE d.index_type_desc = 'CLUSTERED INDEX' -- name index_id index_depth index_level index_type_desc page_count record_count avg_fragmentation_in_percent -- test 1 2 0 CLUSTERED INDEX 33 4241 3.03030303030303 -- test 1 2 1 CLUSTERED INDEX 1 33 0 -- check NONCLUSTERED INDEX CREATE NONCLUSTERED INDEX IDX_test_rgdt ON test (rgdt) INCLUDE (name); SELECT o.name, d.index_id, d.index_depth, d.index_level, d.index_type_desc, d.page_count, d.record_count, d.avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats ( DB_ID(), OBJECT_ID('[dbo].[test]'), NULL, NULL, 'DETAILED') d JOIN sys.objects o on o.object_id = d.object_id WHERE d.index_type_desc = 'NONCLUSTERED INDEX' --name index_id index_depth index_level index_type_desc page_count record_count avg_fragmentation_in_percent --test 2 2 0 NONCLUSTERED INDEX 18 4241 0 --test 2 2 1 NONCLUSTERED INDEX 1 18 0 -- analyze how index works -- #1 force seek SET statistics io on SELECT * FROM test WITH(FORCESEEK) WHERE id IN (1170, 1171) -- (2 rows affected) -- Table 'test'. Scan count 2, logical reads 4, physical reads 0 -- #2 force scan SET statistics io on SELECT * FROM test WITH(FORCESCAN) WHERE id IN (1170, 1171) -- (2 rows affected) -- Table 'test'. Scan count 1, logical reads 35, physical reads 0 -- ↑ the logcial reads equals to index leaf page count. -- #3 key search SET STATISTICS IO, TIME ON SELECT * FROM test WHERE id BETWEEN 130 AND 140 SET STATISTICS IO, TIME OFF -- (11 rows affected) -- Table 'test'. Scan count 1, logical reads 3, physical reads 0 ``` ### Nested loop vs Hash join [SQL Server; What is sql server hash join, loop join, merge join, nested loop - Microsoft Q&A](https://learn.microsoft.com/en-us/answers/questions/1012294/sql-server-what-is-sql-server-hash-join-loop-join) > Nested Loops is good when the optimizer assume that a smaller number of rows will be hit in the inner table But nested loops is not good when the condition hits many rows in the inner table. In that case, it is better to scan at least one of the tables and build a hash table for the join. Merge Join is an alternative to Hash Join, which is useful when the inputs are sorted in the same way.