TSQL -> Query and update data distribution statistics. Part 1

SQLserver automatically creates and manages statistics about data distribution in the table columns. We can monitor situation and add missing statistics when necessary. In the following example we will query sys.stats and sys.stats_columns system views in order to see what info exists about Customers table.

USE colombo
GO

SELECT a.object_id,
   OBJECT_NAME(a.object_id) AS table_name,
   a.Name AS stat_name,
   COL_NAME(a.object_id, b.column_id) AS colname
FROM sys.stats AS a INNER JOIN sys.stats_columns AS b
  ON a.stats_id = b.stats_id AND a.object_id = b.object_id
WHERE a.object_id = OBJECT_ID( 'dbo.Customers')
ORDER BY a.Name;



The results will came as following:
object_idtable_namestat_namecolname
685245496Customers_WA_Sys_00000001_28D80438CustomerID
685245496Customers_WA_Sys_00000003_28D80438Country
685245496Customers_WA_Sys_00000004_28D80438City

From the results we can learn that server engine automatically calculated statistics for 3 columns from Customers table. Statististics for the column CustomerID recived internal name: "_WA_Sys_00000001_28D80438". To have updated statistics on the id column is better than nothing. But it also teaches us that we have forgot to index this important column, and engine created it's own statistics on it. Therefore we can fix this by dropping existing statistics and creating Unique Clustered index:

-- Drop Statistics
DROP STATISTICS dbo.Customers._WA_Sys_00000001_28D80438
GO

-- create PK index
ALTER TABLE [dbo].[Customers]
ADD CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED
(
  [CustomerID] ASC
) WITH (SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY];
GO

Executing again query on Customers table statistics we will recive something like this:
object_idtable_namestat_namecolname
685245496Customers_WA_Sys_00000003_28D80438Country
685245496Customers_WA_Sys_00000004_28D80438City
685245496CustomersPK_CustomersCustomerID

We see that from now instead of "_WA_Sys_00000001_28D80438" statistics server engine will use PK_Customers as a primary source of knowledge about data distrubution and location in CustomerID column.


sqlexamples.info