Data Analysis
Download Northwind Database
- GITHUB
- DOWNLOAD Northwind
EXEC
- EXEC sp_helpdb
- show information of databse
INDEX
- FILLFACTOR
- use only 1% of leaf page space
- PAD_INDEX
- apply to FILLFACTOR middle page
- sys.indexes
- help to search index number
DBCC
- DBCC
- show database data that match with the condition
- IND
- show index data
- PAGE
- ref: [Database], [File Number], [Page Number], [Print Option]
- HEAD_RID
- ref: Page Address, File ID, Slot Number
- Random Access
- Access one page by one page to read
- Bookmark Lookup
- search row by RID
Multiple Index
Copy table to new table
- SELECT INTO
- copy the table and paste in a new table
EXEC
- EXEC sp_helpindex
- show information of index
Index performance
- Multiple Index
- when you use multiple index(A, B), the index order is A and B is next level of A
- So, if you want to use index, you have to start A
- if you start B, then the process cannot use index
- SEEK
- process speed is good
- use index when searching
- SCAN
- process speed is bad
- do not use index when searching
- when you use data processing, all of data will be scanned even it is index
Page Split
- Page Split
- if page is full, make a new page and insert in specific position
Clustered vs Non-Clustered
Clustered
- Clustered
- Leaf Page = Data Page
- data is ordered by Clustered index
Non-Clustered
- not exist Clustered Index
- data is saved in Heal Table
- Heap RIP → Heap Table To exract data
- HEAP RID
- Heap RID([Page Address(4)] [File ID(2)][Slot(2)] ROW)
- Heap Table[{Page} {Page} {Page} {Page}]
- exist Clustered Index
- no Heap Table. Real data is in Leaf Table
- has real Key value of Clustered Index
- Previous index can be changed
- no Heap Table. Real data is in Leaf Table
Index Scan VS Index Seek
Index Scan vs Index Seek
- Index Access Process
- Index Scan: Search LEAF PAGE sequentially
- Index Seek: Search with index
CONVERT
- CONVERT
- change type of variable
Clustered
Non-Clustered
SET STATISTICS
- TIME
- Time for Logical read
- IO
- Number of pages read to find actual data
Scan Process
- All table
- INDEX SCAN
- Non-Clustered
- INDEX SCAN + KEY LOOKUP
- N is number of rows which you want to search
- 2 is for root → page stesp in Non-Clustered
Seek Process
- Clustered
- INDEX SEEK
- root → page
- 2 steps
- Non-Clustered
- INDEX SEEK + KEY LOOKUP
- root → page → clustered root → page
- page has Key of clustered index
- 4 steps
Bookmark Lookup
SET STATISTICS
- PROFILE
- show the actual order in which it was executed
WITH INDEX
- WITH INDEX
- force index usage
- Database choose whether index usage or not
- if it is more better index not usage, database can choose
SCAN
Decrease Lookup
- Covered Index
- include all column to search in index
- but it can putsa load on the DML(CREATE, DELETE, UPDATE)
- INCLUDE
- give a hint to index by
INCLUDE
- give a hint to index by
- Clustered
- Only one column
- it can put a load if there is Non-Clustred
Index Column Order
Create dummy data
Create Index
idx_emp_ord
creates index byEmployeeID
at first, and then byOrderDate
idx_ord_emp
creates index byOrderDate
at first, and then byEmployeeID
Compare
- Case 1: no range in search
- there is same logiacal read time
- and both use index to search
- first, database searches the page which have a real data matching with condition
- and check the next row
- if the next row is not matched with condition, then return
- So,
idx_emp_ord
andidx_emp_ord
searched same count of row - this is because their search order did not affect to search
- Case 2: range in search
- both use index also
- but both of logical read time are different
- in
idx_emp_ord
, database searches the page which match with1
inEmployeeID
, and then match with between19970101
and19970103
inOrderDate
- and check next row
- it is ordered by
EmployeeID
first, so next row must have same or biggerEmployeeID
- So,
idx_emp_ord
searched only 3 rows
- in
idx_ord_emp
, database searches the page wich match with between19970101
and19970103
inOrderDate
, and then match with1
inEmployeeID
- and check next row
- it is ordered by
OrderDate
first, So if theOrderDate
is in between19970101
and19970103
, then database have to checkEmployeeID
in the rows - So
idx_ord_emp
searched more then 3 rows
IN-LIST
- IN()
- if between range is small, then change the range to
IN()
- if between range is small, then change the range to
Join
Hash Join
- Hash Join does not need to sort → The more data to Merge, The better to Hash
- not be infected by Index
- Hash is better than NL/Merge(depends on situation)
- But you have to concider the cost of hash table(process time become bigger → Index)
- Random Access X
- smaller data is good to make Hash Table
- Hash Join in C#