Makoto Yui, Jun Miyazaki, Shunsuke Uemura and Hayato Yamana. ``Nb-GCLOCK: A Non-blocking Buffer Management based on the Generalized CLOCK'',
Proc. ICDE, March 2010.
Potential of AI (Generative AI) in Business: Learnings and Insights
ICDE2010 Nb-GCLOCK
1. Nb-GCLOCK:
A Non-blocking Buffer Management
based on the Generalized CLOCK
Makoto YUI1, Jun MIYAZAKI2, Shunsuke UEMURA3
and Hayato YAMANA4
1 .Research fellow, JSPS (Japan Society for the Promotion of Science) /
Visiting Postdoc at Waseda University, Japan and CWI, Netherlands
2. Nara Institute of Science and Technology
3. Nara Sangyo University
4. Waseda University / National Institute of Informatics
3. Background – Recent trends in CPU development
# of CPU cores in a chip Many-Core CPU
is doubling in two year cycles UltraSparc T2
Azul Vega
Larrabee?
Multi-Core CPU
Nehalem
Single-Core CPU
Core2
Power4
Pentium
2000 Many-core era is coming.
1990
3
4. Background – Recent trends in CPU development
# of CPU cores in a chip Many-Core CPU
is doubling in two year cycles UltraSparc T2
Azul Vega
Larrabee?
Multi-Core CPU
Nehalem
Single-Core CPU
Core2
Power4
Pentium
2000 Many-core era is coming.
1990 - Niagara T2 – 8 cores x 8 SMT = 64 processors
- Azul Vega3 – 54 cores x 16 chips = 864 processors
4
5. Background – CPU Scalability of open source DBs
Open source DBs have faced CPU scalability problems
Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”,
In Proc. EDBT, 2009.
5
6. Background – CPU Scalability of open source DBs
Open source DBs have faced CPU scalability problems
Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”,
In Proc. EDBT, 2009.
10
PostgreSQL
8 MySQL
BDB
6
4
2
0
1 4 8 12 16 24 32
Microbenchmark on UltraSparc T1 (32 procs) 6
7. Background – CPU Scalability of open source DBs
Open source DBs have faced CPU scalability problems
Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”,
In Proc. EDBT, 2009.
10
PostgreSQL
8 MySQL
BDB
Throughput 6
(normalized)
4
2
0 Concurrent
1 4 8 12 16 24 32 threads
Microbenchmark on UltraSparc T1 (32 procs) 7
8. Background – CPU Scalability of open source DBs
Open source DBs have faced CPU scalability problems
Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”,
In Proc. EDBT, 2009.
Gain after 16 threads
10 is less than 5 %
PostgreSQL
8 MySQL
BDB
Throughput 6
(normalized)
4
2
0 Concurrent
1 4 8 12 16 24 32 threads
Microbenchmark on UltraSparc T1 (32 procs) 8
9. Background – CPU Scalability of open source DBs
Open source DBs have faced CPU scalability problems
Ryan Johnson et al., “Shore-MT: A Scalable Storage Manager for the Multicore Era”,
In Proc. EDBT, 2009.
Gain after 16 threads
10 is less than 5 %
PostgreSQL
8 MySQL
BDB
Throughput 6
(normalized)
4
2
You might think…
What about TPC-C ?
0 Concurrent
1 4 8 12 16 24 32 threads
Microbenchmark on UltraSparc T1 (32 procs) 9
10. CPU scalability of PostgreSQL
TPC-C benchmark result on a high-end Linux machine of Unisys
(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)
Doug Tolbert, David Strong, Johney Tsai (Unisys),
“Scaling PostgreSQL on SMP Architectures”, PGCON 2007.
10
11. CPU scalability of PostgreSQL
TPC-C benchmark result on a high-end Linux machine of Unisys
(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)
Doug Tolbert, David Strong, Johney Tsai (Unisys),
“Scaling PostgreSQL on SMP Architectures”, PGCON 2007.
TPS
Version 8.2 CPU cores
Version 8.1
Version 8.0 11
12. CPU scalability of PostgreSQL
TPC-C benchmark result on a high-end Linux machine of Unisys
(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)
Doug Tolbert, David Strong, Johney Tsai (Unisys),
“Scaling PostgreSQL on SMP Architectures”, PGCON 2007.
TPS Gain after 16 CPU cores
is less than 5%
Version 8.2 CPU cores
Version 8.1
Version 8.0 12
13. CPU scalability of PostgreSQL
TPC-C benchmark result on a high-end Linux machine of Unisys
(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)
Doug Tolbert, David Strong, Johney Tsai (Unisys),
“Scaling PostgreSQL on SMP Architectures”, PGCON 2007.
TPS Gain after 16 CPU cores
is less than 5%
Q. What PostgreSQL community did?
Version 8.2 CPU cores
Version 8.1
Version 8.0 13
14. CPU scalability of PostgreSQL
TPC-C benchmark result on a high-end Linux machine of Unisys
(Xeon-SMP 32 CPUs, Memory 16GB, EMC RAID10 Storage)
Doug Tolbert, David Strong, Johney Tsai (Unisys),
“Scaling PostgreSQL on SMP Architectures”, PGCON 2007.
TPS Gain after 16 CPU cores
is less than 5%
Q. What PostgreSQL community did?
Version 8.2 CPU cores
Version 8.1 Revised their synchronization mechanisms
in the buffer management module
Version 8.0 14
15. Synchronization in Buffer Management Module
Several empirical studies have revealed that the largest bottleneck is …
synchronization in buffer management module
[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki:
“Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008.
[2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker:
OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.
16. Synchronization in Buffer Management Module
Several empirical studies have revealed that the largest bottleneck is …
synchronization in buffer management module
[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki:
“Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008.
[2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker:
OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.
CPU Page requests
reduces disk access
by caching database pages
Buffer
Memory Manager
HDD Database
Files
17. Synchronization in Buffer Management Module
Several empirical studies have revealed that the largest bottleneck is …
synchronization in buffer management module
[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki:
“Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008.
[2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker:
OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.
CPU Page requests Page requests
reduces disk access Buffer Manager
by caching database pages
(1) Looking-up hash table
Buffer
Memory Manager misses hits
(2) Page replacement algorithm
HDD Database Database
Files Files 20
18. Synchronization in Buffer Management Module
Several empirical studies have revealed that the largest bottleneck is …
synchronization in buffer management module
[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki:
“Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008.
[2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker:
OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.
CPU Page requests Page requests
reduces disk access Buffer Manager
by caching database pages
(1) Looking-up hash table
Buffer
Memory Manager misses hits
(2) Page replacement algorithm
HDD Database Database
Files Files 18
19. Synchronization in Buffer Management Module
Several empirical studies have revealed that the largest bottleneck is …
synchronization in buffer management module
[1] Ryan Johnson, Ippokratis Pandis, Anastassia Ailamaki:
“Critical Sections: Re-emerging Scalability Concerns for Database Storage Engines”, In Proc. DaMoN, 2008.
[2] Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker:
OLTP Through the Looking Glass, and What We Found There, In Proc.SIGMOD, 2008.
CPU Page requests Page requests
reduces disk access Buffer Manager
by caching database pages
(1) Looking-up hash table
Buffer
Memory Manager misses hits
(2) Page replacement algorithm
HDD Database Database
Files Files 19
28. Core idea of our approach
Previous approaches Our optimistic approach
Request pages Request pages
CPU
Buffer Buffer
Memory Manager Manager
HDD Database
files Database
files
28
29. Core idea of our approach
Previous approaches Our optimistic approach
○Reducing disk I/Os
× locks are contended
Request pages Request pages
CPU
Buffer Buffer
Memory Manager Manager
HDD Database
files Database
files
29
30. Core idea of our approach
Previous approaches Our optimistic approach
○Reducing disk I/Os
× locks are contended
Request pages Request pages
CPU
Buffer Buffer
Memory Manager Manager
intuition
HDD Database
files Database
files
30
31. Core idea of our approach
Previous approaches Our optimistic approach
○Reducing disk I/Os
× locks are contended
Request pages Request pages
CPU
Enough
processors
Buffer Buffer
Memory Manager Manager
Disk bandwidth
is not utilized
HDD Database
files Database
files
31
32. Core idea of our approach
Previous approaches Our optimistic approach
○Reducing disk I/Os
× locks are contended
Request pages Request pages
CPU
Enough
processors
Buffer Buffer
Memory Manager Manager
Disk bandwidth
is not utilized
HDD Database
files Database
files
32
33. Core idea of our approach
Previous approaches Our optimistic approach
○Reducing disk I/Os
× locks are contended
Request pages Request pages
CPU
Enough
processors
Buffer Buffer
Memory Manager Manager
Disk bandwidth
is not utilized
Reduced lock granularity to
one CPU instruction and
HDD remove the bottleneck
Database
files Database
files
33
34. Core idea of our approach
Previous approaches Our optimistic approach
○Reducing disk I/Os △ # of I/O slightly increases
× locks are contended ○ no contention on locks
Request pages Request pages
CPU
Enough
processors
Buffer Buffer
Memory Manager Manager
Disk bandwidth
is not utilized
Reduced lock granularity to
one CPU instruction and
HDD remove the bottleneck
Database
files Database
files
34
35. Major Difference to Previous Approaches
Previous approaches Our optimistic approach
○Reducing disk I/Os △ # of I/O slightly increases
× locks are contended ○ no contention on locks
Their goal is …
35
36. Major Difference to Previous Approaches
Previous approaches Our optimistic approach
○Reducing disk I/Os △ # of I/O slightly increases
× locks are contended ○ no contention on locks
Their goal is …
Improve buffer hit-rates
for reducing I/Os
Unique goal for many decades.
Is this goal valid for many core
era? There are also SSDs.
36
37. Major Difference to Previous Approaches
Previous approaches Our optimistic approach
○Reducing disk I/Os △ # of I/O slightly increases
× locks are contended ○ no contention on locks
Their goal is … Our goal is …
Improve buffer hit-rates
for reducing I/Os
Unique goal for many decades.
Is this goal valid for many core
era? There are also SSDs.
37
38. Major Difference to Previous Approaches
Previous approaches Our optimistic approach
○Reducing disk I/Os △ # of I/O slightly increases
× locks are contended ○ no contention on locks
Their goal is … Our goal is …
Improve buffer hit-rates Improve throughputs by
for reducing I/Os utilizing (many) CPUs.
Unique goal for many decades.
Is this goal valid for many core
era? There are also SSDs.
38
39. Major Difference to Previous Approaches
Previous approaches Our optimistic approach
○Reducing disk I/Os △ # of I/O slightly increases
× locks are contended ○ no contention on locks
Their goal is … Our goal is …
Improve buffer hit-rates Improve throughputs by
for reducing I/Os utilizing (many) CPUs.
Unique goal for many decades. Use Non-blocking synchronization
Is this goal valid for many core instead of acquiring locks!
era? There are also SSDs.
39
41. What’s non-blocking and lock-free?
Formally:
Stopping one thread will not prevent global progress.
Individual threads make progress without waiting.
41
42. What’s non-blocking and lock-free?
Formally:
Stopping one thread will not prevent global progress.
Individual threads make progress without waiting.
Less Formally:
42
43. What’s non-blocking and lock-free?
Formally:
Stopping one thread will not prevent global progress.
Individual threads make progress without waiting.
Less Formally:
No thread 'locks' any resource
No 'critical sections', locks, mutexs, spin-locks, etc
43
44. What’s non-blocking and lock-free?
Formally:
Stopping one thread will not prevent global progress.
Individual threads make progress without waiting.
Less Formally:
No thread 'locks' any resource
No 'critical sections', locks, mutexs, spin-locks, etc
Lock-free if every successful step makes Global Progress
and completes within finite time (ensuring liveness)
44
45. What’s non-blocking and lock-free?
Formally:
Stopping one thread will not prevent global progress.
Individual threads make progress without waiting.
Less Formally:
No thread 'locks' any resource
No 'critical sections', locks, mutexs, spin-locks, etc
Lock-free if every successful step makes Global Progress
and completes within finite time (ensuring liveness)
Wait-free if every step makes Global Progress
and completes within finite time (ensuring fairness)
45
47. Non-blocking synchronization
Synchronization method that does not acquire any lock,
enabling concurrent accesses to shared resources
Utilize atomic CPU primitives
CAS (compare-and-swap) cmpxchg on X86
Utilize memory barriers
47
48. Non-blocking synchronization
Synchronization method that does not acquire any lock,
enabling concurrent accesses to shared resources
Utilize atomic CPU primitives
CAS (compare-and-swap) cmpxchg on X86
Utilize memory barriers
Blocking
acquire_lock(lock);
counter++;
release_lock(lock);
48
49. Non-blocking synchronization
Synchronization method that does not acquire any lock,
enabling concurrent accesses to shared resources
Utilize atomic CPU primitives
CAS (compare-and-swap) cmpxchg on X86
Utilize memory barriers
Blocking Non-Blocking
acquire_lock(lock); int old;
counter++; do {
release_lock(lock); old = *counter;
} while (!CAS(counter, old, old+1));
counter is incremented if the value
was equals to old
49
54. Making the buffer manager non-blocking
3. Need to keep consistency
Page requests
between lookup hash table and GCLOCK
(in the right half of fig. 3)
Hash Hash Hash Hash
bucket bucket bucket bucket
misses hits
Page replacement algorithm
(GCLOCK)
lock; lseek; read; unlock
Database
Files
54
55. Making the buffer manager non-blocking
3. Need to keep consistency
Page requests
between lookup hash table and GCLOCK
(in the right half of fig. 3)
Hash Hash Hash Hash
bucket bucket bucket bucket Reference in buffer lookup table
misses hits still has a different page identifier
immediately after changing the
Page replacement algorithm page allocation of a buffer frame
(GCLOCK)
lock; lseek; read; unlock
Database
Files
55
56. Making the buffer manager non-blocking
3. Need to keep consistency
Page requests
between lookup hash table and GCLOCK
(in the right half of fig. 3)
Hash Hash Hash Hash
bucket bucket bucket bucket Reference in buffer lookup table
misses hits still has a different page identifier
immediately after changing the
Page replacement algorithm page allocation of a buffer frame
(GCLOCK)
lock; lseek; read; unlock
4. Avoided locks on I/Os
Database
Files by utilizing pread, CAS, and memory barriers
(in fig. 5)
56
57. State Machine-based Reasoning for selecting replacement victim
Construct algorithm from many 'steps'
─ build a State Machine for ensuring
glabal progress
57
59. State Machine-based Reasoning for selecting replacement victim
E: entry action evicted Fix in pool swapped
Check whether
Evicted E: CAS value
success
!null E: move the
clock hand
!evicted ! swapped
Check whether evicted
Pinned
Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
59
60. State Machine-based Reasoning for selecting replacement victim
E: entry action evicted Fix in pool swapped
Check whether
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
60
61. State Machine-based Reasoning for selecting replacement victim
E: entry action evicted Fix in pool swapped
Check whether
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
of a buffer page
61
62. State Machine-based Reasoning for selecting replacement victim
Return a replacement
E: entry action evicted victim
Check whether
Fix in pool swapped
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
of a buffer page
62
63. State Machine-based Reasoning for selecting replacement victim
Return a replacement
E: entry action evicted victim
Check whether
Fix in pool swapped
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
Advance CLOCK hand
of a buffer page
(check the next candidate)
63
64. State Machine-based Reasoning for selecting replacement victim
Thread A Return a replacement
E: entry action evicted victim
Check whether
Fix in pool swapped
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
Advance CLOCK hand
of a buffer page
(check the next candidate)
64
65. State Machine-based Reasoning for selecting replacement victim
Thread A Return a replacement
E: entry action evicted victim
Check whether
Fix in pool swapped
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted Thread B
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
Advance CLOCK hand
of a buffer page
(check the next candidate)
65
66. State Machine-based Reasoning for selecting replacement victim
Thread A Return a replacement
E: entry action evicted victim
Check whether
Fix in pool swapped
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted Thread B
Pinned Oops! Candidate
victim Select a frame isTry to evict
intercepted.
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
Advance CLOCK hand
of a buffer page
(check the next candidate)
66
67. State Machine-based Reasoning for selecting replacement victim
Thread A Return a replacement
E: entry action evicted victim
Check whether
Fix in pool swapped
Evicted E: CAS value
success
!null E: move the
Start finding a ! swapped
clock hand
!evicted
replacement Check whether evicted Thread B
Pinned
victim Select a frame
Try to evict
E: evict
!evicted
pinned !pinned
null --refcount<=0
Try to decrement
continue the refcount
E: decrement
E: try next entry
the refcount
--refcount>0
Decrement weight count
Advance CLOCK hand
of a buffer page
(check the next candidate)
67
69. Experimental settings
Workload
Zipf 80/20 distribution (a famous power law)
containing 20% of sequential scans
dataset size is 32GB in total
Machine used: UltraSPARC T2
64 processors
69
70. Experimental settings
Workload
Zipf 80/20 distribution (a famous power law)
containing 20% of sequential scans
dataset size is 32GB in total
Machine used: UltraSPARC T2
64 processors
We also performed evaluation
on various X86 settings in the
paper.
70
72. Performance comparison on moderate I/Os (of fig.9)
Throughput
(normalized by LRU)
6.0
LRU
5.0
GCLOCK
4.0 Nb-GCLOCK
3.0
2.0
1.0
CPU0.0
utilization
Previous approach: Low, about 20%
8 16 32 64 Processors
Nb-GCLOCK: High, more than 95%
72
73. Performance comparison on moderate I/Os (of fig.9)
Throughput More difference in CPU time can be
(normalized by LRU) expected when # of CPU increases
➜ We expect more throughput
6.0
LRU
5.0
GCLOCK
4.0 Nb-GCLOCK
3.0
2.0
1.0
CPU0.0
utilization
Previous approach: Low, about 20%
8 16 32 64 Processors
Nb-GCLOCK: High, more than 95%
73
74. Maximum throughput to processors
Scalability to processors when pages are resident in memory
intending to see the scalability limit expected by each algorithm
74
75. Maximum throughput to processors
Scalability to processors when pages are resident in memory
intending to see the scalability limit expected by each algorithm
Throughput
(log scale)
8 (1) 16 (2) 32 (4) 64 (8) Processors
2Q 890992 819975 866009 662782
GCLOCK 1758605 1912000 1931268 1817748 (cores)
Nb-GCLOCK 3409819 7331722 14245524 25834449
75
76. Maximum throughput to processors
Scalability to processors when pages are resident in memory
intending to see the scalability limit expected by each algorithm
Throughput
(log scale) Achieved almost linear scalability,
at least, up to 64 processors!
This is the first attempt that
removed locks in buffer management
8 (1) 16 (2) 32 (4) 64 (8) Processors
2Q 890992 819975 866009 662782
GCLOCK 1758605 1912000 1931268 1817748 (cores)
Nb-GCLOCK 3409819 7331722 14245524 25834449
76
77. Maximum throughput to processors
Scalability to processors when pages are resident in memory
intending to see the scalability limit expected by each algorithm
Throughput
(log scale) Achieved almost linear scalability,
at least, up to 64 processors!
This is the first attempt that
removed locks in buffer management
8 (1) 16 (2) 32 (4) 64 (8) Processors
2Q Interesting here is GCLOCK has662782
890992 819975 866009 CPU-
GCLOCK scalability limit on around 16 1817748
1758605 1912000 1931268 (cores)
Nb-GCLOCK 3409819 Caching solutions 25834449
processors. 7331722 14245524 using
GCLOCK have scalability limit there. 77
78. Max thoughput (operation/sec) evaluation
Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)
Accesses issued from 64 threads in 60 seconds
Thus, ideally 64 x 60 = 3,840 seconds can be used
78
79. Max thoughput (operation/sec) evaluation
Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)
Accesses issued from 64 threads in 60 seconds
Thus, ideally 64 x 60 = 3,840 seconds can be used
79
80. Max thoughput (operation/sec) evaluation
Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)
Accesses issued from 64 threads in 60 seconds
Thus, ideally 64 x 60 = 3,840 seconds can be used
Most of CPU time is used
because our Nb-GCLOCK
is non-blocking!
80
81. Max thoughput (operation/sec) evaluation
Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)
Accesses issued from 64 threads in 60 seconds
Thus, ideally 64 x 60 = 3,840 seconds can be used
About 10-20% of CPU
Time is used!
Most of CPU time is used
because our Nb-GCLOCK
is non-blocking!
81
82. Max thoughput (operation/sec) evaluation
Workload is Zipf 80/20, Evaluated on UltraSparcT2 (64 procs)
Accesses issued from 64 threads in 60 seconds
Thus, ideally 64 x 60 = 3,840 seconds can be used
About 10-20% of CPU
Time is used!
Most of CPU time is used
because our Nb-GCLOCK
is non-blocking!
The CPU utilization would be more differs when # of
processors grows. It would causes contentions! 82
83. TPC-C evaluation using Apache Derby
1400
1300
Transaction
per minutes 1200
tpmC
1100
Derby
1000
Nb-GCLOCK
900
800
8 16 32 64 128
# of terminals (threads)
Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-
Memory Multiprocessor Systems. In Proc. VLDB, 2001. 83
84. TPC-C evaluation using Apache Derby
1400
1300
Transaction
per minutes 1200
tpmC
1100
Derby
1000
Nb-GCLOCK
900
800
8 16 32 64 128
The original scheme of Derby (CLOCK)
decreased throughput.#On the other hand,
of terminals (threads)
ours scheme showed better result.
Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-
Memory Multiprocessor Systems. In Proc. VLDB, 2001. 84
85. TPC-C evaluation using Apache Derby
Throughput to buffer management module reduced a
latch on root page of B+-tree
➜ We would require a concurrent B+-tree (see OLFIT)
1400
1300
Transaction
per minutes 1200
tpmC
1100
Derby
1000
Nb-GCLOCK
900
800
8 16 32 64 128
# of terminals (threads)
Sang Kyun Cha et al. Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-
Memory Multiprocessor Systems. In Proc. VLDB, 2001. 85
87. Xiaoning Ding, Song Jiang, and Xiaodong Zhang:
Bp-wrapper Bp-Wrapper: A System Framework Making Any Replacement
Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.
eliminates lock contention on buffer hits
Page requests
by using a batching and prefetching technique
Hash Hash Hash Hash
bucket bucket bucket bucket
hits
misses
Recording access
Page replacement algorithm
(any)
Database
Files
87
88. Xiaoning Ding, Song Jiang, and Xiaodong Zhang:
Bp-wrapper Bp-Wrapper: A System Framework Making Any Replacement
Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.
eliminates lock contention on buffer hits
Page requests
by using a batching and prefetching technique
Hash Hash Hash Hash
postpones the physical work
bucket bucket bucket bucket (adjusting the buffer replacement list)
hits and immediately returns
misses the logical operation
Recording access called Lazy synchronization in the literature
Page replacement algorithm
(any)
Database
Files
88
89. Xiaoning Ding, Song Jiang, and Xiaodong Zhang:
Bp-wrapper Bp-Wrapper: A System Framework Making Any Replacement
Algorithms (Almost) Lock Contention Free, Proc. ICDE, 2009.
eliminates lock contention on buffer hits
Page requests
by using a batching and prefetching technique
Hash Hash Hash Hash
postpones the physical work
bucket bucket bucket bucket (adjusting the buffer replacement list)
hits and immediately returns
misses the logical operation
Recording access called Lazy synchronization in the literature
Pros.
Page replacement algorithm
- works with any page replacement algorithm
(any)
Cons.
- Does not increase throughputs of CLOCK variants
because CLOCK does not require locks on buffer hits
Database - Cache misses involve batching
Files larger lock holding time makes more contentions
89
90. Conclusions
Proposed a lock-free variant of the GCLOCK page
replacement algorithm, named Nb-GCLOCK.
Linearizability and lock-freedom are proven in the paper
90
91. Conclusions
Proposed a lock-free variant of the GCLOCK page
replacement algorithm, named Nb-GCLOCK.
almost linear scalability to processors up to 64 processors
while existing locking-based schemes do not scale beyond 16 processors
The first attempt that introduce non-blocking synchronization
to database buffer management
Optimistic I/Os using pread, CAS and memory barriers
Linearizability and lock-freedom are proven in the paper
91
92. Conclusions
Proposed a lock-free variant of the GCLOCK page
replacement algorithm, named Nb-GCLOCK.
almost linear scalability to processors up to 64 processors
while existing locking-based schemes do not scale beyond 16 processors
The first attempt that introduce non-blocking synchronization
to database buffer management
Optimistic I/Os using pread, CAS and memory barriers
Linearizability and lock-freedom are proven in the paper
The lock-freedom guarantees a certain throughput:
any active thread taking a bounded number of steps ensures global progress.
92
93. Conclusions
Proposed a lock-free variant of the GCLOCK page
replacement algorithm, named Nb-GCLOCK.
almost linear scalability to processors up to 64 processors
while existing locking-based schemes do not scale beyond 16 processors
The first attempt that introduce non-blocking synchronization
to database buffer management
Optimistic I/Os using pread, CAS and memory barriers
Linearizability and lock-freedom are proven in the paper
The lock-freedom guarantees a certain throughput:
any active thread taking a bounded number of steps ensures global progress.
This work is also useful for any caching solution
that requires high throughput (e.g., C10K accesses) 93