Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Storage infrastructure using HBase behind LINE messages
1.
2. Storage infrastructure using
HBase behind LINE messages
NHN Japan Corp.
LINE Server Task Force
Shunsuke Nakamura
@sunsuk7tp
13.1.21
Hadoop
Conference
Japan
2013
Winter
2
3. To support ’s users, we have built
message storage that is
Large scale (tens of billion rows/day)
Responsive (under 10 ms)
High available (dual clusters)
13.1.21
Hadoop
Conference
Japan
2013
Winter
3
4. Outline
• About LINE
• LINE & Storage requirements
• What we achieved
• Today’s topics
– IDC online migration
– NN failover
– Stabilizing LINE message cluster
• Conclusion
13.1.21
Hadoop
Conference
Japan
2013
Winter
4
5. LINE
- A global messenger powered by NHN Japan -
Devices
5 different mobile platforms
+ Desktop support
13.1.21
Hadoop
Conference
Japan
2013
Winter
5
9. New year 2013 in Japan
Number of requests in a HBase cluster
Usual Peak Hours New Year 2013
X
3
(ploFed
by
1min)
あけおめ!
新年好!
3
5mes
traffic
explosion
LINE
Storage
had
no
problems
:)
13.1.21
Hadoop
Conference
Japan
2013
Winter
9
10. LINE on Hadoop
Storages for service, backup and log
For HBase, M/R and log archive
Bulk migration and ad-hoc analysis
For HBase and Sharded-Redis
Collecting Apache and Tomcat logs
KPI, Log analysis
13.1.21
Hadoop
Conference
Japan
2013
Winter
10
11. LINE on Hadoop
Storages for service, backup and log
For HBase, M/R and log archive
Bulk migration and ad-hoc analysis
For HBase and Sharded-Redis
Collecting Apache and Tomcat logs
KPI, Log analysis
13.1.21
Hadoop
Conference
Japan
2013
Winter
11
12. LINE service requirements
LINE is a…
Messaging Service - Should be fast
Global Service - Downtime not allowed
But, not a Simple Messaging Service.
Message synchronization b/w phone & PCs
– Messages should be kept for a while.
13.1.21
Hadoop
Conference
Japan
2013
Winter
12
13. LINE’s storage requirements
No
data
loss
Eventual
Low
consistency
latency
HA
Flexible
schema
Easy
scale-‐
management
out
13.1.21
Hadoop
Conference
Japan
2013
Winter
13
14. Our selection is HBase
• Low latency for large amount of data
• Linearly scalable
• Relatively lower operating cost
– Replication by nature
– Automatic failover
• Data model fits our requirements
– Semi-structured
– Timestamp
13.1.21
Hadoop
Conference
Japan
2013
Winter
14
15. Stored rows per day in a cluster
(billions/day)
10
8
6
4
2
13.1.21
Hadoop
Conference
Japan
2013
Winter
15
16. What we achieved with HBase
• No data loss
– Persistent
– Data replication
• Automatic recovery from server failure
• Reasonable performance for large data sets
– Hundreds of billion rows
– Write: ~ 1 ms
– Read: 1 ~ 10 ms
13.1.21
Hadoop
Conference
Japan
2013
Winter
16
17. Many issues we had
• Heterogeneous storages coordination
• IDC online migration
• Flush & Compaction Storms by “too many HLogs”
• Row & Column distribution
• Secondary Index
• Region Management
– load, size balancing
– RS Allocation
– META region
– M/R
• Monitoring for diagnostics
• Traffic burst by decommission
• NN problems
• Performance degradation
– hotspot problem
– timeout burst
– GC problem
• Client bugs
– Thread Blocking on server failure (HBASE-6364)
13.1.21
Hadoop
Conference
Japan
2013
Winter
17
18. Today’s topics
IDC online migration
NN failover
Stabilizing LINE message cluster
13.1.21
Hadoop
Conference
Japan
2013
Winter
18
20. Why?
• Move whole HBase clusters and data
• For better network infrastructure
• Without downtime
13.1.21
Hadoop
Conference
Japan
2013
Winter
20
21. IDC online migration
Before migration
App Server
dst-HBase
write
src-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
21
22. IDC online migration
• Write to both (client-level replication)
write
App Server
dst-HBase
write
src-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
22
23. IDC online migration
• New data: Incremental replication
• Old data: Bulk migration
• dst’s timestamp equals src’s one
write
App Server
dst-HBase
write
src-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
23
24. LINE HBase Replicator & BulkMigrator
Replicator is for incremental replication
BulkMigrator is for bulk migration
13.1.21
Hadoop
Conference
Japan
2013
Winter
24
25. LINE HBase Replicator
• Our own implementation
• Prefer pull to push
• Throughput throttling
• Workload isolation of replicator and RS
• Rowkey conversion and filtering
HBase
Replicator
LINE
HBase
Replicator
src-HBase
src-HBase
push
pull
dst-HBase
dst-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
25
26. LINE HBase Replicator
- A simple daemon to replicate local regions -
1. HLogTracker reads a ckpt
and selects next HLog.
2. For each entry in HLog:
1. Filter & convert a HLog.Entry
2. Create Puts and batch to dst HBase
• Periodic checkpointing
• Generally, entries are replicated
in seconds
13.1.21
Hadoop
Conference
Japan
2013
Winter
26
27. Bulk migration
1. MapReduce between any storages
– Map task only
– Read source, write destination
– Task scheduling problem depends on region allocation
2. Non MapReduce version (BulkMigrator)
– Our own implementation
– HBase → HBase
– On each RS, scan & batch by a region
– Throughput throttling
– Slow, but easy to implement and debug
13.1.21
Hadoop
Conference
Japan
2013
Winter
27
31. NameNode failure
in 2012.10
13.1.21
Hadoop
Conference
Japan
2013
Winter
31
32. HA-NN failover failed
• Not NameNode process
• Incorrect leader election at network partitioning
• Complicated configuration
– Easy to mistake, difficult to control
– Pacemaker scripting was not straightforward
– VIP is risky to HDFS
• DRBD split-brain problem
– Protocol C
– Unable to re-sync while service is online
13.1.21
Hadoop
Conference
Japan
2013
Winter
32
33. Now: In-house NN failure handling
• Bye-bye old HA-NN
– Had to restart whole HBase clusters after NN failover
• Alternative ideas
– Quorum-based leader election (Using ZK)
– Using L4 switch
– Implement our own AvatarNode
• Safer solution instead of a little downtime
13.1.21
Hadoop
Conference
Japan
2013
Winter
33
34. In-house NN failure handling (1)
rsync
with
-‐-‐link-‐dest
periodically
13.1.21
Hadoop
Conference
Japan
2013
Winter
34
35. In-house NN failure handling (2)
Bomb
13.1.21
Hadoop
Conference
Japan
2013
Winter
35
36. In-house NN failure handling (3)
13.1.21
Hadoop
Conference
Japan
2013
Winter
36
38. Stabilizing LINE message cluster
Case
1
“Too
many
HLogs”
H/W
Failure
RS
GC
Storm
Handling
Case
3
Case
2
META
region
Hotspot
workload
Performance
problems
isola5on
Case
4
Region
mappings
to
RS
13.1.21
Hadoop
Conference
Japan
2013
Winter
38
39. Case1: “Too many HLogs”
• Effect
– MemStore flush storm
– Compaction storm
• Cause
– Different regions growth
– Heterogeneous tables in a RS
• Solution
– Region balancing
– External flush scheduler
13.1.21
Hadoop
Conference
Japan
2013
Winter
39
40. Case1: Number of HLogs
Forced flushed
shed
N o flu
Periodic flushed
better case
peak
off-peak
worse case
Forced flushed
Forced flushed
flush storm
Forced flushed
13.1.21
Hadoop
Conference
Japan
2013
Winter
40
41. Case2: Hotspot problems
• Effect
– Excessive GC
– RS performance degradation (High CPU usage)
• Cause
– Get/Scan:
• Row or column, updated too frequently
• Row which has too many columns (+ tombstones)
• Solution
– Schema and row/column distribution are important
– Hotspot region isolation
13.1.21
Hadoop
Conference
Japan
2013
Winter
41
42. Case3: META region workload
isolation
• Effect
1. RS high CPU
2. Excessive timeout
3. META lookup timeout
• Cause
– Inefficient exception handling of HBase client
– Hotspot region and META in same RS
• Solution
– META only RS
13.1.21
Hadoop
Conference
Japan
2013
Winter
42
43. Case4: Region mappings to RS
• Effect
– Region mapping is not restored on RS restart
– Some region mappings aren’t restored properly
after graceful restart
• graceful_stop.sh --restart --reload
• Cause
– HBase does not support it well
• Solution
– Periodic dump and restore it
13.1.21
Hadoop
Conference
Japan
2013
Winter
43
44. Summary
• IDC online migration
– Without downtime
– LINE HBase Replicator & BulkMigrator
• NN failover
– Simple solution for a person saying
“What’s Hadoop?”
• Stabilizing LINE message cluster
– Improved response time of RS
13.1.21
Hadoop
Conference
Japan
2013
Winter
44
45. Conclusion
We won 100M user adopting HBase
LINE Storage is a successful example
of a messaging service using HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
45