Storage infrastructure using HBase behind LINE messages

Storage infrastructure using
HBase behind LINE messages
NHN Japan Corp.
LINE Server Task Force
Shunsuke Nakamura
@sunsuk7tp

13.1.21
Hadoop
Conference
Japan
2013
Winter
2

To support ’s users, we have built
message storage that is

Large scale (tens of billion rows/day)
Responsive (under 10 ms)
High available (dual clusters)

13.1.21
Hadoop
Conference
Japan
2013
Winter
3

Outline
•  About LINE
•  LINE & Storage requirements
•  What we achieved
•  Today’s topics
–  IDC online migration
–  NN failover
–  Stabilizing LINE message cluster
•  Conclusion
13.1.21
Hadoop
Conference
Japan
2013
Winter
4

LINE
- A global messenger powered by NHN Japan -

Devices
5 different mobile platforms
+ Desktop support

13.1.21
Hadoop
Conference
Japan
2013
Winter
5

13.1.21
Hadoop
Conference
Japan
2013
Winter
6

13.1.21
Hadoop
Conference
Japan
2013
Winter
7

New year 2013 in Japan
Number of requests in a HBase cluster
Usual Peak Hours New Year 2013

X
3

(ploFed
by
1min)
あけおめ!
新年好!

3
5mes
traﬃc
explosion

LINE
Storage
had
no
problems
:)

13.1.21
Hadoop
Conference
Japan
2013
Winter
9

LINE on Hadoop
Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis
13.1.21
Hadoop
Conference
Japan
2013
Winter
10

LINE on Hadoop
Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis
13.1.21
Hadoop
Conference
Japan
2013
Winter
11

LINE service requirements
LINE is a…
Messaging Service - Should be fast
Global Service - Downtime not allowed

But, not a Simple Messaging Service.
Message synchronization b/w phone & PCs
–  Messages should be kept for a while.

13.1.21
Hadoop
Conference
Japan
2013
Winter
12

LINE’s storage requirements
No

data
loss

Eventual
Low

consistency
latency

HA
Flexible

schema

Easy
scale-‐
management
out

13.1.21
Hadoop
Conference
Japan
2013
Winter
13

Our selection is HBase
•  Low latency for large amount of data
•  Linearly scalable
•  Relatively lower operating cost
–  Replication by nature
–  Automatic failover
•  Data model fits our requirements
–  Semi-structured
–  Timestamp

13.1.21
Hadoop
Conference
Japan
2013
Winter
14

Stored rows per day in a cluster
(billions/day)
10

8

6

4

2

13.1.21
Hadoop
Conference
Japan
2013
Winter
15

What we achieved with HBase
•  No data loss
–  Persistent
–  Data replication
•  Automatic recovery from server failure

•  Reasonable performance for large data sets
–  Hundreds of billion rows
–  Write: ~ 1 ms
–  Read: 1 ~ 10 ms

13.1.21
Hadoop
Conference
Japan
2013
Winter
16

Many issues we had
•  Heterogeneous storages coordination
•  IDC online migration
•  Flush & Compaction Storms by “too many HLogs”
•  Row & Column distribution
•  Secondary Index
•  Region Management
–  load, size balancing
–  RS Allocation
–  META region
–  M/R
•  Monitoring for diagnostics
•  Traffic burst by decommission
•  NN problems
•  Performance degradation
–  hotspot problem
–  timeout burst
–  GC problem
•  Client bugs
–  Thread Blocking on server failure (HBASE-6364)

13.1.21
Hadoop
Conference
Japan
2013
Winter
17

Today’s topics

IDC online migration

NN failover

Stabilizing LINE message cluster

13.1.21
Hadoop
Conference
Japan
2013
Winter
18

NN failover

Why?

•  Move whole HBase clusters and data

•  For better network infrastructure

•  Without downtime

13.1.21
Hadoop
Conference
Japan
2013
Winter
20

Before migration

App Server
dst-HBase

write

src-HBase

13.1.21
Hadoop
Conference
Japan
2013
Winter
21

•  Write to both (client-level replication)

write
App Server
dst-HBase

write

src-HBase

13.1.21
Hadoop
Conference
Japan
2013
Winter
22

•  New data: Incremental replication
•  Old data: Bulk migration
•  dst’s timestamp equals src’s one
write
App Server
dst-HBase

write

src-HBase

13.1.21
Hadoop
Conference
Japan
2013
Winter
23

LINE HBase Replicator & BulkMigrator

Replicator is for incremental replication
BulkMigrator is for bulk migration

13.1.21
Hadoop
Conference
Japan
2013
Winter
24

LINE HBase Replicator
•  Our own implementation
•  Prefer pull to push
•  Throughput throttling
•  Workload isolation of replicator and RS
•  Rowkey conversion and filtering
HBase
Replicator
LINE
HBase
Replicator
src-HBase
src-HBase

push
pull
dst-HBase
dst-HBase
13.1.21
Hadoop
Conference
Japan
2013
Winter
25

LINE HBase Replicator
- A simple daemon to replicate local regions -

1.  HLogTracker reads a ckpt
and selects next HLog.
2.  For each entry in HLog:
1.  Filter & convert a HLog.Entry
2.  Create Puts and batch to dst HBase

•  Periodic checkpointing
•  Generally, entries are replicated
in seconds

13.1.21
Hadoop
Conference
Japan
2013
Winter
26

Bulk migration
1.  MapReduce between any storages
–  Map task only
–  Read source, write destination
–  Task scheduling problem depends on region allocation

2.  Non MapReduce version (BulkMigrator)
–  Our own implementation
–  HBase → HBase
–  On each RS, scan & batch by a region
–  Throughput throttling
–  Slow, but easy to implement and debug
13.1.21
Hadoop
Conference
Japan
2013
Winter
27

Background
•  Our HBase has a SPOF: NameNode
•  “Apache Hadoop HA Configuration”
http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/

•  Furthermore, added Pacemaker
–  Heartbeat can’t detect whether NN is running

13.1.21
Hadoop
Conference
Japan
2013
Winter
29

Previous: HA-NN
DRBD + VIP + Pacemaker

13.1.21
Hadoop
Conference
Japan
2013
Winter
30

NameNode failure
in 2012.10

13.1.21
Hadoop
Conference
Japan
2013
Winter
31

HA-NN failover failed

•  Not NameNode process
•  Incorrect leader election at network partitioning
•  Complicated configuration
–  Easy to mistake, difficult to control
–  Pacemaker scripting was not straightforward
–  VIP is risky to HDFS
•  DRBD split-brain problem
–  Protocol C
–  Unable to re-sync while service is online
13.1.21
Hadoop
Conference
Japan
2013
Winter
32

Now: In-house NN failure handling

•  Bye-bye old HA-NN
–  Had to restart whole HBase clusters after NN failover
•  Alternative ideas
–  Quorum-based leader election (Using ZK)
–  Using L4 switch
–  Implement our own AvatarNode
•  Safer solution instead of a little downtime

13.1.21
Hadoop
Conference
Japan
2013
Winter
33

In-house NN failure handling (1)

rsync
with
-‐-‐link-‐dest
periodically

13.1.21
Hadoop
Conference
Japan
2013
Winter
34

Bomb

13.1.21
Hadoop
Conference
Japan
2013
Winter
35


13.1.21
Hadoop
Conference
Japan
2013
Winter
36

Case
1
“Too
many

HLogs”

H/W
Failure

RS
GC
Storm

Handling

Case
3
Case
2
META
region
Hotspot

workload
Performance
problems
isola5on

Case
4
Region

mappings

to
RS

13.1.21
Hadoop
Conference
Japan
2013
Winter
38

Case1: “Too many HLogs”
•  Effect
–  MemStore flush storm
–  Compaction storm
•  Cause
–  Different regions growth
–  Heterogeneous tables in a RS
•  Solution
–  Region balancing
–  External flush scheduler

13.1.21
Hadoop
Conference
Japan
2013
Winter
39

Case1: Number of HLogs
Forced flushed

shed
N o flu

Periodic flushed
better case

peak
off-peak
worse case

Forced flushed
Forced flushed
flush storm

Forced flushed

13.1.21
Hadoop
Conference
Japan
2013
Winter
40

Case2: Hotspot problems
•  Effect
–  Excessive GC
–  RS performance degradation (High CPU usage)
•  Cause
–  Get/Scan:
•  Row or column, updated too frequently
•  Row which has too many columns (+ tombstones)
•  Solution
–  Schema and row/column distribution are important
–  Hotspot region isolation

13.1.21
Hadoop
Conference
Japan
2013
Winter
41

Case3: META region workload
isolation
•  Effect
1.  RS high CPU
2.  Excessive timeout
3.  META lookup timeout
•  Cause
–  Inefficient exception handling of HBase client
–  Hotspot region and META in same RS
•  Solution
–  META only RS

13.1.21
Hadoop
Conference
Japan
2013
Winter
42

Case4: Region mappings to RS
•  Effect
–  Region mapping is not restored on RS restart
–  Some region mappings aren’t restored properly
after graceful restart
•  graceful_stop.sh --restart --reload
•  Cause
–  HBase does not support it well
•  Solution
–  Periodic dump and restore it
13.1.21
Hadoop
Conference
Japan
2013
Winter
43

Summary
•  IDC online migration
–  Without downtime
–  LINE HBase Replicator & BulkMigrator
•  NN failover
–  Simple solution for a person saying
“What’s Hadoop?”
•  Stabilizing LINE message cluster
–  Improved response time of RS

13.1.21
Hadoop
Conference
Japan
2013
Winter
44

Conclusion
We won 100M user adopting HBase

LINE Storage is a successful example
of a messaging service using HBase

13.1.21
Hadoop
Conference
Japan
2013
Winter
45

Storage infrastructure using HBase behind LINE messages

Storage infrastructure using HBase behind LINE messages

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Storage infrastructure using HBase behind LINE messages

Similar a Storage infrastructure using HBase behind LINE messages (20)

Último

Último (20)

Storage infrastructure using HBase behind LINE messages