This session will dive into our most successful (and unsuccessful!) multi-cluster event replication patterns.
An x-ray of the cross cluster distribution model that powers our globally distributed APIs will touch on the benefits that this model has provided in terms of client API experience, delivery agility and developer experience.
We will focus on recipes for effective use of Mirror Maker event replication to power platform distribution including the challenges of managing a 'fan in' event replication workflow - pulling events created in satellite clusters back to a mothership cluster for processing.
We will introduce the elegant technique of replication event multiplexing - which can be used to simplify the burden of managing a 'fan-in' replication topology by eliminating regional awareness from the application domain and improving replication health monitoring & observability.
3. Global Platform Distribution Challenge
How best to centrally manage events generated in remote clusters?
HQ
Distribution across remote clusters yields great portability
4. Global Platform Distribution Challenge
How best to centrally manage events generated in remote clusters?
• Why does this challenge exist?
• Kafka distribution best practices & mistakes
• Mirror Maker replication recipes
• Topic Multiplexing
• Separate global distribution concerns from application domain
• Improve observability & monitoring
1
2
3
6. Information Distribution Challenge
Foreign Exchange
Platform
Desired Experience:
🌏 Local
🚀 Fast
⭐ High availability
Our Constraints:
🎯 Central connections
Aggregation opportunities
7. Key Learning – Kafka promotes decomposition
Isolation benefits
• Single responsibility
• Single writer
From monolith… …to pluggable microservice pipeline
8. Why is Kafka a good fit? Pluggability
Kafka is a universal application dependency
Workflows modelled as chain of microservices
Pluggability benefits
- Unburden critical path
- Support innovation
- Composable capability
- Portability
9. API on Kafka Pattern
Price distribution
API on Kafka Pattern
• Push state & all decision making to very edge of system
• Fast API response time
🚀 Fast (API latency)
• Read API driven from in-memory state
• Write API publish instructions as events
⭐ High availability
• Convenient horizonal scalability
• Reduce dependencies on critical path
🌏 Local
• How to solve…? Instruction execution
10. Learnings – Simplify the critical path for API responses
• Burden Client with Kafka RPC
• Exactly once semantics
Do Don't
• Embrace eventual consistency
• Enforce idempotency
11. Global Distribution Pattern 🌏
Global Distribution Pattern
• Remote Kafka clusters power global API distribution
• Mirror maker (v2) used to synchronize clusters
• Fan out vs Fan in
🚀 Fast (API latency)
• Internalize the cost of WAN distribution
⭐ High availability
• Opportunities for DR / cross region fallback (cross cluster)
🌏 Local
• We can deploy APIs near customers
• Kafka yields application portability & composability
M
irror m
aker
M
irror m
aker
12. Multi-cluster Challenges - Fan out
Fan out replication Pattern
• Mirror maker pull model
• Avoid monolithic replication processor
• MM instance deployed to each remote region
• Topic data is cloned (but not offsets).
• Topic name remains the same
• Offsets vary
13. Mirror Maker Best Practice - Isolation
MM Isolation Best Practice
• Separate MM instances for each region
• Isolate meta data per source-destination combination
• Version meta data to support case reset / remodel
⭐ Benefits
• Limit blast radius of cluster stability issues
• Flexibility to shutdown / maintain regions
• Migration pathway to support MM upgrades
MM Kafka cluster naming convention:
version.source.dest.[dest,src]
v5 . HQ . US . dest
16. Multi-cluster Challenges - Fan in
Fan in challenge
Compromised isolation on merged topic
• Harder to reconcile / monitor
• Violate single writer principal
17. Multi-cluster Challenges - Fan in
Fan in – better isolation
Isolation ✅
• Region, topic, mirror maker
Portability ❌
• Application & env burden
• API & topic region
• Processor consumption
Scalability 😐
• Multiple workflows
Monitoring 😐
• How to monitor replication during off peak periods?
19. Multiplexing to the rescue!
a method by which multiple analog or digital signals are combined into one signal over a shared medium.
The aim is to share a scarce resource.
20. Multiplexing to the rescue!
Could multiplexing techniques help us to abstract distribution / replication concerns away from
application / business domain?
21. Apply multiplex concept to fan-in challenge
We can solve isolation, distribution, portability, replication monitoring
concerns without burdening our application
• Multiplex service can interlace all source topics into a single channel
• Simpler replication, easier to isolate, easier to monitor
• Demux service can deinterlace multiplexed byte stream
23. Generating Mux Stream & Partitioning
Multiplexing is not difficult – entire mux service is 300 lines of Kotlin
Respect source partition sequence.
PartitionKey = srcTopic + srcPartition
Can use ByteArray serdes to copy
record key & value to mux topic.
No need to use original schema
Kafka headers provide convenient way
to carry original context to demux
(the source partition & topic name)
27. Improving observability & monitoring
Dead replication alerts
• Synthetic ‘keepalive’ events generated to keep replication & monitor warm during off-peak time
• Synthetic events internalized to multiplexer / demultiplexer & do not affect business workflow
29. Kafka Distribution Patterns – Fan in Fan out & the Multiplexer
✅ Isolation
✅ Portability
✅ Scalability
✅ Monitoring
🌏 Local
🚀 Fast
⭐ High availability
Fan In + Multiplexer
Fan Out
Conclusions
- Install Kafka cluster in each customer region
- Keep API processing simple
- Strive for strong isolation between regions
- Separate application & distribution concerns
30. What Next…?
Support Multiple HQ Regions
Share The Multiplexer Code
Read more on our Blog!
medium.com/airwallex-engineering
Feedback / questions: alex.hilton@airwallex.com
airwallex.com/careers