SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
Everything You Thought
You Already Knew About
Orchestration
Laura Frank
Director of Engineering,
Codeship
Managing Distributed State with Raft
Quorum 101
Leader Election
Log Replication
Service Scheduling
Failure Recovery
Agenda
bonus debugging tips!
They’re trying to get a collection of nodes to behave
like a single node.
• How does the system maintain state?
• How does work get scheduled?
The Big Problem(s)
What are tools like Swarm and Kubernetes trying to do?
Manager
leader
WorkerWorker
Manager
follower
Manager
follower
WorkerWorker Worker
raft consensus group
So You Think You
Have Quorum
Quorum
The minimum number of votes that a consensus
group needs in order to be allowed to perform
an operation.
Without quorum, your system can’t do work.
Math! Managers Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
(N/2) + 1
In simpler terms, it
means a majority
Math! Managers Quorum Fault Tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3
(N/2) + 1
In simpler terms, it
means a majority
Having two managers instead of
one actually doubles your
chances of losing quorum.
Pay attention to datacenter topology when placing
managers.
Quorum With Multiple Regions
Manager Nodes Distribution across 3 Regions
3 1-1-1
5 1-2-2
7 3-2-2
9 3-3-3
magically works with
Docker for AWS
Let’s talk about Raft!
I think I’ll just write my
own distributed
consensus algorithm.
-no sensible person
Log replication
Leader election
Safety (won’t talk about this much today)
Raft is responsible for…
Being easier to understand
Orchestration systems typically use a key/value
store backed by a consensus algorithm
In a lot of cases, that algorithm is Raft!
Raft is used everywhere…
…that etcd is used
SwarmKit implements the
Raft algorithm directly.
In most cases, you don’t want to run
work on your manager nodes
docker node update --availability drain <NODE>
Participating in a Raft consensus group is work, too.
Make your manager nodes unavailable for tasks:
*I will run work on managers for educational purposes
Leader Election &
Log Replication
Manager
leader
Manager
candidate
Manager
follower
Manager
offline
demo.consensus.group
The log is the source of truth for
your application.
In the context of distributed computing (and this
talk), a log is an append-only, time-based record
of data.
2 10 30 25 5 12first entry append entry here!
This log is for computers, not humans.
2 10 30 25 5 12
Server
12
Client
12
In simple systems, the log is pretty straightforward.
In a manager group, that log entry can only “become
truth” once it is confirmed from the majority of
followers (quorum!)
Client
12
Manager
follower
Manager
follower
Manager
leader
demo.consensus.group
In distributed computing, it’s essential that you
understand log replication.
bit.ly/logging-post
Debugging Tip
Watch the Raft logs.
Monitor via inotifywait OR
just read them directly!
Scheduling
HA application
problems
scheduling
problems
orchestrator problems
Scheduling constraints
Restrict services to specific nodes, such as specific
architectures, security levels, or types
docker service create 
--constraint 'node.labels.type==web' my-app
New in 17.04.0-ce
Topology-aware scheduling!!1!
Implements a spread strategy over nodes that belong to a
certain category.
Unlike --constraint, this is a “soft” preference
—placement-pref ‘spread=node.labels.dc’
Swarm will not rebalance
healthy tasks when a new
node comes online
Debugging Tip
Add a manager to your
Swarm running with
--availability drain
and in Engine debug mode
Failure Recovery
Losing quorum
• Bring the downed nodes back online (derp)
Regain quorum
• On a healthy manager, run 

docker swarm init --force-new-cluster
This will create a new cluster with one healthy manager
• You need to promote new managers
The datacenter is
on fire
• Bring up a new manager and stop Docker
• sudo rm -rf /var/lib/docker/swarm
• Copy backup to /var/lib/docker/swarm
• Start Docker
• docker swarm init (--force-new-cluster)
Restore from a backup in 5 easy steps!
• In general, users shouldn’t be allowed to modify IP
addresses of nodes
• Restoring from a backup == old IP address for node1
• Workaround is to use elastic IPs with ability to reassign
But wait, there’s a bug… or a feature
Thank You!
@docker
#dockercon

Más contenido relacionado

La actualidad más candente

Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 

La actualidad más candente (20)

Shifter: Containers in HPC Environments
Shifter: Containers in HPC EnvironmentsShifter: Containers in HPC Environments
Shifter: Containers in HPC Environments
 
DevOps Foundations
DevOps FoundationsDevOps Foundations
DevOps Foundations
 
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...
Combining Cloud Native & PaaS: Building a Fully Managed Application Platform ...
 
Real-time in the real world: DIRT in production
Real-time in the real world: DIRT in productionReal-time in the real world: DIRT in production
Real-time in the real world: DIRT in production
 
Docker Swarm for Beginner
Docker Swarm for BeginnerDocker Swarm for Beginner
Docker Swarm for Beginner
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
LinuxKit Deep Dive
LinuxKit Deep DiveLinuxKit Deep Dive
LinuxKit Deep Dive
 
Netflix: From Clouds to Roots
Netflix: From Clouds to RootsNetflix: From Clouds to Roots
Netflix: From Clouds to Roots
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Docker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing MeetupDocker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing Meetup
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 

Similar a Everything You Thought You Already Knew About Orchestration

BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
Craig Schumann
 
Drools Presentation for Tallink.ee
Drools Presentation for Tallink.eeDrools Presentation for Tallink.ee
Drools Presentation for Tallink.ee
Anton Arhipov
 

Similar a Everything You Thought You Already Knew About Orchestration (20)

Dzone performancemonitoring2016-mastercode.vn
Dzone performancemonitoring2016-mastercode.vnDzone performancemonitoring2016-mastercode.vn
Dzone performancemonitoring2016-mastercode.vn
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - Slides
 
Architecting for Failure in a Containerized World
Architecting for Failure in a Containerized WorldArchitecting for Failure in a Containerized World
Architecting for Failure in a Containerized World
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
 
程序员实践之路
程序员实践之路程序员实践之路
程序员实践之路
 
arch_mtg_sqlsig_hcotter_replication.ppt
arch_mtg_sqlsig_hcotter_replication.pptarch_mtg_sqlsig_hcotter_replication.ppt
arch_mtg_sqlsig_hcotter_replication.ppt
 
Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE Incident Management in the Age of DevOps and SRE
Incident Management in the Age of DevOps and SRE
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
SharePoint Administration: Tips from the Field
SharePoint Administration: Tips from the FieldSharePoint Administration: Tips from the Field
SharePoint Administration: Tips from the Field
 
Drools Presentation for Tallink.ee
Drools Presentation for Tallink.eeDrools Presentation for Tallink.ee
Drools Presentation for Tallink.ee
 
Windows 7 client performance talk - Jeff Stokes
Windows 7 client performance talk - Jeff StokesWindows 7 client performance talk - Jeff Stokes
Windows 7 client performance talk - Jeff Stokes
 
Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...
Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...
Start Up Austin 2017: Manual vs Automation - When to Start Automating your Pr...
 
Realtime search at Yammer
Realtime search at YammerRealtime search at Yammer
Realtime search at Yammer
 
Real-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky BorisReal-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky Boris
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
Acc 340 Preview Full Course
Acc 340 Preview Full Course Acc 340 Preview Full Course
Acc 340 Preview Full Course
 
Acc 340 Preview Full Course
Acc 340 Preview Full CourseAcc 340 Preview Full Course
Acc 340 Preview Full Course
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 

Más de Laura Frank Tacho

Más de Laura Frank Tacho (8)

The Container Shame Spiral
The Container Shame SpiralThe Container Shame Spiral
The Container Shame Spiral
 
Using Docker For Development
Using Docker For DevelopmentUsing Docker For Development
Using Docker For Development
 
Deploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKSDeploying a Kubernetes App with Amazon EKS
Deploying a Kubernetes App with Amazon EKS
 
Building Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with DockerBuilding Efficient Parallel Testing Platforms with Docker
Building Efficient Parallel Testing Platforms with Docker
 
Efficient Parallel Testing with Docker
Efficient Parallel Testing with DockerEfficient Parallel Testing with Docker
Efficient Parallel Testing with Docker
 
Stop Being Lazy and Test Your Software
Stop Being Lazy and Test Your SoftwareStop Being Lazy and Test Your Software
Stop Being Lazy and Test Your Software
 
Happier Teams Through Tools
Happier Teams Through ToolsHappier Teams Through Tools
Happier Teams Through Tools
 
Rails Applications with Docker
Rails Applications with DockerRails Applications with Docker
Rails Applications with Docker
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Everything You Thought You Already Knew About Orchestration