SlideShare una empresa de Scribd logo
1 de 36
Monitoring micro-services platform 
Boyan Dimitrov, 
Platform Engineering @ Hailo @nathariel
Outline 
• Intro to the Hailo world 
• Platform Overview 
• Monitoring Evolution
The Platform 
Troll a platform by Swinsto101 / CC BY-SA 3.0 / Desaturated 
from original
Platform specifics 
• SOA based on Go ( and Java… ) 
• 1000+ AWS instances spanning multiple regions 
• 160+ services in production 
• Designed specifically for the cloud – different building blocks and 
components will constantly be in flux, broken or unavailable.
eu-west-1 
Proxy Layer 
Message Bus+ 
Go Services 
Java 
Services 
C* 
us-east-1 
Proxy Layer 
Message Bus+ 
Go Services Java 
C* 
Services
Provisioning Service 
CI Pipeline (Janky/Jenkins) 
Amazon S3 
Provisioning Service Provisioning Service 
Provisioning Manager 
Docker Registry 
Inside an environment
A micro-service under the hood 
Handler platform-layer 
Logic 
Storage 
Library for abstracting service-to- 
service comms 
service-layer 
Self-configuring external 
service adapters 
Service 
Any service gets for free: 
• Provisioning 
• Discovery 
• Configuration 
• Authentication/Authorization 
• A/B testing capabilities 
• Self-configuring connectivity to 
third-party services 
• Monitoring 
• Instrumentation
Mission: 
Define high level platform and business metrics 
Gather as many insights as possible 
Add automatic failover and recovery capabilities 
"A[ollo 8 Launch Control Room” by Tfawls 
/ Desaturated from original
PHP Java 
Host Instance 
Graphite 
Zabbix 
Aspiration vs Reality 
CloudWatch 
Zabbix 
Agent 
StatsD Carbon
Challenges 
• Single StatsD instance and generic graphite setup cannot cope with all the traffic 
(surprise!) 
• No easy way of generating and searching for graphs quickly 
• We didn’t instrument everything 
• “Traditional” monitoring systems can only give basic app insights 
• Se#ing up app templates is a manual daunting process and does not scale 
• No in-depth visibility into our main KPIs 
• No way of identifying platform / release / config / cloud infrastructure changes
Instrumentation++ 
“Airplaine board” by Smithore 
/ Desaturated from original
Host Instance 
Graphite 
Cache 
Zabbix 
Iterate on what we already know 
Relay 
CloudWatch 
CollectD StatsD 
Cache 
Cache 
Zabbix 
Agent
Result 
• Scaling up graphite and moving StatsD to every box allowed us to collect millions 
of metrics 
• Instrumenting everything gives us a lot of insights. 
• Grafana allows us to quickly build, store and search for important graphs. Widely 
adopted by the whole development team! 
Tip: Focus on upper 95th and 99th percentiles and work out from there.
Monitoring & 
Instrumentation
RReatzhiinekl Service 
Monitoring
Provisioning Service 
Message bus 
Monitoring 
Service 
New 
Service 
Publish 
Healthchecks 
Host Instance 
Provisioning Manager 
Binding Discovery 
Provisioning Service 
Host Instance 
Monitoring 
V2
healthcheck.Register(&healthcheck.HealthCheck{! 
Id: “MyHCId”,! 
ServiceName: ServiceName,! 
ServiceVersion: ServiceVersion,! 
Hostname: Hostname,! 
InstanceId: InstanceID,! 
Interval: time.Minute,! 
Checker: myCallbackFunc,! 
Priority: hc.Warning,! 
})!
Service level health checks
Result 
• Service health checks give us in-depth service performance details 
• The monitoring service has a holistic view of our platform health and can identify 
degraded availability zones 
• Developers can identify what is important for their service and track & alert on it.
Trace++ 
Monitoring & 
Instrumentation 
“Abstract conception of network and communication” 
by Leszekglasner / Desaturated from original
Trace Architecture 
CollectD StatsD 
Zabbix 
Agent 
Provisioning Service 
Host Instance 
Phosphor 
Publish 
Trace 
Service 
Dashboards 
Monitoring 
In-memory 
Aggregates 
Optional 
persistant 
storage 
Async 
UDP
Live traffic flows
Live traffic flows
Automatic request tracing
Result 
• Trace incoming requests and pinpoint bo#lenecks & SLA offenders 
• Easily identify problems on the request/response path 
• Quickly find out exactly which services participate on the request path
Robomon
Automated Jobs
Result 
• Identify business impacting issues immediately 
• Highlight the service on the critical path that is most likely responsible for the 
problems
Event Correlation 
“Connection” by A2bb5s 
/ Desaturated from the original
CollectD StatsD 
Zabbix 
Agent 
Provisioning Service 
Host Instance 
Phosphor 
Publish 
c 
Dashboards 
Monitoring 
Persistent 
Storage 
SNS 
Platform 
Events 
Whisper 
Service 
c 
Platform events
Result 
• Answer to the most important “Did anything change?” question 
• Audit trail for any platform changes 
• Holistic view of our platform status
It is not over yet! 
++ Machine Learning 
++ Event source weighting
Thanks! 
PS. We’re hiring! 
@nathariel 
boyan@hailocab.com London DevOps

Más contenido relacionado

La actualidad más candente

Agile in a nutshell
Agile in a nutshellAgile in a nutshell
Agile in a nutshellDoc List
 
Modular PHP Development using CodeIgniter Bonfire
Modular PHP Development using CodeIgniter BonfireModular PHP Development using CodeIgniter Bonfire
Modular PHP Development using CodeIgniter BonfireJeff Fox
 
Janus: an open source and general purpose WebRTC (gateway) server
Janus: an open source and general purpose WebRTC (gateway) serverJanus: an open source and general purpose WebRTC (gateway) server
Janus: an open source and general purpose WebRTC (gateway) serverDevDay
 
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUES
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUESARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUES
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUESSOAT
 
Six Signs You Need Platform Engineering
Six Signs You Need Platform EngineeringSix Signs You Need Platform Engineering
Six Signs You Need Platform EngineeringWeaveworks
 
Agile có thể giúp chúng ta những gì?
Agile có thể giúp chúng ta những gì?Agile có thể giúp chúng ta những gì?
Agile có thể giúp chúng ta những gì?DUONG Trong Tan
 
CI with Gitlab & Docker
CI with Gitlab & DockerCI with Gitlab & Docker
CI with Gitlab & DockerJoerg Henning
 
What the Heck Is a Product Owner?
What the Heck Is a Product Owner?What the Heck Is a Product Owner?
What the Heck Is a Product Owner?Ron Lichty
 
Formation GIT gratuite par ippon 2014
Formation GIT gratuite par ippon 2014Formation GIT gratuite par ippon 2014
Formation GIT gratuite par ippon 2014Ippon
 
Grokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineersGrokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineers9diov
 
Introduction to Chef
Introduction to ChefIntroduction to Chef
Introduction to ChefKnoldus Inc.
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityJulien Pivotto
 
SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)
SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)
SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)Markus Giacomuzzi
 
The Essence of Sprint Planning : Presented by Sprint Planning
The Essence of Sprint Planning : Presented by Sprint PlanningThe Essence of Sprint Planning : Presented by Sprint Planning
The Essence of Sprint Planning : Presented by Sprint PlanningoGuild .
 
Scrum Process For Offshore Team
Scrum Process For Offshore TeamScrum Process For Offshore Team
Scrum Process For Offshore TeamPaul Nguyen
 
Agile training
Agile trainingAgile training
Agile trainingLong Ta
 
Modeling microservices using DDD
Modeling microservices using DDDModeling microservices using DDD
Modeling microservices using DDDMasashi Narumoto
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitecturePLUMgrid
 

La actualidad más candente (20)

Agile in a nutshell
Agile in a nutshellAgile in a nutshell
Agile in a nutshell
 
Modular PHP Development using CodeIgniter Bonfire
Modular PHP Development using CodeIgniter BonfireModular PHP Development using CodeIgniter Bonfire
Modular PHP Development using CodeIgniter Bonfire
 
Microservice intro
Microservice introMicroservice intro
Microservice intro
 
Janus: an open source and general purpose WebRTC (gateway) server
Janus: an open source and general purpose WebRTC (gateway) serverJanus: an open source and general purpose WebRTC (gateway) server
Janus: an open source and general purpose WebRTC (gateway) server
 
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUES
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUESARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUES
ARCHITECTURE MICROSERVICE : TOUR D’HORIZON DU CONCEPT ET BONNES PRATIQUES
 
Six Signs You Need Platform Engineering
Six Signs You Need Platform EngineeringSix Signs You Need Platform Engineering
Six Signs You Need Platform Engineering
 
Git advanced
Git advancedGit advanced
Git advanced
 
Agile có thể giúp chúng ta những gì?
Agile có thể giúp chúng ta những gì?Agile có thể giúp chúng ta những gì?
Agile có thể giúp chúng ta những gì?
 
CI with Gitlab & Docker
CI with Gitlab & DockerCI with Gitlab & Docker
CI with Gitlab & Docker
 
What the Heck Is a Product Owner?
What the Heck Is a Product Owner?What the Heck Is a Product Owner?
What the Heck Is a Product Owner?
 
Formation GIT gratuite par ippon 2014
Formation GIT gratuite par ippon 2014Formation GIT gratuite par ippon 2014
Formation GIT gratuite par ippon 2014
 
Grokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineersGrokking Techtalk: Problem solving for sw engineers
Grokking Techtalk: Problem solving for sw engineers
 
Introduction to Chef
Introduction to ChefIntroduction to Chef
Introduction to Chef
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
 
SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)
SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)
SAFe Dependency Board Widget V3 for IBM Rational Team Concert (RTC)
 
The Essence of Sprint Planning : Presented by Sprint Planning
The Essence of Sprint Planning : Presented by Sprint PlanningThe Essence of Sprint Planning : Presented by Sprint Planning
The Essence of Sprint Planning : Presented by Sprint Planning
 
Scrum Process For Offshore Team
Scrum Process For Offshore TeamScrum Process For Offshore Team
Scrum Process For Offshore Team
 
Agile training
Agile trainingAgile training
Agile training
 
Modeling microservices using DDD
Modeling microservices using DDDModeling microservices using DDD
Modeling microservices using DDD
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices Architecture
 

Similar a Monitoring microservices platform

Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyBoyan Dimitrov
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsDigitalOcean
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Adin Ermie
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Amazon Web Services
 
Cloud Foundry Technical Overview
Cloud Foundry Technical OverviewCloud Foundry Technical Overview
Cloud Foundry Technical Overviewcornelia davis
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Introducing ONAP for OpenStack St Louis Meetup
Introducing ONAP for OpenStack St Louis MeetupIntroducing ONAP for OpenStack St Louis Meetup
Introducing ONAP for OpenStack St Louis Meetupdjzook
 
WLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure MonitoringWLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure MonitoringJames Casey
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectDevOps.com
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
LogicMonitor: An Overview
LogicMonitor: An Overview LogicMonitor: An Overview
LogicMonitor: An Overview James McCabe
 
Streaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamStreaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamInformaticaMarketplace
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 
Azure Cloud Application Development Workshop - UGIdotNET
Azure Cloud Application Development Workshop - UGIdotNETAzure Cloud Application Development Workshop - UGIdotNET
Azure Cloud Application Development Workshop - UGIdotNETLorenzo Barbieri
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Uri Cohen
 
AMB110: IT Asset Management – How to Start When You Don’t Know Where to Start
AMB110: IT Asset Management – How to Start When You Don’t Know Where to StartAMB110: IT Asset Management – How to Start When You Don’t Know Where to Start
AMB110: IT Asset Management – How to Start When You Don’t Know Where to StartIvanti
 
Microservices: next-steps
Microservices: next-stepsMicroservices: next-steps
Microservices: next-stepsBoyan Dimitrov
 
z Technical Summit Track 3 Session 4 Developing mobilefirst app for z
z Technical Summit Track 3 Session 4 Developing mobilefirst app for zz Technical Summit Track 3 Session 4 Developing mobilefirst app for z
z Technical Summit Track 3 Session 4 Developing mobilefirst app for znick_garrod
 
Infrastructure as Code - Getting Started, Concepts & Tools
Infrastructure as Code - Getting Started, Concepts & ToolsInfrastructure as Code - Getting Started, Concepts & Tools
Infrastructure as Code - Getting Started, Concepts & ToolsLior Kamrat
 

Similar a Monitoring microservices platform (20)

Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
 
Building an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult StepsBuilding an Observability Platform in 389 Difficult Steps
Building an Observability Platform in 389 Difficult Steps
 
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
Global Azure Bootcamp 2017 - Performance and Health Management for Modern App...
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
 
Cloud Foundry Technical Overview
Cloud Foundry Technical OverviewCloud Foundry Technical Overview
Cloud Foundry Technical Overview
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Introducing ONAP for OpenStack St Louis Meetup
Introducing ONAP for OpenStack St Louis MeetupIntroducing ONAP for OpenStack St Louis Meetup
Introducing ONAP for OpenStack St Louis Meetup
 
WLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure MonitoringWLCG Grid Infrastructure Monitoring
WLCG Grid Infrastructure Monitoring
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-Architect
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
LogicMonitor: An Overview
LogicMonitor: An Overview LogicMonitor: An Overview
LogicMonitor: An Overview
 
Streaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamStreaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data Stream
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Azure Cloud Application Development Workshop - UGIdotNET
Azure Cloud Application Development Workshop - UGIdotNETAzure Cloud Application Development Workshop - UGIdotNET
Azure Cloud Application Development Workshop - UGIdotNET
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
AMB110: IT Asset Management – How to Start When You Don’t Know Where to Start
AMB110: IT Asset Management – How to Start When You Don’t Know Where to StartAMB110: IT Asset Management – How to Start When You Don’t Know Where to Start
AMB110: IT Asset Management – How to Start When You Don’t Know Where to Start
 
Microservices: next-steps
Microservices: next-stepsMicroservices: next-steps
Microservices: next-steps
 
z Technical Summit Track 3 Session 4 Developing mobilefirst app for z
z Technical Summit Track 3 Session 4 Developing mobilefirst app for zz Technical Summit Track 3 Session 4 Developing mobilefirst app for z
z Technical Summit Track 3 Session 4 Developing mobilefirst app for z
 
Infrastructure as Code - Getting Started, Concepts & Tools
Infrastructure as Code - Getting Started, Concepts & ToolsInfrastructure as Code - Getting Started, Concepts & Tools
Infrastructure as Code - Getting Started, Concepts & Tools
 

Más de Boyan Dimitrov

Complex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWSComplex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWSBoyan Dimitrov
 
Complex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWSComplex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWSBoyan Dimitrov
 
Building Highly Sophisticated Environments for Security and Compliance on AWS
Building Highly Sophisticated Environments for Security and Compliance on AWSBuilding Highly Sophisticated Environments for Security and Compliance on AWS
Building Highly Sophisticated Environments for Security and Compliance on AWSBoyan Dimitrov
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesBoyan Dimitrov
 
Anatomy of the modern application stack
Anatomy of the modern application stackAnatomy of the modern application stack
Anatomy of the modern application stackBoyan Dimitrov
 
Patterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSPatterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSBoyan Dimitrov
 
Microservices and elastic resource pools with Amazon EC2 Container Service
Microservices and elastic resource pools with Amazon EC2 Container ServiceMicroservices and elastic resource pools with Amazon EC2 Container Service
Microservices and elastic resource pools with Amazon EC2 Container ServiceBoyan Dimitrov
 
Scaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWSScaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWSBoyan Dimitrov
 
Scaling from 1 to 10 million users - Hailo
Scaling from 1 to 10 million users - HailoScaling from 1 to 10 million users - Hailo
Scaling from 1 to 10 million users - HailoBoyan Dimitrov
 

Más de Boyan Dimitrov (9)

Complex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWSComplex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWS
 
Complex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWSComplex architectures for authentication and authorization on AWS
Complex architectures for authentication and authorization on AWS
 
Building Highly Sophisticated Environments for Security and Compliance on AWS
Building Highly Sophisticated Environments for Security and Compliance on AWSBuilding Highly Sophisticated Environments for Security and Compliance on AWS
Building Highly Sophisticated Environments for Security and Compliance on AWS
 
Observability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architecturesObservability foundations in dynamically evolving architectures
Observability foundations in dynamically evolving architectures
 
Anatomy of the modern application stack
Anatomy of the modern application stackAnatomy of the modern application stack
Anatomy of the modern application stack
 
Patterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSPatterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWS
 
Microservices and elastic resource pools with Amazon EC2 Container Service
Microservices and elastic resource pools with Amazon EC2 Container ServiceMicroservices and elastic resource pools with Amazon EC2 Container Service
Microservices and elastic resource pools with Amazon EC2 Container Service
 
Scaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWSScaling micro-services Architecture on AWS
Scaling micro-services Architecture on AWS
 
Scaling from 1 to 10 million users - Hailo
Scaling from 1 to 10 million users - HailoScaling from 1 to 10 million users - Hailo
Scaling from 1 to 10 million users - Hailo
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Monitoring microservices platform

  • 1. Monitoring micro-services platform Boyan Dimitrov, Platform Engineering @ Hailo @nathariel
  • 2. Outline • Intro to the Hailo world • Platform Overview • Monitoring Evolution
  • 3.
  • 4. The Platform Troll a platform by Swinsto101 / CC BY-SA 3.0 / Desaturated from original
  • 5. Platform specifics • SOA based on Go ( and Java… ) • 1000+ AWS instances spanning multiple regions • 160+ services in production • Designed specifically for the cloud – different building blocks and components will constantly be in flux, broken or unavailable.
  • 6. eu-west-1 Proxy Layer Message Bus+ Go Services Java Services C* us-east-1 Proxy Layer Message Bus+ Go Services Java C* Services
  • 7. Provisioning Service CI Pipeline (Janky/Jenkins) Amazon S3 Provisioning Service Provisioning Service Provisioning Manager Docker Registry Inside an environment
  • 8. A micro-service under the hood Handler platform-layer Logic Storage Library for abstracting service-to- service comms service-layer Self-configuring external service adapters Service Any service gets for free: • Provisioning • Discovery • Configuration • Authentication/Authorization • A/B testing capabilities • Self-configuring connectivity to third-party services • Monitoring • Instrumentation
  • 9. Mission: Define high level platform and business metrics Gather as many insights as possible Add automatic failover and recovery capabilities "A[ollo 8 Launch Control Room” by Tfawls / Desaturated from original
  • 10. PHP Java Host Instance Graphite Zabbix Aspiration vs Reality CloudWatch Zabbix Agent StatsD Carbon
  • 11. Challenges • Single StatsD instance and generic graphite setup cannot cope with all the traffic (surprise!) • No easy way of generating and searching for graphs quickly • We didn’t instrument everything • “Traditional” monitoring systems can only give basic app insights • Se#ing up app templates is a manual daunting process and does not scale • No in-depth visibility into our main KPIs • No way of identifying platform / release / config / cloud infrastructure changes
  • 12. Instrumentation++ “Airplaine board” by Smithore / Desaturated from original
  • 13. Host Instance Graphite Cache Zabbix Iterate on what we already know Relay CloudWatch CollectD StatsD Cache Cache Zabbix Agent
  • 14. Result • Scaling up graphite and moving StatsD to every box allowed us to collect millions of metrics • Instrumenting everything gives us a lot of insights. • Grafana allows us to quickly build, store and search for important graphs. Widely adopted by the whole development team! Tip: Focus on upper 95th and 99th percentiles and work out from there.
  • 17. Provisioning Service Message bus Monitoring Service New Service Publish Healthchecks Host Instance Provisioning Manager Binding Discovery Provisioning Service Host Instance Monitoring V2
  • 18. healthcheck.Register(&healthcheck.HealthCheck{! Id: “MyHCId”,! ServiceName: ServiceName,! ServiceVersion: ServiceVersion,! Hostname: Hostname,! InstanceId: InstanceID,! Interval: time.Minute,! Checker: myCallbackFunc,! Priority: hc.Warning,! })!
  • 20. Result • Service health checks give us in-depth service performance details • The monitoring service has a holistic view of our platform health and can identify degraded availability zones • Developers can identify what is important for their service and track & alert on it.
  • 21. Trace++ Monitoring & Instrumentation “Abstract conception of network and communication” by Leszekglasner / Desaturated from original
  • 22. Trace Architecture CollectD StatsD Zabbix Agent Provisioning Service Host Instance Phosphor Publish Trace Service Dashboards Monitoring In-memory Aggregates Optional persistant storage Async UDP
  • 26. Result • Trace incoming requests and pinpoint bo#lenecks & SLA offenders • Easily identify problems on the request/response path • Quickly find out exactly which services participate on the request path
  • 29. Result • Identify business impacting issues immediately • Highlight the service on the critical path that is most likely responsible for the problems
  • 30. Event Correlation “Connection” by A2bb5s / Desaturated from the original
  • 31. CollectD StatsD Zabbix Agent Provisioning Service Host Instance Phosphor Publish c Dashboards Monitoring Persistent Storage SNS Platform Events Whisper Service c Platform events
  • 32.
  • 33.
  • 34. Result • Answer to the most important “Did anything change?” question • Audit trail for any platform changes • Holistic view of our platform status
  • 35. It is not over yet! ++ Machine Learning ++ Event source weighting
  • 36. Thanks! PS. We’re hiring! @nathariel boyan@hailocab.com London DevOps