SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Cloud Native at Netflix
What Changed?
July 2013
Adrian Cockcroft
@adrianco #netflixcloud @NetflixOSS
http://www.linkedin.com/in/adriancockcroft
Cloud Native
Netflix Architecture
NetflixOSS
Cloud Native
What is it?
Why?
Engineers
Solve hard problems
Build amazing and complex things
Fix things when they break
Strive for perfection
Perfect code
Perfect hardware
Perfectly operated
But perfection takes too long…
Compromises…
Time to market vs. Quality
Utopia remains out of reach
Where time to market wins big
Making a land-grab
Disrupting competitors (OODA)
Anything delivered as web services
Observe
Orient
Decide
Act
Land grab
opportunity Competitive
move
Customer
Pain Point
Analysis
Get buy-in
Plan
response
Commit
resources
Implement
Deliver
Engage
customers
Research
alternatives
Measure
customers
Colonel
Boyd, USAF
“Get inside your
adversaries'
OODA loop to
disorient them”
How Soon?
Code features in days instead of months
Get hardware in minutes instead of weeks
Incident response in seconds instead of hours
A new engineering challenge
Construct a highly agile and highly
available service from ephemeral and
assumed broken components
Inspiration
How to get to Cloud Native
Freedom and Responsibility for Developers
Decentralize and Automate Ops Activities
Integrate DevOps into the Business Organization
Four Transitions
• Management: Integrated Roles in a Single Organization
– Business, Development, Operations -> BusDevOps
• Developers: Denormalized Data – NoSQL
– Decentralized, scalable, available, polyglot
• Responsibility from Ops to Dev: Continuous Delivery
– Decentralized small daily production updates
• Responsibility from Ops to Dev: Agile Infrastructure - Cloud
– Hardware in minutes, provisioned directly by developers
Netflix BusDevOps Organization
Chief Product
Officer
VP Product
Management
Directors
Product
VP UI
Engineering
Directors
Development
Developers +
DevOps
UI Data
Sources
AWS
VP Discovery
Engineering
Directors
Development
Developers +
DevOps
Discovery
Data Sources
AWS
VP Platform
Directors
Platform
Developers +
DevOps
Platform
Data Sources
AWS
Denormalized, independently
updated and scaled data
Cloud, self service updated &
scaled infrastructure
Code, independently updated
continuous delivery
Decentralized Deployment
Asgard Developer Portal
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Ephemeral Instances
• Largest services are autoscaled
• Average lifetime of an instance is 36 hours
P
u
s
h
Autoscale Up
Autoscale Down
Netflix Streaming
A Cloud Native Application based on
an open source platform
Netflix Member Web Site Home Page
Personalization Driven – How Does It Work?
How Netflix Streaming Works
Customer Device
(PC, PS3, TV…)
Web Site or
Discovery API
User Data
Personalization
Streaming API
DRM
QoS Logging
OpenConnect
CDN Boxes
CDN
Management and
Steering
Content Encoding
Consumer
Electronics
AWS Cloud
Services
CDN Edge
Locations
Nov
2012
Streaming
Bandwidth
March
2013
Mean
Bandwidth
+39% 6mo
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Start Here
memcached
Cassandra
Web service
S3 bucket
Personalization movie group choosers
(for US, Canada and Latam)
Each icon is
three to a few
hundred
instances
across three
AWS zones
Three Balanced Availability Zones
Test with Chaos Gorilla
Cassandra and Evcache
Replicas
Zone A
Cassandra and Evcache
Replicas
Zone B
Cassandra and Evcache
Replicas
Zone C
Load Balancers
Isolated Regions
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
EU-West Load Balancers
Cross Region Use Cases
• Geographic Isolation
– US to Europe replication of subscriber data
– Read intensive, low update rate
– Production use since late 2011
• Redundancy for regional failover
– US East to US West replication of everything
– Includes write intensive data, high update rate
– Testing now
Benchmarking Global Cassandra
Write intensive test of cross region replication capacity
16 x hi1.4xlarge SSD nodes per zone = 96 total
192 TB of SSD in six locations up and running Cassandra in 20 min
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-West-2 Region - Oregon
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East-1 Region - Virginia
Test
Load
Test
Load
Validation
Load
Inter-Zone Traffic
1 Million writes
CL.ONE (wait for
one replica to ack)
1 Million reads
After 500ms
CL.ONE with no
Data loss
Inter-Region Traffic
Up to 9Gbits/s, 83ms 18TB
backups
from S3
Managing Multi-Region Availability
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
UltraDNS
DynECT
DNS
AWS
Route53
Denominator – manage traffic via multiple DNS providers with Java code
2013 Timeline - Concept Jan, Code Feb, OSS March, Production use May
Denominator
Incidents – Impact and Mitigation
PR
X Incidents
CS
XX Incidents
Metrics impact – Feature disable
XXX Incidents
No Impact – fast retry or automated failover
XXXX Incidents
Public Relations
Media Impact
High Customer
Service Calls
Affects AB
Test Results
Y incidents mitigated by Active
Active, game day practicing
YY incidents
mitigated by
better tools and
practices
YYY incidents
mitigated by better
data tagging
Cloud Security
Automated attack surface monitoring
Crypto key store management (CloudHSM)
Scale to resist DDOS attacks
http://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned
What Changed?
“Impossible” deployments are easy
Jointly building code with vendors in public
Highly available and secure despite scale and speed
The DIY Question
Why doesn’t Netflix build and run its
own cloud?
Fitting Into Public Scale
Public
Grey
Area
Private
1,000 Instances 100,000 Instances
Netflix FacebookStartups
How big is Public?
AWS upper bound estimate based on the number of public IP Addresses
Every provisioned instance gets a public IP by default (some VPC don’t)
AWS Maximum Possible Instance Count 4.2 Million – May 2013
Growth >10x in Three Years, >2x Per Annum - http://bit.ly/awsiprange
A Cloud Native Open Source Platform
See netflix.github.com
Establish our
solutions as Best
Practices / Standards
Hire, Retain and
Engage Top
Engineers
Build up Netflix
Technology Brand
Benefit from a
shared ecosystem
Goals
Example Application – RSS Reader
Z
U
U
L
Zuul Traffic
Processing
and Routing
Ice – Detailed AWS “Chargeback”
http://techblog.netflix.com/2013/06/announcing-ice-cloud-spend-and-usage.html
Boosting the @NetflixOSS Ecosystem
See netflix.github.com
More Use Cases
More
Features
Better portability
Higher availability
Easier to deploy
Contributions from end users
Contributions from vendors
What’s Coming Next?
Vendor Driven Portability
Interest in using NetflixOSS for Enterprise Private Clouds
“It’s done when it runs Asgard”
Functionally complete
Demonstrated March
Released June in V3.3
Offering $10K prize for integration work
Vendor and end user interest
Openstack “Heat” getting there
Paypal C3 Console based on Asgard
Functionality and scale now, portability coming
Moving from parts to a platform in 2013
Netflix is fostering a cloud native ecosystem
Rapid Evolution - Low MTBIAMSH
(Mean Time Between Idea And Making Stuff Happen)
Slideshare.net/Netflix Details
• Meetup S1E3 July – Featuring Contributors Eucalyptus, IBM, Paypal, Riot Games
– http://techblog.netflix.com/2013/07/netflixoss-meetup-series-1-episode-3.html
• Lightning Talks March S1E2
– http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-
roadmap
• Lightning Talks Feb S1E1
– http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks
• Asgard In Depth Feb S1E1
– http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house
• Security Architecture
– http://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned/
• Cost Aware Cloud Architectures – with Jinesh Varia of AWS
– http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-
varia-aws-and-adrian-cockroft-netflix
What Changed?
Speed wins, Cloud Native helps you get there
NetflixOSS makes it easier for everyone to become Cloud Native
@adrianco #netflixcloud @NetflixOSS

Más contenido relacionado

Más de Adrian Cockcroft

Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 

Más de Adrian Cockcroft (20)

Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 

Último

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Último (20)

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013

  • 1. Cloud Native at Netflix What Changed? July 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
  • 4. Engineers Solve hard problems Build amazing and complex things Fix things when they break
  • 5. Strive for perfection Perfect code Perfect hardware Perfectly operated
  • 6. But perfection takes too long… Compromises… Time to market vs. Quality Utopia remains out of reach
  • 7. Where time to market wins big Making a land-grab Disrupting competitors (OODA) Anything delivered as web services
  • 8. Observe Orient Decide Act Land grab opportunity Competitive move Customer Pain Point Analysis Get buy-in Plan response Commit resources Implement Deliver Engage customers Research alternatives Measure customers Colonel Boyd, USAF “Get inside your adversaries' OODA loop to disorient them”
  • 9. How Soon? Code features in days instead of months Get hardware in minutes instead of weeks Incident response in seconds instead of hours
  • 10. A new engineering challenge Construct a highly agile and highly available service from ephemeral and assumed broken components
  • 12. How to get to Cloud Native Freedom and Responsibility for Developers Decentralize and Automate Ops Activities Integrate DevOps into the Business Organization
  • 13. Four Transitions • Management: Integrated Roles in a Single Organization – Business, Development, Operations -> BusDevOps • Developers: Denormalized Data – NoSQL – Decentralized, scalable, available, polyglot • Responsibility from Ops to Dev: Continuous Delivery – Decentralized small daily production updates • Responsibility from Ops to Dev: Agile Infrastructure - Cloud – Hardware in minutes, provisioned directly by developers
  • 14. Netflix BusDevOps Organization Chief Product Officer VP Product Management Directors Product VP UI Engineering Directors Development Developers + DevOps UI Data Sources AWS VP Discovery Engineering Directors Development Developers + DevOps Discovery Data Sources AWS VP Platform Directors Platform Developers + DevOps Platform Data Sources AWS Denormalized, independently updated and scaled data Cloud, self service updated & scaled infrastructure Code, independently updated continuous delivery
  • 17. Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
  • 18. Netflix Streaming A Cloud Native Application based on an open source platform
  • 19. Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
  • 20. How Netflix Streaming Works Customer Device (PC, PS3, TV…) Web Site or Discovery API User Data Personalization Streaming API DRM QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding Consumer Electronics AWS Cloud Services CDN Edge Locations
  • 22. Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Start Here memcached Cassandra Web service S3 bucket Personalization movie group choosers (for US, Canada and Latam) Each icon is three to a few hundred instances across three AWS zones
  • 23. Three Balanced Availability Zones Test with Chaos Gorilla Cassandra and Evcache Replicas Zone A Cassandra and Evcache Replicas Zone B Cassandra and Evcache Replicas Zone C Load Balancers
  • 24. Isolated Regions Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C US-East Load Balancers Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C EU-West Load Balancers
  • 25. Cross Region Use Cases • Geographic Isolation – US to Europe replication of subscriber data – Read intensive, low update rate – Production use since late 2011 • Redundancy for regional failover – US East to US West replication of everything – Includes write intensive data, high update rate – Testing now
  • 26. Benchmarking Global Cassandra Write intensive test of cross region replication capacity 16 x hi1.4xlarge SSD nodes per zone = 96 total 192 TB of SSD in six locations up and running Cassandra in 20 min Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C US-West-2 Region - Oregon Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C US-East-1 Region - Virginia Test Load Test Load Validation Load Inter-Zone Traffic 1 Million writes CL.ONE (wait for one replica to ack) 1 Million reads After 500ms CL.ONE with no Data loss Inter-Region Traffic Up to 9Gbits/s, 83ms 18TB backups from S3
  • 27. Managing Multi-Region Availability Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C Regional Load Balancers Cassandra Replicas Zone A Cassandra Replicas Zone B Cassandra Replicas Zone C Regional Load Balancers UltraDNS DynECT DNS AWS Route53 Denominator – manage traffic via multiple DNS providers with Java code 2013 Timeline - Concept Jan, Code Feb, OSS March, Production use May Denominator
  • 28. Incidents – Impact and Mitigation PR X Incidents CS XX Incidents Metrics impact – Feature disable XXX Incidents No Impact – fast retry or automated failover XXXX Incidents Public Relations Media Impact High Customer Service Calls Affects AB Test Results Y incidents mitigated by Active Active, game day practicing YY incidents mitigated by better tools and practices YYY incidents mitigated by better data tagging
  • 29. Cloud Security Automated attack surface monitoring Crypto key store management (CloudHSM) Scale to resist DDOS attacks http://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned
  • 30. What Changed? “Impossible” deployments are easy Jointly building code with vendors in public Highly available and secure despite scale and speed
  • 31. The DIY Question Why doesn’t Netflix build and run its own cloud?
  • 32. Fitting Into Public Scale Public Grey Area Private 1,000 Instances 100,000 Instances Netflix FacebookStartups
  • 33. How big is Public? AWS upper bound estimate based on the number of public IP Addresses Every provisioned instance gets a public IP by default (some VPC don’t) AWS Maximum Possible Instance Count 4.2 Million – May 2013 Growth >10x in Three Years, >2x Per Annum - http://bit.ly/awsiprange
  • 34. A Cloud Native Open Source Platform See netflix.github.com
  • 35. Establish our solutions as Best Practices / Standards Hire, Retain and Engage Top Engineers Build up Netflix Technology Brand Benefit from a shared ecosystem Goals
  • 36. Example Application – RSS Reader Z U U L Zuul Traffic Processing and Routing
  • 37. Ice – Detailed AWS “Chargeback” http://techblog.netflix.com/2013/06/announcing-ice-cloud-spend-and-usage.html
  • 38. Boosting the @NetflixOSS Ecosystem See netflix.github.com
  • 39. More Use Cases More Features Better portability Higher availability Easier to deploy Contributions from end users Contributions from vendors What’s Coming Next?
  • 40. Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Released June in V3.3 Offering $10K prize for integration work Vendor and end user interest Openstack “Heat” getting there Paypal C3 Console based on Asgard
  • 41. Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering a cloud native ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
  • 42. Slideshare.net/Netflix Details • Meetup S1E3 July – Featuring Contributors Eucalyptus, IBM, Paypal, Riot Games – http://techblog.netflix.com/2013/07/netflixoss-meetup-series-1-episode-3.html • Lightning Talks March S1E2 – http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and- roadmap • Lightning Talks Feb S1E1 – http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks • Asgard In Depth Feb S1E1 – http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house • Security Architecture – http://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned/ • Cost Aware Cloud Architectures – with Jinesh Varia of AWS – http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh- varia-aws-and-adrian-cockroft-netflix
  • 43. What Changed? Speed wins, Cloud Native helps you get there NetflixOSS makes it easier for everyone to become Cloud Native @adrianco #netflixcloud @NetflixOSS