SlideShare una empresa de Scribd logo
1 de 64
Globally	
  Distributed	
  Cloud	
  
Na2ve	
  Applica2ons	
  at	
  Ne8lix	
  
June	
  2013	
  
Adrian	
  Cockcro@	
  
@adrianco	
  #ne8lixcloud	
  @Ne8lixOSS	
  
hFp://www.linkedin.com/in/adriancockcro@	
  
Cloud	
  Na2ve	
  
Global	
  Architecture	
  
Ne8lixOSS	
  Components	
  
Cloud	
  Na2ve	
  
We	
  are	
  Engineers	
  
We	
  solve	
  hard	
  problems	
  
We	
  build	
  amazing	
  and	
  complex	
  things	
  
We	
  fix	
  things	
  when	
  they	
  break	
  
We	
  strive	
  for	
  perfec2on	
  
Perfect	
  code	
  
Perfect	
  hardware	
  
Perfectly	
  operated	
  
But	
  perfec2on	
  takes	
  too	
  long…	
  
So	
  we	
  compromise	
  
Time	
  to	
  market	
  vs.	
  Quality	
  
Utopia	
  remains	
  out	
  of	
  reach	
  
Where	
  2me	
  to	
  market	
  wins	
  big	
  
Making	
  a	
  land-­‐grab	
  
Disrup2ng	
  compe2tors	
  (OODA)	
  
Anything	
  delivered	
  as	
  web	
  services	
  
	
  
How	
  Soon?	
  
Code	
  features	
  in	
  days	
  instead	
  of	
  months	
  
Get	
  hardware	
  in	
  minutes	
  instead	
  of	
  weeks	
  
Incident	
  response	
  in	
  seconds	
  instead	
  of	
  hours	
  
Tipping	
  the	
  Balance	
  
Utopia	
   Dystopia	
  
Inefficient	
  
Broken	
  
Dynamic	
  
Sooner	
  
Cheaper	
  
BeFer	
  
Sta2c	
  
A	
  new	
  engineering	
  challenge	
  
Construct	
  a	
  highly	
  agile	
  and	
  highly	
  
available	
  service	
  from	
  ephemeral	
  and	
  
o@en	
  broken	
  components	
  
Inspira2on	
  
Ne8lix	
  Streaming	
  
A	
  Cloud	
  Na2ve	
  Applica2on	
  based	
  on	
  
an	
  open	
  source	
  pla8orm	
  
Ne8lix	
  Member	
  Web	
  Site	
  Home	
  Page	
  
Personaliza2on	
  Driven	
  –	
  How	
  Does	
  It	
  Work?	
  
How	
  Ne8lix	
  Streaming	
  Works	
  
Customer	
  Device	
  
(PC,	
  PS3,	
  TV…)	
  
Web	
  Site	
  or	
  
Discovery	
  API	
  
User	
  Data	
  
Personaliza2on	
  
Streaming	
  API	
  
DRM	
  
QoS	
  Logging	
  
OpenConnect	
  
CDN	
  Boxes	
  
CDN	
  
Management	
  and	
  
Steering	
  
Content	
  Encoding	
  
Consumer	
  
Electronics	
  
AWS	
  Cloud	
  
Services	
  
CDN	
  Edge	
  
Loca2ons	
  
Amazon	
  Video	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  1.31%	
  	
  
18x	
  Prime	
  
25x	
  Prime	
  
Nov	
  
2012	
  
Streaming	
  
Bandwidth	
  
March	
  
2013	
  
	
  
Mean	
  
Bandwidth	
  
+39%	
  6mo	
  
Real	
  Web	
  Server	
  Dependencies	
  Flow	
  
(Ne8lix	
  Home	
  page	
  business	
  transac2on	
  as	
  seen	
  by	
  AppDynamics)	
  
Start	
  Here	
  
memcached	
  
Cassandra	
  
Web	
  service	
  
S3	
  bucket	
  
Personaliza2on	
  movie	
  group	
  choosers	
  
(for	
  US,	
  Canada	
  and	
  Latam)	
  
Each	
  icon	
  is	
  
three	
  to	
  a	
  few	
  
hundred	
  
instances	
  
across	
  three	
  
AWS	
  zones	
  
Component	
  Micro-­‐Services	
  
Test	
  With	
  Chaos	
  Monkey,	
  Latency	
  Monkey	
  
Three	
  Balanced	
  Availability	
  Zones	
  
Test	
  with	
  Chaos	
  Gorilla	
  
Cassandra	
  and	
  Evcache	
  
Replicas	
  
Zone	
  A	
  
Cassandra	
  and	
  Evcache	
  
Replicas	
  
Zone	
  B	
  
Cassandra	
  and	
  Evcache	
  
Replicas	
  
Zone	
  C	
  
Load	
  Balancers	
  
Triple	
  Replicated	
  Persistence	
  
Cassandra	
  maintenance	
  affects	
  individual	
  replicas	
  	
  
Cassandra	
  and	
  Evcache	
  
Replicas	
  
Zone	
  A	
  
Cassandra	
  and	
  Evcache	
  
Replicas	
  
Zone	
  B	
  
Cassandra	
  and	
  Evcache	
  
Replicas	
  
Zone	
  C	
  
Load	
  Balancers	
  
Isolated	
  Regions	
  
Cassandra	
  Replicas	
  
Zone	
  A	
  
Cassandra	
  Replicas	
  
Zone	
  B	
  
Cassandra	
  Replicas	
  
Zone	
  C	
  
US-­‐East	
  Load	
  Balancers	
  
Cassandra	
  Replicas	
  
Zone	
  A	
  
Cassandra	
  Replicas	
  
Zone	
  B	
  
Cassandra	
  Replicas	
  
Zone	
  C	
  
EU-­‐West	
  Load	
  Balancers	
  
Failure	
  Modes	
  and	
  Effects	
  
Failure	
  Mode	
   Probability	
   Current	
  Mi2ga2on	
  Plan	
  
Applica2on	
  Failure	
   High	
   Automa2c	
  degraded	
  response	
  
AWS	
  Region	
  Failure	
   Low	
   Switch	
  traffic	
  between	
  regions	
  
AWS	
  Zone	
  Failure	
   Medium	
   Con2nue	
  to	
  run	
  on	
  2	
  out	
  of	
  3	
  zones	
  
Datacenter	
  Failure	
   Medium	
   Migrate	
  more	
  func2ons	
  to	
  cloud	
  
Data	
  store	
  failure	
   Low	
   Restore	
  from	
  S3	
  backups	
  
S3	
  failure	
   Low	
   Restore	
  from	
  remote	
  archive	
  
Un2l	
  we	
  got	
  really	
  good	
  at	
  mi2ga2ng	
  high	
  and	
  medium	
  
probability	
  failures,	
  the	
  ROI	
  for	
  mi2ga2ng	
  regional	
  
failures	
  didn’t	
  make	
  sense.	
  Working	
  on	
  it	
  now.	
  
An2fragile	
  Tes2ng	
  
hFp://techblog.ne8lix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html	
  
•  Chaos	
  Monkey	
  makes	
  sure	
  systems	
  are	
  resilient	
  
– Kill	
  individual	
  instances	
  without	
  customer	
  impact	
  
•  Chaos	
  Gorilla	
  shuts	
  down	
  en2re	
  zone	
  
– Run	
  in	
  produc2on	
  once	
  every	
  3	
  months	
  
•  Latency	
  Monkey	
  
– Injects	
  extra	
  latency	
  and	
  error	
  return	
  codes	
  
Monkeys	
  
Edda	
  –	
  Configura2on	
  History	
  
hFp://techblog.ne8lix.com/2012/11/edda-­‐learn-­‐stories-­‐of-­‐your-­‐cloud.html	
  
Edda	
  
AWS	
  
Instances,	
  
ASGs,	
  etc.	
  
Eureka	
  
Services	
  
metadata	
  
AppDynamics	
  
Request	
  flow	
  
Edda	
  Query	
  Examples	
  
Find	
  any	
  instances	
  that	
  have	
  ever	
  had	
  a	
  specific	
  public	
  IP	
  address!
$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"!
["i-0123456789","i-012345678a","i-012345678b”]!
!
Show	
  the	
  most	
  recent	
  change	
  to	
  a	
  security	
  group!
$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"!
--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810!
+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504!
@@ -1,33 +1,33 @@!
{!
…!
"ipRanges" : [!
"10.10.1.1/32",!
"10.10.1.2/32",!
+ "10.10.1.3/32",!
- "10.10.1.4/32"!
…!
}!
	
  
Highly	
  Available	
  Storage	
  
A	
  highly	
  scalable,	
  available	
  and	
  
durable	
  deployment	
  paFern	
  based	
  
on	
  Apache	
  Cassandra	
  
Single	
  Func2on	
  Micro-­‐Service	
  PaFern	
  
One	
  keyspace,	
  replaces	
  a	
  single	
  table	
  or	
  materialized	
  view	
  
Single	
  func2on	
  Cassandra	
  
Cluster	
  Managed	
  by	
  Priam	
  
Between	
  6	
  and	
  144	
  nodes	
  
Stateless	
  Data	
  Access	
  REST	
  Service	
  
Astyanax	
  Cassandra	
  Client	
  
Op2onal	
  
Datacenter	
  
Update	
  Flow	
  
Many	
  Different	
  Single-­‐Func2on	
  REST	
  Clients	
  
Appdynamics	
  Service	
  Flow	
  Visualiza2on	
  
Each	
  icon	
  represents	
  a	
  horizontally	
  scaled	
  service	
  of	
  three	
  to	
  
hundreds	
  of	
  instances	
  deployed	
  over	
  three	
  availability	
  zones	
  
Over	
  50	
  Cassandra	
  clusters	
  
Over	
  1000	
  nodes	
  
Over	
  30TB	
  backup	
  
Over	
  1M	
  writes/s/cluster	
  
Stateless	
  Micro-­‐Service	
  Architecture	
  
Linux	
  Base	
  AMI	
  (CentOS	
  or	
  Ubuntu)	
  
Op2onal	
  
Apache	
  
frontend,	
  
memcached,	
  
non-­‐java	
  apps	
  
Monitoring	
  
Log	
  rota2on	
  
to	
  S3	
  
AppDynamics	
  
machineagent	
  
Epic/Atlas	
  	
  
Java	
  (JDK	
  6	
  or	
  7)	
  
AppDynamics	
  
appagent	
  
monitoring	
  
GC	
  and	
  thread	
  
dump	
  logging	
  
Tomcat	
  
Applica2on	
  war	
  file,	
  base	
  
servlet,	
  pla8orm,	
  client	
  
interface	
  jars,	
  Astyanax	
  
Healthcheck,	
  status	
  
servlets,	
  JMX	
  interface,	
  
Servo	
  autoscale	
  
Cassandra	
  Instance	
  Architecture	
  
Linux	
  Base	
  AMI	
  (CentOS	
  or	
  Ubuntu)	
  
Tomcat	
  and	
  
Priam	
  on	
  JDK	
  
Healthcheck,	
  
Status	
  
Monitoring	
  
AppDynamics	
  
machineagent	
  
Epic/Atlas	
  	
  
Java	
  (JDK	
  7)	
  
AppDynamics	
  
appagent	
  
monitoring	
  
GC	
  and	
  thread	
  
dump	
  logging	
  
Cassandra	
  Server	
  
Local	
  Ephemeral	
  Disk	
  Space	
  –	
  2TB	
  of	
  SSD	
  or	
  1.6TB	
  disk	
  
holding	
  Commit	
  log	
  and	
  SSTables	
  
Priam	
  –	
  Cassandra	
  Automa2on	
  
Available	
  at	
  hFp://github.com/ne8lix	
  
•  Ne8lix	
  Pla8orm	
  Tomcat	
  Code	
  
•  Zero	
  touch	
  auto-­‐configura2on	
  
•  State	
  management	
  for	
  Cassandra	
  JVM	
  
•  Token	
  alloca2on	
  and	
  assignment	
  
•  Broken	
  node	
  auto-­‐replacement	
  
•  Full	
  and	
  incremental	
  backup	
  to	
  S3	
  
•  Restore	
  sequencing	
  from	
  S3	
  
•  Grow/Shrink	
  Cassandra	
  “ring”	
  
ETL	
  for	
  Cassandra	
  
•  Data	
  is	
  de-­‐normalized	
  over	
  many	
  clusters!	
  
•  Too	
  many	
  to	
  restore	
  from	
  backups	
  for	
  ETL	
  
•  Solu2on	
  –	
  read	
  backup	
  files	
  using	
  Hadoop	
  
•  Aegisthus	
  
–  hFp://techblog.ne8lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html	
  
– High	
  throughput	
  raw	
  SSTable	
  processing	
  
– Re-­‐normalizes	
  many	
  clusters	
  to	
  a	
  consistent	
  view	
  
– Extract,	
  Transform,	
  then	
  Load	
  into	
  Teradata	
  
Cloud	
  Na2ve	
  Big	
  Data	
  
Size	
  the	
  cluster	
  to	
  the	
  data	
  
Size	
  the	
  cluster	
  to	
  the	
  ques2ons	
  
Never	
  wait	
  for	
  space	
  or	
  answers	
  
Ne8lix	
  Dataoven	
  
Data	
  Warehouse	
  
Over	
  2	
  Petabytes	
  
Ursula	
  
Aegisthus	
  
Data	
  Pipelines	
  
From	
  cloud	
  	
  
Services	
  
~100	
  Billion	
  
Events/day	
  
	
  
From	
  C*	
  
Terabytes	
  of	
  
Dimension	
  
data	
  
Hadoop	
  Clusters	
  –	
  AWS	
  EMR	
  
1300	
  nodes	
   800	
  nodes	
   Mul2ple	
  150	
  nodes	
  Nightly	
  
RDS	
  
Metadata	
  
Gateways	
  
Tools	
  
Global	
  Architecture	
  
Local	
  Client	
  Traffic	
  to	
  Cassandra	
  
Synchronous	
  Replica2on	
  Across	
  Zones	
  
Asynchronous	
  Replica2on	
  Across	
  Regions	
  
Astyanax	
  Cassandra	
  Client	
  for	
  Java	
  
Available	
  at	
  hFp://github.com/ne8lix	
  
•  Features	
  
–  Complete	
  abstrac2on	
  of	
  connec2on	
  pool	
  from	
  RPC	
  protocol	
  
–  Fluent	
  Style	
  API	
  
–  Opera2on	
  retry	
  with	
  backoff	
  
–  Token	
  aware	
  
•  Recipes	
  
–  Distributed	
  row	
  lock	
  (without	
  zookeeper)	
  
–  Mul2-­‐region	
  row	
  lock	
  
–  Uniqueness	
  constraint	
  
–  Mul2-­‐row	
  uniqueness	
  constraint	
  
–  Chunked	
  and	
  mul2-­‐threaded	
  large	
  file	
  storage	
  
–  Reverse	
  index	
  search	
  
–  All	
  rows	
  query	
  
–  Durable	
  message	
  queue	
  
Astyanax	
  -­‐	
  Cassandra	
  Write	
  Data	
  Flows	
  
Single	
  Region,	
  Mul2ple	
  Availability	
  Zone,	
  Token	
  Aware	
  
Token	
  
Aware	
  
Clients	
  
Cassandra	
  
• Disks	
  
• Zone	
  A	
  
Cassandra	
  
• Disks	
  
• Zone	
  B	
  
Cassandra	
  
• Disks	
  
• Zone	
  C	
  
Cassandra	
  
• Disks	
  
• Zone	
  A	
  
Cassandra	
  
• Disks	
  
• Zone	
  B	
  
Cassandra	
  
• Disks	
  
• Zone	
  C	
  
1.  Client	
  Writes	
  to	
  local	
  
coordinator	
  
2.  Coodinator	
  writes	
  to	
  
other	
  zones	
  
3.  Nodes	
  return	
  ack	
  
4.  Data	
  wriFen	
  to	
  
internal	
  commit	
  log	
  
disks	
  (no	
  more	
  than	
  
10	
  seconds	
  later)	
  
If	
  a	
  node	
  goes	
  offline,	
  
hinted	
  handoff	
  
completes	
  the	
  write	
  
when	
  the	
  node	
  comes	
  
back	
  up.	
  
	
  
Requests	
  can	
  choose	
  to	
  
wait	
  for	
  one	
  node,	
  a	
  
quorum,	
  or	
  all	
  nodes	
  to	
  
ack	
  the	
  write	
  
	
  
SSTable	
  disk	
  writes	
  and	
  
compac2ons	
  occur	
  
asynchronously	
  
1
4	
  
4	
  
4
2	
  
3	
  
3	
  
3	
  
2	
  
Data	
  Flows	
  for	
  Mul2-­‐Region	
  Writes	
  
Token	
  Aware,	
  Consistency	
  Level	
  =	
  Local	
  Quorum	
  
US	
  
Clients	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  A	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  B	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  C	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  A	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  B	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  C	
  
1.  Client	
  writes	
  to	
  local	
  replicas	
  
2.  Local	
  write	
  acks	
  returned	
  to	
  
Client	
  which	
  con2nues	
  when	
  
2	
  of	
  3	
  local	
  nodes	
  are	
  
commiFed	
  
3.  Local	
  coordinator	
  writes	
  to	
  
remote	
  coordinator.	
  	
  
4.  When	
  data	
  arrives,	
  remote	
  
coordinator	
  node	
  acks	
  and	
  
copies	
  to	
  other	
  remote	
  zones	
  
5.  Remote	
  nodes	
  ack	
  to	
  local	
  
coordinator	
  
6.  Data	
  flushed	
  to	
  internal	
  
commit	
  log	
  disks	
  (no	
  more	
  
than	
  10	
  seconds	
  later)	
  
If	
  a	
  node	
  or	
  region	
  goes	
  offline,	
  hinted	
  handoff	
  
completes	
  the	
  write	
  when	
  the	
  node	
  comes	
  back	
  up.	
  
Nightly	
  global	
  compare	
  and	
  repair	
  jobs	
  ensure	
  
everything	
  stays	
  consistent.	
  
EU	
  
Clients	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  A	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  B	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  C	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  A	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  B	
  
Cassandra	
  
•  Disks	
  
•  Zone	
  C	
  
6	
  
5	
  
5	
  
6	
   6	
  
4	
  
4	
  
4	
  
1	
  
6	
  
6	
  
6	
  
2	
  
2	
  
2	
  
3	
  
100+ms	
  latency	
  
Cross	
  Region	
  Use	
  Cases	
  
•  Geographic	
  Isola2on	
  
– US	
  to	
  Europe	
  replica2on	
  of	
  subscriber	
  data	
  
– Read	
  intensive,	
  low	
  update	
  rate	
  
– Produc2on	
  use	
  since	
  late	
  2011	
  
•  Redundancy	
  for	
  regional	
  failover	
  
– US	
  East	
  to	
  US	
  West	
  replica2on	
  of	
  everything	
  
– Includes	
  write	
  intensive	
  data,	
  high	
  update	
  rate	
  
– Tes2ng	
  now	
  
Benchmarking	
  Global	
  Cassandra	
  
Write	
  intensive	
  test	
  of	
  cross	
  region	
  capacity	
  
16	
  x	
  hi1.4xlarge	
  SSD	
  nodes	
  per	
  zone	
  =	
  96	
  total	
  
Cassandra	
  Replicas	
  
Zone	
  A	
  
Cassandra	
  Replicas	
  
Zone	
  B	
  
Cassandra	
  Replicas	
  
Zone	
  C	
  
US-­‐West-­‐2	
  Region	
  -­‐	
  Oregon	
  
Cassandra	
  Replicas	
  
Zone	
  A	
  
Cassandra	
  Replicas	
  
Zone	
  B	
  
Cassandra	
  Replicas	
  
Zone	
  C	
  
US-­‐East-­‐1	
  Region	
  -­‐	
  Virginia	
  
Test	
  
Load	
  
Test	
  
Load	
  
Valida2on	
  
Load	
  
Inter-­‐Zone	
  Traffic	
  
1	
  Million	
  writes	
  
CL.ONE	
  
1	
  Million	
  reads	
  
CL.ONE	
  with	
  no	
  
Data	
  loss	
  
Inter-­‐Region	
  Traffic	
  
S3	
  
Copying	
  18TB	
  from	
  East	
  to	
  West	
  
Cassandra	
  bootstrap	
  9.3	
  Gbit/s	
  single	
  threaded	
  48	
  nodes	
  to	
  48	
  nodes	
  
Thanks	
  to	
  boundary.com	
  for	
  these	
  network	
  analysis	
  plots	
  
Inter	
  Region	
  Traffic	
  Test	
  
Verified	
  at	
  desired	
  capacity,	
  no	
  problems,	
  339	
  MB/s,	
  83ms	
  latency	
  
Ramp	
  Up	
  Load	
  Un2l	
  It	
  Breaks!	
  
Unmodified	
  tuning,	
  dropping	
  client	
  data	
  at	
  1.93GB/s	
  inter	
  region	
  traffic	
  
Spare	
  CPU,	
  IOPS,	
  Network,	
  just	
  need	
  some	
  Cassandra	
  tuning	
  for	
  more	
  
Managing	
  Mul2-­‐Region	
  Availability	
  
Cassandra	
  Replicas	
  
Zone	
  A	
  
Cassandra	
  Replicas	
  
Zone	
  B	
  
Cassandra	
  Replicas	
  
Zone	
  C	
  
Regional	
  Load	
  Balancers	
  
Cassandra	
  Replicas	
  
Zone	
  A	
  
Cassandra	
  Replicas	
  
Zone	
  B	
  
Cassandra	
  Replicas	
  
Zone	
  C	
  
Regional	
  Load	
  Balancers	
  
UltraDNS	
  
DynECT	
  
DNS	
  
AWS	
  
Route53	
  
Denominator	
  –	
  manage	
  traffic	
  via	
  mul2ple	
  DNS	
  providers	
  
Denominator	
  
Boos2ng	
  the	
  @Ne8lixOSS	
  Ecosystem	
  
See	
  ne8lix.github.com	
  
Judges	
  
Aino	
  Corry	
  
Program	
  Chair	
  for	
  Qcon/GOTO	
  
Mar2n	
  Fowler	
  
Chief	
  Scien2st	
  Thoughtworks	
  Simon	
  Wardley	
  
Strategist	
  
Yury	
  Izrailevsky	
  
VP	
  Cloud	
  Ne8lix	
  
Werner	
  Vogels	
  
CTO	
  Amazon	
   Joe	
  Weinman	
  
SVP	
  Telx,	
  Author	
  “Cloudonomics”	
  
What	
  do	
  you	
  win?	
  
One	
  winner	
  in	
  each	
  of	
  the	
  10	
  categories	
  
Ticket	
  and	
  expenses	
  to	
  aFend	
  AWS	
  
Re:Invent	
  2013	
  in	
  Las	
  Vegas	
  
A	
  Trophy	
  
A	
  Cloud	
  Na2ve	
  Open	
  Source	
  Pla8orm	
  
See	
  ne8lix.github.com	
  
Ne8lix	
  Pla8orm	
  Evolu2on	
  
Bleeding	
  Edge	
  
Innova2on	
  
Common	
  
PaFern	
  
Shared	
  
PaFern	
  
2009-­‐2010	
   2011-­‐2012	
   2013-­‐2014	
  
Ne8lix	
  ended	
  up	
  several	
  years	
  ahead	
  of	
  the	
  
industry,	
  but	
  it’s	
  becoming	
  commodi2zed	
  now	
  
Establish	
  our	
  
solu2ons	
  as	
  Best	
  
Prac2ces	
  /	
  Standards	
  
Hire,	
  Retain	
  and	
  
Engage	
  Top	
  
Engineers	
  
Build	
  up	
  Ne8lix	
  
Technology	
  Brand	
  
Benefit	
  from	
  a	
  
shared	
  ecosystem	
  
Goals	
  
How	
  does	
  it	
  all	
  fit	
  together?	
  
Example	
  Applica2on	
  –	
  RSS	
  Reader	
  
Github	
  
Ne8lixOSS	
  
Source	
  
AWS	
  
Base	
  AMI	
  
Maven	
  
Central	
  
Cloudbees	
  
Jenkins	
  
Aminator	
  
Bakery	
  
Dynaslave	
  
AWS	
  Build	
  
Slaves	
  
Asgard	
  
(+	
  Frigga)	
  
Console	
  
AWS	
  
Baked	
  AMIs	
  
Odin	
  
Orchestra2on	
  
API	
  
AWS	
  
Account	
  
Ne8lixOSS	
  Con2nuous	
  Build	
  and	
  Deployment	
  
AWS	
  Account	
  
Asgard	
  Console	
  
Archaius	
  	
  
Config	
  Service	
  
Cross	
  region	
  
Priam	
  C*	
  
Pytheas	
  
Dashboards	
  
Atlas	
  
Monitoring	
  
Genie,	
  Lips2ck	
  
Hadoop	
  Services	
  
AWS	
  Usage	
  
Cost	
  Monitoring	
  
Mul2ple	
  AWS	
  Regions	
  
Eureka	
  Registry	
  
Exhibitor	
  ZK	
  
Edda	
  History	
  
Simian	
  Army	
  
Zuul	
  Traffic	
  Mgr	
  
3	
  AWS	
  Zones	
  
Applica2on	
  
Clusters	
  
Autoscale	
  Groups	
  
Instances	
  
Priam	
  
Cassandra	
  
Persistent	
  Storage	
  
Evcache	
  
Memcached	
  
Ephemeral	
  Storage	
  
Ne8lixOSS	
  Services	
  Scope	
  
• Baked	
  AMI	
  –	
  Tomcat,	
  Apache,	
  your	
  code	
  
• Governator	
  –	
  Guice	
  based	
  dependency	
  injec2on	
  
• Archaius	
  –	
  dynamic	
  configura2on	
  proper2es	
  client	
  
• Eureka	
  -­‐	
  service	
  registra2on	
  client	
  
Ini2aliza2on	
  
• Karyon	
  -­‐	
  Base	
  Server	
  for	
  inbound	
  requests	
  
• RxJava	
  –	
  Reac2ve	
  paFern	
  
• Hystrix/Turbine	
  –	
  dependencies	
  and	
  real-­‐2me	
  status	
  
• Ribbon	
  -­‐	
  REST	
  Client	
  for	
  outbound	
  calls	
  
Service	
  
Requests	
  
• Astyanax	
  –	
  Cassandra	
  client	
  and	
  paFern	
  library	
  
• Evcache	
  –	
  Zone	
  aware	
  Memcached	
  client	
  
• Curator	
  –	
  Zookeeper	
  paFerns	
  
• Denominator	
  –	
  DNS	
  rou2ng	
  abstrac2on	
  
Data	
  Access	
  
• Blitz4j	
  –	
  non-­‐blocking	
  logging	
  
• Servo	
  –	
  metrics	
  export	
  for	
  autoscaling	
  
• Atlas	
  –	
  high	
  volume	
  instrumenta2on	
  
Logging	
  
Ne8lixOSS	
  Instance	
  Libraries	
  
Dashboards	
  with	
  Pytheas	
  (Explorers)	
  
hFp://techblog.ne8lix.com/2013/05/announcing-­‐pytheas.html	
  
•  Cassandra	
  Explorer	
  
– Browse	
  clusters,	
  keyspaces,	
  column	
  families	
  
•  Base	
  Server	
  Explorer	
  
– Browse	
  service	
  endpoints	
  configura2on,	
  perf	
  
•  Anything	
  else	
  you	
  want	
  to	
  build…	
  
Cassandra	
  Explorer	
  
Cassandra	
  Explorer	
  
Cassandra	
  Clusters	
  
AWS	
  Usage	
  (coming	
  soon)	
  
Reserva2on-­‐aware	
  cost	
  monitoring	
  and	
  repor2ng	
  
More	
  Use	
  Cases	
  
More	
  
Features	
  
BeFer	
  portability	
  
	
  
Higher	
  availability	
  
	
  
Easier	
  to	
  deploy	
  
	
  
Contribu2ons	
  from	
  end	
  users	
  
	
  
Contribu2ons	
  from	
  vendors	
  
	
  
What’s	
  Coming	
  Next?	
  
Func2onality	
  and	
  scale	
  now,	
  portability	
  coming	
  
	
  
Moving	
  from	
  parts	
  to	
  a	
  pla8orm	
  in	
  2013	
  
	
  
Ne8lix	
  is	
  fostering	
  a	
  cloud	
  na2ve	
  ecosystem	
  
	
  
Rapid	
  Evolu2on	
  -­‐	
  Low	
  MTBIAMSH	
  
(Mean	
  Time	
  Between	
  Idea	
  And	
  Making	
  Stuff	
  Happen)	
  
Takeaway	
  
	
  	
  
Ne$lixOSS	
  makes	
  it	
  easier	
  for	
  everyone	
  to	
  become	
  Cloud	
  Na:ve	
  
	
  
	
  
	
  
@adrianco	
  #ne8lixcloud	
  @Ne8lixOSS	
  
Slideshare	
  Ne8lixOSS	
  Details	
  
•  Lightning	
  Talks	
  Feb	
  S1E1	
  
–  hFp://www.slideshare.net/RuslanMeshenberg/ne8lixoss-­‐open-­‐house-­‐lightning-­‐talks	
  
•  Asgard	
  In	
  Depth	
  Feb	
  S1E1	
  
–  hFp://www.slideshare.net/joesondow/asgard-­‐overview-­‐from-­‐ne8lix-­‐oss-­‐open-­‐house	
  
•  Lightning	
  Talks	
  March	
  S1E2	
  
–  hFp://www.slideshare.net/RuslanMeshenberg/ne8lixoss-­‐meetup-­‐lightning-­‐talks-­‐and-­‐
roadmap	
  
•  Security	
  Architecture	
  
–  hFp://www.slideshare.net/jason_chan/	
  
•  Cost	
  Aware	
  Cloud	
  Architectures	
  –	
  with	
  Jinesh	
  Varia	
  of	
  AWS	
  
–  hFp://www.slideshare.net/AmazonWebServices/building-­‐costaware-­‐architectures-­‐jinesh-­‐
varia-­‐aws-­‐and-­‐adrian-­‐cockro@-­‐ne8lix	
  	
  
Amazon Cloud Terminology Reference
See http://aws.amazon.com/ This is not a full list of Amazon Web Service features
•  AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•  AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applica2on	
  code)	
  
•  EC2	
  –	
  Elas2c	
  Compute	
  Cloud	
  
–  Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configura2ons.	
  
–  Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
–  Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
–  Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hos2ng	
  cloud	
  instances	
  
–  Region	
  –	
  group	
  of	
  Avail	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan,	
  SA-­‐Brazil,	
  US-­‐Gov	
  
•  ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  boo2ng	
  from	
  the	
  same	
  AMI)	
  
•  S3	
  –	
  Simple	
  Storage	
  Service	
  (hFp	
  access)	
  
•  EBS	
  –	
  Elas2c	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•  RDS	
  –	
  Rela2onal	
  Database	
  Service	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•  DynamoDB/SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  hFp	
  based	
  NoSQL	
  datastore,	
  DynamoDB	
  replaces	
  SDB)	
  
•  SQS	
  –	
  Simple	
  Queue	
  Service	
  (hFp	
  based	
  message	
  queue)	
  
•  SNS	
  –	
  Simple	
  No2fica2on	
  Service	
  (hFp	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•  EMR	
  –	
  Elas2c	
  Map	
  Reduce	
  (automa2cally	
  managed	
  Hadoop	
  cluster)	
  
•  ELB	
  –	
  Elas2c	
  Load	
  Balancer	
  
•  EIP	
  –	
  Elas2c	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•  VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (single	
  tenant,	
  more	
  flexible	
  network	
  and	
  security	
  constructs)	
  
•  DirectConnect	
  –	
  secure	
  pipe	
  from	
  AWS	
  VPC	
  to	
  external	
  datacenter	
  
•  IAM	
  –	
  Iden2ty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  

Más contenido relacionado

La actualidad más candente

AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudSid Anand
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Asgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudAsgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudJoe Sondow
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The CloudAmazon Web Services
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012Amazon Web Services
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 

La actualidad más candente (20)

AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Netflix in the Cloud
Netflix in the CloudNetflix in the Cloud
Netflix in the Cloud
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the CloudIntuit CTOF 2011 - Netflix for Mobile in the Cloud
Intuit CTOF 2011 - Netflix for Mobile in the Cloud
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
Asgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the CloudAsgard, the Grails App that Deploys Netflix to the Cloud
Asgard, the Grails App that Deploys Netflix to the Cloud
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 

Destacado

Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleJason Chan
 
From Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityFrom Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityJason Chan
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedJason Chan
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedJason Chan
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services SecurityJason Chan
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4aspyker
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesMatt McCarthy
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thingaspyker
 
Careers in Security
Careers in SecurityCareers in Security
Careers in SecurityJason Chan
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security AutomationJason Chan
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and SecurityJason Chan
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from AbuseJason Chan
 

Destacado (18)

Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Resilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and ScaleResilience and Compliance at Speed and Scale
Resilience and Compliance at Speed and Scale
 
From Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product SecurityFrom Gates to Guardrails: Alternate Approaches to Product Security
From Gates to Guardrails: Alternate Approaches to Product Security
 
Analyze System and Code Interactions
Analyze System and Code InteractionsAnalyze System and Code Interactions
Analyze System and Code Interactions
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV Devices
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 
Careers in Security
Careers in SecurityCareers in Security
Careers in Security
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security Automation
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Splitting the Check on Compliance and Security
Splitting the Check on Compliance and SecuritySplitting the Check on Compliance and Security
Splitting the Check on Compliance and Security
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from Abuse
 

Similar a Netflix Global Applications - NoSQL Search Roadshow

Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013MassTLC
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Microservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSMicroservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSDiego Pacheco
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Spark Summit
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Szabolcs Zajdó
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage makerPhilipBasford
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerAWSCOMSUM
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...Docker, Inc.
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud OutageNati Shalom
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...Amazon Web Services
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 

Similar a Netflix Global Applications - NoSQL Search Roadshow (20)

Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013Netflix presents at MassTLC Cloud Summit 2013
Netflix presents at MassTLC Cloud Summit 2013
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Microservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWSMicroservices reativos usando a stack do Netflix na AWS
Microservices reativos usando a stack do Netflix na AWS
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
Cloud Talk
Cloud TalkCloud Talk
Cloud Talk
 
JOSA TechTalks - Downgrade your Costs
JOSA TechTalks - Downgrade your CostsJOSA TechTalks - Downgrade your Costs
JOSA TechTalks - Downgrade your Costs
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
 
Avoiding Cloud Outage
Avoiding Cloud OutageAvoiding Cloud Outage
Avoiding Cloud Outage
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
AWS re:Invent 2016: Get Technically Inspired by Container-Powered Migrations ...
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 

Más de Adrian Cockcroft

Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 

Más de Adrian Cockcroft (8)

Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
NoSQL for Netflix
NoSQL for NetflixNoSQL for Netflix
NoSQL for Netflix
 

Último

Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 

Último (20)

Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 

Netflix Global Applications - NoSQL Search Roadshow

  • 1. Globally  Distributed  Cloud   Na2ve  Applica2ons  at  Ne8lix   June  2013   Adrian  Cockcro@   @adrianco  #ne8lixcloud  @Ne8lixOSS   hFp://www.linkedin.com/in/adriancockcro@  
  • 2. Cloud  Na2ve   Global  Architecture   Ne8lixOSS  Components  
  • 4. We  are  Engineers   We  solve  hard  problems   We  build  amazing  and  complex  things   We  fix  things  when  they  break  
  • 5. We  strive  for  perfec2on   Perfect  code   Perfect  hardware   Perfectly  operated  
  • 6. But  perfec2on  takes  too  long…   So  we  compromise   Time  to  market  vs.  Quality   Utopia  remains  out  of  reach  
  • 7. Where  2me  to  market  wins  big   Making  a  land-­‐grab   Disrup2ng  compe2tors  (OODA)   Anything  delivered  as  web  services    
  • 8. How  Soon?   Code  features  in  days  instead  of  months   Get  hardware  in  minutes  instead  of  weeks   Incident  response  in  seconds  instead  of  hours  
  • 9. Tipping  the  Balance   Utopia   Dystopia   Inefficient   Broken   Dynamic   Sooner   Cheaper   BeFer   Sta2c  
  • 10. A  new  engineering  challenge   Construct  a  highly  agile  and  highly   available  service  from  ephemeral  and   o@en  broken  components  
  • 12. Ne8lix  Streaming   A  Cloud  Na2ve  Applica2on  based  on   an  open  source  pla8orm  
  • 13. Ne8lix  Member  Web  Site  Home  Page   Personaliza2on  Driven  –  How  Does  It  Work?  
  • 14. How  Ne8lix  Streaming  Works   Customer  Device   (PC,  PS3,  TV…)   Web  Site  or   Discovery  API   User  Data   Personaliza2on   Streaming  API   DRM   QoS  Logging   OpenConnect   CDN  Boxes   CDN   Management  and   Steering   Content  Encoding   Consumer   Electronics   AWS  Cloud   Services   CDN  Edge   Loca2ons  
  • 15. Amazon  Video                        1.31%     18x  Prime   25x  Prime   Nov   2012   Streaming   Bandwidth   March   2013     Mean   Bandwidth   +39%  6mo  
  • 16. Real  Web  Server  Dependencies  Flow   (Ne8lix  Home  page  business  transac2on  as  seen  by  AppDynamics)   Start  Here   memcached   Cassandra   Web  service   S3  bucket   Personaliza2on  movie  group  choosers   (for  US,  Canada  and  Latam)   Each  icon  is   three  to  a  few   hundred   instances   across  three   AWS  zones  
  • 17. Component  Micro-­‐Services   Test  With  Chaos  Monkey,  Latency  Monkey  
  • 18. Three  Balanced  Availability  Zones   Test  with  Chaos  Gorilla   Cassandra  and  Evcache   Replicas   Zone  A   Cassandra  and  Evcache   Replicas   Zone  B   Cassandra  and  Evcache   Replicas   Zone  C   Load  Balancers  
  • 19. Triple  Replicated  Persistence   Cassandra  maintenance  affects  individual  replicas     Cassandra  and  Evcache   Replicas   Zone  A   Cassandra  and  Evcache   Replicas   Zone  B   Cassandra  and  Evcache   Replicas   Zone  C   Load  Balancers  
  • 20. Isolated  Regions   Cassandra  Replicas   Zone  A   Cassandra  Replicas   Zone  B   Cassandra  Replicas   Zone  C   US-­‐East  Load  Balancers   Cassandra  Replicas   Zone  A   Cassandra  Replicas   Zone  B   Cassandra  Replicas   Zone  C   EU-­‐West  Load  Balancers  
  • 21. Failure  Modes  and  Effects   Failure  Mode   Probability   Current  Mi2ga2on  Plan   Applica2on  Failure   High   Automa2c  degraded  response   AWS  Region  Failure   Low   Switch  traffic  between  regions   AWS  Zone  Failure   Medium   Con2nue  to  run  on  2  out  of  3  zones   Datacenter  Failure   Medium   Migrate  more  func2ons  to  cloud   Data  store  failure   Low   Restore  from  S3  backups   S3  failure   Low   Restore  from  remote  archive   Un2l  we  got  really  good  at  mi2ga2ng  high  and  medium   probability  failures,  the  ROI  for  mi2ga2ng  regional   failures  didn’t  make  sense.  Working  on  it  now.  
  • 22. An2fragile  Tes2ng   hFp://techblog.ne8lix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html   •  Chaos  Monkey  makes  sure  systems  are  resilient   – Kill  individual  instances  without  customer  impact   •  Chaos  Gorilla  shuts  down  en2re  zone   – Run  in  produc2on  once  every  3  months   •  Latency  Monkey   – Injects  extra  latency  and  error  return  codes  
  • 23. Monkeys   Edda  –  Configura2on  History   hFp://techblog.ne8lix.com/2012/11/edda-­‐learn-­‐stories-­‐of-­‐your-­‐cloud.html   Edda   AWS   Instances,   ASGs,  etc.   Eureka   Services   metadata   AppDynamics   Request  flow  
  • 24. Edda  Query  Examples   Find  any  instances  that  have  ever  had  a  specific  public  IP  address! $ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"! ["i-0123456789","i-012345678a","i-012345678b”]! ! Show  the  most  recent  change  to  a  security  group! $ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"! --- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810! +++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504! @@ -1,33 +1,33 @@! {! …! "ipRanges" : [! "10.10.1.1/32",! "10.10.1.2/32",! + "10.10.1.3/32",! - "10.10.1.4/32"! …! }!  
  • 25. Highly  Available  Storage   A  highly  scalable,  available  and   durable  deployment  paFern  based   on  Apache  Cassandra  
  • 26. Single  Func2on  Micro-­‐Service  PaFern   One  keyspace,  replaces  a  single  table  or  materialized  view   Single  func2on  Cassandra   Cluster  Managed  by  Priam   Between  6  and  144  nodes   Stateless  Data  Access  REST  Service   Astyanax  Cassandra  Client   Op2onal   Datacenter   Update  Flow   Many  Different  Single-­‐Func2on  REST  Clients   Appdynamics  Service  Flow  Visualiza2on   Each  icon  represents  a  horizontally  scaled  service  of  three  to   hundreds  of  instances  deployed  over  three  availability  zones   Over  50  Cassandra  clusters   Over  1000  nodes   Over  30TB  backup   Over  1M  writes/s/cluster  
  • 27. Stateless  Micro-­‐Service  Architecture   Linux  Base  AMI  (CentOS  or  Ubuntu)   Op2onal   Apache   frontend,   memcached,   non-­‐java  apps   Monitoring   Log  rota2on   to  S3   AppDynamics   machineagent   Epic/Atlas     Java  (JDK  6  or  7)   AppDynamics   appagent   monitoring   GC  and  thread   dump  logging   Tomcat   Applica2on  war  file,  base   servlet,  pla8orm,  client   interface  jars,  Astyanax   Healthcheck,  status   servlets,  JMX  interface,   Servo  autoscale  
  • 28. Cassandra  Instance  Architecture   Linux  Base  AMI  (CentOS  or  Ubuntu)   Tomcat  and   Priam  on  JDK   Healthcheck,   Status   Monitoring   AppDynamics   machineagent   Epic/Atlas     Java  (JDK  7)   AppDynamics   appagent   monitoring   GC  and  thread   dump  logging   Cassandra  Server   Local  Ephemeral  Disk  Space  –  2TB  of  SSD  or  1.6TB  disk   holding  Commit  log  and  SSTables  
  • 29. Priam  –  Cassandra  Automa2on   Available  at  hFp://github.com/ne8lix   •  Ne8lix  Pla8orm  Tomcat  Code   •  Zero  touch  auto-­‐configura2on   •  State  management  for  Cassandra  JVM   •  Token  alloca2on  and  assignment   •  Broken  node  auto-­‐replacement   •  Full  and  incremental  backup  to  S3   •  Restore  sequencing  from  S3   •  Grow/Shrink  Cassandra  “ring”  
  • 30. ETL  for  Cassandra   •  Data  is  de-­‐normalized  over  many  clusters!   •  Too  many  to  restore  from  backups  for  ETL   •  Solu2on  –  read  backup  files  using  Hadoop   •  Aegisthus   –  hFp://techblog.ne8lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html   – High  throughput  raw  SSTable  processing   – Re-­‐normalizes  many  clusters  to  a  consistent  view   – Extract,  Transform,  then  Load  into  Teradata  
  • 31. Cloud  Na2ve  Big  Data   Size  the  cluster  to  the  data   Size  the  cluster  to  the  ques2ons   Never  wait  for  space  or  answers  
  • 32. Ne8lix  Dataoven   Data  Warehouse   Over  2  Petabytes   Ursula   Aegisthus   Data  Pipelines   From  cloud     Services   ~100  Billion   Events/day     From  C*   Terabytes  of   Dimension   data   Hadoop  Clusters  –  AWS  EMR   1300  nodes   800  nodes   Mul2ple  150  nodes  Nightly   RDS   Metadata   Gateways   Tools  
  • 33. Global  Architecture   Local  Client  Traffic  to  Cassandra   Synchronous  Replica2on  Across  Zones   Asynchronous  Replica2on  Across  Regions  
  • 34. Astyanax  Cassandra  Client  for  Java   Available  at  hFp://github.com/ne8lix   •  Features   –  Complete  abstrac2on  of  connec2on  pool  from  RPC  protocol   –  Fluent  Style  API   –  Opera2on  retry  with  backoff   –  Token  aware   •  Recipes   –  Distributed  row  lock  (without  zookeeper)   –  Mul2-­‐region  row  lock   –  Uniqueness  constraint   –  Mul2-­‐row  uniqueness  constraint   –  Chunked  and  mul2-­‐threaded  large  file  storage   –  Reverse  index  search   –  All  rows  query   –  Durable  message  queue  
  • 35. Astyanax  -­‐  Cassandra  Write  Data  Flows   Single  Region,  Mul2ple  Availability  Zone,  Token  Aware   Token   Aware   Clients   Cassandra   • Disks   • Zone  A   Cassandra   • Disks   • Zone  B   Cassandra   • Disks   • Zone  C   Cassandra   • Disks   • Zone  A   Cassandra   • Disks   • Zone  B   Cassandra   • Disks   • Zone  C   1.  Client  Writes  to  local   coordinator   2.  Coodinator  writes  to   other  zones   3.  Nodes  return  ack   4.  Data  wriFen  to   internal  commit  log   disks  (no  more  than   10  seconds  later)   If  a  node  goes  offline,   hinted  handoff   completes  the  write   when  the  node  comes   back  up.     Requests  can  choose  to   wait  for  one  node,  a   quorum,  or  all  nodes  to   ack  the  write     SSTable  disk  writes  and   compac2ons  occur   asynchronously   1 4   4   4 2   3   3   3   2  
  • 36. Data  Flows  for  Mul2-­‐Region  Writes   Token  Aware,  Consistency  Level  =  Local  Quorum   US   Clients   Cassandra   •  Disks   •  Zone  A   Cassandra   •  Disks   •  Zone  B   Cassandra   •  Disks   •  Zone  C   Cassandra   •  Disks   •  Zone  A   Cassandra   •  Disks   •  Zone  B   Cassandra   •  Disks   •  Zone  C   1.  Client  writes  to  local  replicas   2.  Local  write  acks  returned  to   Client  which  con2nues  when   2  of  3  local  nodes  are   commiFed   3.  Local  coordinator  writes  to   remote  coordinator.     4.  When  data  arrives,  remote   coordinator  node  acks  and   copies  to  other  remote  zones   5.  Remote  nodes  ack  to  local   coordinator   6.  Data  flushed  to  internal   commit  log  disks  (no  more   than  10  seconds  later)   If  a  node  or  region  goes  offline,  hinted  handoff   completes  the  write  when  the  node  comes  back  up.   Nightly  global  compare  and  repair  jobs  ensure   everything  stays  consistent.   EU   Clients   Cassandra   •  Disks   •  Zone  A   Cassandra   •  Disks   •  Zone  B   Cassandra   •  Disks   •  Zone  C   Cassandra   •  Disks   •  Zone  A   Cassandra   •  Disks   •  Zone  B   Cassandra   •  Disks   •  Zone  C   6   5   5   6   6   4   4   4   1   6   6   6   2   2   2   3   100+ms  latency  
  • 37. Cross  Region  Use  Cases   •  Geographic  Isola2on   – US  to  Europe  replica2on  of  subscriber  data   – Read  intensive,  low  update  rate   – Produc2on  use  since  late  2011   •  Redundancy  for  regional  failover   – US  East  to  US  West  replica2on  of  everything   – Includes  write  intensive  data,  high  update  rate   – Tes2ng  now  
  • 38. Benchmarking  Global  Cassandra   Write  intensive  test  of  cross  region  capacity   16  x  hi1.4xlarge  SSD  nodes  per  zone  =  96  total   Cassandra  Replicas   Zone  A   Cassandra  Replicas   Zone  B   Cassandra  Replicas   Zone  C   US-­‐West-­‐2  Region  -­‐  Oregon   Cassandra  Replicas   Zone  A   Cassandra  Replicas   Zone  B   Cassandra  Replicas   Zone  C   US-­‐East-­‐1  Region  -­‐  Virginia   Test   Load   Test   Load   Valida2on   Load   Inter-­‐Zone  Traffic   1  Million  writes   CL.ONE   1  Million  reads   CL.ONE  with  no   Data  loss   Inter-­‐Region  Traffic   S3  
  • 39. Copying  18TB  from  East  to  West   Cassandra  bootstrap  9.3  Gbit/s  single  threaded  48  nodes  to  48  nodes   Thanks  to  boundary.com  for  these  network  analysis  plots  
  • 40. Inter  Region  Traffic  Test   Verified  at  desired  capacity,  no  problems,  339  MB/s,  83ms  latency  
  • 41. Ramp  Up  Load  Un2l  It  Breaks!   Unmodified  tuning,  dropping  client  data  at  1.93GB/s  inter  region  traffic   Spare  CPU,  IOPS,  Network,  just  need  some  Cassandra  tuning  for  more  
  • 42. Managing  Mul2-­‐Region  Availability   Cassandra  Replicas   Zone  A   Cassandra  Replicas   Zone  B   Cassandra  Replicas   Zone  C   Regional  Load  Balancers   Cassandra  Replicas   Zone  A   Cassandra  Replicas   Zone  B   Cassandra  Replicas   Zone  C   Regional  Load  Balancers   UltraDNS   DynECT   DNS   AWS   Route53   Denominator  –  manage  traffic  via  mul2ple  DNS  providers   Denominator  
  • 43. Boos2ng  the  @Ne8lixOSS  Ecosystem   See  ne8lix.github.com  
  • 44.
  • 45. Judges   Aino  Corry   Program  Chair  for  Qcon/GOTO   Mar2n  Fowler   Chief  Scien2st  Thoughtworks  Simon  Wardley   Strategist   Yury  Izrailevsky   VP  Cloud  Ne8lix   Werner  Vogels   CTO  Amazon   Joe  Weinman   SVP  Telx,  Author  “Cloudonomics”  
  • 46. What  do  you  win?   One  winner  in  each  of  the  10  categories   Ticket  and  expenses  to  aFend  AWS   Re:Invent  2013  in  Las  Vegas   A  Trophy  
  • 47. A  Cloud  Na2ve  Open  Source  Pla8orm   See  ne8lix.github.com  
  • 48. Ne8lix  Pla8orm  Evolu2on   Bleeding  Edge   Innova2on   Common   PaFern   Shared   PaFern   2009-­‐2010   2011-­‐2012   2013-­‐2014   Ne8lix  ended  up  several  years  ahead  of  the   industry,  but  it’s  becoming  commodi2zed  now  
  • 49. Establish  our   solu2ons  as  Best   Prac2ces  /  Standards   Hire,  Retain  and   Engage  Top   Engineers   Build  up  Ne8lix   Technology  Brand   Benefit  from  a   shared  ecosystem   Goals  
  • 50. How  does  it  all  fit  together?  
  • 51. Example  Applica2on  –  RSS  Reader  
  • 52. Github   Ne8lixOSS   Source   AWS   Base  AMI   Maven   Central   Cloudbees   Jenkins   Aminator   Bakery   Dynaslave   AWS  Build   Slaves   Asgard   (+  Frigga)   Console   AWS   Baked  AMIs   Odin   Orchestra2on   API   AWS   Account   Ne8lixOSS  Con2nuous  Build  and  Deployment  
  • 53. AWS  Account   Asgard  Console   Archaius     Config  Service   Cross  region   Priam  C*   Pytheas   Dashboards   Atlas   Monitoring   Genie,  Lips2ck   Hadoop  Services   AWS  Usage   Cost  Monitoring   Mul2ple  AWS  Regions   Eureka  Registry   Exhibitor  ZK   Edda  History   Simian  Army   Zuul  Traffic  Mgr   3  AWS  Zones   Applica2on   Clusters   Autoscale  Groups   Instances   Priam   Cassandra   Persistent  Storage   Evcache   Memcached   Ephemeral  Storage   Ne8lixOSS  Services  Scope  
  • 54. • Baked  AMI  –  Tomcat,  Apache,  your  code   • Governator  –  Guice  based  dependency  injec2on   • Archaius  –  dynamic  configura2on  proper2es  client   • Eureka  -­‐  service  registra2on  client   Ini2aliza2on   • Karyon  -­‐  Base  Server  for  inbound  requests   • RxJava  –  Reac2ve  paFern   • Hystrix/Turbine  –  dependencies  and  real-­‐2me  status   • Ribbon  -­‐  REST  Client  for  outbound  calls   Service   Requests   • Astyanax  –  Cassandra  client  and  paFern  library   • Evcache  –  Zone  aware  Memcached  client   • Curator  –  Zookeeper  paFerns   • Denominator  –  DNS  rou2ng  abstrac2on   Data  Access   • Blitz4j  –  non-­‐blocking  logging   • Servo  –  metrics  export  for  autoscaling   • Atlas  –  high  volume  instrumenta2on   Logging   Ne8lixOSS  Instance  Libraries  
  • 55. Dashboards  with  Pytheas  (Explorers)   hFp://techblog.ne8lix.com/2013/05/announcing-­‐pytheas.html   •  Cassandra  Explorer   – Browse  clusters,  keyspaces,  column  families   •  Base  Server  Explorer   – Browse  service  endpoints  configura2on,  perf   •  Anything  else  you  want  to  build…  
  • 59. AWS  Usage  (coming  soon)   Reserva2on-­‐aware  cost  monitoring  and  repor2ng  
  • 60. More  Use  Cases   More   Features   BeFer  portability     Higher  availability     Easier  to  deploy     Contribu2ons  from  end  users     Contribu2ons  from  vendors     What’s  Coming  Next?  
  • 61. Func2onality  and  scale  now,  portability  coming     Moving  from  parts  to  a  pla8orm  in  2013     Ne8lix  is  fostering  a  cloud  na2ve  ecosystem     Rapid  Evolu2on  -­‐  Low  MTBIAMSH   (Mean  Time  Between  Idea  And  Making  Stuff  Happen)  
  • 62. Takeaway       Ne$lixOSS  makes  it  easier  for  everyone  to  become  Cloud  Na:ve         @adrianco  #ne8lixcloud  @Ne8lixOSS  
  • 63. Slideshare  Ne8lixOSS  Details   •  Lightning  Talks  Feb  S1E1   –  hFp://www.slideshare.net/RuslanMeshenberg/ne8lixoss-­‐open-­‐house-­‐lightning-­‐talks   •  Asgard  In  Depth  Feb  S1E1   –  hFp://www.slideshare.net/joesondow/asgard-­‐overview-­‐from-­‐ne8lix-­‐oss-­‐open-­‐house   •  Lightning  Talks  March  S1E2   –  hFp://www.slideshare.net/RuslanMeshenberg/ne8lixoss-­‐meetup-­‐lightning-­‐talks-­‐and-­‐ roadmap   •  Security  Architecture   –  hFp://www.slideshare.net/jason_chan/   •  Cost  Aware  Cloud  Architectures  –  with  Jinesh  Varia  of  AWS   –  hFp://www.slideshare.net/AmazonWebServices/building-­‐costaware-­‐architectures-­‐jinesh-­‐ varia-­‐aws-­‐and-­‐adrian-­‐cockro@-­‐ne8lix    
  • 64. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applica2on  code)   •  EC2  –  Elas2c  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configura2ons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hos2ng  cloud  instances   –  Region  –  group  of  Avail  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan,  SA-­‐Brazil,  US-­‐Gov   •  ASG  –  Auto  Scaling  Group  (instances  boo2ng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (hFp  access)   •  EBS  –  Elas2c  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDS  –  Rela2onal  Database  Service  (managed  MySQL  master  and  slaves)   •  DynamoDB/SDB  –  Simple  Data  Base  (hosted  hFp  based  NoSQL  datastore,  DynamoDB  replaces  SDB)   •  SQS  –  Simple  Queue  Service  (hFp  based  message  queue)   •  SNS  –  Simple  No2fica2on  Service  (hFp  and  email  based  topics  and  messages)   •  EMR  –  Elas2c  Map  Reduce  (automa2cally  managed  Hadoop  cluster)   •  ELB  –  Elas2c  Load  Balancer   •  EIP  –  Elas2c  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (single  tenant,  more  flexible  network  and  security  constructs)   •  DirectConnect  –  secure  pipe  from  AWS  VPC  to  external  datacenter   •  IAM  –  Iden2ty  and  Access  Management  (fine  grain  role  based  security  keys)