2. Tutorial Objectives
Provide an overview of the emerging cloud
industry, the jargon, the trends, and a
model to help sort through the mess
Dig into a couple of specific examples on
how to provision and operate a cloud
environment, conveying practical insight
Explore cloud computing architectures,
looking at whether they change traditional
system architectures
San Francisco 2008 2
3. About Your Presenter
Stuart Charlton
• Canadian,
now in San Francisco
Chief Architect, Elastra
• Responsible for technical
direction & long-term
product strategy
In prior lives...
• BEA Systems,
Rogers Communications,
Financial Services,
global training & consulting
Stu Says Stuff
http://stucharlton.com/blog
San Francisco 2008 3
4. Agenda - Part 1
A Look at the Clouds
• (Good Luck) Defining Cloud Computing
• Qualities of a Cloud
• The Cloud Computing Industry - Late 2008
• A Cloud Reference Model
Amazon Web Services Tutorial
• Simple Storage Service (S3)
• Elastic Compute Cloud (EC2)
• Elastic Block Storage (EBS)
• Covering APIs, Tools, and Experiences
San Francisco 2008 4
5. Agenda - Part 2
Managing & Operating Cloud Systems
• Whither IT Service Management?
• The Hope for Cloud Standards
• The Puppet Administrative System
• A Preview of Elastra Cloud Services
Cloud Architecture
• Common Patterns
• Integrating applications, networks, and data
• Scalability and Monitoring
Q&A and Open Discussion
San Francisco 2008 5
6. Caveats
The technology is a (very) moving target
• Expect this to increase as the industry tries to
drive a new round of retooling & spending
• Lots to cover; we’ll try to scratch a reasonable
amount of surface
Much cloud technology is quite proprietary
• Too early to dive into committee-land
• Even if it’s open source, only one distribution
may eventually be problematic
The “definition game” is only fun for so long
• Fondly recall the crisp and concise industry
definitions such as SOA, OO, Components, etc...
San Francisco 2008 6
9. (Good Luck) Defining Cloud Computing
Software-as-a-Service
• “My customer resource management (CRM) system is
out on the Internet!”
Grids vs. Clouds
• Shared Virtual Resources
• Batch Jobs vs. Online Applications
• Different Approaches to State Management
Network Diagrams
• A service is “on a cloud somewhere”
Virtualization Platforms & APIs
• Hardware can be manipulated with software
San Francisco 2008 9
10. Qualities of a Cloud
On-Demand
• Lowered requirement to call-ahead forecasts
• Demand trends are predicted by the provider
Usage-metered (i.e. an operating expense)
• Pay-by-the-drink or over time, not up front
Self-service
• Resources directly/indirectly reserved with a GUI or API
Elastic Scalability
• Grow or shrink resources as required
Mandatory Network
• The network is essential to consume the service
San Francisco 2008 10
11. A Subset of the Cloud Landscape
Software Vendors
Mid-Size Providers
Large Providers
San Francisco 2008 11
12. The Cloud Provider Continuum
“Retail Ecosystem” “Supplier Ecosystem”
Closer to the Closer to the
Developer/User SysAdmin/Ops
Platform-as-a-Service Infrastructure-as-a-Service
San Francisco 2008 12
13. A Cloud Technology Reference Model
Begin with the Basic Data Center
Testing,
Monitoring,
Facilities & Diagnostics,
Logistics
and
Software & Hardware Infrastructure Verification
San Francisco 2008 1
14. A Cloud Technology Reference Model
Add easy software access to:
Elements - HW/SW/Network/Storage
Settings, Installations, and Configurations
Resources - Reservations from a pool of
excess capacity in storage, computing, and
network
Element Resource Testing,
Management Management Monitoring,
Facilities & Diagnostics,
Logistics
and
Software & Hardware Infrastructure Verification
San Francisco 2008 1
15. A Cloud Technology Reference Model
Add some visibility:
A Web of Metadata
(What uses or contains what other things?)
Lifecycle (when and how can things change?)
Lifecycle
(Birth, Growth, Failure, Recovery, Death)
Web of Metadata Testing,
Categories, Capabilities, Configurations & Dependencies Monitoring,
Diagnostics,
Element Resource and
Facilities & Management Management Verification
Logistics
Software & Hardware Infrastructure
San Francisco 2008 1
16. A Cloud Technology Reference Model
Add some real-world context:
Governance
(Who has authority / responsibility to change, and how?)
Architecture Views (How are my concerns addressed?)
Architectural Views
Governance (e.g. scalability, availability, recovery,
data quality, security) Testing,
Monitoring,
Diagnostics,
Lifecycle and
(Birth, Growth, Failure, Recovery, Death) Verification
Web of Metadata
Categories, Capabilities, Configurations & Dependencies
San Francisco 2008 1
17. A Cloud Technology Reference Model
Your Application
Governance
Architectural Views
Lifecycle
(Birth, Growth, Failure, Recovery, Death) Testing,
Monitoring,
Web of Metadata Diagnostics,
Categories, Capabilities, Configurations & Dependencies and
Verification
Element Resource
Facilities & Management Management
Logistics
Software & Hardware Infrastructure
San Francisco 2008 1
18. Infrastructure Clouds Start Here:
Your Application
Governance
Architectural Views
Testing,
Your Monitoring,
Diagnostics,
Problem Lifecycle and
(Birth, Growth, Failure, Recovery, Death) Verification
Web of Metadata
Categories, Capabilities, Configurations & Dependencies
Element
Management
Resource
Their Facilities &
Operating System
Images
Management
Basic
Problem Logistics Monitoring
Software & Hardware Infrastructure
San Francisco 2008 1
19. “Cloud Servers” Try to Extend Infra:
Your Your Application
problem
Governance
Architectural Views Testing,
Monitoring,
Diagnostics,
Cloud Lifecycle
and
Verification
servers (Birth, Growth, Failure, Recovery, Death)
Web of Metadata
Categories, Capabilities, Configurations & Dependencies
Cloud Element
Management Resource
Infra Facilities &
(Split Responsibility) Management
Basic
(private or Logistics Monitoring
public) Software & Hardware Infrastructure
San Francisco 2008
20. Cloud Platforms, As Perceived Today
Application-
lol, Your Application
Governance
Level
(Insert Code Here)
Monitoring
DON’T WORRY YOUR PRETTY HEAD,
WE HAVE THE REST UNDER CONTROL
San Francisco 2008 20
21. How Cloud Platforms Likely Will Evolve
Your Application
App-Level
Governance Scalability, Integration, Testing,
Backup & Recovery, Security Views Monitoring,
Diagnostics,
Application Lifecycle and
(Birth, Growth, Failure, Recovery, Death) Verification
BLACK BOX OF INTRIGUE
San Francisco 2008 21
23. AWS Registration and Security
Create an AWS account
• aws.amazon.com
• Attachable to your existing Amazon.com account
Creating an Access Key ID and Secret Key
San Francisco 2008 23
24. Simple Storage Service (S3)
Web-Based Media Storage
• Scalable, Redundant, Reliable, and Fast
• XML-Based Metadata over RESTful Web Interface
• Available over HTTP, HTTPS, and BitTorrent
Official 99.9% availability SLA (per month)
• 10% service credit when between 99% and 99.9%
• 25% service credit when less than 99%
Available in United States and Europe
Pricing (U.S.) - November 2008
• Storage Rates: starting at $0.15 per GB monthly
• Usage Rates: $0.10 inbound, $0.17 outbound
• Request Rates: $0.01 per 10k GET, 1k POST, PUT, etc.
• Rates are reduced as volume increases (multi-TB)
San Francisco 2008 24
25. S3 Conceptual Model
/2008-11-08/QCon.html S3 Key
Protected
S3 Objects by ACL
QConPages S3 Bucket
Mapped into:
https://QConPages.s3.amazonaws.com/2008-11-08/QCon.html
https://s3.amazonaws.com/QConPages/2008-11-08/QCon.html
San Francisco 2008 25
26. S3 RESTful Interactions
Creating Buckets as Resources
PUT /qconpages HTTP/1.1
Host: s3.amazonaws.com
Date: Mon, 17 Nov 2008 09:15:00 PST
Authorization: AWS <AccessKeyID:signature>
Content-Length: 0
Response
HTTP/1.1 200 OK
Location: /qconpages
Date: Mon, 17 Nov 2008 09:15:01 PST
Content-Length: 0
San Francisco 2008 26
27. S3 RESTful Interactions
Writing objects in buckets
PUT /qconpages/QCon.html HTTP/1.1
Host: s3.amazonaws.com
Date: Mon, 17 Nov 2008 09:15:16 PST
Authorization: AWS <AccessKeyID:signature>
Content-Length: 104
Content-Type: text/html
<html>
<head>
<title>QCon San Francisco 2008</title>
</head>
<body><p>Welcome!</p></body>
</html>
San Francisco 2008 27
28. S3 RESTful Interactions
Retrieving Objects
GET /HugeFile HTTP/1.1
Host: qconpages.s3.amazonaws.com
Date: Mon, 17 Nov 2008 09:15:16 PST
Accept: */*
Range: bytes=0-1048579
(Range is an optional, standard HTTP, way to retrieve
subsets and/or to resume broken transfers)
San Francisco 2008 28
29. Transfer Considerations
HTML Form Uploads
• Content type is multipart/form-data
• Hidden form fields can pass other parameters
Object Key, Authorization Signature, etc.
BitTorrent Access
• Request /bucket/key?torrent for .torrent file
• Object needs to be available by anonymous users
• Other downloaders will contribute to the Torrent,
S3 will act as a seeder
San Francisco 2008 29
30. AWS Authorization Format
Ensures that requests were not tampered with
and was authorized by the AWS account holder
• An HMAC-SHA1 Algorithm applied to several
canonicalized HTTP headers and and content
Passed as an Authorization header
Optionally can be passed as URI parameters for
pre-signed, expiry-based signatures
San Francisco 2008 30
31. Elastic Compute Cloud (EC2)
Resizable Compute Capacity in the Cloud
CPU, Memory, Storage, and Network
• Storage is “ephemeral” ; is lost on termination
Supports Linux, OpenSolaris, and
Windows Server 2003
Free data transfer
• Between S3 and EC2
• Among EC2 instances
In/Outbound data transfer similar price to S3
Baseline CPU Speed is 1.0-1.2 Ghz AMD Opteron
• aka. Elastic Compute Unit (ECU)
San Francisco 2008 31
32. EC2 Sizes
Size Cores / Speed Storage Memory Cost
1 Core, 1 ECU $0.10/hr (*NIX)
Small 160 GB 1.7 GB
(32-bit) $0.125/hr (Windows)
2 Core, 2 ECU $0.40/hr (*NIX)
Large 850 GB 7.5 GB
(64-bit) $0.50/hr (Windows)
4 Core, 2 ECU $0.80/hr (*NIX)
X-Large 1690 GB 15 GB
(64-bit) $1.00/hr (Windows)
High 2 Core, 2.5 ECU $0.20/hr (*NIX)
CPU 350 GB 1.7 GB
Medium (32-bit) $0.30/hr (Windows)
High 8 Core, 2.5 ECU $0.80/hr (*NIX)
CPU 1690 GB 7 GB
X-Large (64-bit) $1.20/hr (Windows)
San Francisco 2008 32
34. EC2 Authorization Keypairs
Amazon EC2 uses an x.509 Certificate and
Private Key pair to enable authorization
On Linux & UNIX:
• Passwordless-SSH
On Windows:
• Keypair is used to access administrator password
Generate your own (e.g. Elasticfox),
or use Amazon’s web interface
San Francisco 2008 34
35. Image Management
Amazon Machine Images (AMIs)
• A copy of the OS filesystem, minus the kernel
• Chunked up into smaller pieces, uploaded to S3
• After uploading, can be registered with EC2
Library of AMIs available through EC2 API
• Amazon-provided AMIs
e.g. Fedora 8, Windows Server 2003
• Publically-available 3rd Party AMIs
e.g. OpenSolaris, various Linux distros
• Paid-AMIs
• Private (your own) AMIs
San Francisco 2008 35
36. Instance Management
Launching an AMI
• Select the min/max
number of instances
desired
• Choose security groups
• Choose instance size
• Ensure OS fits the size
(i.e. 32 vs 64-bit)
• Choose the registered
keypair for
authentication
San Francisco 2008 36
37. Availability Zones
A grouping of the data centre infrastructure
that’s isolated from other infrastructure
• Could be in the same data centre, just redundant
power, HVAC, etc.
Generally, failures in one zone will not
impact the other zones (except for
catastrophic failure)
In future, regions will also be available for
planned disaster recovery.
San Francisco 2008 37
38. EC2 Query API
Intuitive Functions
• Describe*
AvailabilityZones
Images
Instances
KeyPairs
SecurityGroups
• RunInstances
• TerminateInstances
Constructed via URI (not RESTful, though)
• https://ec2.amazonaws.com/?
Action=RunInstances&ImageId=ami-60a54009..
San Francisco 2008 38
39. Image Bundling
Bundling Images on Linux & UNIX
• ec2-bundle-vol utility run on the instance
• ec2-upload-bundle utility to send to S3
Bundling Images on Windows
• ec2-bundle-instance API wrapper cmd
San Francisco 2008 39
40. Example of Launching an Instance
Using the Typica Toolkit (Java Wrapper)
http://code.google.com/p/typica/
List<String> params = new ArrayList<String>();
List<ImageDescription> images =
ec2.describeImages(params);
for (ImageDescription img : images) {
if (img.getImageId().equals(“ami-2a5fba43”))
ReservationDescription =
ec2.runInstances(img.getImageId(),
1 /*min*/,
1 /*max*/,
securityGroups,
“”, “mykeypair”);
}
San Francisco 2008 40
41. EC2 Security Groups
Virtual Group-Based Firewalls
in the EC2 Data Center
CIDR-based group firewall for Load
external clients (e.g. 0.0.0.0/0) Balancer
Web
Security
Group
App
Database Server
Data Security Group
San Francisco 2008 41
42. EC2 Networking
Each instance is given a Public Dynamic Host:
• e.g. ec2-33-131-3-227.compute-1.amazonaws.com
And a Private Host for within EC2:
• e.g. domU-10-21-18-00-69-D5.compute-1.internal
Cross-Instance Traffic should almost always use
the Private Host
No UDP Broadcast or IP Multicast is allowed
Elastic IP
• Static public IP address, allocated within 24 hours
• Attaching an Elastic IP may take ~15 minutes
• Note that it asynchronously replaces your public
dynamic host name & IP address without warning
San Francisco 2008 42
43. Elastic Block Storage
Persistent, highly-available, block storage
• (Similar experience to a SAN)
Released August 2008
Volumes between 1GB to 1TB
• Multiple volumes allowed
RAID striping allowed
(bandwidth constrained at ~100+ MB/sec)
Supports snapshots to S3 for later restore
• Snapshots are asynchronous and take a long time
• Restores, on the other hand, are relatively quick
San Francisco 2008 43
44. Elastic Block Storage API
Create/DeleteVolume
Attach/DetachVolume
Create/DeleteSnapshot
DescribeVolumes
EBS Storage is normally provisioned very quickly
(seconds)
Initial writes will be slow, as with ephemeral stores;
All EBS volumes must be formatted with a file
system prior to use
San Francisco 2008 44
46. Agenda - Part 2
Managing & Operating Cloud Systems
• Whither IT Service Management?
• The Hope for Cloud Standards
• Tutorial - The Puppet Administrative System
• A Preview of Elastra Cloud Services
Cloud Architecture Topics
• Common Patterns
• Integrating applications, networks, and data
• Security (Identity, Privacy, etc.)
• Scalability and Monitoring
Q&A and Open Discussion
San Francisco 2008 46
50. Dependency Management vs. Uniformity
The “Google Secret Sauce” Theory:
• Always available, scalable, fast
• Computing as fungible commodity
• Reliability is enabled by architecture
• But you have to rewrite your software
Does a seemingly magical architecture
reduce or eliminate the need for
configuration & dependency If I spill this on a
management? server, who
Does this architecture match classic is affected, and by
how much?
enterprise requirements?
San Francisco 2008 50
51. EC2 is great, but...
That’s a lot of images!
That’s a heckuva lot of instances!
How do I change many machines at once?
• Scripts that wrap SSH?
Do I need to re-image every time I add/
update software?
How do I detect configuration drift?
San Francisco 2008 51
52. The Puppet Administrative System
An Open Source Runtime System and
Domain-Specific Language (DSL) for
managing Linux, BSD, & UNIX servers
• Maintained by Reductive Labs since 2005
• Founded by Luke Kanies, ex-BladeLogic
Encapsulates cross-package installation,
configuration setting, permissions, etc. in a
transactional runtime
San Francisco 2008 52
53. Puppet Architecture
Puppetmasterd
• Maintains central configuration repostiory
Puppetd
• Agent on each client, polls the puppetmaster
every 30 minutes (adjustable)
San Francisco 2008 53
54. Puppet Manifest Example
class mysql::server {
$mountpath = $mydc::constants::ebs_mount
$datadir = quot;${mountpath}/mysqlquot;
package { quot;mysql-serverquot;:
ensure => installed,
}
include amazon::ebs
file { $datadir:
ensure => directory,
require => Exec[quot;Mount Devicequot;]
}
}
San Francisco 2008 54
55. Puppet Sites & Nodes
node “web.qconsf.com” {
include apachewebserver
}
node “mysql1”, “mysql2” {
include mysql::server
}
Declaratively Adds Infrastructure to Nodes
San Francisco 2008 55
56. Security
Puppetmasterd provides a form of PKI for
deployments
• Clients are authenticated via keypairs
• Can act as a Self-Signed Certificate Authority or
use a registered certificate
Current encrypted XML-RPC being
transitioned to RESTful HTTP in a future
release
San Francisco 2008 56
57. Inventory and Drift Control
Puppet includes Facter, a system inventory
tool
• Returns facts about nodes
e.g. hostnames, kernel, IP addresses, etc.
• Facts can then be used in Puppet configurations
• Detects changes and updates information
San Francisco 2008 57
58. Elastra Cloud Services
Today
• Load Balanced, Clustered & Recoverable MySQL,
PgCluster, and Apache Tomcat 5.5
• Turn-Key Deployment on Amazon EC2
• Private beta support for VMWare or Eucalyptus
In Early 2009
•Elastra Cloud Suite v2.0
Enterprise cloud server for IT services management
•Open Cloud Services
Resource Provisioning API
Configuration Management API
Administrative Tools & Utilities
San Francisco 2008 58
59. Elastra Design & Deploy Lifecycle
Wire Funds Tomcat
Web App V 5.5
Msg Bus Mule ESB 1.6
Acct WLS
Wire Svc Lombardi 10.1
Process 6
DB MySQL
Application Deployment
Design Markup For
Iterative Design
(ECML) (EDML)
Desired-State Each Role,
Design GUI management
interface
San Francisco 2008 6 59
60. Helping Drive a Collaborative IT Process
Business Business
Unit A IT Unit B
ECML
ECML EDML
ECML EDML
EDML
Application Architect Reuse Mechanisms, Application Architect
Standards,
Mortgage Best Practices Private Banking
Application Systems System Application
Architects Admins
Standard
Infrastructure
Images
Configured Configured
Infrastructure App A App B Infrastructure
Parameters Business Unit Parameters
Instance Architects Focus Instance
On Business Logic
San Francisco 2008 60
61. Lifecycle-Managed Architectures
PgCluster Load
Balancer Scalability
Policy
Load Balancing
Connector
Resource
PgCluster Data PgCluster Data Allocation
Component Component Strategy
Replication
Connector Monitoring
Policy
PgCluster Replication Component
San Francisco 2008 61
63. Recurring Topics and Patterns
Some design decisions and tradeoffs are
continually associated with the cloud
Some designs are due to fundamentals
• e.g. CAP Tradeoffs
(Consistency, Availability, Partitioning)
Others are due to out-of-date software
• Assuming a single machine
• ...On a local area network
• ...With reliable nodes
San Francisco 2008 63
64. Availability > Consistency
Increasingly common way of handling higher
loads
Locks & distributed transactions reduce
availability
• If my data is locked, it’s not available!
A variety of techniques enable this
• Caching everywhere (e.g. Memcached, Gigaspaces)
• Distributed Replication (e.g. MySQL slaves)
• Compensating transactions
San Francisco 2008 64
65. Stateless Web / Application Servers
What?
Servers do not maintain state between requests
(pushed to database or client)
Why?
• Scalability - smaller working set to manage;
session replication becomes hard at scale
• Reliability - easier to recover when there is no
conversational state
• Support - EC2 doesn’t support multicast for
session replication
Danger: Most enterprise web application
development still makes heavy use of sessions
San Francisco 2008 65
66. Partitioned Databases
What?
• Partitioning, also known as Sharding, or (loosely)
Shared-Nothing, spreads the load across multiple
instances by having each manage a subset of data
Why?
• Scale-up breaks down fairly quickly when dealing with
spikes; scale out becomes the viable option
• Shared-disk databases tend to be commercial and require
high-end SANs
Danger:
• Cross-partition communication is very slow - must have
good data locality or heavily denormalize
• Doesn’t help scale “hot” write-intensive data!
• Quite unfamiliar to enterprises used to large-SMP Oracle
databases
San Francisco 2008 66
67. Stateless Workers
The most common case for elastic
scalability
• e.g. Animoto’s 50 -> 3600 -> 100 servers
Appropriate for computationally intensive
processing
Though much of Enterprise IT’s processing
needs are I/O-bound, not CPU-bound
San Francisco 2008 67
68. Federated Identity
From Lookup to Assertions
SAML, WS-Federation, OAuth
Public
Cloud
Delegated
Identity Identity
Private Other
Cloud Cloud
Major Feature of Windows Azure
San Francisco 2008 68
69. Auto-Scale, Monitoring and Diagnosis
The Journey of Monitoring
• From Log Management & Search
• ... to Aggregation and Statistics
• ... to Event Correlation
• ... to Complex Event Analysis
How in-depth is necessary depends on how
predictable or unique your application
design is!
San Francisco 2008 69
70. Auto-Scale, Monitoring and Diagnosis
Application Deployment
Failed Wire
Transfers Design Design
(effect)
Configured Software Infrastructure
1.
Aggregating
Monitoring Monitoring Service 3. Correlating
Data Cloud Deployment Events for
Diagnosis
Monitoring Service
2. Log
Mining
Virtualization Layer Out of Memory
Errors
(cause)
San Francisco 2008 8 70
71. Conclusion
Cloud Computing comes in many shapes and sizes
• From Infrastructure
• ...to Middleware
• ...to Entire Platforms
Reduces Lead Time to Deploy Systems
• With varying degrees of visibility
The full impact on IT Management & Operations is
still unknown
• Chances are it won’t eliminate what we do today
Cloud architectures promote what were secondary
problems to a higher status (e.g. integration, security)
San Francisco 2008 71
72. Thank You
Stuart Charlton
Chief Software Architect, Elastra
San Francisco 2008