How DCIM Can Help Improve Availability & Reduce Costs

1
High Availability Mantra:
How DCIM Can Help

2
Today’s Topics
• High Availability Mantra Revisited
• Anatomy of a DCIM Software: GFS Crane
• How GFS Crane DCIM Delivers Higher Availability
• How GFS Crane DCIM Helps to Reduce Costs
• GFS Crane DCIM Case Studies

3
The High Availability Mantra RevisitedThe High Availability Mantra Revisited
Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) had two
outages in 2012 – each over 3 hours!
• Tier 3/Tier 4 just defined by hardware redundancies
• Glaring gaps in operating procedures to prevent fatal human errors
• Lack of purpose-built BCP software to predict failures
• Lack of chain of custody to detect root cause
Amazon Data Centers (built to Tier 4 standards and with an expected availability of 99.995%) had two
outages in 2012 – each over 3 hours!
• Tier 3/Tier 4 just defined by hardware redundancies
• Glaring gaps in operating procedures to prevent fatal human errors
• Lack of purpose-built BCP software to predict failures
• Lack of chain of custody to detect root cause
Availability % Downtime per year Downtime per month* Downtime per week
99% ("two nines") 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes
99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds

4
Did You Know?
90% of DC Failures Are From Common Preventable Causes90% of DC Failures Are From Common Preventable Causes

5
Did You Know?
Average Failure of an Online System: 36 hours per annum.
That’s only 99.6% Uptime
Average Failure of an Online System: 36 hours per annum.
That’s only 99.6% Uptime

6
Did You Know?
75% of Businesses Without a BC Plan Fail Within 3 Years after a Major
Disruption in their IT Systems
75% of Businesses Without a BC Plan Fail Within 3 Years after a Major
Disruption in their IT Systems

7
Anatomy of a DCIM Software: GFS Crane

8
Improves Availability: Predictability, Visibility & Change Tracking
 Advanced Alarm Management and analytics helps in failure
predictability, faster turn-around-time, improved availability and SLA
 Consolidation of alarms from different facilities helps in centralized
monitoring
Improved visibility of the power chain and the relationships among
critical components of the infrastructure helps in better impact analysis of
device malfunction or failure and doing RCA
 Change Tracking in the data center environment helps in doing impact
analysis of any change and root cause analysis of any outage occurring due
to a change
Predictive
Analytics
Predictive
Analytics
Visibility from
Power Chain
Visibility from
Power Chain
Change TrackingChange Tracking

9
Improves Availability: Predictability from Proactive Alarms
Proactive Real-time alarms
 Alarms on power, PUE and environmental
conditions like temperature, humidity, smoke,
fire, WLD, door-open and motion
 Alarms can be sent on e-mail & SMS
Alarm Dashboard
 Alarms from multiple data centers are
consolidated on a dashboard
 Analysis on alarms based on severity, type,
source, duration etc.
Advanced Alarm Management helps in failure
predictability, faster turn-around-time,
improved availability & SLA compliance

10
Improves Availability: Visibility from Power Chain
Maps relationships among critical
components of electrical infrastructure
 Create power chain for electrical infrastructure
 Map asset relationships and redundancies
starting from power source to customers and
applications
Asset Relationship Mapping
Improved visibility of the power chain and
relationships among critical components of
the infrastructure help in better impact
analysis of device malfunction or failure
and doing root cause analysis

11
Improves Availability: Change Tracking
 Maintains an audit trail for all
Installation/Move/Add/Change activity in
the data center
 Integration with existing ITSM tool
enables running the tracked changes
through a workflow system for change
approvals
Audit Trail of DC Configuration Changes
Tracking changes in the data center environment helps in doing impact analysis
of any change and root cause analysis of any outage occurring due to a change

12
Reduces Cost: Capex & Opex
Better visibility helps discovering under-utilized computing capacities
-> defers capex purchases
Better visibility helps avoiding stranded capacities on rack space &
power use: maximizes utilization of available capacities
 Better monitoring & analytics reduces operating cost on power
 Automation of processes like Asset Tracking, Provisioning &
Monitoring improves productivity
 Rationalizing asset base helps in lower maintenance costs like
equipment AMC
Reduces CapexReduces Capex
Reduces OpexReduces Opex

13
Reduces CapEx: Monitoring IT Utilization
Visibility of hidden compute capacity
 Calculates the average utilization of all
computing devices in the data center
 Identifies the unused compute capacity
Under-utilized servers can be repurposed
 Based on power consumption & utilization
patterns, hardware specs and age, ‘Repurpose
Candidates’ are identified that helps in deferring
new server hardware purchase
Hidden Computing Capacity
Repurpose Hardware
Discovery of hidden compute capacity defers
capital investment on new server hardware and
software licenses

14
Reduces Capex: Minimizing Stranded Capacities
Visibility of consumed power against max
capacity in a rack
 Provides real-time information on actual IT
load in a rack
 Provides maximum power capacity
 Provides available power capacity
Visibility of occupied rack space against
max available space
 Provides real-time information on occupied
space in the rack in RU
 Provides maximum space capacity
 Provides available space capacity
Hidden Power Capacity
Hidden Space Capacity

15
Reduces OpEx: Power Costs
Multi-level PUE Comparison
 Compares PUE calculated at
multiple levels and identifies power
distribution losses that can be
rectified to improve efficiency and
reduce OpEx on Power
Detect Power Distribution Loss
L1 PUE: UPS Output
L2 PUE: PDU Output
L3 PUE: Device-level
reading
Detection of power distribution losses in the
electrical infrastructure helps in improving
energy efficiency of the data center and
reduce operating cost on power

16
Reduces Opex: Process Automation & Improved Productivity
Automated discovery and inventory of
both IT and infrastructure assets
 Intelligent assets are automatically
discovered using SNMP/IPMI
 Manufacturer Repository contains
information on static attributes of assets
 Assets data imported from
spreadsheets or asset management tool
 Single management console to manage
IT and non-IT assets
 Maintenance management for assets
done using plug-ins that sends scheduler
based proactive alerts
 Workflow-based auto-provisioning
improves speed and reduces errors
Advanced Asset Management

17
Reduces Opex: Asset Rationalization
Asset Rationalization
 Asset Management module tracks & maintains inventory of all assets (IT
& non-IT) in the data Centre.
 Helps identify legacy servers and replacement candidates
 Reduces AMC, space rentals
Asset
Rationalization
Asset
Rationalization
Server
Virtualization
Server
Virtualization
Capacity
Planning
Capacity
Planning
Data Center
Consolidation
Data Center
Consolidation
GFS
Crane
DC
DCIM
GFS
Crane
DC
DCIM
Legacy Data
Center
Legacy Data
Center
Server & Rack
Consolidation
Server & Rack
Consolidation
Multiple
Data Centers
Multiple
Data Centers

18
How GFS Crane DCIM Helps
• Helps Data Center Manager avoid unnecessary over-provisioning
• Helps plan investments and new capacity
• Helps reduce the capital costs
• Helps reduce power use and other operating costs
• Helps reduce risk of failures through critical alerts
• Helps adapting to technical and business change more easily
• Helps improvement plans through real-time metrics & dashboard

19
GFS Crane DCIM Case Study 1: Financial Services
Industry Project Financing & Mutual Funds
Data Center Location India
Data Center Details Tier III certified by 451 Research, Energy Efficient ‘green’ Data Center
certified by TÜV Rheinland
DCIM Implementation
date
January, 2012
Business requirement
driving DCIM
implementation
 Improve energy efficiency through better energy management
 Comply with Green Grid recommendations and adopt best practices
in data center operations
 Improve data center availability and meet business SLA through
better monitoring, failure prediction and faster turn-around-time
Integration Touch
Points
Power Systems: LT transformer panels, UPS, PDUs and Distribution
Panels, BUSBAR panels, Multifunction Energy Meters.
Environmental Systems: PAC units, temperature and humidity probes
Servers, Network devices, Storage devices
Siemens Building Management System

20
Industry Mobile Operator
Data Center Location South Asia
Data Center Details Multiple data centers spread across 4 locations, covering 8,500 sq.ft. of
whitespace and housing 320 racks
DCIM Implementation
Date
Ongoing
Business requirement
driving DCIM
implementation
 Improve data center efficiency through better energy management
 Improve operational efficiency through better asset management,
capacity planning and converged infrastructure monitoring capability
 Improve data center availability and meet business SLA through
better monitoring, failure prediction and faster turn-around-time
Integration Touch
Points
Power Systems: LT transformer panels, UPS, A/C & D/C PDUs and
Distribution Panels, BUSBAR panels, Multifunction Energy Meters.
Environmental Systems: PAC units, temperature and humidity probes
Diesel generator, flow and level sensors
IBM Netcool (ITSM), VESDA, ACS and IP Surveillance
GFS Crane DCIM Case Study 2: Telecom

21
Thank You
http://www.greenfieldsoft.com
Email: sales@greenfieldsoft.com
See other two in this series:
- The Modern Data Center Topology: The High
Availability Mantra
- Data Center Infrastructure Management:
ERP for the Data Center Manager

How DCIM Can Help Improve Availability & Reduce Costs

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a How DCIM Can Help Improve Availability & Reduce Costs

Similar a How DCIM Can Help Improve Availability & Reduce Costs (20)

How DCIM Can Help Improve Availability & Reduce Costs