A First-Hand Look at What's New in HDP 2.3

© Hortonworks Inc. 2015
A First-Hand Look at What's New in HDP
2.3
Tim E. Hall
VP, Product Management
Hortonworks
June 2015

Empowering More Organizations to
Drive Transformational Outcomes
Introducing Hortonworks® Data Platform 2.3

Retailer builds 360° view of its customers
Challenges
• Cost: Data silos led to duplicate storage expenses
• Customer: Data fragmentation (with as many as 15 different records
on the same customer) harmed service quality
• Supply chain: Mismatch between inventory and store-specific
demand led to inefficient carrying costs
Results
• Cost: Data offload and consolidation saved millions
• Customer: Single view of customer personalized promotions
• Supply chain: A single view fed by 12 legacy systems improved
visibility and streamlined inventory management
• Pricing: Optimization added $80 million in top-line revenue

Security company protects its customers from intrusions
Challenges
• Cost: Redundant storage systems cost many millions annually,
data retention limited to no more than two years
• Multi-tenancy: Unable to support simultaneous users with ad
hoc, data science and predictive analytics tasks
• Speed: Latencies created lags that attackers could exploit
Results
• Cost: Millions saved through elimination of redundant platforms
• Multi-tenancy: Concurrent jobs run in a private cloud
• Ingest: 105 million log events per minute
• Processing time: Time reduced from four hours to two seconds
• High availability: Zero downtime across rolling upgrades
“The recent transformation of
business and consumer
technologies has driven
pervasive mobility and an
explosion of data resulting in the
need for a new approach to
protecting devices, applications,
data and users.”
Company’s 2014 annual report

New Capabilities in Hortonworks Data Platform 2.3
Breakthrough User
Experience
Dramatic Improvement in the User Experience
HDP 2.3 eliminates much of the complexity administering
Hadoop and improves developer productivity.
Enhanced Security
and Governance
Enhanced Security and Data Governance
HDP 2.3 delivers new encryption of data at rest, and
extends the data governance initiative with Apache™ Atlas.
Proactive Support
Extending the Value of a Hortonworks Subscription
Hortonworks® SmartSense™ adds proactive cluster monitoring,
enhancing Hortonworks’ award-winning support in key areas.
Apache is a trademark of the Apache Software Foundation.

New Capabilities in HDP 2.3
Breakthrough User
Experience
Enhanced Security
and Governance
Proactive Support

Ambari Views Framework
Goal: enable the delivery of custom UI experiences in Ambari Web
Developers can extend the Ambari Web interface
• Views expose custom UI features for Hadoop Services
Ambari Admins can entitle Views to Ambari Web users
• Entitlements framework for controlling access to Views

Views Framework
Views Framework vs. Views
Views
Core to Ambari
Built by
Hortonworks,
Community,
Partners

View Components
• Serve client-side assets (such as HTML + JavaScript)
• Expose server-side resources (such as REST endpoints)
VIEW
Client-side
assets
(.js, html)
AMBARI WEB
VIEW
Server-side
resources
(java)
AMBARI SERVER
{rest}
Hadoop
and
other
systems

View Delivery
1. Develop the View (just like you would for a Web App)
2. Package as a View (basically a WAR)
3. Deploy the View into Ambari
4. Ambari Admins create + configuration view instance(s) and give
access to users + groups
Develop DeployPackage
Create
Instance(s)

Versions and Instances
• Deploy multiple versions and create multiple instances of a view
• Manage accessibility and usage

Choice of Deployment Model
• For Hadoop Operators:
Deploy Views in an Ambari Server that is managing a Hadoop cluster
• For Data Workers:
Run Views in a “standalone” Ambari Server
Ambari
Server
HADOOP
Store & Process
Ambari
Server
Operators
manage the
cluster, may
have Views
deployed
Data
Workers use
the cluster
and use a
“standalone”
Ambari
Server for
Views

Improved Ease of Use for the Hadoop Operator
Responsibilities include:
• Deploying Hadoop® clusters
• Managing cluster health
• Troubleshooting and resolving issues
Hadoop Operator
Simpler administration speeds
time to value
Easy Setup and Installation
Streamlined configuration experience
Customizable Dashboards
Track cluster health with KPIs and drill downs
Easier Provisioning and Faster Cluster
Formation
Cloudbreak simplifies provisioning. Ambari speeds
cluster formation with automated host discovery.

Hadoop Operator
New guided configurations
ease cluster setup

Ease installation and
configuration for HDFS,
YARN, Hive and HBase
Makes Key Configs Visible
Clearly displays the set of options
Recommends Settings
Suggests optimal ranges
Highlights Dependencies
Lets you visualize any impact on
dependent services
Hadoop Operator

Hadoop Operator
Fully customizable
dashboard shows cluster
KPIs

System Administrator
Hadoop operators can
configure dashboards to
show KPIs
Out-of-box Templates
Based on common best practices
Personalized Experience
Create new display widgets built from
Hadoop metrics. Add or remove
existing widgets.
Reusable and Shareable
Widget library allows other operators
to re-use community widgets

Demo: Operations

Host discovery makes cluster
expansion automatic, fast,
orderly and predictable
Faster
Expand clusters incrementally and automatically
as each new node becomes available
Easier
Pre-plan automatic expansion paths
Flexible for Cloud or On-premises
Discover hosts wherever they are
Ambari
Hadoop Operator
Host Discovery Eases Cluster Formation

Learn More about Ambari
Thursday, 3:10-3:50 – What’s New in Apache Ambari
with Sumit Mohanty & Yusako Sako

Launch HDP on Leading Cloud Platforms
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science,
Dev / Test

BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science,
Dev / Test

Cloudbreak automates provisioning and
scaling clusters in the cloud
Hadoop Operator

Hadoop Operator
Cloudbreak automates cluster
provisioning and scaling for the cloud in
only 3 steps

Step 1: Choose your cloud provider –
Microsoft Azure, Amazon AWS, Google
Cloud Platform or OpenStack

Step 2: Enter your cloud credentials

Step 3: Pick Your Ambari Blueprint

Cloudbreak provides feedback while
cluster creation is progress

Turn on auto-scaling and
set SLA policies

Hadoop Operator
Leverage re-usable
blueprints to provision
HDP in any environment
Public or Private Clouds
Dynamically set up public or private
cloud clusters from the web console
Automated Scaling
Manage elasticity requirements as
cluster demands grow
Choice of Many Clouds
Supports Microsoft Azure, AWS,
Google and Open Stack clouds

Learn More about Cloudbreak
Wednesday, 2:35-3:15 – One-click Hadoop Clusters - anywhere (using Docker)
with Janos Matyas

Preview URL: launch.hortonworks.com
Launch an HDP cluster
with only a few clicks
Easy Setup
With the leading public cloud
platforms: Microsoft Azure, AWS and
Google Cloud
Easy Exploration
Try out the latest features in HDP
Your Data
Use the newest cluster technologies
with your own familiar dataset

Advances for the Developer
Responsibilities include:
• Developing SQL queries
• Developing new Spark applications
• Implementing streaming data analytics
Developer
Develop Hadoop applications with
ease and speed
Visualization of SQL Queries
Streamlined user interface for Apache Hive
Improvements to Apache Spark on YARN
Machine Learning, Data Frame API, New SQL (Preview)
Enterprise Enhancements for Streaming
Fault tolerance, security, and rolling upgrades for
Apache Kafka and Apache Storm

Enhanced SQL Semantics and New SQL User View
The rich developer experience includes enhanced
SQL semantics and a new user interface
Enhanced SQL Semantics
Include interval types in expressions and added UNION
SQL User View in Ambari
Write, debug and run Hive SQL queries
Performance Improvements
2.5x performance gain
Query Scheduling
Dynamically share resources for Hive queries
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management

Developer
New user interface enables fast &
easy SQL definition and execution.

New capabilities add dynamic access methods
to feature-rich Spark applications
Data Frame API
Enables common and easy interchange between Spark
components for data imports and exports
Machine Learning
Introduces multiclass classification, clustering,
frequent pattern-mining algorithms
Enterprise-Ready
Consistent operations, comprehensive security,
deployable anywhere
Spark SQL
[Tech Preview] A new module for structured data processing
in Spark
Improvements for Apache Spark on YARN
Storage
Governance Security
Operations
Resource Management

Stream analysis, scalable across the cluster
Nimbus High Availability
No single point of failure for stream processing job
management
Ease of Deployment
Quickly create stream processing pipelines
Rolling Upgrades
Update Storm to newer versions, with zero downtime
Enhanced Security for Kafka
Authorization via Ranger and authentication via Kerberos
Streaming Analysis Ready for Mainstream Adoption
Storage
Governance Security
Operations
Resource Management

Demo: Developer

New Capabilities in HDP 2.3
Breakthrough User
Experience
Enhanced Security
and Governance
Proactive Support

HDP Security: Comprehensive, Complete, Extensible
Security in HDP is the most comprehensive, complete and extensible for Hadoop
Administration
Central management and consistent security
Only HDP delivers a single administrative console to
set policy across the entire cluster
Authentication
Authenticate users and systems
Authentication for perimeter and cluster; integrates with existing
ActiveDirectory and LDAP solutions
Authorization
Provision access to data
Provides consistent authorization controls across all
Apache components within HDP
Audit
Maintain a record of data access
Maintains a record of data access events across all
components that is consistent and accessible
Data Protection
Protect data at rest and in motion
Encrypts data in motion and data at rest; refer partner
encryption solutions for broader needs

Enhanced Security Capabilities in HDP 2.3
Project New Features
Administration
Central management
and consistent security
Ranger
• Administer Kafka, Solr and multi-tenant YARN queues
• Support for custom plugins via Ranger and Knox stacks
Authentication
Authenticate users and systems
Knox
• Bi-directional SSL support  trust between clients and servers
• LDAP data caching reduces server load, improves performance
Authorization
Provision access to data
Ranger
• Authorization for Kafka, Solr and multi-tenant YARN queues
• Hooks for dynamic policy rules (e.g., by geo-location)
Audit
Maintain a record of data access
Atlas
• Scalable metadata service
• Hive integration leverages existing metadata
• UI: Hive table lineage and domain-specific search
Data Protection
Protect data at rest and in motion
HDFS,
Ranger
• HDFS transparent data encryption (for data at rest)
• Key management store (KMS) that’s robust and highly available

Demo: Security

Learn More about Ranger
Thursday, 3:10-3:50 – Securing Hadoop with Apache Ranger: Strategies and
Best Practices
with Selvamohan Neethiraj & Velmurugan Periasamy

Extending Data Governance to Hadoop
ETL / DQ MDM
ARCHIVE
Traditional
Data Systems
Data Governance Requirements
Transparent
Governance standards and
protocols must be clearly defined
and available to all
Reproducible
Recreate the relevant data
landscape at a given point in time
Auditable
Trace all relevant events and assets
with appropriate historical lineage
Consistent
Compliance practices must be
consistent
Hadoop Data
Platform
Must snap into existing
data governance
frameworks and openly
exchange metadata
A group of companies dedicated to
meeting these requirements in the openSCM
CRM
ERP
Holistic Data
Governance
Business
Analytics
Visualization
& Dashboards

Apache Atlas Is Now Included in HDP
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag-based
Policies
Data Lifecycle
Management
Real-time Tag-based Access Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM
Scalable Metadata Service
Agile Centralized Taxonomy – Enterprise/Business unit level
modeling with industry-specific vocabulary
Operational Metadata – Extend visibility into HDFS Path,
Hive DB, table, columns
REST API – Modern, flexible access to Atlas services
Hive Integration
Hive Metadata – Leverage existing metadata with import /
export capability and capture SQL runtime metrics directly
User Interface
Hive Table Lineage and Search DSL – Support for key word,
faceted and free text searches

HDP Subscriptions Deliver
Global support coverage, 24x7x365
Hortonworks University self-paced learning
Premier Support: designated support engineer
Influence on the direction of the technology
The Hadoop Industry’s Best Subscription Value
Expansion
Architecture &
Development ProductionImplementation
Hortonworks Support
# tickets
Project 2
Project 3
Project N
.
.
.
From Architecture to Expansion
“Hortonworks loves
and lives open-source
innovation”

Hortonworks® SmartSense™ provides
comprehensive visibility into cluster issues
Hadoop Operator

Hortonworks® SmartSense™ makes
tailored recommendations based on
analysis of operational data
Hadoop Operator

Hortonworks® SmartSense™ solicits
feedback from Hadoop Operators to
optimize its recommendations
Hadoop Operator

Hadoop Operator
Hortonworks®
SmartSense™ enhances
the support subscription
Faster Case Resolution
Easily capture log files and metrics for
insight and resolution
Proactive Configuration
Via intelligent stream of cluster
analytics and data-driven
recommendations
Capacity Planning
Through proactive view into
customer’s cluster utilization

Hortonworks® SmartSense™ Resolves Issues Proactively
Integrated Customer Portal
Knowledge Base
On-Demand
Training
Customer Environment
• Any cloud
• Hybrid environment
• Multi-tenant
“5 out of 5”Enterprise Hadoop Support
Connection to the customer’s environment
via telephone or web support

Hortonworks SmartSense
Hortonworks® SmartSense™ Resolves Issues Proactively
Integrated Customer Portal
Knowledge Base
On-Demand
Training
Customer Environment
• Any cloud
• Hybrid environment
• Multi-tenant
“5 out of 5”Enterprise Hadoop Support

In Summary: New in HDP
Breakthrough User
Experience
Enhanced Security
and Governance
Proactive Support
HDP 2.3 is a Major Step Forward for
Open Enterprise Hadoop®

Thank You. Questions?

A First-Hand Look at What's New in HDP 2.3

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a A First-Hand Look at What's New in HDP 2.3

Similar a A First-Hand Look at What's New in HDP 2.3 (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

A First-Hand Look at What's New in HDP 2.3

Notas del editor