SlideShare una empresa de Scribd logo
1 de 76
About Me
 Business Intelligence Consultant, in IT for 30 years
 Microsoft, Big Data Evangelist
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference and PASS Summit
 MCSE: Data Platform and Business Intelligence
 MS: Architecting Microsoft Azure Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
Agenda
 Collect + Manage
 Transform + Analyze
 Visual + Decide
 Access Methods
 Product Groupings
 Modern Data Warehouse
 Sample architectures
The Microsoft
Data Platform
MobileReports
Natural
language
queryDashboardsApplications
StreamingRelational
Internal &
externalNon-relational NoSQL
Orchestration
Machine
learningModeling
Information
management
Complex event
processing
Transform
+ analyze
Visualize
+ decide
Collect
+ manage
Data
Secure, reliable performance
Increase speed across all your data workloads
Capture any data: structured, unstructured, and streaming
Scale your platform quickly to meet changing demands
Collect and manage diverse data types with breakthrough speed
Collect + manage
Transform
+ analyze
Visualize
+ decide
Collect
+ manage
Data
Who manages what?
Infrastructure
as a Service
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
ManagedbyMicrosoft
Youscale,make
resilient&manage
Platform
as a Service
Scale,Resilienceand
managementbyMicrosoft
Youmanage
Storage
Servers
Networking
O/S
Middleware
Virtualization
Applications
Runtime
Data
On Premises
Physical / Virtual
Youscale,makeresilientandmanage
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
Software
as a Service
Storage
Servers
Networking
O/S
Middleware
Virtualization
Applications
Runtime
Data
Scale,Resilienceand
managementbyMicrosoft
Windows Azure
Virtual Machines
Windows Azure
Cloud Services
SQL Server options
Azure SQL Database has a max
database size of 4TB; Managed
Instance max of 35TB
Potential total volume size of up
to 64 TB, 256TB soon
Benefits of the cloud
Agility
• Unlimited elastic scale
• Pay for what you need
Innovation
• Quick “Time to market”
• Fail fast
Risk
• Availability
• Reliability
• Security
Total cost of ownership calculator: https://www.tco.microsoft.com/
Cloud-born data4
Data sources
Our customer challenges
Increasing
data volumes
1
Real-time
business requests
2
New data sources
and types
3
Non-Relational Data
Parallelism
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
50 TB
100 TB
500 TB
10 TB
5 PB
1.000
100
10.000
3-5 Way
Joins
 Joins +
 OLAP operations +
 Aggregation +
 Complex “Where”
constraints +
 Views
 Parallelism
5-10 Way
Joins
Normalized
Multiple, Integrated
Stars and Normalized
Simple
Star
Multiple,
Integrated
Stars
TB’s
MB’s
GB’s
Batch Reporting,
Repetitive Queries
Ad Hoc Queries
Data Analysis/Mining
Near Real Time
Data Feeds
Daily
Load
Weekly
Load
Strategic, Tactical
Strategic
Strategic, Tactical
Loads
Strategic, Tactical
Loads, SLA
“Query Freedom“
“Query complexity“
“Data
Freshness”
“Query Data Volume“
“Query Concurrency“
“Mixed
Workload”
“Schema Sophistication“
“Data Volume”
DW SCALABILITY SPIDER CHART
MPP – Multidimensional
Scalability
SMP – Tunable in one dimension
on cost of other dimensions
The spiderweb depicts
important attributes to
consider when evaluating
Data Warehousing options.
Big Data support is newest
dimension.
Microsoft data platform solutions
Product Category Description More Info
SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic
quadrant. JSON support. Linux TBD
https://www.microsoft.com/en-us/server-
cloud/products/sql-server-2016/
SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly.
Has built-in high availability and disaster recovery. JSON
support
https://azure.microsoft.com/en-
us/services/sql-database/
SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data.
Provision and scale quickly. Can pause service to reduce
cost
https://azure.microsoft.com/en-
us/services/sql-data-warehouse/
Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and
seamless integration of all your data
https://www.microsoft.com/en-us/server-
cloud/products/analytics-platform-
system/
Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of
your data while making it faster to get up and running with
batch, streaming, and interactive analytics
https://azure.microsoft.com/en-
us/services/data-lake-store/
Azure Data Lake Analytics On-demand analytics job
service/Big Data-as-a-
service
Cloud-based service that dynamically provisions resources
so you can run queries on exabytes of data. Includes U-
SQL, a new big data query language
https://azure.microsoft.com/en-
us/services/data-lake-analytics/
HDInsight PaaS Hadoop
compute/Hadoop
clusters-as-a-service
A managed Apache Hadoop, Spark, R, HBase, Kafka, and
Storm cloud service made easy
https://azure.microsoft.com/en-
us/services/hdinsight/
Azure Cosmos DB PaaS NoSQL: Key-value,
Column-family,
Document, Graph
Globally distributed, massively scalable, multi-model, multi-
API, low latency data service – which can be used as an
operational database or a hot data lake
https://azure.microsoft.com/en-
us/services/cosmos-db/
Azure Table Storage PaaS NoSQL: Key-value
Store
Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en-
us/services/storage/tables/
Microsoft Big Data Portfolio
SQL Server Stretch
Business intelligence
Machine learning analytics
Insights
Azure SQL Database
SQL Server 2017
SQL Server 2016 Fast Track
Azure SQL DW
ADLS & ADLA
Cosmos DB
HDInsight
Hadoop
Analytics Platform System
Sequential Scale Out + AcrossScale Up
Key
Relational Non-relational
On-premisesCloud
Microsoft has solutions covering
and connecting all four
quadrants – that’s why SQL
Server is one of the most utilized
databases in the world
• Linux distributions including
RedHat Enterprise Linux (RHEL),
Ubuntu, and SUSE Enterprise
Linux (SLES)
• Docker: Windows & Linux
containers
• Windows Server / Windows 10
• Speed query performance without
tuning using new Adaptive Query
Processing
NEW*
• Maintain performance when
making app changes with
Automatic Plan Correction
NEW*
Power of SQL Server 2017 on the platform of your choice
Linux
Linux/Windows container
Windows
Order history
Name SSN Date
Jane Doe cm61ba906fd 2/28/2005
Jim Gray ox7ff654ae6d 3/18/2005
John Smith i2y36cg776rg 4/10/2005
Bill Brown nx290pldo90l 4/27/2005
Sue Daniels ypo85ba616rj 5/12/2005
Sarah Jones bns51ra806fd 5/22/2005
Jake Marks mci12hh906fj 6/07/2005
Order history
Name SSN Date
Jane Doe cm61ba906fd 2/28/2005
Jim Gray ox7ff654ae6d 3/18/2005
John Smith i2y36cg776rg 4/10/2005
Bill Brown nx290pldo90l 4/27/2005
Customer data
Product data
Order History
Stretch to cloud
Stretch SQL Server into Azure (Stretch DB)
Stretch cold data to Azure with remote query processing
App
Query
Microsoft Azure

Jim Gray ox7ff654ae6d 3/18/2005
It can handle up to 384-cores and 24TB of memory! It use the HPE 3PAR StoreServ 8450 storage array
which consists of 192 SSD drives (480GB/drive) for a total of 92TB of disk space.
Options for data warehouse solutions
Balancing flexibility
and choice
By yourself With a reference
architecture
With an appliance
Tuning and optimization
Installation
Configuration
Tuning and optimization
Installation
Configuration
Installation
Tuning and optimization
HIGH
LOW
Time to
solution
Optional, if you have hardware already
Existing or procured
hardware and support
Procured software and
support
Offerings
• SQL Server 2014/2016
• Windows Server 2012 R2/2016
• System Center 2012 R2/2016
Offerings
• Private Cloud Fast Track
• Data Warehouse Fast Track
• Build or purchase
Offerings
• Analytics Platform System
Existing or procured
hardware and support
Procured software and
support
Procured appliance and
support
HIGH
Price
A workload-specific
database system design
and validation program
for Microsoft partners
and customers
Hardware system design
• Tight specifications for servers, storage, and
networking
• Resource balanced and validated
• Latest-generation servers and storage,
including solid-state disks (SSDs)
Database configuration
• Workload-specific
• Database architecture
• SQL Server settings
• Windows Server settings
• Performance guidance
Software
• SQL Server 2016 Enterprise
• Windows Server 2012 R2
Windows Server
2012 R2
SQL Server 2016
Processors
Networking
Servers
Storage
https://www.microsoft.com/en-us/cloud-platform/data-warehouse-fast-track
Analytics Platform System (APS) for Big Data
Pre-Built Hardware + Software Appliance
• Co-engineered with HP, Dell, Quanta
• Scale-out, up to 100x performance increase
• Appliance installed in 1-2 days
• Support - Microsoft provides first call support
• Hardware partner provides onsite break/fix support
PlugandPlay Built-inBest
Practices
SaveTime On-Premise
Solution
SQL Database Service
A relational database-as-a-service, fully managed by Microsoft.
For cloud-designed apps when near-zero administration and enterprise-grade capabilities are key.
Perfect for organizations looking to dramatically increase the DB:IT ratio.
Enhancements over SQL Server
• Create database in minutes
• HA built in
• DR with a few clicks
• Scale on the fly
• 99.99% SLA
• Point-in-time restore
• Database Advisor (recommendations: index tuning, parameterized queries,
schema issues)
• Query performance insight
• Query store
• Auditing and threat detection
Unmatched app
compatibility
• Fully-fledged
SQL instance
with nearly
100% compat
with on-prem
Unmatched
PaaS capabilities
• Learns and
adapts with
customer app
Favorable
business model
• Competitive
• Transparent
• Frictionless
A flavor of SQL DB that
designed to provide easy app
migration to a fully managed
PaaS
SQL Database
(DBaaS)
Managed Instance Singleton Elastic Pool
Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cloud data warehouse with enterprise-grade capabilities.
Support your smallest to your largest data storage needs while handling queries up to 100x faster.
Azure
Data Lake Store
A hyper-scale
repository for Big Data
analytics workloads
Hadoop File System (HDFS) for the cloud
No limits to scale
Store any data in its native format
Enterprise-grade access control,
encryption at rest
Optimized for analytic workload performance
Azure
HDInsight
Hadoop and Spark
as a Service on Azure
Fully-managed Hadoop and Spark
for the cloud
100% Open Source Hortonworks
data platform
Clusters up and running in minutes
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Hortonworks Data Platform (HDP) 2.6
Simply put, Hortonworks ties all the open source products together (22)
(under the covers of HDInsight)
Azure
Data Lake Analytics
A new distributed
analytics service
Job-as-a-service
Distributed analytics service built on
Apache YARN
Elastic scale per query lets users focus on
business goals—not configuring hardware
Includes U-SQL—a language that unifies the
benefits of SQL with the expressive
power of C#
Integrates with Visual Studio to develop,
debug, and tune code faster
Federated query across Azure data sources
Enterprise-grade role based access control
Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store
Benefits
• Avoid moving large amounts of data across the network
between stores (federated query/logical data warehouse)
• Single view of data irrespective of physical location
• Minimize data proliferation issues caused by maintaining
multiple copies
• Single query language for all data
• Each data store maintains its own sovereignty
• Design choices based on the need
• Push SQL expressions to remote SQL sources
• Filters, Joins
• SELECT * FROM EXTERNAL MyDataSource EXECUTE
@”Select CustName from Customers WHERE ID=1”;
(remote queries)
• SELECT CustName FROM EXTERNAL MyDataSource
LOCATION “dbo.Customers” WHERE ID=1 (federated
queries)
U-SQL
Query
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Azure
SQL Data Warehouse
Azure
Data Lake Storage
CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology
Workload optimized,
managed clusters
Specific apps in a multi-
tenant form factor
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Hadoop Managed Hadoop Big Data as-a-service
Azure HDInsight
BIGDATA
STORAGE
BIGDATA
ANALYTICS
Bringing Big Data to everybody
Accelerate the pace of innovation through a state-of-the-art cloud platform
UserAdoption
Cloud Big Data Solution
Data lake is the center of a big data solution
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
• Inexpensively store unlimited data
• Collect all data “just in case”
• Easy integration of differently-structured data
• Store data with no modeling – “Schema on read”
• Complements EDW
• Frees up expensive EDW resources
• Hadoop cluster offers faster ETL processing over SMP solutions
• Quick user access to data
• Data exploration to see if data valuable before writing ETL and schema for relational database
• Allows use of Hadoop tools such as ETL and extreme analytics
• Place to land IoT streaming data
• On-line archive for data warehouse data
• Easily scalable
• With Hadoop, high availability built in
Data sources
What happened?
Why did
it happen?
Descriptive
Analytics
Diagnostic
Analytics
Why did it happen?
What will happen?
Predictive
Analytics
Prescriptive
Analytics
How can we make it happen?
Roles when using both Data Lake and DW
Data Lake/Hadoop (staging and processing environment)
• Batch reporting
• Data refinement/cleaning
• ETL workloads
• Store historical data
• Sandbox for data exploration
• One-time reports
• Data scientist workloads
• Quick results
Data Warehouse/RDBMS (serving and compliance environment)
• Low latency
• High number of users
• Additional security
• Large support for tools
• Easily create reports (Self-service BI)
• A data lake is just a glorified file folder with data files in it – how many end-users can accurately create reports from it?
A globally distributed, massively scalable, multi-model database service
Column-family
Document
Graph
Turnkey global distribution
Elastic scale out
of storage & throughput
Guaranteed low latency at the 99th percentile
Comprehensive SLAs
Five well-defined consistency models
Table API
Key-value
Azure Cosmos DB
MongoDB API
Relational Databases vs Non-Relational Databases (NoSQL) vs Hadoop
• RDBMS for enterprise OLTP and ACID compliance, or db’s under 5TB
• NoSQL for scaled OLTP and JSON documents
• Hadoop for big data analytics (OLAP) or Data Lake
(from my presentation “Relational Databases vs Non-Relational Databases”)
Publish-subscribe data
distribution
Managed PaaS (Platform
as a Service) solution
Scales with your needs to
millions of events per
second
Provides a durable buffer
between event publishers
and event consumers
Azure Event Hubs
Azure Stream Analytics
Process real-time data in Azure
Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure,
and applications
Performs time-sensitive analysis using SQL-like language against multiple real-time streams and
reference data
Outputs to persistent stores, dashboards or back to devices
Point of
Service Devices
Self Checkout
Stations
Kiosks
Smart
Phones
Slates/
Tablets
PCs/
Laptops
Servers
Digital
Signs
Diagnostic
EquipmentRemote Medical
Monitors
Logic
Controllers
Specialized
DevicesThin
Clients
Handhelds
Security
POS
Terminals
Automation
Devices
Vending
Machines
Kinect
ATM
SQL Server on Linux
(Preview today, GA in
mid-2017)
Red Hat - Microsoft
Partnership
(Nov 2015)
Microsoft joins Eclipse
Foundation (Mar 2016).
HD Insight PaaS on
Linux GA (Sep 2015)
C:Usersmarkhill>
root@localhost: #
bash
Azure Marketplace 60% of all images in
Azure Marketplace
are based on
Linux/OSS
In partnership with the Linux
Foundation, Microsoft releases the
Microsoft Certified Solutions Associate
(MCSA) Linux on Azure certification.
493,141,677 ?????? Microsoft Open Source Hub
Ross Gardler: President Apache Software
Foundation
Wim Coekaerts: Oracle’s Mr Linux
1 out of 4 VMs on Azure runs
Linux, and getting larger every
day
• 28.9% of All VMs are Linux
• >50% of new VMs
Microsoft Products vs Hadoop/OSS Products
Microsoft Product Hadoop/Open Source Software Product
Office365/Excel OpenOffice/Calc
DocumentDB MongoDB, HBase, Cassandra
SQL Database SQLite, MySQL, PostgreSQL, MariaDB
Azure Data Lake Analytics/YARN None
Azure VM/IaaS OpenStack
Blob Storage HDFS, Ceph (Note: These are distributed file systems and Blob storage is not distributed)
Azure HBase Apache HBase (Azure HBase is a service wrapped around Apache HBase), Apache Trafodion
Event Hub Apache Kafka
Azure Stream Analytics Apache Storm, Apache Spark, Twitter Heron
Power BI Apache Zeppelin, Apache Jupyter, Airbnb Caravel, Kibana
HDInsight Hortonworks (pay), Cloudera (pay), MapR (pay)
Azure ML Apache Mahout, Apache Spark MLib
Microsoft R Open R
SQL Data Warehouse Apache Hive, Apache Drill, Presto
IoT Hub Apache NiFi
Azure Data Factory Apache Falcon, Apache Oozie, Airbnb Airflow
Azure Data Lake Storage/WebHDFS HDFS Ozone
Azure Analysis Services/SSAS Apache Kylin, Apache Lens, AtScale (pay)
SQL Server Reporting Services None
Hadoop Indexes Jethro Data (pay)
Azure Data Catalog Apache Atlas
PolyBase Apache Drill
Azure Search Apache Solr, Apache ElasticSearch (Azure Search build on ES)
Others Apache Flink, Apache Ambari, Apache Ranger, Apache Knox
Note: Many of the Hadoop/OSS products are available in Azure
Connect, combine, and refine any data
Create data marts and publish reports
Build and test predictive models
Curate and catalog any data
Transform + analyze
Transform
+ analyze
Visualize
+ decide
Collect
+ manage
Data
Transform and analyze data for anyone to access anywhere
Make sense of disparate data and prepare it for analysis
Connect, combine, and refine any data
Integration, Data Quality
and Master Data Services
• Rich support for ETL tasks
• Data cleansing and matching
• Manage master data structures
Connect any data and
all volumes in real time
• Social data
• SAP and Dynamics data
• Machine data
SQL Server Analysis Services
Azure Analysis Services
Azure Analysis Services is based on the proven analytics engine that has helped
organizations turn complex data into a trusted, single source of truth for years.
Built for
hybrid data
Access and model
data on-premises,
in the cloud, or both
Interactive
visualization
Quick, highly interactive
self-service data discovery
with support of major
data visualization tools
Proven
technology
Powerful, proven tabular
models built from SQL Server
2016 Analysis Services
Cloud
powered
Easy to deploy, scale, and
manage as a platform-as-
a-service solution
SSAS/Azure Analysis Services Cubes
Reasons to report off cubes instead of the data warehouse:
 Semantic layer
 Handle many concurrent users
 Aggregating data for performance
 Multidimensional analysis
 No joins or relationships
 Hierarchies, KPI’s
 Security
 Advanced time-calculations
 Slowly Changing Dimensions (SCD)
 Required for some reporting tools
Use the power of machine learning to predict future trends or behavior
Build and test predictive models
• HDInsight
• SQL Server VM
• SQL DB
• Blobs and tables
Publish API in minutes
Devices Applications Dashboards
Data Microsoft Azure Machine Learning API
Storage space Web
Microsoft
Azure portal
Workspace
ML
Studio
Business problem Business valueModeling Deployment
• Desktop files
• Excel spreadsheet
• Other data
files on PC
Cloud
Local
Azure Machine Learning
Get started with just a browser
Requires no provisioning; simply log
on to your Azure subscription or try
it for free off azure.com/ml
Experience the power of choice
Choose from hundreds of algorithms
and packages from R and Python or
drop in your own custom code
Take advantage of business-tested
algorithms from Xbox and Bing
Deploy solutions in minutes
With the click of a button, deploy
the finished model as a web service
that can connect to any data,
anywhere
Connect to the world
Brand and monetize solutions on
our global Machine Learning
Marketplace
https://datamarket.azure.com/
Beyond business intelligence – machine intelligence
Microsoft Azure
Machine Learning Studio
Modeling environment (shown)
Microsoft Azure
Machine Learning API service
Model in production as a web service
Microsoft Azure
Machine Learning Marketplace
APIs and solutions for broad use
SQL Server
R Services
Linux
Hadoop Teradata
Windows
CommercialCommunity
R ServerR Open
Enable enterprise-wide self-service data source registration and discovery
A metadata repository that allow users to register, enrich,
understand, discover, and consume data sources
Delivers differentiated value though
‒ Data source discovery; rather than data discovery
‒ Support for data from any source; Structured and
unstructured, on premises and in the cloud
‒ Publishing, discovery and consumption through any tool
‒ Annotation crowdsourcing: empowering any user to
capture and share their knowledge.
This, while allowing IT to maintain control and oversight
Azure Data Factory
Connect to relational or non-
relational data that is on-
premises or in the cloud
Orchestrate data movement &
data processing
Publish to Power BI users as a
searchable data view
Operationalize (schedule,
manage, debug) workflows
Lifecycle management,
monitoring
Orchestrate trusted information production in Azure
Microsoft Confidential – Under Strict NDA
C#
MapReduce
Hive
Pig
Stored Procedures
Azure Machine Learning
Discover, explore, and combine any data type or size,
regardless of location
Ask questions of data to visualize, analyze,
and forecast
Make faster decisions, share broadly,
and access insights on any device
Visualize + decide
Transform
+ analyze
Visualize
+ decide
Collect
+ manage
Data
Visualize data and make decisions quickly using everyday tools
Power BI Overview
Power BI PlatformPower BI Desktop
Prepare Explore ShareReport
Power BI Service
Data refresh
Visualizations
Live dashboards
Content packs Sharing & collaborationNatural language query
Reports
Datasets01001
10101
</> embed, extend, integrate
Data sources
Cloud-based SaaS solutions
e.g. Marketo, Salesforce, Quickbooks,
Google Analytics, …
On-premises data
e.g. Analysis Services, SQL Server
Organizational content packs
Corporate data sources or external
data services
Azure services
Azure SQL, Stream Analytics…
Excel and CSV files
Workbook data, flat files
Power BI Desktop files
Data from files, databases, Azure,
Online Services, and other sources
Power BI Desktop Create Power BI Content
Connect to data and build reports for Power BI
146.03K145.84K145.96K146.06K 40.08K38.84K39.99K40.33K
Tools Defined
• Front-end (Excel) or Power BI Desktop
• Data shaping and cleanup, self-service ETL (Power Query)
• Data analysis (Power Pivot)
• Visualization and data discovery (Power View, Power Map)
• Dashboarding (Power BI Dashboard)
• Publishing and sharing (Power BI Service)
• Natural language query (Power BI Q&A)
• Mobile (Power BI for Mobile)
• Access on-premise data (DMG, Analysis Services Connector)
• Power BI Service updated bi-weekly, Power BI Desktop updated monthly
Power
Query
Power
Pivot
Power
View
Power
Map
Power BI
Desktop
Power BI
Dashboard
Power BI Service
Power BI
Q&A
Power BI
for mobile
SQL Server Reporting Services
www.botframework.com
Microsoft
Cognitive
Services
Give your apps
a human side
Cognitive Services API Collection
Connect live to your on-premises data
Live Query & Scheduled Data Refresh
PolyBase
Query relational and non-relational data with T-SQL
By preview this year PolyBase will add support for Teradata, Oracle, SQL Server,
MongoDB, and generic ODBC (Spark, Hive, Impala, DB2)
vs U-SQL: PolyBase is interactive while U-SQL is batch. PolyBase extents T-SQL onto
data via views while U-SQL natively operates on data and virtualizes access to other
SQL data sources (no metadata needed) and supports more formats (JSON) and
libraries/UDOs
PolyBase use cases
Cortana Intelligence Suite
Transform data into intelligent action
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and
devices
Data
Stream Analytics
TransformIngest
Example overall data flow and Architecture
Web logs
Present &
decide
IoT, Mobile Devices
etc.
Social Data
Event Hubs HDInsight
Azure Data
Factory
Azure SQL DB
Azure Blob Storage
Azure Machine
Learning
(Fraud detection
etc.)
Power BI
Web
dashboards
Mobile devices
DW / Long-term
storage
Predictive
analytics
Event & data
producers
Analytics Platform Sys.
BI and analytics
Data management and processing
Data sources Non-relational data
Data enrichment and federated query
OLTP ERP CRM LOB Devices Web Sensors Social
Self-service Corporate Collaboration Mobile Machine learning
Single query model Extract, transform, load Data quality Master data management
Box software Appliances Cloud
SQL Server
Box software Appliances Cloud
Any BI tool
Advanced Analytics
Any languageBig Data processing
Data warehousing
Relational data
Dashboards | Reporting
Mobile BI | Cubes
Machine Learning
Stream analytics | Cognitive | AI
.NET | Java | R | Python
Ruby | PHP | Scala
Non-relational data
Datavirtualization
OLTP ERP CRM LOB
The Data Management Platform for Analytics
Social media DevicesWeb Media
On-premises Cloud
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumption,
BI/visualization)
Consume
(Alerts, Operational
Stats, Insights)
Lambda Architecture : Interactive Analytics Pipeline
Data Consumption
(Ingestion)
Stream Layer (data in motion)
Batch Layer (data at rest)
Presentation/Serving
Layer
Near Realtime Data Analytics Pipeline using Azure Steam Analytics
Big Data Analytics Pipeline using Azure Data Lake
Interactive Analytics and Predictive Pipeline using Azure Data Factory
Base Architecture : Big Data Advanced Analytics Pipeline
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumption,
BI/visualization)
Consume
(Alerts, Operational
Stats, Insights)
Machine Learning
(Failure and RCA
Predictions)
Telemetry
Azure SQL
(Predictions)
HDI Custom ETL
Aggregate /Partition
Azure Storage Blob
dashboard of
predictions /
alerts
Live / real-time data
stats, Anomalies and
aggregates
Custome
r MIS
Event
Hub
PowerBI
dashboard
Stream Analytics
(real-time analytics)
Azure Data Lake Analytics
(Big Data Processing)
Azure Data Lake
Storage
Azure SQL
(COL + TACOPS)
Data
in
MotionData
at
Rest
dashboard of
operational
stats FDS +
SDS
(Shared with field
Ops, customers,
MIS, and Engineers)
Scheduledhourly
transferusingAzure
DataFactory
Machine
Learning
(Anomaly Detection)
Schneider Electric Architecture
Event hubs
Machine
Learning
Flatten &
Metadata Join
Data Factory: Move Data, Orchestrate, Schedule, and Monitor
Machine
Learning Azure SQL
Data Warehouse
Power BI
INGEST PREPARE ANALYZE PUBLISH
ASA Job Rule #2
CONSUMEDATA SOURCES
Cortana
Web/LOB
Dashboards
On Premise
Hot Path
Cold Path
Archived
Data
Data Lake
Store
Simulated Sensors
and devices
Blobs –
Reference Data
Event hubs ASA Job Rule #1
Event hubs
Real-time Scoring
Aggregated Data
Data Lake
Store
CSV Data
Data Lake
Store
Data Lake
Analytics
Batch Scoring
Offline Training
Hourly, Daily,
Monthly Roll Ups
Ingestion
Batch
PresentationSpeed
Summary
Understand at a high
level all the
Microsoft data
platform products
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck will be posted)

Más contenido relacionado

La actualidad más candente

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql databasePARIKSHIT SAVJANI
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 

La actualidad más candente (20)

Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Migrating on premises workload to azure sql database
Migrating on premises workload to azure sql databaseMigrating on premises workload to azure sql database
Migrating on premises workload to azure sql database
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 

Similar a Microsoft Data Platform - What's included

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Modernization sql server 2016
Modernization   sql server 2016Modernization   sql server 2016
Modernization sql server 2016Kiki Noviandi
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?James Serra
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsAlluxio, Inc.
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applicationsdecode2016
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 

Similar a Microsoft Data Platform - What's included (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Modernization sql server 2016
Modernization   sql server 2016Modernization   sql server 2016
Modernization sql server 2016
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Should I move my database to the cloud?
Should I move my database to the cloud?Should I move my database to the cloud?
Should I move my database to the cloud?
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 

Más de James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
How to build your career
How to build your careerHow to build your career
How to build your careerJames Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at itJames Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine LearningJames Serra
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridJames Serra
 

Más de James Serra (17)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Microsoft Data Platform - What's included

  • 1.
  • 2. About Me  Business Intelligence Consultant, in IT for 30 years  Microsoft, Big Data Evangelist  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE: Data Platform and Business Intelligence  MS: Architecting Microsoft Azure Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 3. Agenda  Collect + Manage  Transform + Analyze  Visual + Decide  Access Methods  Product Groupings  Modern Data Warehouse  Sample architectures
  • 4. The Microsoft Data Platform MobileReports Natural language queryDashboardsApplications StreamingRelational Internal & externalNon-relational NoSQL Orchestration Machine learningModeling Information management Complex event processing Transform + analyze Visualize + decide Collect + manage Data
  • 5. Secure, reliable performance Increase speed across all your data workloads Capture any data: structured, unstructured, and streaming Scale your platform quickly to meet changing demands Collect and manage diverse data types with breakthrough speed Collect + manage Transform + analyze Visualize + decide Collect + manage Data
  • 6.
  • 7. Who manages what? Infrastructure as a Service Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime ManagedbyMicrosoft Youscale,make resilient&manage Platform as a Service Scale,Resilienceand managementbyMicrosoft Youmanage Storage Servers Networking O/S Middleware Virtualization Applications Runtime Data On Premises Physical / Virtual Youscale,makeresilientandmanage Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime Software as a Service Storage Servers Networking O/S Middleware Virtualization Applications Runtime Data Scale,Resilienceand managementbyMicrosoft Windows Azure Virtual Machines Windows Azure Cloud Services
  • 8. SQL Server options Azure SQL Database has a max database size of 4TB; Managed Instance max of 35TB Potential total volume size of up to 64 TB, 256TB soon
  • 9. Benefits of the cloud Agility • Unlimited elastic scale • Pay for what you need Innovation • Quick “Time to market” • Fail fast Risk • Availability • Reliability • Security Total cost of ownership calculator: https://www.tco.microsoft.com/
  • 10. Cloud-born data4 Data sources Our customer challenges Increasing data volumes 1 Real-time business requests 2 New data sources and types 3 Non-Relational Data
  • 11. Parallelism • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  • 12. 50 TB 100 TB 500 TB 10 TB 5 PB 1.000 100 10.000 3-5 Way Joins  Joins +  OLAP operations +  Aggregation +  Complex “Where” constraints +  Views  Parallelism 5-10 Way Joins Normalized Multiple, Integrated Stars and Normalized Simple Star Multiple, Integrated Stars TB’s MB’s GB’s Batch Reporting, Repetitive Queries Ad Hoc Queries Data Analysis/Mining Near Real Time Data Feeds Daily Load Weekly Load Strategic, Tactical Strategic Strategic, Tactical Loads Strategic, Tactical Loads, SLA “Query Freedom“ “Query complexity“ “Data Freshness” “Query Data Volume“ “Query Concurrency“ “Mixed Workload” “Schema Sophistication“ “Data Volume” DW SCALABILITY SPIDER CHART MPP – Multidimensional Scalability SMP – Tunable in one dimension on cost of other dimensions The spiderweb depicts important attributes to consider when evaluating Data Warehousing options. Big Data support is newest dimension.
  • 13. Microsoft data platform solutions Product Category Description More Info SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic quadrant. JSON support. Linux TBD https://www.microsoft.com/en-us/server- cloud/products/sql-server-2016/ SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly. Has built-in high availability and disaster recovery. JSON support https://azure.microsoft.com/en- us/services/sql-database/ SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data. Provision and scale quickly. Can pause service to reduce cost https://azure.microsoft.com/en- us/services/sql-data-warehouse/ Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and seamless integration of all your data https://www.microsoft.com/en-us/server- cloud/products/analytics-platform- system/ Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics https://azure.microsoft.com/en- us/services/data-lake-store/ Azure Data Lake Analytics On-demand analytics job service/Big Data-as-a- service Cloud-based service that dynamically provisions resources so you can run queries on exabytes of data. Includes U- SQL, a new big data query language https://azure.microsoft.com/en- us/services/data-lake-analytics/ HDInsight PaaS Hadoop compute/Hadoop clusters-as-a-service A managed Apache Hadoop, Spark, R, HBase, Kafka, and Storm cloud service made easy https://azure.microsoft.com/en- us/services/hdinsight/ Azure Cosmos DB PaaS NoSQL: Key-value, Column-family, Document, Graph Globally distributed, massively scalable, multi-model, multi- API, low latency data service – which can be used as an operational database or a hot data lake https://azure.microsoft.com/en- us/services/cosmos-db/ Azure Table Storage PaaS NoSQL: Key-value Store Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en- us/services/storage/tables/
  • 14. Microsoft Big Data Portfolio SQL Server Stretch Business intelligence Machine learning analytics Insights Azure SQL Database SQL Server 2017 SQL Server 2016 Fast Track Azure SQL DW ADLS & ADLA Cosmos DB HDInsight Hadoop Analytics Platform System Sequential Scale Out + AcrossScale Up Key Relational Non-relational On-premisesCloud Microsoft has solutions covering and connecting all four quadrants – that’s why SQL Server is one of the most utilized databases in the world
  • 15. • Linux distributions including RedHat Enterprise Linux (RHEL), Ubuntu, and SUSE Enterprise Linux (SLES) • Docker: Windows & Linux containers • Windows Server / Windows 10 • Speed query performance without tuning using new Adaptive Query Processing NEW* • Maintain performance when making app changes with Automatic Plan Correction NEW* Power of SQL Server 2017 on the platform of your choice Linux Linux/Windows container Windows
  • 16. Order history Name SSN Date Jane Doe cm61ba906fd 2/28/2005 Jim Gray ox7ff654ae6d 3/18/2005 John Smith i2y36cg776rg 4/10/2005 Bill Brown nx290pldo90l 4/27/2005 Sue Daniels ypo85ba616rj 5/12/2005 Sarah Jones bns51ra806fd 5/22/2005 Jake Marks mci12hh906fj 6/07/2005 Order history Name SSN Date Jane Doe cm61ba906fd 2/28/2005 Jim Gray ox7ff654ae6d 3/18/2005 John Smith i2y36cg776rg 4/10/2005 Bill Brown nx290pldo90l 4/27/2005 Customer data Product data Order History Stretch to cloud Stretch SQL Server into Azure (Stretch DB) Stretch cold data to Azure with remote query processing App Query Microsoft Azure  Jim Gray ox7ff654ae6d 3/18/2005
  • 17. It can handle up to 384-cores and 24TB of memory! It use the HPE 3PAR StoreServ 8450 storage array which consists of 192 SSD drives (480GB/drive) for a total of 92TB of disk space.
  • 18. Options for data warehouse solutions Balancing flexibility and choice By yourself With a reference architecture With an appliance Tuning and optimization Installation Configuration Tuning and optimization Installation Configuration Installation Tuning and optimization HIGH LOW Time to solution Optional, if you have hardware already Existing or procured hardware and support Procured software and support Offerings • SQL Server 2014/2016 • Windows Server 2012 R2/2016 • System Center 2012 R2/2016 Offerings • Private Cloud Fast Track • Data Warehouse Fast Track • Build or purchase Offerings • Analytics Platform System Existing or procured hardware and support Procured software and support Procured appliance and support HIGH Price
  • 19. A workload-specific database system design and validation program for Microsoft partners and customers Hardware system design • Tight specifications for servers, storage, and networking • Resource balanced and validated • Latest-generation servers and storage, including solid-state disks (SSDs) Database configuration • Workload-specific • Database architecture • SQL Server settings • Windows Server settings • Performance guidance Software • SQL Server 2016 Enterprise • Windows Server 2012 R2 Windows Server 2012 R2 SQL Server 2016 Processors Networking Servers Storage https://www.microsoft.com/en-us/cloud-platform/data-warehouse-fast-track
  • 20. Analytics Platform System (APS) for Big Data Pre-Built Hardware + Software Appliance • Co-engineered with HP, Dell, Quanta • Scale-out, up to 100x performance increase • Appliance installed in 1-2 days • Support - Microsoft provides first call support • Hardware partner provides onsite break/fix support PlugandPlay Built-inBest Practices SaveTime On-Premise Solution
  • 21. SQL Database Service A relational database-as-a-service, fully managed by Microsoft. For cloud-designed apps when near-zero administration and enterprise-grade capabilities are key. Perfect for organizations looking to dramatically increase the DB:IT ratio.
  • 22. Enhancements over SQL Server • Create database in minutes • HA built in • DR with a few clicks • Scale on the fly • 99.99% SLA • Point-in-time restore • Database Advisor (recommendations: index tuning, parameterized queries, schema issues) • Query performance insight • Query store • Auditing and threat detection
  • 23. Unmatched app compatibility • Fully-fledged SQL instance with nearly 100% compat with on-prem Unmatched PaaS capabilities • Learns and adapts with customer app Favorable business model • Competitive • Transparent • Frictionless A flavor of SQL DB that designed to provide easy app migration to a fully managed PaaS SQL Database (DBaaS) Managed Instance Singleton Elastic Pool
  • 24. Azure SQL Data Warehouse A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities. Support your smallest to your largest data storage needs while handling queries up to 100x faster.
  • 25. Azure Data Lake Store A hyper-scale repository for Big Data analytics workloads Hadoop File System (HDFS) for the cloud No limits to scale Store any data in its native format Enterprise-grade access control, encryption at rest Optimized for analytic workload performance
  • 26. Azure HDInsight Hadoop and Spark as a Service on Azure Fully-managed Hadoop and Spark for the cloud 100% Open Source Hortonworks data platform Clusters up and running in minutes Managed, monitored and supported by Microsoft with the industry’s best SLA Familiar BI tools for analysis, or open source notebooks for interactive data science 63% lower TCO than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  • 27. Hortonworks Data Platform (HDP) 2.6 Simply put, Hortonworks ties all the open source products together (22) (under the covers of HDInsight)
  • 28. Azure Data Lake Analytics A new distributed analytics service Job-as-a-service Distributed analytics service built on Apache YARN Elastic scale per query lets users focus on business goals—not configuring hardware Includes U-SQL—a language that unifies the benefits of SQL with the expressive power of C# Integrates with Visual Studio to develop, debug, and tune code faster Federated query across Azure data sources Enterprise-grade role based access control
  • 29. Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores (federated query/logical data warehouse) • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters, Joins • SELECT * FROM EXTERNAL MyDataSource EXECUTE @”Select CustName from Customers WHERE ID=1”; (remote queries) • SELECT CustName FROM EXTERNAL MyDataSource LOCATION “dbo.Customers” WHERE ID=1 (federated queries) U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  • 30. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology Workload optimized, managed clusters Specific apps in a multi- tenant form factor Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Hadoop Managed Hadoop Big Data as-a-service Azure HDInsight BIGDATA STORAGE BIGDATA ANALYTICS Bringing Big Data to everybody Accelerate the pace of innovation through a state-of-the-art cloud platform UserAdoption
  • 31. Cloud Big Data Solution
  • 32. Data lake is the center of a big data solution A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • Inexpensively store unlimited data • Collect all data “just in case” • Easy integration of differently-structured data • Store data with no modeling – “Schema on read” • Complements EDW • Frees up expensive EDW resources • Hadoop cluster offers faster ETL processing over SMP solutions • Quick user access to data • Data exploration to see if data valuable before writing ETL and schema for relational database • Allows use of Hadoop tools such as ETL and extreme analytics • Place to land IoT streaming data • On-line archive for data warehouse data • Easily scalable • With Hadoop, high availability built in
  • 33. Data sources What happened? Why did it happen? Descriptive Analytics Diagnostic Analytics Why did it happen? What will happen? Predictive Analytics Prescriptive Analytics How can we make it happen?
  • 34. Roles when using both Data Lake and DW Data Lake/Hadoop (staging and processing environment) • Batch reporting • Data refinement/cleaning • ETL workloads • Store historical data • Sandbox for data exploration • One-time reports • Data scientist workloads • Quick results Data Warehouse/RDBMS (serving and compliance environment) • Low latency • High number of users • Additional security • Large support for tools • Easily create reports (Self-service BI) • A data lake is just a glorified file folder with data files in it – how many end-users can accurately create reports from it?
  • 35. A globally distributed, massively scalable, multi-model database service Column-family Document Graph Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Table API Key-value Azure Cosmos DB MongoDB API
  • 36. Relational Databases vs Non-Relational Databases (NoSQL) vs Hadoop • RDBMS for enterprise OLTP and ACID compliance, or db’s under 5TB • NoSQL for scaled OLTP and JSON documents • Hadoop for big data analytics (OLAP) or Data Lake (from my presentation “Relational Databases vs Non-Relational Databases”)
  • 37. Publish-subscribe data distribution Managed PaaS (Platform as a Service) solution Scales with your needs to millions of events per second Provides a durable buffer between event publishers and event consumers Azure Event Hubs
  • 38. Azure Stream Analytics Process real-time data in Azure Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data Outputs to persistent stores, dashboards or back to devices Point of Service Devices Self Checkout Stations Kiosks Smart Phones Slates/ Tablets PCs/ Laptops Servers Digital Signs Diagnostic EquipmentRemote Medical Monitors Logic Controllers Specialized DevicesThin Clients Handhelds Security POS Terminals Automation Devices Vending Machines Kinect ATM
  • 39. SQL Server on Linux (Preview today, GA in mid-2017) Red Hat - Microsoft Partnership (Nov 2015) Microsoft joins Eclipse Foundation (Mar 2016). HD Insight PaaS on Linux GA (Sep 2015) C:Usersmarkhill> root@localhost: # bash Azure Marketplace 60% of all images in Azure Marketplace are based on Linux/OSS In partnership with the Linux Foundation, Microsoft releases the Microsoft Certified Solutions Associate (MCSA) Linux on Azure certification. 493,141,677 ?????? Microsoft Open Source Hub Ross Gardler: President Apache Software Foundation Wim Coekaerts: Oracle’s Mr Linux 1 out of 4 VMs on Azure runs Linux, and getting larger every day • 28.9% of All VMs are Linux • >50% of new VMs
  • 40. Microsoft Products vs Hadoop/OSS Products Microsoft Product Hadoop/Open Source Software Product Office365/Excel OpenOffice/Calc DocumentDB MongoDB, HBase, Cassandra SQL Database SQLite, MySQL, PostgreSQL, MariaDB Azure Data Lake Analytics/YARN None Azure VM/IaaS OpenStack Blob Storage HDFS, Ceph (Note: These are distributed file systems and Blob storage is not distributed) Azure HBase Apache HBase (Azure HBase is a service wrapped around Apache HBase), Apache Trafodion Event Hub Apache Kafka Azure Stream Analytics Apache Storm, Apache Spark, Twitter Heron Power BI Apache Zeppelin, Apache Jupyter, Airbnb Caravel, Kibana HDInsight Hortonworks (pay), Cloudera (pay), MapR (pay) Azure ML Apache Mahout, Apache Spark MLib Microsoft R Open R SQL Data Warehouse Apache Hive, Apache Drill, Presto IoT Hub Apache NiFi Azure Data Factory Apache Falcon, Apache Oozie, Airbnb Airflow Azure Data Lake Storage/WebHDFS HDFS Ozone Azure Analysis Services/SSAS Apache Kylin, Apache Lens, AtScale (pay) SQL Server Reporting Services None Hadoop Indexes Jethro Data (pay) Azure Data Catalog Apache Atlas PolyBase Apache Drill Azure Search Apache Solr, Apache ElasticSearch (Azure Search build on ES) Others Apache Flink, Apache Ambari, Apache Ranger, Apache Knox Note: Many of the Hadoop/OSS products are available in Azure
  • 41. Connect, combine, and refine any data Create data marts and publish reports Build and test predictive models Curate and catalog any data Transform + analyze Transform + analyze Visualize + decide Collect + manage Data Transform and analyze data for anyone to access anywhere
  • 42.
  • 43. Make sense of disparate data and prepare it for analysis Connect, combine, and refine any data Integration, Data Quality and Master Data Services • Rich support for ETL tasks • Data cleansing and matching • Manage master data structures Connect any data and all volumes in real time • Social data • SAP and Dynamics data • Machine data
  • 45. Azure Analysis Services Azure Analysis Services is based on the proven analytics engine that has helped organizations turn complex data into a trusted, single source of truth for years. Built for hybrid data Access and model data on-premises, in the cloud, or both Interactive visualization Quick, highly interactive self-service data discovery with support of major data visualization tools Proven technology Powerful, proven tabular models built from SQL Server 2016 Analysis Services Cloud powered Easy to deploy, scale, and manage as a platform-as- a-service solution
  • 46. SSAS/Azure Analysis Services Cubes Reasons to report off cubes instead of the data warehouse:  Semantic layer  Handle many concurrent users  Aggregating data for performance  Multidimensional analysis  No joins or relationships  Hierarchies, KPI’s  Security  Advanced time-calculations  Slowly Changing Dimensions (SCD)  Required for some reporting tools
  • 47. Use the power of machine learning to predict future trends or behavior Build and test predictive models • HDInsight • SQL Server VM • SQL DB • Blobs and tables Publish API in minutes Devices Applications Dashboards Data Microsoft Azure Machine Learning API Storage space Web Microsoft Azure portal Workspace ML Studio Business problem Business valueModeling Deployment • Desktop files • Excel spreadsheet • Other data files on PC Cloud Local
  • 48. Azure Machine Learning Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and Bing Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/ Beyond business intelligence – machine intelligence Microsoft Azure Machine Learning Studio Modeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web service Microsoft Azure Machine Learning Marketplace APIs and solutions for broad use
  • 49. SQL Server R Services Linux Hadoop Teradata Windows CommercialCommunity R ServerR Open
  • 50. Enable enterprise-wide self-service data source registration and discovery A metadata repository that allow users to register, enrich, understand, discover, and consume data sources Delivers differentiated value though ‒ Data source discovery; rather than data discovery ‒ Support for data from any source; Structured and unstructured, on premises and in the cloud ‒ Publishing, discovery and consumption through any tool ‒ Annotation crowdsourcing: empowering any user to capture and share their knowledge. This, while allowing IT to maintain control and oversight
  • 51. Azure Data Factory Connect to relational or non- relational data that is on- premises or in the cloud Orchestrate data movement & data processing Publish to Power BI users as a searchable data view Operationalize (schedule, manage, debug) workflows Lifecycle management, monitoring Orchestrate trusted information production in Azure Microsoft Confidential – Under Strict NDA C# MapReduce Hive Pig Stored Procedures Azure Machine Learning
  • 52. Discover, explore, and combine any data type or size, regardless of location Ask questions of data to visualize, analyze, and forecast Make faster decisions, share broadly, and access insights on any device Visualize + decide Transform + analyze Visualize + decide Collect + manage Data Visualize data and make decisions quickly using everyday tools
  • 53.
  • 54. Power BI Overview Power BI PlatformPower BI Desktop Prepare Explore ShareReport Power BI Service Data refresh Visualizations Live dashboards Content packs Sharing & collaborationNatural language query Reports Datasets01001 10101 </> embed, extend, integrate Data sources Cloud-based SaaS solutions e.g. Marketo, Salesforce, Quickbooks, Google Analytics, … On-premises data e.g. Analysis Services, SQL Server Organizational content packs Corporate data sources or external data services Azure services Azure SQL, Stream Analytics… Excel and CSV files Workbook data, flat files Power BI Desktop files Data from files, databases, Azure, Online Services, and other sources
  • 55. Power BI Desktop Create Power BI Content Connect to data and build reports for Power BI
  • 57.
  • 58.
  • 59. Tools Defined • Front-end (Excel) or Power BI Desktop • Data shaping and cleanup, self-service ETL (Power Query) • Data analysis (Power Pivot) • Visualization and data discovery (Power View, Power Map) • Dashboarding (Power BI Dashboard) • Publishing and sharing (Power BI Service) • Natural language query (Power BI Q&A) • Mobile (Power BI for Mobile) • Access on-premise data (DMG, Analysis Services Connector) • Power BI Service updated bi-weekly, Power BI Desktop updated monthly Power Query Power Pivot Power View Power Map Power BI Desktop Power BI Dashboard Power BI Service Power BI Q&A Power BI for mobile
  • 61.
  • 63. Microsoft Cognitive Services Give your apps a human side Cognitive Services API Collection
  • 64. Connect live to your on-premises data Live Query & Scheduled Data Refresh
  • 65. PolyBase Query relational and non-relational data with T-SQL By preview this year PolyBase will add support for Teradata, Oracle, SQL Server, MongoDB, and generic ODBC (Spark, Hive, Impala, DB2) vs U-SQL: PolyBase is interactive while U-SQL is batch. PolyBase extents T-SQL onto data via views while U-SQL natively operates on data and virtualizes access to other SQL data sources (no metadata needed) and supports more formats (JSON) and libraries/UDOs
  • 67. Cortana Intelligence Suite Transform data into intelligent action Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  • 68. Stream Analytics TransformIngest Example overall data flow and Architecture Web logs Present & decide IoT, Mobile Devices etc. Social Data Event Hubs HDInsight Azure Data Factory Azure SQL DB Azure Blob Storage Azure Machine Learning (Fraud detection etc.) Power BI Web dashboards Mobile devices DW / Long-term storage Predictive analytics Event & data producers Analytics Platform Sys.
  • 69. BI and analytics Data management and processing Data sources Non-relational data Data enrichment and federated query OLTP ERP CRM LOB Devices Web Sensors Social Self-service Corporate Collaboration Mobile Machine learning Single query model Extract, transform, load Data quality Master data management Box software Appliances Cloud SQL Server Box software Appliances Cloud
  • 70. Any BI tool Advanced Analytics Any languageBig Data processing Data warehousing Relational data Dashboards | Reporting Mobile BI | Cubes Machine Learning Stream analytics | Cognitive | AI .NET | Java | R | Python Ruby | PHP | Scala Non-relational data Datavirtualization OLTP ERP CRM LOB The Data Management Platform for Analytics Social media DevicesWeb Media On-premises Cloud
  • 71. Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Lambda Architecture : Interactive Analytics Pipeline Data Consumption (Ingestion) Stream Layer (data in motion) Batch Layer (data at rest) Presentation/Serving Layer
  • 72. Near Realtime Data Analytics Pipeline using Azure Steam Analytics Big Data Analytics Pipeline using Azure Data Lake Interactive Analytics and Predictive Pipeline using Azure Data Factory Base Architecture : Big Data Advanced Analytics Pipeline Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Machine Learning (Failure and RCA Predictions) Telemetry Azure SQL (Predictions) HDI Custom ETL Aggregate /Partition Azure Storage Blob dashboard of predictions / alerts Live / real-time data stats, Anomalies and aggregates Custome r MIS Event Hub PowerBI dashboard Stream Analytics (real-time analytics) Azure Data Lake Analytics (Big Data Processing) Azure Data Lake Storage Azure SQL (COL + TACOPS) Data in MotionData at Rest dashboard of operational stats FDS + SDS (Shared with field Ops, customers, MIS, and Engineers) Scheduledhourly transferusingAzure DataFactory Machine Learning (Anomaly Detection)
  • 73.
  • 74. Schneider Electric Architecture Event hubs Machine Learning Flatten & Metadata Join Data Factory: Move Data, Orchestrate, Schedule, and Monitor Machine Learning Azure SQL Data Warehouse Power BI INGEST PREPARE ANALYZE PUBLISH ASA Job Rule #2 CONSUMEDATA SOURCES Cortana Web/LOB Dashboards On Premise Hot Path Cold Path Archived Data Data Lake Store Simulated Sensors and devices Blobs – Reference Data Event hubs ASA Job Rule #1 Event hubs Real-time Scoring Aggregated Data Data Lake Store CSV Data Data Lake Store Data Lake Analytics Batch Scoring Offline Training Hourly, Daily, Monthly Roll Ups Ingestion Batch PresentationSpeed
  • 75. Summary Understand at a high level all the Microsoft data platform products
  • 76. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck will be posted)