Más contenido relacionado La actualidad más candente (19) Similar a Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cluster - Codemotion Berlin 2018 (20) Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cluster - Codemotion Berlin 20182. © 2018 IBM Corporation
About me - Luciano Resende
2
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
© 2018 IBM Corporation
3. 3
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs
per year
• ~10K IBM
commits per
month
Connect
> 1000
active IBM
Contributors
Working in key OS
projects
2018 / © 2018 IBM Corporation
IBM Open Source Participation
4. 4
IBM Open Source Participation
IBM generated open source innovation
• 137 Code Open (dWO) projects w/1000+ Github projects
• 4 graduates: Node-Red, OpenWhisk, SystemML,
Blockchain fabric to full open governance in the last year
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom
of action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
2018 / © 2018 IBM Corporation
5. Center for Open Source
Data and AI Technologies
CODAIT
codait.org
2018 / © 2018 IBM Corporation
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
5
6. © 2018 IBM Corporation
Interactive
Development with
Jupyter Notebooks
6© 2018 IBM Corporation
7. Jupyter Notebooks
© 2018 IBM Corporation
7
Notebooks are interactive
computational
environments, in which
you can combine code
execution, rich text,
mathematics, plots and
rich media.
8. Jupyter Notebooks
© 2018 IBM Corporation
8
• Notebook UI runs on the browser
• The Notebook Server serves the
’Notebooks’
• Kernels interpret/execute cell
contents
– Are responsible for code execution
– Abstracts different languages
– 1:1 relationship with Notebook
– Runs and consume resources as long as
notebook is running
9. © 2018 IBM Corporation
Analytics and Deep Learning
Workloads
9© 2018 IBM Corporation
10. Analytics Workloads
© 2018 IBM Corporation
10
Large amount of data
Shared across organization in Data Lakes
Multiple workload types
- Data cleansing
- Data Warehouse
- ML and Insights
11. Deep Learning Workloads
© 2018 IBM Corporation
11
Resource Intensive workloads
Requires expensive hardware (GPU, TPU)
Long Running training jobs
- Simple MNIST takes over one hour
WITHOUT a decent GPU
- Other non complex deep learning model
training can easily take over a day WITH
GPUs
14. Analytic and AI Platforms
© 2018 IBM Corporation
14
Large pool of shared computing
resources
• Enterprise Cloud, Public Cloud or Hybrid
• Shared Data (Data Lakes/Object Storage)
Distributed Consumers
• Notebooks running local (users laptop)
or as a service (e.g. Jupyter Hub)
Different Resource Utilization Patterns
• High number of idle resources
15. Limitations of Jupyter Notebook Stack
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
15
8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
• Scalability
• Jupyter Kernels running as local process
• Resources are limited by what is available
on the one single node that runs all Kernels
and associated Spark drivers
• Security
• Single user sharing the same privileges
• Users can see and control each other process
using Jupyter administrative utilities
Kernel
Kernel
Kernel
Kernel
Kernel
16. © 2018 IBM Corporation
Jupyter Enterprise Gateway
16© 2018 IBM Corporation
17. Jupyter Enterprise
Gateway
© 2018 IBM Corporation
Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
17
A lightweight, multi-tenant, scalable
and secure gateway that enables
Jupyter Notebooks to share resources
across an Apache Spark or Kubernetes
cluster for Enterprise/Cloud use cases
Spectrum Conductor
+
18. Jupyter Enterprise Gateway Features
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
18
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels as Spark
applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional Resource Managers
Enhanced Security
– End-to-End secure communications
• Secure socket communications
• Encrypted HTTP communication using SSL
Multiuser support with user impersonation
– Enhance security and sandboxing by enabling user impersonation when
running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
19. © 2018 IBM Corporation
Jupyter Notebooks
and Kubernetes
19© 2018 IBM Corporation
21. Jupyter & Kubernetes
© 2018 IBM Corporation
21
Kubernetes Platform
- Containers provides a flexible way to
deploy applications and are here to stay
- Containers simplify management of
complicated and heterogenous AI/Deep
Learning infratructure
- Kubernetes enables easy management
of containerized applications and
resources with the benefit of Elasticity
and Quality of Services
Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
22. Enterprise Gateway & Kubernetes
© 2018 IBM Corporation
Supported Platforms
FfDL
Before Enterprise Gateway After Enterprise Gateway
Before Jupyter Enterprise Gateway …
• Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
• Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
• Gateway pod very lightweight
• Kernels in their own pod, isolation
• Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
23. Jupyter Enterprise Gateway - Kubernetes
© 2018 IBM Corporation
23
Container images defined in kernelspec
Community image
Kernel
Spark on K8
Kernel
Distributed
File
System
Vanilla Kernels
Spark based kernels
Gateway
nb2kg
nb2kg
24. © 2018 IBM Corporation
24March 30 2018 / © 2018 IBM Corporation
25. March 30 2018 / © 2018 IBM Corporation
25
• Multi-user Enterprise Gateway pod
• Each kernel launched on it’s own pod
• Kernel pod namespace is configurable
Jupyter & Kubernetes
26. © 2018 IBM Corporation
Jupyter Kernels are configured by kernelspecs
• Each kernel has a correspondent kernel spec
• Stored in one of the Jupyter data path
• $ jupyter kernelspec list
Enabling remote kernels
/…/anaconda3/share/jupyter/kernels/python2/kernel.jsom
27. © 2018 IBM Corporation
Process Proxy:
• Abstracts kernel process represented by Jupyter
framework
• Pluggable class definition identified in kernelspec
(kernel.json)
• Manages kernel lifecycle
Kernel Launcher:
• Embeds target kernel
• Listens on gateway communication port
• Conveys interrupt requests (via local signal)
• Could be extended for additional communications
{
"language": "python",
"display_name": "Spark - Python (Kubernetes Mode)",
"process_proxy": {
"class_name":
"enterprise_gateway.services.processproxies.k8s.KubernetesProcessP
roxy",
"config": {
"image_name": "elyra/kubernetes-kernel-py:dev",
"executor_image_name": "elyra/kubernetes-kernel-py:dev”,
"port_range" : "40000..42000"
}
},
"env": {
"SPARK_HOME": "/opt/spark",
"SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST
--deploy-mode cluster --name …",
…
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.
sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
Enabling remote kernels
Process Proxies mixed with Kernel Launchers
28. Jupyter Enterprise Gateway Components
© 2018 IBM Corporation
28
Spectrum Conductor
+
Supported
Runtime
Platforms
J U P Y T E R E N T E R P R I S E G A T E W A Y
Remote
Kernel Manager
Distributed
Process Proxy
YARN Cluster
Process Proxy
Kubernetes
Process Proxy
Conductor Cluster
Process Proxy
J U P Y T E R N O T E B O O K
NB2KG Extension Lab Extension
J U P Y T E R K E R N E L G A T E W A Y
J U P Y T E R N O T E B O O K
FfDL
29. © 2018 IBM Corporation
Jupyter Notebooks
and Deep Learning Platforms
29© 2018 IBM Corporation
30. Deep Learning Platforms
© 2018 IBM Corporation
30
Prohibited costs
- Deep Learning resources are prohibitive in
costs to be locked/idle during interactive
development
Deep Learning Platforms
- We have seen the rise of Deep Learning
platforms that leverage containers and
Kubernetes as the basis of their
infrastructure
- Kubernetes enables Deep Learning
platforms to easily share and restrict
accelerated hardware
Fabric for Deep Learning
IBM Watson Studio
Deep Learning as a service
Batch
oriented
developm
ent
31. Deep Learning Workspace
March 30 2018 / © 2018 IBM Corporation
31
Streamline Data Science user experience when
coming from Notebook/Interactive
development interfaces
• Current process include multiple steps, one
being decomposing the notebook into an
application that needs to be submitted as a
zip to the deep learning runtime which
becomes a show stopper for data scientists
to adopt FfDL and DLaas
32. March 30 2018 / © 2018 IBM Corporation
32
Streamline the Deep Learning application
lifecycle
• Run local notebook experiments, with small data
samples and seamlessly validate experiments on
Deep Learning environments
• IBM Cloud DLaaS, FfDL (open source), KubeFlow
(open source)
Simplify productionalization of Model training
and serving from Notebooks
• Enable running/scheduling notebooks on
production environments as batch jobs
• Results can be made available via updated
notebook, or exported to html, pdf and a few
other formats.
Interactive development
lifecycle done on
commodity hardware
with sampled data
Training on full dataset
gets scheduled as
batch jobs on deep
learning infrastructure
Deep Learning Workspace
33. © 2018 IBM Corporation
33March 30 2018 / © 2018 IBM Corporation
34. March 30 2018 / © 2018 IBM Corporation
34
• User select where to run the
experiment
• Job is packaged and submitted on
behalf of user
• User has access to Job Console to
monitor experiment
Deep Learning Workspace
35. © 2018 IBM Corporation
Thank you!
@lresende1975