以 Kubernetes 部屬 Spark 大數據計算環境

•

0 recomendaciones•913 vistas

inwin stack

20180510 Kubernetes 開源容器技術論壇 2018 楊曜佑（迎棧科技）

Tecnología

Getter May. 10
以 Kubernetes 部屬
Spark 大數據計算環境

Who am I?
● Getter (楊曜佑)
○ inwinstack RD(Ready to Die) engineer
○ OpenStack integration & Operation
○ K8S Beginner

User
We need
a Big Data
solution!!
Okay….

About Big Data Solution
● Famous management tool -- Cloudera
○ Too big
○ Too difficult
○ User does not want it (Most Important)
● Famous container management tool -- K8S
○ Small
○ Simple
○ User want it

Basic Hadoop MapReduce Compoment
● YARN
○ NodeManager
○ ResourceManager
● HDFS
○ NameNode
○ DataNode

Basic Spark Compoment
● Master
● Slave
● Storage

Spark on K8S Architecture
● https://github.com/kubernetes/examples/tree/m
aster/staging/spark
○ spark-master-controller
○ spark-master-service
○ spark-work-controller
○ spark-ui-proxy-controller
○ spark-ui-proxy-service

Spark on K8S Architecture
● Only one master
● Using nodeAffinity to avoid Worker and Master
same node
● Using podAntiAffinity to ensure each node have
only one worker

About storage
● HDFS
● Persistent Volumes
○ iSCSI
○ NFS
○ CephFS
○ RBD
○ Etc...

Environment
● 3 node
● K8S version v1.9.0
○ kubespray
○ calico
● Spark version 2.2.0

Simple performance compare
● https://codait.github.io/spark-bench/ -- SparkPI
○ slices: 10000
■ Spark on K8S
■ Spark standalone
● Spark-example -- WordCount
○ Input file: 3G
■ Spark on K8S with NFS
■ Spark standalone with NFS

How it works
$ bin/spark-submit
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
--deploy-mode cluster
--name spark-pi
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=<spark-image>
local:///path/to/examples.jar

Currently experimental...
● Client mode is not currently supported.
● Future Work
○ PySpark
○ R
○ Dynamic Executor Scaling
○ Local File Dependency Management
○ Spark Application Management
○ Job Queues and Resource Management

www.inwinstack.com
Thank You!
迎棧科技股份有限公司

Más contenido relacionado

La actualidad más candente

Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackNicolas Brousse

A Container Stack for Openstack - OpenStack Silicon ValleyStephen Gordon

How to Integrate Kubernetes in OpenStack Meng-Ze Lee

Deploying openstack using ansibleopenstackindia

Cloud Native User Group: Shift-Left Testing IaC With PaCsmalltown

Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and CephSean Cohen

Intro to Kubernetesmatthewbrahms

Introduction to Docker and Monitoring with InfluxDataInfluxData

Introduction to Container Storage Interface (CSI)Idan Atias

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS

Containers and CloudStackShapeBlue

Kubernetes on the Edge / 在邊緣的K8SYi-Fu Ciou

How Kubernetes make OpenStack & Ceph betterTeK Charnsilp Chinprasert

Google container engine (GKE)Md. Sadhan Sarker

Implementing Progressive Delivery with Your Team (by Leigh Capili)Weaveworks

從Google cloud看kubernetes服務inwin stack

How to manage Kubernetes at scale with just git Weaveworks

Kubernetes User Group: 維運 Kubernetes 的兩三事smalltown

GKE Tip Series - Usage MeteringSreenivas Makam

Kubernetes scheduling and QoSCloud Technology Experts

La actualidad más candente (20)

Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack

A Container Stack for Openstack - OpenStack Silicon Valley

How to Integrate Kubernetes in OpenStack

Deploying openstack using ansible

Cloud Native User Group: Shift-Left Testing IaC With PaC

Protecting the Galaxy - Multi-Region Disaster Recovery with OpenStack and Ceph

Intro to Kubernetes

Introduction to Docker and Monitoring with InfluxData

Introduction to Container Storage Interface (CSI)

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...

Containers and CloudStack

Kubernetes on the Edge / 在邊緣的K8S

How Kubernetes make OpenStack & Ceph better

Google container engine (GKE)

Implementing Progressive Delivery with Your Team (by Leigh Capili)

從Google cloud看kubernetes服務

How to manage Kubernetes at scale with just git

Kubernetes User Group: 維運 Kubernetes 的兩三事

GKE Tip Series - Usage Metering

Kubernetes scheduling and QoS

Similar a 以 Kubernetes 部屬 Spark 大數據計算環境

Scalable Clusters On DemandBogdan Kyryliuk

Benchmarking for postgresql workloads in kubernetesDoKC

MapReduce and HadoopNicola Cadenelli

Databricks: What We Have Learned by Eating Our Dog FoodDatabricks

AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty

When is Myrocks good? 2020 Webinar SeriesAlkin Tezuysal

Aleksejs Nemirovskis - Manage your data using oracle BDAAndrejs Vorobjovs

Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Chris Shenton

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen

Spark Summit EU 2015: Lessons from 300+ production usersDatabricks

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh

shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014Gerd König

Why run Postgres in Kubernetes?DoKC

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks

Migrating to Apache Spark at NetflixDatabricks

Free GitOps Workshop + Intro to Kubernetes & GitOpsWeaveworks

Deep Learning on ARM Platforms - SFO17-509Linaro

Deployment of PostgreSQL inside of Kubernetes with High AvailabilityEDB

nebulaconfPedro Dias

Similar a 以 Kubernetes 部屬 Spark 大數據計算環境 (20)

Scalable Clusters On Demand

Benchmarking for postgresql workloads in kubernetes

MapReduce and Hadoop

Databricks: What We Have Learned by Eating Our Dog Food

AWS Big Data Demystified #1: Big data architecture lessons learned

When is Myrocks good? 2020 Webinar Series

Aleksejs Nemirovskis - Manage your data using oracle BDA

Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft

Spark Summit EU 2015: Lessons from 300+ production users

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

shark attack on sql-on-hadoop Talk at BerlinBuzzwords 2014

Why run Postgres in Kubernetes?

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

Migrating to Apache Spark at Netflix

Free GitOps Workshop + Intro to Kubernetes & GitOps

Deep Learning on ARM Platforms - SFO17-509

Deployment of PostgreSQL inside of Kubernetes with High Availability

nebulaconf

Más de inwin stack

Migrating to Cloud Native Solutionsinwin stack

Cloud Native 下的應用網路設計inwin stack

當電子發票遇見 Google Cloud Functioninwin stack

運用高效、敏捷全新平台極速落實雲原生開發inwin stack

The last mile of digital transformation AI大眾化：數位轉型的最後一哩inwin stack

整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案inwin stack

An Open, Open source way to enable your Cloud Native Journeyinwin stack

維運Kubernetes的兩三事inwin stack

Serverless framework on kubernetesinwin stack

Train.IO 【第六期－OpenStack 二三事】inwin stack

Setup Hybrid Clusters Using Kubernetes Federationinwin stack

基於 K8S 開發的 FaaS 專案 - riffinwin stack

使用 Prometheus 監控 Kubernetes Cluster inwin stack

Extend the Kubernetes API with CRD and Custom API Serverinwin stack

Distributed tensorflow on kubernetesinwin stack

Build your own kubernetes apiserver and resource typeinwin stack

Virtualization inside kubernetesinwin stack

利用K8S實現高可靠應用inwin stack

Build the Blockchain as service (BaaS) Using Ethereum on Kubernetesinwin stack

How to integrate Kubernetes in OpenStack: You need to know these projectinwin stack

Más de inwin stack (20)

Migrating to Cloud Native Solutions

Cloud Native 下的應用網路設計

當電子發票遇見 Google Cloud Function

運用高效、敏捷全新平台極速落實雲原生開發

The last mile of digital transformation AI大眾化：數位轉型的最後一哩

整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案

An Open, Open source way to enable your Cloud Native Journey

維運Kubernetes的兩三事

Serverless framework on kubernetes

Train.IO 【第六期－OpenStack 二三事】

Setup Hybrid Clusters Using Kubernetes Federation

基於 K8S 開發的 FaaS 專案 - riff

使用 Prometheus 監控 Kubernetes Cluster

Extend the Kubernetes API with CRD and Custom API Server

Distributed tensorflow on kubernetes

Build your own kubernetes apiserver and resource type

Virtualization inside kubernetes

利用K8S實現高可靠應用

Build the Blockchain as service (BaaS) Using Ethereum on Kubernetes

How to integrate Kubernetes in OpenStack: You need to know these project

Último

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

A Journey Into the Emotions of Software DevelopersNicole Novielli

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

From Family Reminiscence to Scholarly Archive .Alan Dix

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

What is Artificial Intelligence?????????blackmambaettijean

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Sample pptx for embedding into website for demoHarshalMandlekar2

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

How to write a Business Continuity PlanDatabarracks

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

以 Kubernetes 部屬 Spark 大數據計算環境

1. Getter May. 10 以 Kubernetes 部屬 Spark 大數據計算環境

2. Who am I? ● Getter (楊曜佑) ○ inwinstack RD(Ready to Die) engineer ○ OpenStack integration & Operation ○ K8S Beginner

3. Why use K8S?

4. User We need a Big Data solution!! Okay….

5. About Big Data Solution ● Famous management tool -- Cloudera ○ Too big ○ Too difficult ○ User does not want it (Most Important) ● Famous container management tool -- K8S ○ Small ○ Simple ○ User want it

6. Why use Spark?

7. Basic Hadoop MapReduce Compoment ● YARN ○ NodeManager ○ ResourceManager ● HDFS ○ NameNode ○ DataNode

8. Basic Spark Compoment ● Master ● Slave ● Storage

9. Spark on K8S Architecture

10. Spark on K8S Architecture ● https://github.com/kubernetes/examples/tree/m aster/staging/spark ○ spark-master-controller ○ spark-master-service ○ spark-work-controller ○ spark-ui-proxy-controller ○ spark-ui-proxy-service

11. Spark on K8S Architecture

12. Spark on K8S Architecture ● Only one master ● Using nodeAffinity to avoid Worker and Master same node ● Using podAntiAffinity to ensure each node have only one worker

13. About storage ● HDFS ● Persistent Volumes ○ iSCSI ○ NFS ○ CephFS ○ RBD ○ Etc...

14. Environment ● 3 node ● K8S version v1.9.0 ○ kubespray ○ calico ● Spark version 2.2.0

15. Simple performance compare ● https://codait.github.io/spark-bench/ -- SparkPI ○ slices: 10000 ■ Spark on K8S ■ Spark standalone ● Spark-example -- WordCount ○ Input file: 3G ■ Spark on K8S with NFS ■ Spark standalone with NFS

16. Offical support spark 2.3.0 on K8S

17. How it works

18. How it works $ bin/spark-submit --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image=<spark-image> local:///path/to/examples.jar

19. Currently experimental... ● Client mode is not currently supported. ● Future Work ○ PySpark ○ R ○ Dynamic Executor Scaling ○ Local File Dependency Management ○ Spark Application Management ○ Job Queues and Resource Management

20. www.inwinstack.com Thank You! 迎棧科技股份有限公司

以 Kubernetes 部屬 Spark 大數據計算環境

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a 以 Kubernetes 部屬 Spark 大數據計算環境

Similar a 以 Kubernetes 部屬 Spark 大數據計算環境 (20)

Más de inwin stack

Más de inwin stack (20)

Último

Último (20)

以 Kubernetes 部屬 Spark 大數據計算環境