SlideShare una empresa de Scribd logo
1 de 80
Descargar para leer sin conexión
Scientist meets web dev:
how Python became the language of data
Ga¨el Varoquaux
Scientist meets web dev:
how Python became the language of data
Ga¨el Varoquaux
Very diverse community
This talk: a reflection on what
we have in common, Python
I am talking about
things you don’t understand (my science)
and things I don’t understand (web dev)
I actually did a PhD
in quantum physics
Hence I think I qualify as
a “scientist”
G Varoquaux 3
I now do computer science for neuroscience
Try to link neural activity to thoughts and cognition
G Varoquaux 4
I now do computer science for neuroscience
Try to link neural activity to thoughts and cognition
We attack it as a machine learning problem
Python software: nilearn
G Varoquaux 5
On the way, we created
a machine-learning library:
scikit-learn
G Varoquaux 6
Data science with Python is hot
Huge success.
Cool.
Data science is THE thing.
G Varoquaux 7
Data science with Python is hot
Huge success.
Cool.
Data science is THE thing.
Python is the go-to language
How did it happen?
We built scikit-learn, others pandas, etc...,
but these were built on solid foundations
G Varoquaux 7
1 Scientists come from Jupiter
And web devs from Saturn?
And sysadmins from Neptune?
G Varoquaux 8
1 We’re different
numbers (in arrays)
arrays (of numbers)
arrays
arrays
strings
databases
object-oriented
programming
flow control
A bit of a culture gap
G Varoquaux 9
1 Let’s do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you don’t need glasses
Let’s find some common topics with data science
G Varoquaux 10
1 Let’s do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you don’t need glasses
Let’s find some common topics with data science
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
G Varoquaux 10
1 Let’s do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you don’t need glasses
Let’s find some common topics with data science
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
import urllib2, bs4
import sklearn,
wordcloud
G Varoquaux 10
1 Let’s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
G Varoquaux 11
1 Let’s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
profiling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
G Varoquaux 11
1 Let’s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
profiling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
1321
540
208
964
123
7
6
191
1450
All
docs
G Varoquaux 11
1 Let’s do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
profiling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
1321
540
208
964
123
7
6
191
1450
All
docs
.015
.018
.019
.014
.023
.286
.167
.047
.012
Ratio
TF-IDF in scikit-learn
sklearn.feature extraction.text.TfidfVectorizer
G Varoquaux 11
1 Let’s do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
Term-document matrix
G Varoquaux 12
1 Let’s do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
Term-document matrix
3 78 9 7 79 7
79 7527 578
94 71 6
797
97
8 7
1
4 4
9
5 2 5 8
Can be a sparse matrix
G Varoquaux 12
1 Let’s do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
→
03078090707907
00790752700578
94071006000797
topics
the
Python
performance
profiling
module
is
code
can
a
030
007
940
009
100
000
documents
topics
+
What terms
are in a topics
What documents
are in a topics
A matrix factorization
Often with non-negative constraints
sklearn.decompositions.NMF
G Varoquaux 13
1 Let’s do something together: sort EuroPython site
EuroPyton abstracts
Topic 1
G Varoquaux 14
1 Let’s do something together: sort EuroPython site
EuroPyton abstracts
Topic 2
G Varoquaux 14
1 Let’s do something together: sort EuroPython site
EuroPyton abstracts
Topic 3
G Varoquaux 14
1 Let’s do something together: sort EuroPython site
EuroPyton abstracts
G Varoquaux 14
1 Let’s do something together: sort EuroPython site
EuroPyton abstracts
Add one of Python’s great templating engine
... get a usable website
https://gaelvaroquaux.github.io/my_topics/ep16
G Varoquaux 14
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
G Varoquaux 15
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
G Varoquaux 15
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
C:> pip install numpy
...
error: Unable to find vcvarsall.bat
G Varoquaux 15
Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
C:> pip install numpy
...
error: Unable to find vcvarsall.bat
G Varoquaux 15
1 We’re different
Well
fast linear algebra
ATLAS (Fortran) 70x faster
libfortran.so.3 ??
you’re kidding me
G Varoquaux 16
1 We’re different
Well
fast linear algebra
ATLAS (Fortran) 70x faster
libfortran.so.3 ??
you’re kidding me
Packaging is a major roadblock for scientific Python
A lot of compiled code + shared libraries
⇒ library + ABI compatibility issues
Progress:
- Manylinux wheels: PEP 513, RT. McGibbon, NJ. Smith
rely on a conservative core set of libs
- Openblas: pure-C, fast linear algebra
G Varoquaux 16
1 We’re different
But working together gives us awesome things
Text mining ⇒ intelligent interfaces
G Varoquaux 17
2 The scientist’s view of code
Numerics versus control flow
Numerics versus databases
Numerics versus strings
Numerics versus the world
G Varoquaux 18
2 Why we love numpy
100 000 term frequency vs inverse doc frequency:
In [*]: %timeit [t * i for t, i in izip(tf, idf)]
100 loops, best of 3: 6.2 ms per loop
The numpy style:
In [*]: %timeit tf * idf
1000 loops, best of 3: 74.2 µs per loop
102
104
106
number of elements
1µs
100ns
10ns
1ns
timeperelement
lists
numpy
G Varoquaux 19
2 Why we love numpy
100 000 term frequency vs inverse doc frequency:
In [*]: %timeit [t * i for t, i in izip(tf, idf)]
100 loops, best of 3: 6.2 ms per loop
The numpy style:
In [*]: %timeit tf * idf
1000 loops, best of 3: 74.2 µs per loop
102
104
106
number of elements
1µs
100ns
10ns
1ns
timeperelement
lists
numpy
Array computing can be more readable
tf * idf
vs
[t * i for t, i in izip(tf, idf)]
G Varoquaux 19
2 arrays are nothing but pointers
A numpy array =
memory address data type shape strides
03878794797927
01790752701578
94071746124797
54970718717887
13653490495190
74754265358098
48721546349084
90345673245614
78957187745620
stride 2
stride 1
shape 1
shape 2
Represents any regular
data in a structured way:
how to access elements
via pointer arythmetics
(computing offsets)
stride 2
stride 1
shape 1
shape 2
03878794797927 01790752701578 ...
G Varoquaux 20
2 arrays are nothing but pointers
A numpy array =
memory address data type shape strides
03878794797927
01790752701578
94071746124797
54970718717887
13653490495190
74754265358098
48721546349084
90345673245614
78957187745620
stride 2
stride 1
shape 1
shape 2
Represents any regular
data in a structured way:
how to access elements
via pointer arythmetics
(computing offsets)
stride 2
stride 1
shape 1
shape 2
03878794797927 01790752701578 ...
Matches the memory model of numerical libraries
⇒ Enables copyless interactions
Numpy is really a memory model
G Varoquaux 20
2 Array computing is fast
102
104
106
number of elements
1µs
100ns
10ns
1ns
timeperelement
lists
numpy
tf idf = tf * idf CPU
03878794797927
01790752701578
*tf_idf=No type checking
Direct sequential memory access
Vector operations (SIMD)
G Varoquaux 21
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
tf idf = tf * idf CPU
03878794797927
01790752701578
*tf_idf=
2x slowdown passed a certain size
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
105
∼ size of
the CPU cache
Memory is much slower than CPU
tf idf = tf * idf
03878794797927
01790752701578
*tf_idf=
0387879
017
cpu
cache
2x slowdown passed a certain size
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
Memory is much slower than CPU
tf idf = tf * idf - 1
It gets worse for complex expressions
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
Memory is much slower than CPU
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
G Varoquaux 22
2 Array computing is limited by CPU starvation
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
In [*]: %timeit tf * idf
10000 loops, best of 3: 74.2 µs per loop
In [*]: %timeit tf * idf - 1
1000 loops, best of 3: 418 µs per loop
G Varoquaux 22
2 Array computing is limited by CPU starvation
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
In [*]: %timeit tf * idf
10000 loops, best of 3: 74.2 µs per loop
In [*]: %timeit tf * idf - 1
1000 loops, best of 3: 418 µs per loop
In-place operations: reuse the allocation
In [*]: %timeit tmp = tf * idf; tmp -= 1
10000 loops, best of 3: 112 µs per loop
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
tmp = tf * idf
tmp -= 1
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
tmp = tf * idf
tmp -= 1
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
tf idf = tf * idf - 1
tf idf = tf * idf
tf idf -= 1
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
numexpr
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
• Removing/reusing temporaries
• Operating on “chunks” that fit in cache
Addressed by numexpr, with string expressions
numexpr.evaluate(’tf * idf - 1’, locals())
G Varoquaux 22
2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
numexpr
tf idf = tf * idf - 1
What’s going on:
1. tmp ← tf * idf
2. tf idf ← tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
• Removing/reusing temporaries
• Operating on “chunks” that fit in cache
Addressed by numexpr, with string expressions
Addressed by numba, with bytecode inspection
lazyarray
Similar problem to pagination with SQL queries
G Varoquaux 22
2 Array computing is limited by CPU starvation
tf idf = tf * idf
Too small:
overhead
Too BIG:
Out of cache
BIG Data
$$$
G Varoquaux 23
2 Numerics versus control flow
What if there is an if
tf idf = tf / idf
tf idf[idf == 0] = 0
Suppose the we are looking at ages in a population:
ages[gender == ’male’].mean()
- ages[gender == ’female’].mean()
G Varoquaux 24
2 Numerics versus control flow
What if there is an if
tf idf = tf / idf
tf idf[idf == 0] = 0
Suppose the we are looking at ages in a population:
ages[gender == ’male’].mean()
- ages[gender == ’female’].mean()
This is really starting to be looking like databases
pandas: something in between arrays and
an in-memory database
Great for queries, less great for numerics.
G Varoquaux 24
Installation
PROBLEMS
Beautiful
Python
COde
Routines
in Fortran
or C++
ScalaBILiTY
G Varoquaux 25
Installation
PROBLEMS
Beautiful
Python
COde
Routines
in Fortran
or C++
ScalaBILiTY
DeploymentPROBLEMS
Beautiful
Python
COde
DATABASE
in C++, JAVA,
ERLANG...
ScalaBILiTY
Numpy is the scientist’s
equivalent to an ORM
Gives speed
with non-Python code
G Varoquaux 25
numerics vs databases
numerics efficient on regularly spaced data
But numpy creates cache misses for big arrays
⇒ Need to remove temporaries and chunk data
G Varoquaux 26
numerics vs databases
numerics efficient on regularly spaced data
But numpy creates cache misses for big arrays
⇒ Need to remove temporaries and chunk data
selection and grouping efficient with indexes or trees
⇒ Need to group queries
Compilation is unpythonic
G Varoquaux 26
numerics vs databases
numerics efficient on regularly spaced data
But numpy creates cache misses for big arrays
⇒ Need to remove temporaries and chunk data
selection and grouping efficient with indexes or trees
⇒ Need to group queries
Compilation is unpythonic
A computation & query language? numexpr
I hate domain-specific languages (SQL)
Numpy is very expressive
G Varoquaux 26
numerics vs databases
numerics efficient on regularly spaced data
But numpy creates cache misses for big arrays
⇒ Need to remove temporaries and chunk data
selection and grouping efficient with indexes or trees
⇒ Need to group queries
Compilation is unpythonic
A computation & query language? numexpr
I hate domain-specific languages (SQL)
Numpy is very expressive
PonyORM: Compiling Python to optimized SQL
Datascience with SQL: Ibis & Blaze
G Varoquaux 26
numerics vs databases
numerics efficient on regularly spaced data
But numpy creates cache misses for big arrays
⇒ Need to remove temporaries and chunk data
selection and grouping efficient with indexes or trees
⇒ Need to group queries
Spark: java-world “big data” rising star
combines distributed store
+ computing model
We (scikit-learn) are faster when data fits in RAM
G Varoquaux 26
Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
G Varoquaux 27
Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
03878794797927
03878794797927
ETL (extract, transform, & load) Multivariate statistics
G Varoquaux 27
Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
03878794797927
03878794797927
ETL (extract, transform, & load) Multivariate statistics
Out-of-core opera-
tions not efficient:
no data locality
On-line aglorithms (streaming)
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
eg stochastic gradient descent
As in deep learning
G Varoquaux 27
Making the data-science magic happens
from sklearn import
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
03078090707907
00790752700578
94071006000797
topics
the
Python
performance
profiling
module
is
code
can
a
030
007
940
009
100
000
documents
topics
+
What terms
are in a topics
What documents
are in a topics
G Varoquaux 28
Making the data-science magic happens
from sklearn import
Turning applied maths papers to robust code
High-level, readable, simple syntax reduces cognitive load
Thanks
G Varoquaux 28
3 Beyond numerics
Make #PyData great (again)
G Varoquaux 29
3 Data/computation flow is crucial
03878794797927
03878794797927
03878794797927
03878794797927
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
Data-flow engines are everywhere
dask pure-Python dynamic scheduler
static compiler parallel & distributed
theano expression analysis pure-Python
tensorflow C library distributed
G Varoquaux 30
3 Data/computation flow is crucial
03878794797927
03878794797927
03878794797927
03878794797927
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
Data-flow engines are everywhere
Python should shine there:
reflexivity + metaprogramming + async
“Python is the best numerical language out there
because it’s not a numerical language.” – Nathaniel Smith
API challenging:
For algorithm design: no framework / inversion of control
G Varoquaux 30
3 Ingredients for future data flows
Distributed computation & Run-time analysis
Reflexivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
G Varoquaux 31
3 Ingredients for future data flows
Distributed computation & Run-time analysis
Reflexivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
Pickle
distribute code and data without data model
serialize intermediate results
deep of hash of any data structure joblib.hash
Very limited (eg no lambda #19272)
⇒ variants: dill, cloudpickle
G Varoquaux 31
3 Ingredients for future data flows
Distributed computation & Run-time analysis
Reflexivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
Pickle
distribute code and data without data model
serialize intermediate results
deep of hash of any data structure joblib.hash
joblib:
Simple parallel syntax:
Parallel(n jobs=2)(delayed(sqrt)(i) for i in range(10))
Fast persistence:
joblib.dump(anything, ’filename.pkl.gz’)
Primitive for out of core:
pointer = mem.cache(f).call and shelves(big data)
•Non-invasive syntax / paradigm
•Fast on big numpy arrays
•Soon backend system (job broker and persistence)
Gets job managment into algorithms (eg in scikit-learn)
G Varoquaux 31
3 The Python VM is great
The simplicity of the VM is our strength
Software Transactional Memory... would be nice
But, I want to use foreign memory
Java gained jmalloc for foreign memory
Better garbage collection
Yes but, I easily plug into reference counting
A strength of Python is its clear C API
⇒ Easy foreign functionality
G Varoquaux 32
3 The Python VM is great
The simplicity of the VM is our strength
Software Transactional Memory... would be nice
But, I want to use foreign memory
Java gained jmalloc for foreign memory
Better garbage collection
Yes but, I easily plug into reference counting
A strength of Python is its clear C API
⇒ Easy foreign functionality
Cython: the best of C and Python
Add types for speed (numpy arrays as float*)
Call C to bind external libraries: surprisingly easy
no pointer arithmetics
An adaptation layer between Python VM and C
G Varoquaux 32
4 Working together
G Varoquaux 33
4 Scikit-learn is easy machine learning
As easy as py
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n )
Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
People love the encapsulation
classifier is a semi black box
The power of a simple object-oriented API
Documentation-driven development
G Varoquaux 34
4 Scikit-learn is easy machine learning
As easy as py
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n )
Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
People love the encapsulation
classifier is a semi black box
The power of a simple object-oriented API
Documentation-driven development
High-level, readable, simple API reduces cognitive load
PyData loves Python in return
G Varoquaux 34
4 Difference is richness
We all do different things
We can all benefit from others
though we don’t know how
G Varoquaux 35
4 Difference is richness, but requires outreach
We all do different things
We can all benefit from others
though we don’t know how
Being didactic outside one’s community is crucial
Avoiding jargon take that machine learning
Prioritizing information
“Simple is better than complex”
Students learning numerics don’t care about unicode
Build documentation upon very simple examples
Think stackoverflow
Sphinx + Sphinx-gallery
G Varoquaux 35
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Connects to other paradigms, eg C
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reflexivity
⇒ meta-programming and debugging
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reflexivity
⇒ meta-programming and debugging
Needs for compilation and dynamism:
a difficult balance
PEP 509: guards on run-time modification
PEP 510: function specicalization
@GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reflexivity
⇒ meta-programming and debugging
Needs for compilation and dynamism
Pydata will use DB and concurrency from web
PyData can give knowledge engineering + AI

Más contenido relacionado

La actualidad más candente

Python Programming - XIII. GUI Programming
Python Programming - XIII. GUI ProgrammingPython Programming - XIII. GUI Programming
Python Programming - XIII. GUI ProgrammingRanel Padon
 
SWIG Hello World
SWIG Hello WorldSWIG Hello World
SWIG Hello Worlde8xu
 
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Edureka!
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming KrishnaMildain
 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Edureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET DeveloperSarah Dutkiewicz
 
What is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | EdurekaWhat is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | EdurekaEdureka!
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksEueung Mulyana
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.Marcel Caraciolo
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Final presentation on python
Final presentation on pythonFinal presentation on python
Final presentation on pythonRaginiJain21
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...Edureka!
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleMatthias Bussonnier
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...ISSEL
 

La actualidad más candente (20)

Python Programming - XIII. GUI Programming
Python Programming - XIII. GUI ProgrammingPython Programming - XIII. GUI Programming
Python Programming - XIII. GUI Programming
 
SWIG Hello World
SWIG Hello WorldSWIG Hello World
SWIG Hello World
 
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...
 
Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming Python programming | Fundamentals of Python programming
Python programming | Fundamentals of Python programming
 
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
 
What is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | EdurekaWhat is Range Function? | Range in Python Explained | Edureka
What is Range Function? | Range in Python Explained | Edureka
 
Introduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter NotebooksIntroduction to IPython & Jupyter Notebooks
Introduction to IPython & Jupyter Notebooks
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Building custom kernels for IPython
Building custom kernels for IPythonBuilding custom kernels for IPython
Building custom kernels for IPython
 
Final presentation on python
Final presentation on pythonFinal presentation on python
Final presentation on python
 
Python lec1
Python lec1Python lec1
Python lec1
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
C pythontalk
C pythontalkC pythontalk
C pythontalk
 

Destacado

Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data scienceGael Varoquaux
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Gael Varoquaux
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareGael Varoquaux
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonGael Varoquaux
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetGael Varoquaux
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Gael Varoquaux
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionGael Varoquaux
 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisGael Varoquaux
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsGael Varoquaux
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsityGael Varoquaux
 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific SoftwareGael Varoquaux
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 

Destacado (20)

Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_software
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific Software
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Mediation analysis
Mediation analysisMediation analysis
Mediation analysis
 

Similar a Scientist and Web Dev Meet: How Python Became a Data Science Language

A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageEdureka!
 
Py4 inf 01-intro
Py4 inf 01-introPy4 inf 01-intro
Py4 inf 01-introIshaq Ali
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python courseEran Shlomo
 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptxKaviya452563
 
Python Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech PunePython Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech PuneEthan's Tech
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonSpotle.ai
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...Edureka!
 
5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptxAttitude Tally Academy
 
Seminar report on python 3 course
Seminar report on python 3 courseSeminar report on python 3 course
Seminar report on python 3 courseHimanshuPanwar38
 
Django Article V0
Django Article V0Django Article V0
Django Article V0Udi Bauman
 
BarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian MulvanyBarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian MulvanyIan Mulvany
 
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdfTraining report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdfYadavHarshKr
 
pycon-2015-liza-daly
pycon-2015-liza-dalypycon-2015-liza-daly
pycon-2015-liza-dalyLiza Daly
 
Python Full Stack Training in Noida.pptx
Python Full Stack Training in  Noida.pptxPython Full Stack Training in  Noida.pptx
Python Full Stack Training in Noida.pptxashishthakur730937
 

Similar a Scientist and Web Dev Meet: How Python Became a Data Science Language (20)

A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
Py4 inf 01-intro
Py4 inf 01-introPy4 inf 01-intro
Py4 inf 01-intro
 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptx
 
Python Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech PunePython Training in Pune - Ethans Tech Pune
Python Training in Pune - Ethans Tech Pune
 
Python
Python Python
Python
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
Python Projects For Beginners | Python Projects Examples | Python Tutorial | ...
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx5 Effective Tips to Learn Python Fast.pptx
5 Effective Tips to Learn Python Fast.pptx
 
Seminar report on python 3 course
Seminar report on python 3 courseSeminar report on python 3 course
Seminar report on python 3 course
 
Django Article V0
Django Article V0Django Article V0
Django Article V0
 
BarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian MulvanyBarCamb Connotea by Ian Mulvany
BarCamb Connotea by Ian Mulvany
 
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdfTraining report 1923-b.e-eee-batchno--intern-54 (1).pdf
Training report 1923-b.e-eee-batchno--intern-54 (1).pdf
 
pycon-2015-liza-daly
pycon-2015-liza-dalypycon-2015-liza-daly
pycon-2015-liza-daly
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
 
Python Full Stack Training in Noida.pptx
Python Full Stack Training in  Noida.pptxPython Full Stack Training in  Noida.pptx
Python Full Stack Training in Noida.pptx
 

Más de Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible scienceGael Varoquaux
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovationGael Varoquaux
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
 

Más de Gael Varoquaux (17)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Scientist and Web Dev Meet: How Python Became a Data Science Language

  • 1. Scientist meets web dev: how Python became the language of data Ga¨el Varoquaux
  • 2. Scientist meets web dev: how Python became the language of data Ga¨el Varoquaux Very diverse community This talk: a reflection on what we have in common, Python I am talking about things you don’t understand (my science) and things I don’t understand (web dev)
  • 3. I actually did a PhD in quantum physics Hence I think I qualify as a “scientist” G Varoquaux 3
  • 4. I now do computer science for neuroscience Try to link neural activity to thoughts and cognition G Varoquaux 4
  • 5. I now do computer science for neuroscience Try to link neural activity to thoughts and cognition We attack it as a machine learning problem Python software: nilearn G Varoquaux 5
  • 6. On the way, we created a machine-learning library: scikit-learn G Varoquaux 6
  • 7. Data science with Python is hot Huge success. Cool. Data science is THE thing. G Varoquaux 7
  • 8. Data science with Python is hot Huge success. Cool. Data science is THE thing. Python is the go-to language How did it happen? We built scikit-learn, others pandas, etc..., but these were built on solid foundations G Varoquaux 7
  • 9. 1 Scientists come from Jupiter And web devs from Saturn? And sysadmins from Neptune? G Varoquaux 8
  • 10. 1 We’re different numbers (in arrays) arrays (of numbers) arrays arrays strings databases object-oriented programming flow control A bit of a culture gap G Varoquaux 9
  • 11. 1 Let’s do something together: sort EuroPython site 205 talks: How OpenStack makes Python better (and vice-versa) Introduction to aiohttp So you think your Python startup is worth $10 million... SQLAlchemy as the backbone of a Data Science company Learn Python The Fun Way Scaling Microservices with Crossbar.io If you can read this you don’t need glasses Let’s find some common topics with data science G Varoquaux 10
  • 12. 1 Let’s do something together: sort EuroPython site 205 talks: How OpenStack makes Python better (and vice-versa) Introduction to aiohttp So you think your Python startup is worth $10 million... SQLAlchemy as the backbone of a Data Science company Learn Python The Fun Way Scaling Microservices with Crossbar.io If you can read this you don’t need glasses Let’s find some common topics with data science Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions G Varoquaux 10
  • 13. 1 Let’s do something together: sort EuroPython site 205 talks: How OpenStack makes Python better (and vice-versa) Introduction to aiohttp So you think your Python startup is worth $10 million... SQLAlchemy as the backbone of a Data Science company Learn Python The Fun Way Scaling Microservices with Crossbar.io If you can read this you don’t need glasses Let’s find some common topics with data science Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions import urllib2, bs4 import sklearn, wordcloud G Varoquaux 10
  • 14. 1 Let’s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree G Varoquaux 11
  • 15. 1 Let’s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree Vectorize Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions a can code is module profiling performance Python the 20 10 4 14 3 2 1 9 18 Term Freq G Varoquaux 11
  • 16. 1 Let’s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree Vectorize Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions a can code is module profiling performance Python the 20 10 4 14 3 2 1 9 18 Term Freq 1321 540 208 964 123 7 6 191 1450 All docs G Varoquaux 11
  • 17. 1 Let’s do something together: sort EuroPython site Crawl the schedule to get a list of titles and URLs talk pages to retrieve abstract and tags bs4: beautiful soup, matchings on the DOM tree Vectorize Anyone who has used Python to search text for substring patterns has at least heard of the regular expression module. Many of us use it extensively for parsers and lexers, The py.test tool presents a rapid and simple way to write tests for your Python code. This training gives a quick introduction with exercises into some distinguishing features.Chat with the core developers about how to extend django CMS or how to integrate your own apps seamlessly. Lets talk about your plugins, apphooks, toolbar extensions a can code is module profiling performance Python the 20 10 4 14 3 2 1 9 18 Term Freq 1321 540 208 964 123 7 6 191 1450 All docs .015 .018 .019 .014 .023 .286 .167 .047 .012 Ratio TF-IDF in scikit-learn sklearn.feature extraction.text.TfidfVectorizer G Varoquaux 11
  • 18. 1 Let’s do something together: sort EuroPython site 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a Term-document matrix G Varoquaux 12
  • 19. 1 Let’s do something together: sort EuroPython site 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a Term-document matrix 3 78 9 7 79 7 79 7527 578 94 71 6 797 97 8 7 1 4 4 9 5 2 5 8 Can be a sparse matrix G Varoquaux 12
  • 20. 1 Let’s do something together: sort EuroPython site 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a → 03078090707907 00790752700578 94071006000797 topics the Python performance profiling module is code can a 030 007 940 009 100 000 documents topics + What terms are in a topics What documents are in a topics A matrix factorization Often with non-negative constraints sklearn.decompositions.NMF G Varoquaux 13
  • 21. 1 Let’s do something together: sort EuroPython site EuroPyton abstracts Topic 1 G Varoquaux 14
  • 22. 1 Let’s do something together: sort EuroPython site EuroPyton abstracts Topic 2 G Varoquaux 14
  • 23. 1 Let’s do something together: sort EuroPython site EuroPyton abstracts Topic 3 G Varoquaux 14
  • 24. 1 Let’s do something together: sort EuroPython site EuroPyton abstracts G Varoquaux 14
  • 25. 1 Let’s do something together: sort EuroPython site EuroPyton abstracts Add one of Python’s great templating engine ... get a usable website https://gaelvaroquaux.github.io/my_topics/ep16 G Varoquaux 14
  • 26. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 G Varoquaux 15
  • 27. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 G Varoquaux 15
  • 28. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 C:> pip install numpy ... error: Unable to find vcvarsall.bat G Varoquaux 15
  • 29. Want to try it? $ pip install scikit-learn ... ImportError: Numerical Python (NumPy) is not installed. scikit-learn requires NumPy >= 1.6.1 C:> pip install numpy ... error: Unable to find vcvarsall.bat G Varoquaux 15
  • 30. 1 We’re different Well fast linear algebra ATLAS (Fortran) 70x faster libfortran.so.3 ?? you’re kidding me G Varoquaux 16
  • 31. 1 We’re different Well fast linear algebra ATLAS (Fortran) 70x faster libfortran.so.3 ?? you’re kidding me Packaging is a major roadblock for scientific Python A lot of compiled code + shared libraries ⇒ library + ABI compatibility issues Progress: - Manylinux wheels: PEP 513, RT. McGibbon, NJ. Smith rely on a conservative core set of libs - Openblas: pure-C, fast linear algebra G Varoquaux 16
  • 32. 1 We’re different But working together gives us awesome things Text mining ⇒ intelligent interfaces G Varoquaux 17
  • 33. 2 The scientist’s view of code Numerics versus control flow Numerics versus databases Numerics versus strings Numerics versus the world G Varoquaux 18
  • 34. 2 Why we love numpy 100 000 term frequency vs inverse doc frequency: In [*]: %timeit [t * i for t, i in izip(tf, idf)] 100 loops, best of 3: 6.2 ms per loop The numpy style: In [*]: %timeit tf * idf 1000 loops, best of 3: 74.2 µs per loop 102 104 106 number of elements 1µs 100ns 10ns 1ns timeperelement lists numpy G Varoquaux 19
  • 35. 2 Why we love numpy 100 000 term frequency vs inverse doc frequency: In [*]: %timeit [t * i for t, i in izip(tf, idf)] 100 loops, best of 3: 6.2 ms per loop The numpy style: In [*]: %timeit tf * idf 1000 loops, best of 3: 74.2 µs per loop 102 104 106 number of elements 1µs 100ns 10ns 1ns timeperelement lists numpy Array computing can be more readable tf * idf vs [t * i for t, i in izip(tf, idf)] G Varoquaux 19
  • 36. 2 arrays are nothing but pointers A numpy array = memory address data type shape strides 03878794797927 01790752701578 94071746124797 54970718717887 13653490495190 74754265358098 48721546349084 90345673245614 78957187745620 stride 2 stride 1 shape 1 shape 2 Represents any regular data in a structured way: how to access elements via pointer arythmetics (computing offsets) stride 2 stride 1 shape 1 shape 2 03878794797927 01790752701578 ... G Varoquaux 20
  • 37. 2 arrays are nothing but pointers A numpy array = memory address data type shape strides 03878794797927 01790752701578 94071746124797 54970718717887 13653490495190 74754265358098 48721546349084 90345673245614 78957187745620 stride 2 stride 1 shape 1 shape 2 Represents any regular data in a structured way: how to access elements via pointer arythmetics (computing offsets) stride 2 stride 1 shape 1 shape 2 03878794797927 01790752701578 ... Matches the memory model of numerical libraries ⇒ Enables copyless interactions Numpy is really a memory model G Varoquaux 20
  • 38. 2 Array computing is fast 102 104 106 number of elements 1µs 100ns 10ns 1ns timeperelement lists numpy tf idf = tf * idf CPU 03878794797927 01790752701578 *tf_idf=No type checking Direct sequential memory access Vector operations (SIMD) G Varoquaux 21
  • 39. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement tf idf = tf * idf CPU 03878794797927 01790752701578 *tf_idf= 2x slowdown passed a certain size G Varoquaux 22
  • 40. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement 105 ∼ size of the CPU cache Memory is much slower than CPU tf idf = tf * idf 03878794797927 01790752701578 *tf_idf= 0387879 017 cpu cache 2x slowdown passed a certain size G Varoquaux 22
  • 41. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement Memory is much slower than CPU tf idf = tf * idf - 1 It gets worse for complex expressions G Varoquaux 22
  • 42. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement Memory is much slower than CPU tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache G Varoquaux 22
  • 43. 2 Array computing is limited by CPU starvation tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache In [*]: %timeit tf * idf 10000 loops, best of 3: 74.2 µs per loop In [*]: %timeit tf * idf - 1 1000 loops, best of 3: 418 µs per loop G Varoquaux 22
  • 44. 2 Array computing is limited by CPU starvation tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache In [*]: %timeit tf * idf 10000 loops, best of 3: 74.2 µs per loop In [*]: %timeit tf * idf - 1 1000 loops, best of 3: 418 µs per loop In-place operations: reuse the allocation In [*]: %timeit tmp = tf * idf; tmp -= 1 10000 loops, best of 3: 112 µs per loop G Varoquaux 22
  • 45. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace tmp = tf * idf tmp -= 1 tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache G Varoquaux 22
  • 46. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace tmp = tf * idf tmp -= 1 tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache A compilation problem: tf idf = tf * idf - 1 tf idf = tf * idf tf idf -= 1 G Varoquaux 22
  • 47. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace numexpr tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache A compilation problem: • Removing/reusing temporaries • Operating on “chunks” that fit in cache Addressed by numexpr, with string expressions numexpr.evaluate(’tf * idf - 1’, locals()) G Varoquaux 22
  • 48. 2 Array computing is limited by CPU starvation 103 104 105 106 number of elements 1ns 2ns 3ns 4ns 5ns timeperelement numpy np inplace numexpr tf idf = tf * idf - 1 What’s going on: 1. tmp ← tf * idf 2. tf idf ← tmp - 1 Big temporary: Moving data in & out of cache A compilation problem: • Removing/reusing temporaries • Operating on “chunks” that fit in cache Addressed by numexpr, with string expressions Addressed by numba, with bytecode inspection lazyarray Similar problem to pagination with SQL queries G Varoquaux 22
  • 49. 2 Array computing is limited by CPU starvation tf idf = tf * idf Too small: overhead Too BIG: Out of cache BIG Data $$$ G Varoquaux 23
  • 50. 2 Numerics versus control flow What if there is an if tf idf = tf / idf tf idf[idf == 0] = 0 Suppose the we are looking at ages in a population: ages[gender == ’male’].mean() - ages[gender == ’female’].mean() G Varoquaux 24
  • 51. 2 Numerics versus control flow What if there is an if tf idf = tf / idf tf idf[idf == 0] = 0 Suppose the we are looking at ages in a population: ages[gender == ’male’].mean() - ages[gender == ’female’].mean() This is really starting to be looking like databases pandas: something in between arrays and an in-memory database Great for queries, less great for numerics. G Varoquaux 24
  • 53. Installation PROBLEMS Beautiful Python COde Routines in Fortran or C++ ScalaBILiTY DeploymentPROBLEMS Beautiful Python COde DATABASE in C++, JAVA, ERLANG... ScalaBILiTY Numpy is the scientist’s equivalent to an ORM Gives speed with non-Python code G Varoquaux 25
  • 54. numerics vs databases numerics efficient on regularly spaced data But numpy creates cache misses for big arrays ⇒ Need to remove temporaries and chunk data G Varoquaux 26
  • 55. numerics vs databases numerics efficient on regularly spaced data But numpy creates cache misses for big arrays ⇒ Need to remove temporaries and chunk data selection and grouping efficient with indexes or trees ⇒ Need to group queries Compilation is unpythonic G Varoquaux 26
  • 56. numerics vs databases numerics efficient on regularly spaced data But numpy creates cache misses for big arrays ⇒ Need to remove temporaries and chunk data selection and grouping efficient with indexes or trees ⇒ Need to group queries Compilation is unpythonic A computation & query language? numexpr I hate domain-specific languages (SQL) Numpy is very expressive G Varoquaux 26
  • 57. numerics vs databases numerics efficient on regularly spaced data But numpy creates cache misses for big arrays ⇒ Need to remove temporaries and chunk data selection and grouping efficient with indexes or trees ⇒ Need to group queries Compilation is unpythonic A computation & query language? numexpr I hate domain-specific languages (SQL) Numpy is very expressive PonyORM: Compiling Python to optimized SQL Datascience with SQL: Ibis & Blaze G Varoquaux 26
  • 58. numerics vs databases numerics efficient on regularly spaced data But numpy creates cache misses for big arrays ⇒ Need to remove temporaries and chunk data selection and grouping efficient with indexes or trees ⇒ Need to group queries Spark: java-world “big data” rising star combines distributed store + computing model We (scikit-learn) are faster when data fits in RAM G Varoquaux 26
  • 59. Operations on chunks, or algorithms on chunks Machine learning, data mining = numerics 03878794797927 03878794797927 G Varoquaux 27
  • 60. Operations on chunks, or algorithms on chunks Machine learning, data mining = numerics 03878794797927 03878794797927 03878794797927 03878794797927 ETL (extract, transform, & load) Multivariate statistics G Varoquaux 27
  • 61. Operations on chunks, or algorithms on chunks Machine learning, data mining = numerics 03878794797927 03878794797927 03878794797927 03878794797927 ETL (extract, transform, & load) Multivariate statistics Out-of-core opera- tions not efficient: no data locality On-line aglorithms (streaming) 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 eg stochastic gradient descent As in deep learning G Varoquaux 27
  • 62. Making the data-science magic happens from sklearn import 03078090707907 00790752700578 94071006000797 00970008007000 10000400400090 00050205008000 documents the Python performance profiling module is code can a 03078090707907 00790752700578 94071006000797 topics the Python performance profiling module is code can a 030 007 940 009 100 000 documents topics + What terms are in a topics What documents are in a topics G Varoquaux 28
  • 63. Making the data-science magic happens from sklearn import Turning applied maths papers to robust code High-level, readable, simple syntax reduces cognitive load Thanks G Varoquaux 28
  • 64. 3 Beyond numerics Make #PyData great (again) G Varoquaux 29
  • 65. 3 Data/computation flow is crucial 03878794797927 03878794797927 03878794797927 03878794797927 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 Data-flow engines are everywhere dask pure-Python dynamic scheduler static compiler parallel & distributed theano expression analysis pure-Python tensorflow C library distributed G Varoquaux 30
  • 66. 3 Data/computation flow is crucial 03878794797927 03878794797927 03878794797927 03878794797927 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 Data-flow engines are everywhere Python should shine there: reflexivity + metaprogramming + async “Python is the best numerical language out there because it’s not a numerical language.” – Nathaniel Smith API challenging: For algorithm design: no framework / inversion of control G Varoquaux 30
  • 67. 3 Ingredients for future data flows Distributed computation & Run-time analysis Reflexivity is central Debugging Interactive work Code analysis Persistence 03878794797927 03878794797927 Parallel computing G Varoquaux 31
  • 68. 3 Ingredients for future data flows Distributed computation & Run-time analysis Reflexivity is central Debugging Interactive work Code analysis Persistence 03878794797927 03878794797927 Parallel computing Pickle distribute code and data without data model serialize intermediate results deep of hash of any data structure joblib.hash Very limited (eg no lambda #19272) ⇒ variants: dill, cloudpickle G Varoquaux 31
  • 69. 3 Ingredients for future data flows Distributed computation & Run-time analysis Reflexivity is central Debugging Interactive work Code analysis Persistence 03878794797927 03878794797927 Parallel computing Pickle distribute code and data without data model serialize intermediate results deep of hash of any data structure joblib.hash joblib: Simple parallel syntax: Parallel(n jobs=2)(delayed(sqrt)(i) for i in range(10)) Fast persistence: joblib.dump(anything, ’filename.pkl.gz’) Primitive for out of core: pointer = mem.cache(f).call and shelves(big data) •Non-invasive syntax / paradigm •Fast on big numpy arrays •Soon backend system (job broker and persistence) Gets job managment into algorithms (eg in scikit-learn) G Varoquaux 31
  • 70. 3 The Python VM is great The simplicity of the VM is our strength Software Transactional Memory... would be nice But, I want to use foreign memory Java gained jmalloc for foreign memory Better garbage collection Yes but, I easily plug into reference counting A strength of Python is its clear C API ⇒ Easy foreign functionality G Varoquaux 32
  • 71. 3 The Python VM is great The simplicity of the VM is our strength Software Transactional Memory... would be nice But, I want to use foreign memory Java gained jmalloc for foreign memory Better garbage collection Yes but, I easily plug into reference counting A strength of Python is its clear C API ⇒ Easy foreign functionality Cython: the best of C and Python Add types for speed (numpy arrays as float*) Call C to bind external libraries: surprisingly easy no pointer arithmetics An adaptation layer between Python VM and C G Varoquaux 32
  • 72. 4 Working together G Varoquaux 33
  • 73. 4 Scikit-learn is easy machine learning As easy as py from s k l e a r n import svm c l a s s i f i e r = svm.SVC() c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n ) Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t ) People love the encapsulation classifier is a semi black box The power of a simple object-oriented API Documentation-driven development G Varoquaux 34
  • 74. 4 Scikit-learn is easy machine learning As easy as py from s k l e a r n import svm c l a s s i f i e r = svm.SVC() c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n ) Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t ) People love the encapsulation classifier is a semi black box The power of a simple object-oriented API Documentation-driven development High-level, readable, simple API reduces cognitive load PyData loves Python in return G Varoquaux 34
  • 75. 4 Difference is richness We all do different things We can all benefit from others though we don’t know how G Varoquaux 35
  • 76. 4 Difference is richness, but requires outreach We all do different things We can all benefit from others though we don’t know how Being didactic outside one’s community is crucial Avoiding jargon take that machine learning Prioritizing information “Simple is better than complex” Students learning numerics don’t care about unicode Build documentation upon very simple examples Think stackoverflow Sphinx + Sphinx-gallery G Varoquaux 35
  • 77. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Connects to other paradigms, eg C
  • 78. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Dynamism and reflexivity ⇒ meta-programming and debugging
  • 79. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Dynamism and reflexivity ⇒ meta-programming and debugging Needs for compilation and dynamism: a difficult balance PEP 509: guards on run-time modification PEP 510: function specicalization
  • 80. @GaelVaroquaux Scientist web dev: Python is the language for data Python language & VM is perfect to manipulate low-level constructs with high-level wordings Dynamism and reflexivity ⇒ meta-programming and debugging Needs for compilation and dynamism Pydata will use DB and concurrency from web PyData can give knowledge engineering + AI