SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
DISQUS
                         Building Scalable Web Apps



                                 David Cramer
                                    @zeeg




Tuesday, June 21, 2011
Agenda



                •        Terminology
                •        Common bottlenecks


                •        Building a scalable app
                         •   Architecting your database
                         •   Utilizing a Queue
                         •   The importance of an API



Tuesday, June 21, 2011
Performance vs. Scalability
                         “Performance measures the speed with
                         which a single request can be executed,
                         while scalability measures the ability of a
                         request to maintain its performance under
                         increasing load.”




                               (but we’re not just going to scale your code)


Tuesday, June 21, 2011
Sharding

                         “Database sharding is a method of
                         horizontally partitioning data by common
                         properties”




Tuesday, June 21, 2011
Denormalization
                         “Denormalization is the process of
                         attempting to optimize the performance of
                         a database by adding redundant data or by
                         grouping data.”




Tuesday, June 21, 2011
Common Bottlenecks




                •        Database (almost always)
                •        Caching, Invalidation
                •        Lack of metrics, lack of tests




Tuesday, June 21, 2011
Building Tweeter




Tuesday, June 21, 2011
Getting Started




                •        Pick a framework: Django, Flask, Pyramid
                •        Package your app; Repeatability
                •        Solve problems
                •        Invest in architecture




Tuesday, June 21, 2011
Let’s use Django




Tuesday, June 21, 2011
Tuesday, June 21, 2011
Django is..




                •        Fast (enough)
                •        Loaded with goodies
                •        Maintained
                •        Tested
                •        Used




Tuesday, June 21, 2011
Packaging Matters




Tuesday, June 21, 2011
setup.py

                     #!/usr/bin/env python
                     from setuptools import setup, find_packages

                     setup(
                           name='tweeter',
                          version='0.1',
                          packages=find_packages(),
                          install_requires=[
                               'Django==1.3',
                          ],
                          package_data={
                               'tweeter': [
                                   'static/*.*',
                                   'templates/*.*',
                               ],
                           },
                     )



Tuesday, June 21, 2011
setup.py (cont.)




                     $   mkvirtualenv tweeter
                     $   git clone git.example.com:tweeter.git
                     $   cd tweeter
                     $   python setup.py develop




Tuesday, June 21, 2011
setup.py (cont.)




                     ## fabfile.py
                     def setup():
                         run('git clone git.example.com:tweeter.git')
                         run('cd tweeter')
                         run('./bootstrap.sh')



                     ## bootstrap.sh
                     #!/usr/bin/env bash
                     virtualenv env
                     env/bin/python setup.py develop




Tuesday, June 21, 2011
setup.py (cont.)



                     $ fab web setup

                     setup   executed   on   web1
                     setup   executed   on   web2
                     setup   executed   on   web3
                     setup   executed   on   web4
                     setup   executed   on   web5
                     setup   executed   on   web6
                     setup   executed   on   web7
                     setup   executed   on   web8
                     setup   executed   on   web9
                     setup   executed   on   web10




Tuesday, June 21, 2011
Database(s) First




Tuesday, June 21, 2011
Databases




          •     Usually core
          •     Common bottleneck
          •     Hard to change
          •     Tedious to scale

                                               http://www.flickr.com/photos/adesigna/3237575990/




Tuesday, June 21, 2011
What a tweet “looks” like




Tuesday, June 21, 2011
Modeling the data


                     from django.db import models

                     class Tweet(models.Model):
                         user    = models.ForeignKey(User)
                         message = models.CharField(max_length=140)
                         date    = models.DateTimeField(auto_now_add=True)
                         parent = models.ForeignKey('self', null=True)




                     class Relationship(models.Model):
                         from_user = models.ForeignKey(User)
                         to_user   = models.ForeignKey(User)



                                    (Remember, bare bones!)


Tuesday, June 21, 2011
Public Timeline


                     # public timeline
                     SELECT * FROM tweets
                       ORDER BY date DESC
                       LIMIT 100;




                •        Scales to the size of one physical machine
                •        Heavy index, long tail
                •        Easy to cache, invalidate




Tuesday, June 21, 2011
Following Timeline

                     # tweets from people you follow
                     SELECT t.* FROM tweets AS t
                       JOIN relationships AS r
                         ON r.to_user_id = t.user_id
                       WHERE r.from_user_id = '1'
                       ORDER BY t.date DESC
                       LIMIT 100



                •        No vertical partitions
                •        Heavy index, long tail
                •        “Necessary evil” join
                •        Easy to cache, expensive to invalidate


Tuesday, June 21, 2011
Materializing Views




                     PUBLIC_TIMELINE = []

                     def on_tweet_creation(tweet):
                         global PUBLIC_TIME

                         PUBLIC_TIMELINE.insert(0, tweet)

                     def get_latest_tweets(num=100):
                         return PUBLIC_TIMELINE[:num]




                                 Disclaimer: don’t try this at home


Tuesday, June 21, 2011
Introducing Redis


                     class PublicTimeline(object):
                         def __init__(self):
                             self.conn = Redis()
                             self.key = 'timeline:public'

                         def add(self, tweet):
                             score = float(tweet.date.strftime('%s.%m'))
                             self.conn.zadd(self.key, tweet.id, score)

                         def remove(self, tweet):
                             self.conn.zrem(self.key, tweet.id)

                         def list(self, offset=0, limit=-1):
                             tweet_ids = self.conn.zrevrange(self.key, offset, limit)

                             return tweet_ids




Tuesday, June 21, 2011
Cleaning Up




                     from datetime import datetime, timedelta

                     class PublicTimeline(object):
                         def truncate(self):
                             # Remove entries older than 30 days
                             d30 = datetime.now() - timedelta(days=30)
                             score = float(d30.strftime('%s.%m'))
                             self.conn.zremrangebyscore(self.key, d30, -1)




Tuesday, June 21, 2011
Scaling Redis




                     from nydus.db import create_cluster

                     class PublicTimeline(object):
                         def __init__(self):
                             # create a cluster of 9 dbs
                             self.conn = create_cluster({
                                 'engine': 'nydus.db.backends.redis.Redis',
                                 'router': 'nydus.db.routers.redis.PartitionRouter',
                                 'hosts': dict((n, {'db': n}) for n in xrange(64)),
                             })




Tuesday, June 21, 2011
Nydus


                     # create a cluster of Redis connections which
                     # partition reads/writes by key (hash(key) % size)

                     from nydus.db import create_cluster
                     redis = create_cluster({
                         'engine': 'nydus.db.backends.redis.Redis',
                         'router': 'nydus.db...redis.PartitionRouter',
                         'hosts': {
                             0: {'db': 0},
                         }
                     })

                     # maps to a single node
                     res = conn.incr('foo')
                     assert res == 1

                     # executes on all nodes
                     conn.flushdb()



                                        http://github.com/disqus/nydus


Tuesday, June 21, 2011
Vertical vs. Horizontal




Tuesday, June 21, 2011
Looking at the Cluster




                               sql-1-master             sql-1-slave




                                              redis-1


                         DB0       DB1         DB2         DB3        DB4




                         DB5       DB6         DB7         DB8        DB9




Tuesday, June 21, 2011
“Tomorrow’s” Cluster


                         sql-1-master         sql-1-users         sql-1-tweets



                                               redis-1


                             DB0        DB1      DB2        DB3      DB4




                                               redis-2


                             DB5        DB6      DB7        DB8      DB9




Tuesday, June 21, 2011
Asynchronous Tasks




Tuesday, June 21, 2011
In-Process Limitations




                     def on_tweet_creation(tweet):
                         # O(1) for public timeline
                         PublicTimeline.add(tweet)

                         # O(n) for users following author
                         for user_id in tweet.user.followers.all():
                             FollowingTimeline.add(user_id, tweet)

                         # O(1) for profile timeline (my tweets)
                         ProfileTimeline.add(tweet.user_id, tweet)




Tuesday, June 21, 2011
In-Process Limitations (cont.)




                         # O(n) for users following author
                         # 7 MILLION writes for Ashton Kutcher
                         for user_id in tweet.user.followers.all():
                             FollowingTimeline.add(user_id, tweet)




Tuesday, June 21, 2011
Introducing Celery




                     #!/usr/bin/env python
                     from setuptools import setup, find_packages

                     setup(
                          install_requires=[
                               'Django==1.3',
                               'django-celery==2.2.4',
                          ],
                           # ...
                     )




Tuesday, June 21, 2011
Introducing Celery (cont.)




                     @task(exchange='tweet_creation')
                     def on_tweet_creation(tweet_dict):
                         # HACK: not the best idea
                         tweet = Tweet()
                         tweet.__dict__ = tweet_dict

                         # O(n) for users following author
                         for user_id in tweet.user.followers.all():
                             FollowingTimeline.add(user_id, tweet)

                     on_tweet_creation.delay(tweet.__dict__)




Tuesday, June 21, 2011
Bringing It Together

                def home(request):
                    "Shows the latest 100 tweets from your follow stream"

                         if random.randint(0, 9) == 0:
                             return render('fail_whale.html')

                         ids = FollowingTimeline.list(
                             user_id=request.user.id,
                             limit=100,
                         )

                         res = dict((str(t.id), t) for t in 
                                       Tweet.objects.filter(id__in=ids))

                         tweets = []
                         for tweet_id in ids:
                             if tweet_id not in res:
                                 continue
                             tweets.append(res[tweet_id])

                         return render('home.html', {'tweets': tweets})



Tuesday, June 21, 2011
Build an API




Tuesday, June 21, 2011
APIs




           •     PublicTimeline.list
           •     redis.zrange
           •     Tweet.objects.all()
           •     example.com/api/tweets/




Tuesday, June 21, 2011
Refactoring

                def home(request):
                    "Shows the latest 100 tweets from your follow stream"

                         tweet_ids = FollowingTimeline.list(
                             user_id=request.user.id,
                             limit=100,
                         )




                def home(request):
                    "Shows the latest 100 tweets from your follow stream"

                         tweets = FollowingTimeline.list(
                             user_id=request.user.id,
                             limit=100,
                         )



Tuesday, June 21, 2011
Refactoring (cont.)




                     from datetime import datetime, timedelta

                     class PublicTimeline(object):
                        def list(self, offset=0, limit=-1):
                            ids = self.conn.zrevrange(self.key, offset, limit)

                            cache = dict((t.id, t) for t in 
                                         Tweet.objects.filter(id__in=ids))

                            return filter(None, (cache.get(i) for i in ids))




Tuesday, June 21, 2011
Optimization in the API

                 class PublicTimeline(object):
                    def list(self, offset=0, limit=-1):
                        ids = self.conn.zrevrange(self.list_key, offset, limit)

                         # pull objects from a hash map (cache) in Redis
                         cache = dict((i, self.conn.get(self.hash_key(i)))
                                      for i in ids)

                         if not all(cache.itervalues()):
                             # fetch missing from database
                             missing = [i for i, c in cache.iteritems() if not c]
                             m_cache = dict((str(t.id), t) for t in 
                                            Tweet.objects.filter(id__in=missing))

                             # push missing back into cache
                             cache.update(m_cache)
                             for i, c in m_cache.iteritems():
                                 self.conn.set(hash_key(i), c)

                         # return only results that still exist
                         return filter(None, (cache.get(i) for i in ids))



Tuesday, June 21, 2011
Optimization in the API (cont.)




                 def list(self, offset=0, limit=-1):
                        ids = self.conn.zrevrange(self.list_key, offset, limit)

                         # pull objects from a hash map (cache) in Redis
                         cache = dict((i, self.conn.get(self.hash_key(i)))
                                      for i in ids)




                                      Store each object in it’s own key




Tuesday, June 21, 2011
Optimization in the API (cont.)




                         if not all(cache.itervalues()):
                             # fetch missing from database
                             missing = [i for i, c in cache.iteritems() if not c]
                             m_cache = dict((str(t.id), t) for t in 
                                            Tweet.objects.filter(id__in=missing))




                                 Hit the database for misses




Tuesday, June 21, 2011
Optimization in the API (cont.)


                                    Store misses back in the cache

                             # push missing back into cache
                             cache.update(m_cache)
                             for i, c in m_cache.iteritems():
                                 self.conn.set(hash_key(i), c)

                         # return only results that still exist
                         return filter(None, (cache.get(i) for i in ids))




                          Ignore database misses



Tuesday, June 21, 2011
(In)validate the Cache




                 class PublicTimeline(object):
                    def add(self, tweet):
                         score = float(tweet.date.strftime('%s.%m'))

                         # add the tweet into the object cache
                         self.conn.set(self.make_key(tweet.id), tweet)

                         # add the tweet to the materialized view
                         self.conn.zadd(self.list_key, tweet.id, score)




Tuesday, June 21, 2011
(In)validate the Cache




                 class PublicTimeline(object):
                    def remove(self, tweet):
                         # remove the tweet from the materialized view
                         self.conn.zrem(self.key, tweet.id)

                         # we COULD remove the tweet from the object cache
                         self.conn.del(self.make_key(tweet.id))




Tuesday, June 21, 2011
Wrap Up




Tuesday, June 21, 2011
Reflection




                •        Use a framework!
                •        Start simple; grow naturally
                •        Scale can lead to performance
                         •   Not the other way around
                •        Consolidate entry points




Tuesday, June 21, 2011
Reflection (cont.)




                •        100 shards > 10; Rebalancing sucks
                         •   Use VMs
                •        Push to caches, don’t pull
                •        “Denormalize” counters, views
                •        Queue everything




Tuesday, June 21, 2011
Food for Thought




                •        Normalize object cache keys
                •        Application triggers directly to queue
                •        Rethink pagination
                •        Build with future-sharding in mind




Tuesday, June 21, 2011
DISQUS
                           Questions?




                           psst, we’re hiring
                          jobs@disqus.com

Tuesday, June 21, 2011

Más contenido relacionado

Destacado

Iria a Todo EL MUNDO
Iria a Todo EL MUNDOIria a Todo EL MUNDO
Iria a Todo EL MUNDOguest8d485e
 
香港六合彩
香港六合彩香港六合彩
香港六合彩wejia
 
Introduction to chef framework
Introduction to chef frameworkIntroduction to chef framework
Introduction to chef frameworkmorgoth
 
La ReconstruccióN Del PaíS
La ReconstruccióN Del PaíSLa ReconstruccióN Del PaíS
La ReconstruccióN Del PaíSmichellchd
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsAgustin Benito Bethencourt
 
Guillems De La Historia
Guillems De La HistoriaGuillems De La Historia
Guillems De La Historiaguestd0403f
 
Battle of luoisbourg keynote0
Battle of luoisbourg keynote0Battle of luoisbourg keynote0
Battle of luoisbourg keynote0iamcanehdian
 
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...Paul McElvaney
 
Lecture somerset webversie
Lecture somerset webversieLecture somerset webversie
Lecture somerset webversieSjef Kerkhofs
 
Right People For The Right Job!
Right People For The Right Job!Right People For The Right Job!
Right People For The Right Job!Geethashree N
 
Aag presentatie 3 februari
Aag presentatie 3 februariAag presentatie 3 februari
Aag presentatie 3 februariSjef Kerkhofs
 
Culto Ferias - 05.07.07
Culto Ferias - 05.07.07Culto Ferias - 05.07.07
Culto Ferias - 05.07.07Jubrac Jacui
 
Matteo baccan raspberry pi - linox 2014
Matteo baccan   raspberry pi - linox 2014Matteo baccan   raspberry pi - linox 2014
Matteo baccan raspberry pi - linox 2014Matteo Baccan
 
Spanish Final Project
Spanish Final ProjectSpanish Final Project
Spanish Final Projectmonikak02
 
MISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-AMISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-AJuan Serrano Pérez
 
維基經濟學
維基經濟學維基經濟學
維基經濟學sdfvb
 

Destacado (20)

Iria a Todo EL MUNDO
Iria a Todo EL MUNDOIria a Todo EL MUNDO
Iria a Todo EL MUNDO
 
Colores
ColoresColores
Colores
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Introduction to chef framework
Introduction to chef frameworkIntroduction to chef framework
Introduction to chef framework
 
01.2008 AcampãO
01.2008   AcampãO01.2008   AcampãO
01.2008 AcampãO
 
La ReconstruccióN Del PaíS
La ReconstruccióN Del PaíSLa ReconstruccióN Del PaíS
La ReconstruccióN Del PaíS
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
 
Jane's walk 2012 evolution of ottawa
Jane's walk 2012   evolution of ottawaJane's walk 2012   evolution of ottawa
Jane's walk 2012 evolution of ottawa
 
Guillems De La Historia
Guillems De La HistoriaGuillems De La Historia
Guillems De La Historia
 
Battle of luoisbourg keynote0
Battle of luoisbourg keynote0Battle of luoisbourg keynote0
Battle of luoisbourg keynote0
 
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
 
Lecture somerset webversie
Lecture somerset webversieLecture somerset webversie
Lecture somerset webversie
 
Right People For The Right Job!
Right People For The Right Job!Right People For The Right Job!
Right People For The Right Job!
 
Aag presentatie 3 februari
Aag presentatie 3 februariAag presentatie 3 februari
Aag presentatie 3 februari
 
Culto Ferias - 05.07.07
Culto Ferias - 05.07.07Culto Ferias - 05.07.07
Culto Ferias - 05.07.07
 
Matteo baccan raspberry pi - linox 2014
Matteo baccan   raspberry pi - linox 2014Matteo baccan   raspberry pi - linox 2014
Matteo baccan raspberry pi - linox 2014
 
Spanish Final Project
Spanish Final ProjectSpanish Final Project
Spanish Final Project
 
MISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-AMISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-A
 
維基經濟學
維基經濟學維基經濟學
維基經濟學
 
Bookevent
BookeventBookevent
Bookevent
 

Similar a Building Scalable Web Apps

Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDDAndrea Francia
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birdscolinbdclark
 
Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​Payara
 
Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)ecommerce poland expo
 
Java EE and Google App Engine
Java EE and Google App EngineJava EE and Google App Engine
Java EE and Google App EngineArun Gupta
 
Governing services, data, rules, processes and more
Governing services, data, rules, processes and moreGoverning services, data, rules, processes and more
Governing services, data, rules, processes and moreRandall Hauch
 
Android 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journeyAndroid 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journeyEmanuele Di Saverio
 
Develop Gwt application in TDD
Develop Gwt application in TDDDevelop Gwt application in TDD
Develop Gwt application in TDDUberto Barbini
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastoreikailan
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)zeeg
 
Introducing the App Engine datastore
Introducing the App Engine datastoreIntroducing the App Engine datastore
Introducing the App Engine datastoreikailan
 
every-day-automation
every-day-automationevery-day-automation
every-day-automationAmir Barylko
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycaneYusuke Ando
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebJakub Nesetril
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHPConFoo
 

Similar a Building Scalable Web Apps (20)

Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDD
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birds
 
Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​
 
Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)
 
What's New in GWT 2.2
What's New in GWT 2.2What's New in GWT 2.2
What's New in GWT 2.2
 
Anarchist guide to titanium ui
Anarchist guide to titanium uiAnarchist guide to titanium ui
Anarchist guide to titanium ui
 
Java EE and Google App Engine
Java EE and Google App EngineJava EE and Google App Engine
Java EE and Google App Engine
 
Governing services, data, rules, processes and more
Governing services, data, rules, processes and moreGoverning services, data, rules, processes and more
Governing services, data, rules, processes and more
 
Android 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journeyAndroid 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journey
 
NoSQL
NoSQLNoSQL
NoSQL
 
Develop Gwt application in TDD
Develop Gwt application in TDDDevelop Gwt application in TDD
Develop Gwt application in TDD
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)
 
Simple Build Tool
Simple Build ToolSimple Build Tool
Simple Build Tool
 
Introducing the App Engine datastore
Introducing the App Engine datastoreIntroducing the App Engine datastore
Introducing the App Engine datastore
 
every-day-automation
every-day-automationevery-day-automation
every-day-automation
 
RunDeck
RunDeckRunDeck
RunDeck
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycane
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time Web
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHP
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Building Scalable Web Apps

  • 1. DISQUS Building Scalable Web Apps David Cramer @zeeg Tuesday, June 21, 2011
  • 2. Agenda • Terminology • Common bottlenecks • Building a scalable app • Architecting your database • Utilizing a Queue • The importance of an API Tuesday, June 21, 2011
  • 3. Performance vs. Scalability “Performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.” (but we’re not just going to scale your code) Tuesday, June 21, 2011
  • 4. Sharding “Database sharding is a method of horizontally partitioning data by common properties” Tuesday, June 21, 2011
  • 5. Denormalization “Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data.” Tuesday, June 21, 2011
  • 6. Common Bottlenecks • Database (almost always) • Caching, Invalidation • Lack of metrics, lack of tests Tuesday, June 21, 2011
  • 8. Getting Started • Pick a framework: Django, Flask, Pyramid • Package your app; Repeatability • Solve problems • Invest in architecture Tuesday, June 21, 2011
  • 11. Django is.. • Fast (enough) • Loaded with goodies • Maintained • Tested • Used Tuesday, June 21, 2011
  • 13. setup.py #!/usr/bin/env python from setuptools import setup, find_packages setup( name='tweeter',    version='0.1',    packages=find_packages(),    install_requires=[     'Django==1.3',    ],    package_data={ 'tweeter': [ 'static/*.*', 'templates/*.*', ], }, ) Tuesday, June 21, 2011
  • 14. setup.py (cont.) $ mkvirtualenv tweeter $ git clone git.example.com:tweeter.git $ cd tweeter $ python setup.py develop Tuesday, June 21, 2011
  • 15. setup.py (cont.) ## fabfile.py def setup(): run('git clone git.example.com:tweeter.git') run('cd tweeter') run('./bootstrap.sh') ## bootstrap.sh #!/usr/bin/env bash virtualenv env env/bin/python setup.py develop Tuesday, June 21, 2011
  • 16. setup.py (cont.) $ fab web setup setup executed on web1 setup executed on web2 setup executed on web3 setup executed on web4 setup executed on web5 setup executed on web6 setup executed on web7 setup executed on web8 setup executed on web9 setup executed on web10 Tuesday, June 21, 2011
  • 18. Databases • Usually core • Common bottleneck • Hard to change • Tedious to scale http://www.flickr.com/photos/adesigna/3237575990/ Tuesday, June 21, 2011
  • 19. What a tweet “looks” like Tuesday, June 21, 2011
  • 20. Modeling the data from django.db import models class Tweet(models.Model): user = models.ForeignKey(User) message = models.CharField(max_length=140) date = models.DateTimeField(auto_now_add=True) parent = models.ForeignKey('self', null=True) class Relationship(models.Model): from_user = models.ForeignKey(User) to_user = models.ForeignKey(User) (Remember, bare bones!) Tuesday, June 21, 2011
  • 21. Public Timeline # public timeline SELECT * FROM tweets ORDER BY date DESC LIMIT 100; • Scales to the size of one physical machine • Heavy index, long tail • Easy to cache, invalidate Tuesday, June 21, 2011
  • 22. Following Timeline # tweets from people you follow SELECT t.* FROM tweets AS t JOIN relationships AS r ON r.to_user_id = t.user_id WHERE r.from_user_id = '1' ORDER BY t.date DESC LIMIT 100 • No vertical partitions • Heavy index, long tail • “Necessary evil” join • Easy to cache, expensive to invalidate Tuesday, June 21, 2011
  • 23. Materializing Views PUBLIC_TIMELINE = [] def on_tweet_creation(tweet): global PUBLIC_TIME PUBLIC_TIMELINE.insert(0, tweet) def get_latest_tweets(num=100): return PUBLIC_TIMELINE[:num] Disclaimer: don’t try this at home Tuesday, June 21, 2011
  • 24. Introducing Redis class PublicTimeline(object): def __init__(self): self.conn = Redis() self.key = 'timeline:public' def add(self, tweet): score = float(tweet.date.strftime('%s.%m')) self.conn.zadd(self.key, tweet.id, score) def remove(self, tweet): self.conn.zrem(self.key, tweet.id) def list(self, offset=0, limit=-1): tweet_ids = self.conn.zrevrange(self.key, offset, limit) return tweet_ids Tuesday, June 21, 2011
  • 25. Cleaning Up from datetime import datetime, timedelta class PublicTimeline(object): def truncate(self): # Remove entries older than 30 days d30 = datetime.now() - timedelta(days=30) score = float(d30.strftime('%s.%m')) self.conn.zremrangebyscore(self.key, d30, -1) Tuesday, June 21, 2011
  • 26. Scaling Redis from nydus.db import create_cluster class PublicTimeline(object): def __init__(self): # create a cluster of 9 dbs self.conn = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db.routers.redis.PartitionRouter', 'hosts': dict((n, {'db': n}) for n in xrange(64)), }) Tuesday, June 21, 2011
  • 27. Nydus # create a cluster of Redis connections which # partition reads/writes by key (hash(key) % size) from nydus.db import create_cluster redis = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db...redis.PartitionRouter', 'hosts': { 0: {'db': 0}, } }) # maps to a single node res = conn.incr('foo') assert res == 1 # executes on all nodes conn.flushdb() http://github.com/disqus/nydus Tuesday, June 21, 2011
  • 29. Looking at the Cluster sql-1-master sql-1-slave redis-1 DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 Tuesday, June 21, 2011
  • 30. “Tomorrow’s” Cluster sql-1-master sql-1-users sql-1-tweets redis-1 DB0 DB1 DB2 DB3 DB4 redis-2 DB5 DB6 DB7 DB8 DB9 Tuesday, June 21, 2011
  • 32. In-Process Limitations def on_tweet_creation(tweet): # O(1) for public timeline PublicTimeline.add(tweet) # O(n) for users following author for user_id in tweet.user.followers.all(): FollowingTimeline.add(user_id, tweet) # O(1) for profile timeline (my tweets) ProfileTimeline.add(tweet.user_id, tweet) Tuesday, June 21, 2011
  • 33. In-Process Limitations (cont.) # O(n) for users following author # 7 MILLION writes for Ashton Kutcher for user_id in tweet.user.followers.all(): FollowingTimeline.add(user_id, tweet) Tuesday, June 21, 2011
  • 34. Introducing Celery #!/usr/bin/env python from setuptools import setup, find_packages setup(    install_requires=[     'Django==1.3',     'django-celery==2.2.4',    ], # ... ) Tuesday, June 21, 2011
  • 35. Introducing Celery (cont.) @task(exchange='tweet_creation') def on_tweet_creation(tweet_dict): # HACK: not the best idea tweet = Tweet() tweet.__dict__ = tweet_dict # O(n) for users following author for user_id in tweet.user.followers.all(): FollowingTimeline.add(user_id, tweet) on_tweet_creation.delay(tweet.__dict__) Tuesday, June 21, 2011
  • 36. Bringing It Together def home(request): "Shows the latest 100 tweets from your follow stream" if random.randint(0, 9) == 0: return render('fail_whale.html') ids = FollowingTimeline.list( user_id=request.user.id, limit=100, ) res = dict((str(t.id), t) for t in Tweet.objects.filter(id__in=ids)) tweets = [] for tweet_id in ids: if tweet_id not in res: continue tweets.append(res[tweet_id]) return render('home.html', {'tweets': tweets}) Tuesday, June 21, 2011
  • 37. Build an API Tuesday, June 21, 2011
  • 38. APIs • PublicTimeline.list • redis.zrange • Tweet.objects.all() • example.com/api/tweets/ Tuesday, June 21, 2011
  • 39. Refactoring def home(request): "Shows the latest 100 tweets from your follow stream" tweet_ids = FollowingTimeline.list( user_id=request.user.id, limit=100, ) def home(request): "Shows the latest 100 tweets from your follow stream" tweets = FollowingTimeline.list( user_id=request.user.id, limit=100, ) Tuesday, June 21, 2011
  • 40. Refactoring (cont.) from datetime import datetime, timedelta class PublicTimeline(object): def list(self, offset=0, limit=-1): ids = self.conn.zrevrange(self.key, offset, limit) cache = dict((t.id, t) for t in Tweet.objects.filter(id__in=ids)) return filter(None, (cache.get(i) for i in ids)) Tuesday, June 21, 2011
  • 41. Optimization in the API class PublicTimeline(object): def list(self, offset=0, limit=-1): ids = self.conn.zrevrange(self.list_key, offset, limit) # pull objects from a hash map (cache) in Redis cache = dict((i, self.conn.get(self.hash_key(i))) for i in ids) if not all(cache.itervalues()): # fetch missing from database missing = [i for i, c in cache.iteritems() if not c] m_cache = dict((str(t.id), t) for t in Tweet.objects.filter(id__in=missing)) # push missing back into cache cache.update(m_cache) for i, c in m_cache.iteritems(): self.conn.set(hash_key(i), c) # return only results that still exist return filter(None, (cache.get(i) for i in ids)) Tuesday, June 21, 2011
  • 42. Optimization in the API (cont.) def list(self, offset=0, limit=-1): ids = self.conn.zrevrange(self.list_key, offset, limit) # pull objects from a hash map (cache) in Redis cache = dict((i, self.conn.get(self.hash_key(i))) for i in ids) Store each object in it’s own key Tuesday, June 21, 2011
  • 43. Optimization in the API (cont.) if not all(cache.itervalues()): # fetch missing from database missing = [i for i, c in cache.iteritems() if not c] m_cache = dict((str(t.id), t) for t in Tweet.objects.filter(id__in=missing)) Hit the database for misses Tuesday, June 21, 2011
  • 44. Optimization in the API (cont.) Store misses back in the cache # push missing back into cache cache.update(m_cache) for i, c in m_cache.iteritems(): self.conn.set(hash_key(i), c) # return only results that still exist return filter(None, (cache.get(i) for i in ids)) Ignore database misses Tuesday, June 21, 2011
  • 45. (In)validate the Cache class PublicTimeline(object): def add(self, tweet): score = float(tweet.date.strftime('%s.%m')) # add the tweet into the object cache self.conn.set(self.make_key(tweet.id), tweet) # add the tweet to the materialized view self.conn.zadd(self.list_key, tweet.id, score) Tuesday, June 21, 2011
  • 46. (In)validate the Cache class PublicTimeline(object): def remove(self, tweet): # remove the tweet from the materialized view self.conn.zrem(self.key, tweet.id) # we COULD remove the tweet from the object cache self.conn.del(self.make_key(tweet.id)) Tuesday, June 21, 2011
  • 48. Reflection • Use a framework! • Start simple; grow naturally • Scale can lead to performance • Not the other way around • Consolidate entry points Tuesday, June 21, 2011
  • 49. Reflection (cont.) • 100 shards > 10; Rebalancing sucks • Use VMs • Push to caches, don’t pull • “Denormalize” counters, views • Queue everything Tuesday, June 21, 2011
  • 50. Food for Thought • Normalize object cache keys • Application triggers directly to queue • Rethink pagination • Build with future-sharding in mind Tuesday, June 21, 2011
  • 51. DISQUS Questions? psst, we’re hiring jobs@disqus.com Tuesday, June 21, 2011