Distributed systems

Presenter Notes

About me

Presenter Notes

Distributed systems: definition

  • A set of autonomous, interconnected services
  • Different codebase
  • Each service with its own database

Presenter Notes

Distributed systems

  1. Why / when ?
  2. Building it
  3. RPCs
  4. Keep it running

Presenter Notes

Why ?

  • In the beginning, a clean, neat design
autoslave_apps.png

Presenter Notes

Why ?

  • Two years later

Presenter Notes

Why ?

  • Two years later
    • New features

Presenter Notes

Why ?

  • Two years later
    • New features
    • Refactoring

Presenter Notes

Why ?

  • Two years later
    • New features
    • Refactoring
    • Auditing, logging

Presenter Notes

Why ?

  • Two years later
    • New features
    • Refactoring
    • Auditing, logging
    • Marketing & sales

Presenter Notes

Why ?

  • Two years later
models-extract.png

Presenter Notes

Let's split it!

multiparc.png

Presenter Notes

When ?

New product/feature:

  • Not an incremental improvement
  • Strongly disconnected from the core one
  • Need to launch fast

Presenter Notes

When ?

New installations of main product:

  • Under another brand
  • In another datacenter
  • With a separate database (legal concerns)
  • Operated by the same (ops) team

Presenter Notes

When ?

Technical improvements:

  • Scaling / sharding
  • Geographic expansion
  • Debundling and upgrading

Presenter Notes

Building the distributed system

Presenter Notes

Where should you start?

Extract generic code

  • misc, util, tools
  • Everything that isn't business-specific
  • And some basic business components, too (common models, UI, etc.)

Put it into a separate modules => Adapt your deployment process for fast-moving internal dependencies

Presenter Notes

Providing an integrated UX

Use a single, coherent visual design:

  • Extract your common UI templates
  • Add inter-product headers
  • Normalize your assets pipeline (bower, grunt)

Presenter Notes

Providing an integrated UX

Deploy an efficient Single-Sign-On solution:

  • SSO, not "centralized authentication"
  • Log-in once, for all services
  • OpenID-like for web customers
  • kerberos-based for internal users, if you can

Presenter Notes

Interlude: Kerberos

  • Efficient, secure authentication.
  • Available with MS Active Directory, or open-source MIT implementation
  • Works with Mac OS, Linux, Windows

Concept:

  1. Authenticate against a secure authentication server ("KDC")
  2. Receive a ticket
  3. Connect to a server
  4. Mutual ticket validation

Presenter Notes

Interlude: Kerberos

Also works for websites

=> Django receives a REMOTE_USER HTTP header

Presenter Notes

A few rules for splitting

  • Keep a single "source of truth"
  • Split per business domain, not per object/table
  • Think about redundancy / availability

Presenter Notes

Where do you begin?

Loosely coupled, isolated module:

  • ACL management
  • Registration
  • Finance & billing

Presenter Notes

Splitting

Avoid rewriting the whole module before splitting:

  1. Enforce the target API within the source code, running on the same host
  2. Start the remote module, running the exact same code
  3. Add a dispatcher that calls both local and remote modules
  4. Once data is properly feeding the new module, switch to "full remote" mode
  5. Remove the dispatcher

Presenter Notes

Remote procedure calls

Presenter Notes

RPC: Choosing a technology

At first, avoid conforming to a strict REST API:

  • You know exactly which methods you need
  • Avoids lots of performance issues
  • Easier to port from internal function calls

Presenter Notes

RPC: Choosing a technology

Important points:

  • Authentication & security
  • Changing the schema
  • Debugging and logging

Presenter Notes

RPC: Security

Don't trust the frontend:

  • Ask to push a "proof of identity"
  • Check authorization against local ACLs

Presenter Notes

RPC: Changing the schema

You'll go through several iterations of the API.

Avoid complex migrations:

  1. Deploy client compatible with old and new contract
  2. Deploy server with both old and new contract
  3. Ask client to switch to new contract
  4. Deploy client and server without old contract

Simpler:

  1. Deploy new server, with new (backwards-compatible) contract
  2. Deploy new client, aware of new features

Presenter Notes

RPC: Changing the schema

Choose your protocol:

  • Allows optional fields, with defaults
  • Doesn't care about unknown fields
  • Supports some path-like notion (/api/v2/resource/)

Presenter Notes

RPC: Choosing a technology

Three layers:

  • Transport (HTTP, plain TCP, ZeroMQ, AMQP, ...)
  • Serialization (JSON, msgpack, ...)
  • RPC (JSON-RPC, SOAP, ...)

Presenter Notes

RPC: Transport

Required features:

  • Load-balancing
  • Message-oriented (with binary support)
  • asyncio compatible
  • Optional: Request/response and asynchronous

Presenter Notes

RPC: Transport

Options:

  • HTTP: good at load-balancing, debugging; bad for binary data
  • AMQP: good at load-balancing, async, binary; bad for time-critical request/response
  • ZeroMQ: good at async, binary, perf; need to rollout your own load-balancing
  • Plain TCP: rollout your own! - bad for message-oriented
  • Plain UDP: rollout your own! - bad for request/response & monitoring

Presenter Notes

RPC: Serialization

Required features:

  • Complex objects
  • Efficient packing
  • Support for schema validation and updates
  • Optional: enums, datetime, custom types

Presenter Notes

RPC: Serialization

Options:

  • msgpack: good for packing; no schema, no custom types
  • protobuf: good for packing, schema; no custom types
  • JSON: bad packing; no schema nor custom types
  • XML: strict schema, advanced types; bad packing
  • pickle: NO.

Presenter Notes

RPC: application layer

Required features:

  • Authentication
  • Explicit schema
  • Unambiguous serialization

Provided by the implementation:

  • Logging/debug
  • Simple client calls, efficient server-side declaration
  • Optional: asyncio compatible

Presenter Notes

RPC: application layer

Options:

  • JSON-RPC: well-defined serialization; no auth, implicit schema ("any defined function")
  • XML-RPC: well-defined serialization; no auth, implicit schema ("any defined function")
  • SOAP/XML: auth, explicit schema; inconsistent serialization
  • ZeroRPC: couldn't find a spec; no auth, implicit schema ("any defined function")
  • JSON/REST: not a real RPC standard

Presenter Notes

RPC: Transport

Recommended technologies:

  • Start with a simple stack:

    HTTP transport, JSON serialization, JSON-RPC/REST-like protocol

  • Move to a more efficient stack for scaling/perf:

    ZeroMQ/AMQP transport, Protobuf serialization, <missing> protocol

Presenter Notes

Maintaining and testing

Presenter Notes

Keep it running

  • Reliability / redundancy
  • Managing deploys and upgrades
  • Splitting additional modules

Presenter Notes

Reliability / redundancy

Ensure that the role of each service is clear:

  • Is it the only source for some data, and must be queried every time?

    => High availability required

  • Is it an authoritative datastore, cached by client services?

    => Don't forget cache invalidation and automatic synchronization

Presenter Notes

Managing upgrades

New challenges:

  • Inter-system dependencies
  • Handling contract changes
  • Shared libraries

Presenter Notes

Upgrades: inter-system deps

  • Critical service: progressive upgrade

    => Play with load-balancers

  • Seldom used backends: up to you

  • Client services: progressive upgrade

    => Ensure you're still compatible with your upstream services

Presenter Notes

Upgrades: Contracts

  • Small changes: deploy new server then new clients

    => Needs more steps for strict schemas

  • Big changes: increase the API version number (new code path)

    => Monitor old service for non-upgraded clients

Presenter Notes

Upgrades: shared libraries

Upgrades to various libraries:

  • Security

    => Deploy as soon as possible

  • New major features

    => Integrate to a new major upgrade

  • Minor improvements

    => For the next minor upgrade

Presenter Notes

Upgrades: shared libraries

Release often, if only to upgrade dependencies

  • Avoids using obsolete code
  • Helps with future feature-based upgrades
  • Less versions of libraries in production => easier to track and fix bugs

Full range of tests:

  • Current prod branch with pinned dependencies
  • Current prod branch with latest version of dependencies
  • Current master with pinned dependencies
  • Current master with latest version of dependencies

Presenter Notes

Testing

Presenter Notes

Testing

Different levels:

  • Server-side contracts

    => Proper input validation

  • Isolated clients

    => Runs faster

  • Full-system tests

    => Most accurate

Presenter Notes

Testing the server

  • Connect a fake client with an in-memory transport
    • Tests the whole stack (including errors, logging)
    • Fast
  • Send already deserialized objects to actual methods
    • Skips some layers
    • Slightly faster
    • Easier tests on returned values
  • Connect an actual client with a real transport
    • Closer to real setups
    • Allows some failure modes (client disconnects in flight)

Presenter Notes

Multi-threaded tests

  • Required when network is involved (until aiodjango? :P)
  • Used by some production code (management commands)
  • Much slower due to TransactionTestCase

Solution?

django-shareddb: offload all DB operations to a single thread

=> Wrap LiveServerTestCase tests in transactions!

Presenter Notes

Multi-system tests

  • Allows for a complete test of the whole system
  • Provide a ground for failure mode handling

But:

  • Rather complex to setup
  • Lack of proper tooling
  • Kind of giant selenium tests, but with multiple servers and databases

Presenter Notes

Multi-system tests

Prototype: fire / djfire.

  • Allows remote code execution across virtual environments
  • Provides hooks for synchronizing test cases, stopping daemons
  • Compatible with django-shareddb

Release: soon!

Presenter Notes

djfire

Example:

class MyRemoteTest(fire_tests.FireTestCase):
    fire_servers = {
        'proj': '../../remote.sock',
    }

    def test_user_factory(self):
        proj = self.fire_remotes['proj']
        auth_factories = proj.module('django_fact.factories')
        auth_models = proj.module('django.contrib.auth.models')

        user = auth_factories.UserFactory()
        user2 = auth_models.User.objects.get()

        self.assertEqual(user, user2)
        self.assertEqual(0, user2.groups.count())

Presenter Notes

Questions?

Presenter Notes