4.2. CouchDB Replication Protocol¶
The CouchDB Replication protocol is a protocol for synchronizing documents between 2 peers over HTTP/1.1.
4.2.1. Language¶
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
4.2.2. Goals¶
The CouchDB Replication protocol is a synchronization protocol for synchronizing documents between 2 peers over HTTP/1.1.
In theory the CouchDB protocol can be used between products that implement it. However the reference implementation, written in Erlang, is provided by the couch_replicator module available in Apache CouchDB.
The CouchDB replication protocol is using the CouchDB REST API and so is based on HTTP and the Apache CouchDB MVCC Data model. The primary goal of this specification is to describe the CouchDB replication algorithm.
4.2.3. Definitions¶
- ID:
- An identifier (could be an UUID) as described in RFC 4122
- Sequence:
- An ID provided by the changes feed. It can be numeric but not necessarily.
- Revision:
- (to define)
- Document
- A document is JSON entity with a unique ID and revision.
- Database
- A collection of documents with a unique URI
- URI
- An uri is defined by the RFC 2396 . It can be an URL as defined in RFC 1738.
- Source
- Database from where the Documents are replicated
- Target
- Database where the Document are replicated
- Checkpoint
- Last source sequence ID
4.2.4. Algorithm¶
- Get unique identifiers for the Source and Target based on their URI if replication task ID is not available.
- Save this identifier in a special Document named _local/<uniqueid> on the Target database. This document isn’t replicated. It will collect the last Source sequence ID, the Checkpoint, from the previous replication process.
- Get the Source changes feed by passing it the Checkpoint using the since parameter by calling the /<source>/_changes URL. The changes feed only return a list of current revisions.
Note
This step can be done continuously using the feed=longpoll or feed=continuous parameters. Then the feed will continuously get the changes.
- Collect a group of Document/Revisions ID pairs from the changes feed and send them to the target databases on the /<target>/_revs_diffs URL. The result will contain the list of revisions NOT in the Target.
- GET each revisions from the source Database by calling the URL /<source>/<docid>?revs=true&open_revs`=<revision> . This will get the document with its parent revisions. Also don’t forget to get attachments that aren’t already stored at the target. As an optimisation you can use the HTTP multipart api to get all.
- Collect a group of revisions fetched at previous step and store them on the target database using the Bulk Docs API with the new_edit: false JSON property to preserve their revisions ID.
- After the group of revision is stored on the Target, save the new Checkpoint on the Source.
Note
- Even if some revisions have been ignored the sequence should be take in consideration for the Checkpoint.
- To compare non numeric sequence ordering, you will have to keep an ordered list of the sequences IDS as they appear in the _changes feed and compare their indices.
4.2.5. Filter replication¶
The replication can be filtered by passing the filter parameter to the changes feeds with a function name. This will call a function on each changes. If this function return True, the document will be added to the feed.
4.2.6. Optimisations¶
- The system should run each steps in parallel to reduce the latency.
- The number of revisions passed to the step 3 and 6 should be large enough to reduce the bandwidth and make sure to reduce the latency.
4.2.7. API Reference¶
HEAD /{db}
– Check Database existencePOST /{db}/_ensure_full_commit
– Ensure that all changes are stored on disk- :get:`/{db}/_local/{id}` – Read the last Checkpoint
- :put:`/{db}/_local/{id}` – Save a new Checkpoint
Push Only¶
PUT /{db}
– Create Target if it not exists and option was providedPOST /{db}/_revs_diff
– Locate Revisions that are not known to the TargetPOST /{db}/_bulk_docs
– Upload Revisions to the TargetPUT /{db}/{docid}
?new_edits=false – Upload a single Document with attachments to the Target
Pull Only¶
GET /{db}/_changes
– Locate changes since on Source the last pull. The request uses next query parameters:GET /{db}/{docid}
– Retrieve a single Document from Source with attachments. The request uses next query parameters:open_revs=revid
- whererevid
is the actual Document Revision at the moment of the pull requestrevs=true
atts_since=lastrev