https://gdstechnology.blog.gov.uk/2015/10/13/guaranteeing-the-integrity-of-a-register/

Guaranteeing the integrity of a register

Last month we blogged about registers: authoritative lists you can trust. To recap, a register is a canonical source of data, with a clearly-defined custodian. You should be able to trust the integrity of the data that you get from a register.

If you get a record of a company from Companies House, you should be certain that nobody has tampered with the record. Further, you should be able to show it to someone else and demonstrate to them that this is an official record. Similarly, if a restaurant displays a food hygiene rating, you should be able to verify that this rating is genuinely what the Food Standards Agency has recorded for this restaurant. We are exploring mechanisms to achieve this through a digital proof of authenticity.

A digital proof of authenticity allows you to:

  • check that a record truly was created by the correct authority
  • ensure that a registrar never goes back on their word
  • copy a register and validate the integrity of its whole contents

There are a number of ways of achieving this but one we have been exploring is based around Google’s Certificate Transparency project. At its heart, Certificate Transparency depends on the creation of a digitally signed append-only log. The entries in the log are hashed together in a Merkle tree and the tree is signed. The registrar can append to the log by issuing a new signature. Consumers can request proof that a single entry appears in a particular log. Consumers can also request proof that the registrar has not rewritten history which the registrar can easily provide.

A worked example

Suppose that we are building a register of restaurant inspections. First, we collect some data:

ct-log1

This represents five inspection reports of particular establishments on particular dates. Then, we compute secure hashes of these inspections, labelled here a, b, c, d, and e:

ct-log1-hashes

We then combine these hashes into a Merkle tree, where each tree node is a secure hash of the two elements below it. We compute the hash of a and b combined, and call it g; we compute the hash of c and d, and call it h. We continue this process until we have built a single complete tree with a single head node m. Finally, we sign the tree head:

ct-log1-merkle

Creating a digital proof

Roy’s Rolls wants to display its inspection report on its premises. It also wants to present a digital proof of authenticity of the report, so it asks the register to provide one. To produce this proof, it’s worth asking: what is the minimum amount of information you need to validate that the signed tree contains the inspection report of Roy’s Rolls? You shouldn’t need to download all reports from the register and the whole corresponding Merkle tree in order to validate the digital proof for a single inspection.

If you are given the inspection report, you can compute its hash, e. If the register also tells you the value of k, you can hash k and e together to compute m. You can then check if the signature validly signs m. (You can take the value of k on trust because of hash function pre-image resistance.) This means you can ignore most of the tree when checking a single inspection report:

ct-log1-proof

In total, a digital proof of authenticity for the Roy’s Rolls inspection report consists of:

  • The signature (in this case, signature-5)
  • The number of items in the register. From this, you can work out the tree’s size and structure
  • The position of the report within the tree (in this case, it is in position 5). This tells you which path to compute through the tree
  • The intermediate Merkle tree hash values for siblings on the path from the report to the tree head (in this case, there is only one sibling node on the path: k)
  • Using a tree structure means that a single signature can sign the whole register, and the same signature can generate a digital proof on demand for any one entry within the register. A proof has a size in logarithmic proportion to the size of the whole register.

    Adding new reports to a register

    On 12th Feb 2015, Prima Doner is reinspected and found to have improved greatly, from two stars to four. We wish to add this new report to the register. To do this, we want to create a new Merkle tree which includes the new report:

    ct-merkle6

    The new Merkle tree is able to reuse large parts of the previous tree. In particular, everything from k and e downwards from the previous Merkle tree. The new parts of the Merkle tree are coloured in green.

    Keeping your word

    As new inspection reports are recorded, it is important that we don’t forget about the old reports. How can a user have confidence that a register is keeping all of its old records legitimately, and is not rewriting history? They can request a consistency proof, which demonstrates that the new Merkle tree has all of the inspection reports from the old tree, in the same order, with extra reports appended at the end.

    In this example, the consistency proof is a demonstration that the first five reports in the new tree are exactly the same as all five reports from the old tree. To demonstrate this, we need the hashes from the new tree which cover the first five reports; in this case, the hashes are k and e. We can then demonstrate that k and e cover the whole of the old tree:

    ct-merkle5-consistency

    And that k and e cover the first five reports of the new tree:

    ct-merkle6-consistency

    Just like the digital proof of authenticity for a single report, we do not need to download the whole tree to demonstrate the consistency of successive versions of the register.

    Conclusion

    Merkle trees are a promising technology for securing the integrity of a register. They provide efficient operations for generating proofs for single records, for adding new records to a register, and for providing guarantees that data is never lost from the register.

    We still have outstanding questions about integrity. There are some particularly thorny use cases which involve some sort of redaction: eg right to be forgotten, or spent criminal convictions. Balancing the needs of digital proofs with the needs of redaction is a complex subject. There will also be further needs that we haven’t even discovered yet. We are just beginning to understand how to guarantee the integrity of a register.

    You can follow Phil on Twitter, sign up now for email updates from this blog or subscribe to the feed.

    If this sounds like a good place to work, take a look at Working for GDS - we're usually in search of talented people to come and join the team.

5 comments

  1. Comment by Ross posted on

    This seems like a great approach, and a really interesting topic generally - is any of the exploratory code for registers available anywhere? I’d love to contribute.

    I do have a couple of questions about the use of certificate-transparency as the basis for this - clearly merkle trees are the way to go, but I’m also interested in why a blockchain (NOT bitcoin) approach wasn’t taken. Obviously they have problems of their own but given that it’s likely that neither solution is perfect, it would be an interesting read.

    My main concern with certificate transparency is that it doesn’t stop bad actors from entering data into the registry, it just makes it much easier to detect after the fact. The onus would then be on someone to keep checking the state of the registry to look for them. Would a solution where it was not possible to enter bad data at all not be preferable?

    Another question that I have about this approach generally, is ‘Who owns the network?’. With both approaches (blockchain and certificate transparency) ownership of the network is important - is it expected that for a government register all of the nodes in the network will run on government owned equipment? Surely it is safer when the public can take part in the network - albeit painful to have to manage and maintain their own nodes. Apologies if this has already been answered and I’ve missed it.

    Cheers

    Ross.

    Reply
    • Replies to Ross>

      Comment by Raphaelle Heaf posted on

      Hi Ross,

      Blockchains and Merkle trees are both technologies for guaranteeing the integrity of an append-only datastore. It’s exciting that lots of people are seeing the value of append-only datastores with guaranteed integrity.

      The registers we are considering at the moment have a clear custodian: that custodian would be responsible for appending updates to the register. This is distinct from a blockchain-based system where there are multiple actors who can append new updates.

      Reply
  2. Comment by Gareth Small posted on

    Sounds like a fun experiment, but I just don't think most members of the public give a damn whether the Food Hygiene score shown in the window has been doctored or not. You're solving a problem that doesn't exist.

    I think the real reason you're checking integrity of food scores and not something important like passports, is because merkle trees won't tell you when a food score has been superceded by another check, or a passport has been revoked. So it's useless until this is addressed.

    Reply
  3. Comment by J Short posted on

    One other potential "thorny use case" is that consumers generally want to know the most recent version of the report for a given vendor, not just that a given report was genuine at some point. If the Tall Ship drops to a 1 star rating this year, I want my smartphone app to state that the one it's displaying is "false". Also it's not really clear from the example why a standard database available through an open API would not be just as useful: what is the Merkle tree adding?

    Reply
  4. Comment by Fran posted on

    First of all, thank you so much for this blog.

    In relation to this topic, I agree with you that some problems arise in terms of rights to be forgotten and issues like that. Have you explored other approaches such as long term signing and revocation in case you need it?

    Cheers

    Reply

Leave a comment

We only ask for your email address so we know you're a real person