A blockchain represents a public commitment of ordered data. This data repository can be used to record votes in a variety of ways. Any interested party can access the blockchain to discover all votes, so long as they know how to differentiate votes from other blockchain activity.
Note that blockchain reorganizations or mining denial-of-service attacks can be used to remove or ignore transactions and therefore votes, preventing their inclusion in the blockchain’s main chain. But these attacks cannot be used to change a participant’s vote.
These attacks could affect an election that specifies a moment in time when polls close. However, reorganizations leave strong historical evidence of misbehavior in terms of proof-of-work, and denial-of-service attacks leave strong cotemporal evidence (e.g. any participant actively monitoring the network during the voting will see vote records posted to the network but not included in the blockchain).
One construction6 of a data carrying vote allows any entity to construct an arbitrary transaction that sends to a specific output contract. Spending this output in a new transaction is the act of voting. To satisfy the contract, the voter must push a public key P registered by the voting authority, a signature of the transaction using P, vote identifier and choice data, and the signature of that data using P.
The use of data signatures (BCH’s OP_CHECKDATASIGVERIFY) allows the vote to be safely placed in the transaction’s satisfier script (also called scriptSig) even though the satisfier script is not signed as part of the transaction’s signature.
tallies votes in a manner that admits succinct inclusion proofs
Extend a normal merkle tree at each node N with vote tallies of the sum of all the votes in the subtree rooted at N. The node hashes must concatenate the child hashes as in a normal Merkle tree and the vote tallies to ensure that the tallies cannot be modified without forcing the higher Merkle node hashes to change. The merkle tree root therefore contains the election results.
Providing a more formal construction:
Assume a vote with different yes/no decisions Dn, functions LeftChild and RightChild that return Merkle tree children, H is a cryptographic hash function, and the operator “|” is byte string concatenation:
Tally(N,Di)=Tally(LeftChild(N).Di+RightChild(N).Di)(1)
Merkle Node hash calculation:
H(N)=H(Tally(N,Dx))∣H(LeftChild(N))∣H(RightChild(N))(2)
The merkle tree leaf nodes are the hash of a vote, and participant information (such as the blockchain transaction or transaction hash, public key & signature) P, that proves the validity of the vote:
Tally(P,Di)={1 if vote "yes" for decision D_i0 otherwise(3)
Vn=H(Tally(P,Dx)∣P)(4)
The following diagram illustrates the merkle tree of votes. Vote tallies appear in brackets. For brevity, tallies have been omitted from many nodes, except those that are part of the merkle proof of vote V2, which is indicated in light blue.
So the Merkle proof is “V2[1,0], K[0,1], D[2,0], and C[2,2] at index 2”.
As with normal Merkle trees, specifying index 2 is required to communicate the data concatenation order at each level. To explain, note that the Merkle proof execution for the provided example begins as follows:
x←H(V2[1,0])(a1)
x←H(x∣K[0,1])(a2)
x←H(D[2,0]∣x),...(a3)
But how did the algorithm “know” to switch the order of concatenation in the last operation? The answer is by using the element index.
To explain, note that the path from the root to child nodes are labelled with a 0 or 1. Traversing any path and interpreting these a bits in a number results in the zero based element index of the vote in the tree, in this case its index 2 or 010 binary. Starting with the least significant bit, use each bit to communicate to the prover the concatenation order (e.g. H(current val, next) or H(next, current val)) of the hash at each tree level, ignoring the bottom-most level which does not concatenate any elements. For example the 2nd hash operation (a2, combining J and K) executes H(current value J | K[0,1]) because the first bit is 0, but the 3rd hash operation (a3, combining D and E) executes H( D[2,0] | current value E) – note the different order – because the next bit in the index is a 1. Therefore the index number of the merkle leaf communicates the prover in what order the proof elements must be combined.
An incorrectly specified index would result in the prover hashing data in the wrong order, yielding a different merkle root and therefore a failed merkle proof. So a correct merkle proof also proves a vote’s position in the tree.
Actually, this additional index data can be eliminated if the hash function is commutative5, if there is no other use for it. A simple commutative hash function CH based on cryptographic hash H is CH(x,y) = H(sort(x,y))
Since the total vote count is committed in the merkle tally tree, the merkle inclusion proof also commits to the total vote count. This prevents extra-depth attacks (although this is already prevented if merkle trees are correctly formulated – each vote’s hash MUST be included as the tree leaf, rather than the vote’s raw bytes.
The commitment to the total vote count also prevents ballot stuffing if participants independently know the vote count.
However, its is possible to create a dishonest merkle tally tree (or partial tree), and derive a correct-looking merkle inclusion proof that claims false tallies (by replacing some voters’ votes) in branches that the proof does not provide. But this merkle inclusion proof CANNOT have the same merkle root as the honest tree, and repeated queries for various merkle inclusion proofs on the dishonest tree will probabilistically reveal the dishonesty, since the creator will not be able to provide the full merkle path to non-existent votes.
As a vote progresses, a “merkle mountain range” style approach can be used to show that a voter’s vote is part of an ongoing tally. In the merkle mountain range, multiple merkle power-of-two trees are constructed, and whenever 2 trees contain the same # of elements, those two trees are joined together by making them the children of a new tree of 1 greater depth.
If vote hashes are sorted prior to inclusion in a merkle tally tree, the “merkle mountain range” approach cannot be used. However, there are several interesting properties.
Note that either the “data carrier” or “destination” voting architectures are possible using tokens.
A token shuffle is CoinJoin2, CashShuffle3, or an equivalent algorithm applied to the tokens that confer permission to participate in a vote, rather than currency.
A coin join is a very simple concept so is described briefly here:
A transaction may consist of many inputs and many outputs. Let us propose that many individuals contribute a single input of one token and receive a single output of one token (to a different address from their input) into a single transaction. If the order of the inputs does not correspond to the order of the outputs, and the input and output quantities are all the same, it is impossible to determine which input corresponds to which output. This process can be repeated with different partners to increase the pool of possible entities that may control a particular output.
The traditional coin shuffle suffers from a problem with varying input quantities: Its possible to connect inputs to outputs if participants include different quantities because each participants’ quantity must be preserved. However, this is not a problem for voting tokens since each participant has exactly 1.
Also note that actual construction of the multi-participant transaction may leak identity information or suffer denial-of-service attacks, so implementations are significantly more complex than this conceptual description.
Apply a cryptographic hash to the complete voting information and choice data to create an identifier that probabilistically uniquely identifies the vote session. Participants and network software can compare identifiers to ensure they have the correct data. The larger network can reject unknown identifiers which might notify participants of misbehaving or malicious software. Identifiers should be included in each participant’s vote so that the vote itself affirms its vote on a particular topic with the particular topic data. This prevents both malicious topic data and replay attacks.
The voting registration authority creates a number of fake votes and commits to those votes via a cryptographic hash. The fakes are issued during the election process. When the polls close, the fake votes are revealed and the cryptographic hash proves all the fakes. Revealing partial fakes results in a non-matching hash so it is not possible to affect the election outcome via a partial reveal.
Vote encryption would prevent miner blockchain attacks from denying votes for one candidate from entering the blockchain since the attacker cannot determine the contents of a vote until after the polls close and presumably the blockchain has advanced beyond the point which re-organization is affordable.
TBD
[1]: The Sybil Attack: https://www.microsoft.com/en-us/research/publication/the-sybil-attack/
[2]: CoinJoin: https://bitcointalk.org/?topic=279249
[3]: CoinShuffle: https://www.darrentapp.com/pdfs/coinshuffle.pdf, https://github.com/cashshuffle/spec
[5]: Commutative Merkle Trees: https://medium.com/@g.andrew.stone/tree-signature-variations-using-commutative-hash-trees-4a8a47d4f8ce
[6]: TBD: Dagur & Jorgen voting project & paper
[7]: In conversation with Dagur Valberg Johannsson