7. Catena: Efficient Non-equivocation via Bitcoin

The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation or to
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. ALIN TOMESCU: My name is Alin. I work in Stata, in
the Stata Center. I'm a PhD student
there in my fifth year. And today we're going
to be talking about one of our research
project called Catena. And Catena is a really
nice way of using bitcoin to build append-only logs. Bitcoin itself is
an append-only log. And a lot of people have been
using it to put data in it. And I'll describe a
really efficient way of doing that and
its applications. And if there is time, we'll talk
about a tax and maybe colored coins and some other stuff. We'll talk about the
what, the how and the why. And that's the overview
of the presentation. So let's talk about this
problem called the equivocation problem.

So what is this? In general, non-equivocation
means saying the same thing to everybody. So for example, if you
have a malicious service and you have Alice
and Bob, the service should say the same
thing to Alice and Bob. So it would make a
bunch of statements. Let's say s1, s2, s3 over
time and Alice and Bob would see all of
these statements. So this is very similar to
what bitcoin provides, right? In bitcoin you see block
one, block two, block three. And everybody agrees on these
blocks in sequence, right? Does that make sense? So this is non-equivocation
and in some sense this is what bitcoin
already offers.

And in general with
non-equivocation, what you might get is
some of these statements might actually be
false or incorrect. Non-equivocation
doesn't guarantee you that this statement is
a correct statement. But it just guarantees
you that everybody sees the same statements. And then they can detect
incorrect statements. In bitcoin you get
a little bit more. You actually know that
if this is a block, it's a valid block, assuming
there are enough blocks on top of it, right? So equivocation means saying
the same thing to everybody. So for example, this malicious
service at time four, he might show Bob a different
statement than Alice. So Bob sees s4 and
Alice sees s4 prime.

This is what happens
in bitcoin sometimes. And that's how you can
double spend in bitcoin by putting the transaction here
sending money to the merchant, and then putting
another transaction here sending money back to you. Right, are you guys
familiar with this? Yeah, OK. All right, so why
does this matter? Let me give you a silly example. Suppose we have Jimmy and we
have Jimmy's mom and Jimmy's dad, right? And Jimmy wants to
go outside and play. But he knows that mom and dad
usually don't let him play. So what he does is he
tells dad, hey dad, mom said I can go outside. Right, and then he
tells mom, hey mom. Dad said I can go outside. And let's say mom and dad
are in different rooms and they're watching soap
operas and they're not talking to one another. So they can actually
confirm that.

You know, mom can confirm
that dad really said that. And dad can really confirm
that mom said that. But they both trust Jimmy. So you see how equivocation
can be really problematic because now mom and
dad will say sure, go outside as long as the
other person said that, right? But let me give you a
more practical example. So let's look at something
called a public-key directory. A public-key directory allows
you to map user's public keys– a user name to a public key. Right, so here I have
the public key for Alice and here I have the
public key for Bob.

And they look up each other's
keys in this directory. And then they can set
up a secure channel. How many of you guys use
Whatsapp, for example? So the Whatsapp server has
a public-key directory. And when I want to
send you a message, I look up your phone
number in that directory and I get your
public key, right? If that directory equivocates,
the following thing can happen. What the directory can do is
it can create a new directory at time two where he puts
a fake public key for Bob and he shows this
to Alice, right? And at time two also, he creates
another directory for Bob where he puts a fake public
key for Alice, right? So now the problem
here is that when Alice checks in this directory,
she looks at her own public key to make sure she's
not impersonated.

And Alice looks in this
version and sees, OK. That is my public key. I'm good. She looks in this version, OK. This is my public key. I'm good. So now I'm ready to
use this directory. And I'll look up Bob and
I'll get his public key. But Alice will actually
get the wrong public key. Does everybody see that? And similarly Bob
will do the same. So Bob will look in his fork
of the directory, right? And he looks up his key
here and his key here. And he thinks he's OK. He's not impersonated. But in fact, Alice has
impersonated there. OK? And now as a result, they
will obtain fake keys for each other. And this man in the
middle, attacker who knows the
corresponding secret keys for these public
keys can basically read all of their
communications. Any questions about this? This is just one example
of how equivocation can be really disastrous. So in a public-key directory,
if you can equivocate, you can show fake
public keys for people and impersonate them. So in other words,
it's really important that Alice and Bob both
see the same directory.

Because if Bob saw
this directory, the same one Alice saw, then Bob
would notice that this is not the public key he had. He would notice his
first public key and then that there's a second one there. And then he would know
he's impersonated. And he could let's say,
talk to the New York Times, and say, look. This directory is
impersonating me. So in conclusion, equivocation
can be pretty bad. So this idea that you
say different things to different people can
be pretty disastrous.

And what Catena does
is it prevents that. So in general, if you have
this malicious service that is backed by Catena, if it
wants to say different things to different people.
it cannot do that. It has to show the same
thing to everybody. And the way we achieve that is
by building on top of bitcoin. And that's what we're going
to be talking about today. So any questions about
sort of the general setting of the problem and
our goals here? So let's move on then. So why does this matter? So this matters for a
bunch of other reasons, not just public-key directories
and secure messaging. It matters because when you want
to do secure software update, equivocation is a problem. And I'll talk about that later. So for example, at
some point bitcoin was concerned about
malicious bitcoin binaries being published on
the web and people like you and me
downloading those binaries and getting our coins stolen.

Right, and it turns out that
that's an equivocation problem. Somebody is equivocating, right? It's equivocating about
the bitcoin binary. It's showing us a
fake version and maybe other people the real version. Secure messaging,
like I said before, has applications here and. Not just secure messaging
but also the web. Like when you go
on Facebook.com, you're looking up
Facebook's public key. And if somebody lies to
you about that public key, you could be going to
a malicious service and you could be giving
them your Facebook password.

Does that make sense? And also it has
applications in the sense that if you have a way of
building a append-only log, you really have a way of
building a blockchain right for whatever purpose you want. And we'll talk
about that as well. So the 10,000 feet
view of the system is we built this bitcoin
based append-only log. And the way to think about
is that bitcoin is already an append-only log. It's just that it's kind
of inefficient to look in that log. If you want to pick certain
things from the Bitcoin blockchain, like you put your
certain bits and pieces of data there, you have to
kind of download the whole thing to make sure
you're not missing anything.

And I'll tell you why soon. So instead of
putting stuff naively in the Bitcoin
blockchain, we put it in a more principled way. And in a sense we get
a log and another log. We get our log and
the bitcoin log. And this generalizes to
other cryptocurrencies. Like you could do
this in light coin, for example, or
even in ethereum. Though I don't
think you guys yet talked about how the
ethereum blockchain works. And the cool thing about this
Catena log that we're building is that the Catena log is as
hard to fork as the bitcoin blockchain. If you want to fork our log,
you have to fork bitcoin.

However, unlike the
Bitcoin blockchain, Catena is super
efficient to verify. So in particular, remember,
I described the log in terms of the
statements in it. So if you have 10
statements, each statement will be 600 bytes to audit. So you don't have to
download the whole Bitcoin blockchain to make sure you're
not missing a statement. And you also have to download
80 bytes per bitcoin block. And we have a Java
implementation of this. And if you guys are curious,
you can go to my GitHub page, and take a look at the code. All right, so before we
start, I know you guys already know a lot about how
the Bitcoin blockchain, but it's important that I
reintroduce some terminology just so we're on the same page. So this is the
bitcoin blockchain. We have a bunch of
blocks connected by hash chain pointers. And we have Merkle
trees of transactions. And you know, these arrows
indicate hash pointers. It means that block n stores a
hash of block n minus 1 in it, right? So in that sense.

Block n has a hash pointer
to block n minus 1. All right, and like I said,
each tree has a Merkle tree. Each block has a Merkle tree. And everybody
agrees on this chain of blocks via proof of work
consensus, which all of you already know. And importantly, in the Merkle
trees, we have transactions. And transactions
can do two things. Right, a transaction can
mint coins, create new coins. So here I have transaction a. It created four coins by
storing them in an output. Are you familiar with outputs? So you guys already discussed
transaction outputs and inputs. So that's what we're going
over here again real quick.

So an output specifies
the number of coins and the public key of
the owner of those coins, as you already know. And the second purpose
of transactions is that they can transfer
coins, right, and pay fees in the process. So here if you have
a transaction b, this transaction b
might say, hey, here's a signature from the
owner of these coins here. Here's a signature
and transaction b. And here's the new
owner with public key b of three of those four coins.

And one of those coins
I'll just give it to the miners as
a transaction fee. Does this make sense? How many of you are with me? All right, any questions
about how transaction inputs and outputs work? So if you remember
in the input here, you have a hash pointer
to the output here. All right. So and the high
level idea here is that the output
is just the number of coins in a public key. And the input is a hash
pointer to an output plus a digital signature from
that output's public key, OK? And yeah, so what
happens here is that transaction b is spending
transaction a's first output. Right, that's the terminology
that we're going to use. And I think you
guys have already used terminology like this. And in addition, what we're
going to talk about today is the fact that in these
bitcoin transactions you can actually embed data.

And I think you touched briefly
on this concept of op return transaction outputs. Right, so this is an output that
sends coins to a public key. But here I can have an output
that sends coins to nobody. It just specifies some data. And in fact, I'll use that
data to specify the statements that I was talking
about earlier. So that the high
level point here is that you can embed data
in bitcoin transactions using these operations. And there's a bunch of other
ways to do it, actually. Like initially
what people did is they put the data
as the public key. They just set the
public key to the data.

And in that sense, they
kind of wasted bitcoins. Right, they said hey,
send these three bitcoins to this public key, which
is just some random data. But nobody would know the
corresponding secret key of that public key. So therefore those
coins would be burned. Did you did you cover
this in class already? Maybe, maybe a little bit. So but that's kind
of inefficient because if you
remember, the miners have to build this UTXO set.

And they have to
keep this output that has this bad public
key in their memory forever because nobody's
going to be able to spend it. So we don't want to do that. We want to build a nice system,
a system that treats bitcoin nicely, the bitcoin miners. So that's why we use
these return outputs. All right, so the
high level here is that Alice gives
Bob three bitcoins. And the miners collect
a bitcoin as a fee. And of course, you can
keep doing this, right? Like Bob can give Carol these
two bitcoins later by creating another transaction with an
input referring to that output and with an output specifying
Carol's public key.

All right, and the high level
idea of bitcoin is that you don't– you cannot
double spend coins. And what that means is that
a transaction output can only be referred to by a
single transaction input. So this thing right here in
bitcoin where I have two inputs spending an output
cannot happen, right? How many of you are
familiar with this already? OK, good. So actually this is
the essential trick that Catena leverages. And we'll talk about it soon. And yeah, the
moral of the story, the reason I told you all
of this is because you know, basically if you have proof of
work consensus in the bitcoin sense, you cannot
do double spends. So this thing right
here, that I said before, just cannot occur in the Bitcoin
blockchain unless you break the assumptions, right, unless
you have more mining power than you should. In that case, you either
have to tx2 or tx2 prime, but not both, right? And what Catena
really realizes is that if I put statements
in these transactions now, what that means is that I can
only have a second statement.

I cannot have two
second statements. I cannot equivocate about the
second statements if I just restrict my way of issuing
statements in this way. I put the first statement
in the transaction and the second
statement I put it in a transaction that
spends the first one. So does everybody agree that
if I do things in this way and I want to equivocate
about the second statement, I would have to double spend? So that's the key insight
behind our system. Any questions about this so far? You know it's really hard to
talk if you guys talk back, it's way easier.

Yeah? AUDIENCE: A question. And maybe, I think I'm
getting this right. If you're setting up
the first transaction, then you're adding data on? You're burning bitcoins
every time you're doing it, so at some point,
you're going to run out. ALIN TOMESCU: That's
an excellent question. So let's– we'll go over that. But the idea is
that in this output, I won't burn any bitcoins. I'll actually specify
my own public key here. And I'll just send the
bitcoins back to myself. And in the process I'll pay a
fee to issue this transaction. Does that make sense? Yeah? And we'll talk about it later. OK, so real quickly,
what did previous work do regarding this? So how many of you are
familiar with blockstack? 1, 2, only two people? OK how many of you are
familiar with Keybase? OK, so blockstack and Keybase
actually post statements in the Bitcoin blockchain.

And they're both
public directories. They map user names
to public keys. And for example, Keybase,
what Keybase does is they take this
Merkle route hash, and they put it in
the transaction. And then six hours later,
they take the new route hash, and they put it in another
transaction, and so on. Every six hours, they
post the transaction. But unfortunately, they don't
actually do what Catena does. So they don't have their new
transaction spend the old one. And as a result,
if you're trying to make sure you see all of
the statements for Keybase, you don't have a lot
of recourse other than just downloading each
block and looking in the block for all of the
relevant transactions. Another thing you could
do is you could sort of trust the bitcoin miners– the bitcoin full nodes to
filter the blocks for you. So you could contact a
bunch of bitcoin full nodes and say, look. I'm only interested
in transactions that have a certain IP
return prefix in the data. And they could do that for you. But unfortunately,
bitcoin full nodes could also lie to
you very easily.

And there is no cost
for them to lie to you. Everybody can be a
bitcoin full node. So then it becomes a very
bandwidth intensive process because you have to
ask a lot of full nodes to deal with the fact that
someone might lie to you. But you either need to
download full blocks to find let's say, a missing
statement like this one that a bitcoin full node
might hide from you, or you can trust the majority
of bitcoin full nodes to not hide statements, which
is not very good, right? So I don't want to
trust these full nodes because all of you guys
could run a full node right now in the bitcoin network. It doesn't cost you anything. And if I talk to your malicious
full node, I could be screwed.

So our work just says, look. Instead of issuing
transactions in sort of an uncorrelated fashion,
just do the following thing. Every transaction you issue
should spend the previous one. So as a result, if someone
wants to equivocate about the third
statement, they have to double spend
like I said before. Right? Yeah? AUDIENCE: So what's
the connection back to Keybase and blockstack again? ALIN TOMESCU: Yeah, so
Keybase and blockstack are public directories.

And what they do is they want
to prevent themselves from– is there a whiteboard
I can draw here? Yeah so, let's say, you know,
if you remember the picture from the beginning, I can
have a public directory that evolves over time, right? So this is v1 of the
directory, and it might have the right public
keys for Alice and Bob. But at v2, the
directory might do this. It might do v2. It might have Alice, Bob, but
then put a fake key for Alice.

And at V2 prime, it might
do Alice, Bob as before, and put a fake key for Bob. So in other words it keeps
the directory append-only. But it just adds fake public
keys for the right people in the right version. So here this one
is shown to Bob. And here this one
is shown to Alice. So now Alice will use
this fake key for Bob. So she'll encrypt a message
with b prime for Bob. And now the attacker can
easily decrypt this message because he has the secret key. So what the attacker can
do then is re-encrypt it with the right
public key for Bob and now he can read
Alice's messages.

And the whole idea is
that the attacker– you know, this b prime is the
public key, is pk b prime, let's say. But the attacker
has this sk b prime. He knows sk b prime because
the attacker put this in there. The attacker being really,
blockstack or Keybase. And of course, they're
not attackers in the sense that they want to be good guys. But they're going to be
compromised eventually. So they want to
prevent themselves from doing things like these. Is that sort of
answer your question? Yeah. All right, so yeah, a
really simple summary. If I had two slides to summarize
our work, this would be it. Right, they would be these two. Look, don't do things this way. Do them this way. All right, so let's see,
let's look a little bit at the design. So remember we have
these authorities that could equivocate about
statements they issued like blockstack and Keybase.

So what we propose is
look, these authorities can run a lock server,
a Catena lock server. And they start with some
funds locked in some output. And what they can
do first is they can issue this
genesis transaction to start a new lock. So for example, Keybase would
issue this genesis transaction starting the log of their
public directory Merkle routes. And this genesis
transaction can be thought of as the
public key of the log. Once you have this
genesis transaction, you know it's transaction ID. You can verify any
future statements, and you can implicitly prevent
equivocation about statements. And what the lock
server is going to do is it's going to
take these coins and send them back to the
server to answer your question.

So if there was a public
key, if these coins are owned by some
public key, they're just sent back to
same public key here and paying some
fees in the process. Right, so we're not
burning coins in the sense that we're just
paying fees that are miners, which we have to do. OK, so now what you
can do is if you want to issue the
first statements, you create a transaction. You send the coins from this
output to this other output. You pay some fees
in the process. You put your statement
in an op return output. And as a result, if this lock
server wants to equivocate, it has to again,
double spend here, which it cannot do unless
it has enough mining power.

So Keybase and
blockstack, if they were to use a system
like this, they could prevent themselves
from equivocating. And this can keep going, right? So you issue
another transaction. Spend the previous output. Put the new statement. Yes? AUDIENCE: This doesn't
seem like a new problem. How have authorities prevented
equivocation in the past? ALIN TOMESCU: This doesn't
seem like a new problem. How did authorities do it? The problem is not new. The problem is eternal. So you are correct there. How did they do it in the past? They just used a Byzantine
consensus algorithm. So in some sense this is what
we're doing here as well. We're just piggybacking
on top of Bitcoin's Byzantine consensus algorithm. AUDIENCE: So you're rolling down
to a newer Byzantine consensus algorithm basically. ALIN TOMESCU: Sure, I'm not
sure what rolling down means. But yeah, we're piggybacking
on top of bitcoin. The idea is that look,
a byzantine consensus is actually quite
complex to get right. We already have a publicly
verifiable business consensus algorithm. It's bitcoin. Why can't we use it to verify,
let's say, a log of statements super efficiently? So up until our work, people
didn't seem to do this.

So Keybase didn't do this. Blockstack didn't do this. They kind of forced it to
download the entire bitcoin block chain to verify let's
say, three statements. So in our case, you only have
to download a few kilobytes of data to verify
these three statements, assuming you have the
bitcoin block headers, right, which we think is
a step forward. And it's sort of
like the right way to use these systems
should be you know, the efficient way,
not the inefficient way. Because bandwidth
is expensive, right? Computation is cheap. Bandwidth is expensive. All right, so anyway the idea is
that if the lock server becomes malicious, if Keybase or
blockstack gets hacked, they cannot equivocate
about the third statement.

They can only issue one
unique third statement. And the advantages
are, you know, it's hard to fork this lock. It's hard to equivocate
about third statement. But it's efficient to verify. And I'll walk you through
how clients verify soon. The disadvantages
are that if I want to know that this is the
second statement in the log, I have to wait for
six more blocks to be built on top of this
statement's block, right? Just like in bitcoin, you
have to wait for six blocks to make sure a transaction
is confirmed, right? Why do you do that? The reason you do that
is because there could be another transaction
here, double spending because sometimes there are
accidental forks in bitcoin and things like that.

What are some
other disadvantages that you guys can point out? Yeah? AUDIENCE: It's going
to be expensive. ALIN TOMESCU: It's going
to be expensive to issue these trends. So Alin– Alin and I
share the same name. So Alin is pointing
out that you know, every time I issue
these statements, I have to pay a fee, right? And if you remember, the
fees were quite ridiculous in bitcoin. So that's a problem, right? So let's see, is
that the next thing? The next thing was
you have to issue– you can only issue a statement
every 10 minutes, right? So if you want to issue
statements really fast. you can't do that. All right, like Alin said you
have to pay bitcoin transaction fees. And the other problem is
that you don't get freshness in the sense that it's kind
of easy for this lock server to hide from you the
latest statement. You know, unless you have
a lot of these log servers and you ask many of them, hey,
what's the latest statement? And they show you back
the latest statement.

If there is just one log
server and it's compromised, it could always
pretend no, no, no. This is the latest statement. And if you don't trust it,
the best recourse you have is to download the full block
and look for the statement yourself. So you don't get freshness. Those are some disadvantages. Now let's look at how
clients audit this log? So I was claiming that
it's very efficient to get these statements
and make sure that no equivocation happened. So let's say, you
have a Catena client and you're running on your
phone with this client. And your goal is to get
that list of statements.

And there's the Catena log
server over there in the back. And there's the
bitcoin peer to peer network which at the moment
has about 11,000 nodes. And remember, I said
the first thing you need is the genesis transaction. Does everybody
sort of understand that if you get the wrong
Genesis transaction, you're completely
screwed, right? Because it's very
easy to equivocate if you have the wrong
Genesis transaction, right? I mean, you know there's the
right GTX here where you have, let's say, a s1.

And then you have s2 in their
own transactions, right? But if there's another GTX prime
here and you're using that one, you're going to get s1 prime,
s2 prime, different statements. So if Alice uses GTX
but Bob uses GTX prime, Alice and Bob are
back to square one. So in some sense, you might ask,
OK, so then what's the point? What have you solved here? I still need to get
this GTX, right? So what we claim is that this is
a step forward because you only have to do this once. Once you've got
this GTX, you can be sure you're never
equivocated to, right? Whereas in the
past, you would have to for each
individual statement, you'd have to do
additional checks to make sure you're
not being equivocated to, like you would have to
ask in a full node, let's say.

Right, so as long as you have
the right GTX, you're good. And how do you
get the right GTX? Well, usually you ship it with
the software on your phone. And there's some
problems there as well, like there is no problem
solved in computer science in some sense. But you know, we're trying
to make progress here. OK, so let's say you have
the right GTX because it got shipped with your software. Now the next thing you want to
do is get the block headers. So you have header
i, but there are some new headers being posted. Let's say the bitcoin peer to
peer network sends them to you. You have these headers.

You verify the proof
of work, right? So this only costs you 80
bytes per header, right? Does everyone see that
this is very cheap? So far, so far I have the GTX,
which is let's say 235 bytes. And now I'm downloading
some headers. And now I'm ready to ask
the log server what's the first statement
in the log, right? And what the log
server will do is he's going to reply
with the transaction with the statement, which is
600 bytes and the Merkle proof. So all of this is
600 bytes, actually. Right, and now what the
Catena client will do is he'll check the
Merkle proof against one of the headers so to see in
which headers does it fit. And then he'll also
check that the input here has a valid signature from the
public key in the output here.

All right, I want at least
one question about this. Yeah? AUDIENCE: Sorry I came in
late, but is the Catena client over there similar to the SPV? ALIN TOMESCU: Yeah,
so that exactly. It's an SPV client. Yeah, so the idea is
that we want SPV clients. We don't want these
mobile phone clients to download 150
gigabytes of data. We want them to download let's
say 40 megabytes worth of block headers, which they can discard
very quickly as they verify. And then we want
them to download 600 bytes per
statement, but still be sure that they saw all of
the statements in sequence, and that there was
no equivocation. Yeah? AUDIENCE: So in our
previous classes, we discussed, if you can have
a full node and an SPV node, do these sort of vulnerabilities
exist with the [INAUDIBLE] client? ALIN TOMESCU: So
which vulnerabilities that you did talk about? AUDIENCE: I forget. ALIN TOMESCU: Did
you talk about– AUDIENCE: There was just a
box that was like less secure. And then there was another box
that was something else bad. There was one about [INAUDIBLE]. The clients would
lie to you about– If you say, here's
some transactions or here are some
unspent outputs, then they could just tell
you something different.

ALIN TOMESCU: Yes. Yeah? Sorry. AUDIENCE: Well,
they can't tell you that transactions
exist or don't exist. They can just not tell you– ALIN TOMESCU: Yeah, they
can hide transactions from you, which gets back
to the freshness issue that we could discussed. They could also–
this block header could be an header
for an invalid block. But remember, that before
you accept this tx1, you wait for enough
proof of work, you wait for more block
headers on top of this guy to sort of get some
assurance that no, this was a valid block because a
bunch of other miners built on top of it.

Right? So as long as you're willing
to trust that the miners do the right thing, which
they have an incentive to do the right thing,
you should be good. But like you said,
what's your name? AUDIENCE: Anne. ALIN TOMESCU: Anne? Like Anne said, there are
actually bigger problems with SPV clients. And if there is time,
we can talk about it. But it's actually easier to
trick SPV clients to fork them. And there's something called a
generalized vector 76 attack. Have any of your guys
heard about this? So it's like a pre-mining
attack but it's a bit easier to pull off. Actually, it's a lot easier
to pull off on an SPV node than on a full node.

And if there's time at the
end, we can talk about it. If there isn't, you
can read our paper, which is online on my website. And you can read about
these pre-mining attacks that work easier for SPV nodes. But anyway, this can
keep going, right? You get block headers,
80 bytes each. You ask the log
server, hey what's the next statement in the log? You get a Merkle proof
in a transaction.

And then you verify
the Merkle proof. You put this transaction
in one of these blocks and you verify that it
spends the previous one. Right, and as a
result, you implicitly by doing this verification,
by checking that hey, this is a transaction
and a valid block. This block has enough
stuff built on top of it and this transaction
spends this guy here, you implicitly
prevent equivocation. Right, then you don't have
to download anything else. Right, you only have to
download these Merkle proofs and transactions in
these block headers. Whereas in previous work, you
could be missing these s1's, these s2's, they
could be hidden away in some other branch
of the Merkle tree. And you'd have to do
peer to peer bloom filtering on the full nodes. And those full nodes
could lie to you.

Yeah, so the bandwidth
is actually very small. So suppose we have
500k block headers– I think bitcoin has a
bit more right now– which are 80 bytes
each and we have 10,000 statements in this
log, which are 600 bytes each. Then we only need now
with 46 megabytes, right? What's the what's the
other way of doing it? You have to download
hundreds of gigabytes. All right, so let's talk about
scalability a little bit. So suppose the system
gets deployed widely. Let's say Whatsapp starts to
use the system to witness– to publish their public
directory in bitcoin, right? And everybody, a lot of
you here have Whatsapp. And this is you guys. Let's say there are 200,000
people using Whatsapp. I think there's
more like a billion. So what are they going to do? Remember that part of
the verification protocol is asking for
these block headers from the peer to
peer network, right? And in fact, if
you're SPV clients, you usually open up around
eight connections to the peer to peer network.

And if you're a full node in the
bitcoin peer to peer network, you usually have around
117 incoming connections. That that's how
much you support. You support 117 incoming
connections as a full node. So that means in total,
you support about a million incoming connections. So you know, this guy
supports a million. But we need about 1.6
million connections from these 200,000
clients, right? So it's a bit of
a problem if you deploy Catena and it
becomes wildly popular. Being a bit optimistic
here, but you know, let's say that happened, right? So how can we fix this? How can we avoid this problem
because in this case, what we would basically
be doing is we would be accidentally
DDOSing bitcoin. And we don't want to do that. Does everybody see that there's
a problem here, first of all? OK, so the idea is very simple. We just introduced something
called a header relay network. And what that means is
look, you don't really have to ask for these block
headers from the Bitcoin peer to peer network.

You could just outsource
these block headers anywhere because they're
publicly verifiable. Right, the block headers
have proof of work on them. So you can use volunteer
nodes that sort of push block headers to whoever
asks for them. You could use blockchain
explorers like blockchain.info, right? You could use Facebook. You could just post block
headers on Facebook. Right, like in a Facebook feed. You could use Twitter for that. You could use GitHub gists. You know, so you could– there's a lot of ways
to implement this. The simplest way is
have servers and have them send these headers
to whoever asks for them. So it's easy to
scale in that sense because if you now
ask these header relay network for the
block headers, you know, it's much easier to scale
this than to scale the bitcoin peer to peer network, which
has to do a bit more than just block headers.

They have to verify blocks. They have to verify signatures. Yeah? AUDIENCE: Did you
consider having the clients being peer to peer? ALIN TOMESCU: ,
Yes so another way. And I think we'll talk
about that in the paper– is you can implement the header
relay network as a peer to peer network on top of the clients. Yeah, so that's
another way to do it. There's some subtleties there
that you have to get right. But you can do it, I think. Yeah? AUDIENCE: Would you you expect
that if a company like Whatsapp decided adopt Catena, they
would run their own servers to make sure that there
was the capacity for it? ALIN TOMESCU: For the
header relay network? AUDIENCE: Yes. ALIN TOMESCU: I mean, I
would be just be speculating, wishful thinking. They could. There is some problem with this
header relay network as well. And we talk about
it in the paper because this had a really
network could withhold block headers from you. So you do have to distribute it. Like usually you don't
want to just trust Whatsapp who's also doing
the statements, who's also pushing the statements
in the blockchain.

You don't want to trust
them to also give you the block headers. You actually want to fetch
them from a different source that Whatsapp
doesn't collude with. Yeah, Anne, you had a question? AUDIENCE: Are there other
header relay networks that are deployed already? ALIN TOMESCU: Yeah, there
was actually one on ethereum. There's a smart
contract in ethereum, I think, that if you submit
bitcoin block headers to it, you get something back. And then you can just
query bitcoin block headers from the ethereum blockchain. Is anyone familiar with this? So I guess I didn't
include it here. But another way to do it is
to just publish the header is in an ethereum
smart contract. Yeah. So there's crazy ways
you could do this too. Yeah? AUDIENCE: Will that
one go out of gas? I don't know, my understanding
of ethereum is pretty decent, but would that smart contract
eventually run out of gas and not publish anymore? ALIN TOMESCU: So to fetch from
it, you don't need to pay gas. But I suspect to push
in it, I actually don't know who
funds that contract.

So I guess you fund that
when you push maybe. Maybe not because then you
also want something back. Why would you push? I'm not sure. But we can look at it after. Yeah. It's a good question. So anyway, even if this header
relay network is compromised, if you implement it
in the right way, you've distributed on a
sufficient number of parties, you can still get all of the
properties that you need to, meaning freshness for
the block headers. That's really the only
property that you need to. The header relay
network should always reply with the
latest block headers. So let's look at costs since
Alin was mentioning the costs. So to open a statement, you have
to issue a transaction, right? And the size of our transactions
are around 235 bytes. So and the fee as of December
13 was $16.24 for transactions if you guys remember
those great bitcoin times. I think it went up to
$40 at some point too. So it was it was really hard
for me to talk to people about this research back then.

But guess what,
the fees are today? So today, this
morning I checked. And there were $0.78, right? When we wrote the paper,
they were like $0.12. So you know, here I
am standing in front of you pitching our work. In two minutes they could be
back to $100, but who knows? Yes, you had a question. AUDIENCE: Maybe I'm wrong,
but in a transaction, you can have some outputs. Can you have several
statements in there? ALIN TOMESCU: So
that's a good question. So can you batch statements? And then the answer is yes. You can definitely
batch statements. What we've said so far
is in a transaction– in a Catena transaction,
you have this output.

And you have this
op return output where you put the
statement, right? And you know, it spends
a previous transaction. But as a matter of fact, what
you can do and some of you may already notice this. There is no reason to put
just one statement in here. There is some reason–
so the only reason is that it only fits 80 bytes. So you could put easily, let's
say, two or three statements in there if you hashed them
with the right hash function. But a better way to do
it is why didn't you put here a Merkle root hash? And then you can have
as many statements as you want in the leafs
of that Merkle tree, right? In fact, here you could
have I don't know, billions of statements.

So keep in mind,
you will only be able to issue billions of
statements every 10 minutes. But you can definitely
have billions of statements in a
single transaction if you just batch them. So now, remember the blockchain
will only store the root hash. This Merkle tree will be stored
by the log server perhaps or by a different party. They don't have to
be the same party. Does that make sense? Does that answer your question? Yeah. Right, that was my next point. Statements can be batched
with Merkle trees. OK, so let's talk about
the why since so far we've been talking abstractly
about these statements. But what could these
statements actually be. So let's look at a
secure software update. So how do you do
secure software update? An example attack on a
software update scheme is that somebody compromises
the bitcoin.org domain. And they change the bitcoin
binary to a malicious binary. And they wait for people to
install that malicious binary. And then they steal their coins. They steal their data.

They could execute
arbitrary code. And an example of this was
sort of this binary safety warning on the bitcoin website. At some point, they
were very concerned that a state actor is going
to mess with the DNS servers and redirect clients
to a different server and make them download
a bad bitcoin binary. So does the attack make sense? Does everybody see
why this is possible? You do need to sort of accept
that the DNS service that we currently have on the internet
is fundamentally flawed.

It's not built for security. So the typical defense the
typical defense for this is that the Bitcoin developers–
they sign the bitcoin binaries with some secret key. And they protect
that secret key. And then there's a
public key associated with the secret key that's
posted on the bitcoin website, right? And maybe some of you'll
notice that sometimes it's also very easy to change the
public key on the website, you know, if you
can just redirect the victim to another website. And another problem is that not
everyone checks the signature. Even if let's say you have the
public key on your computer, you know what the
right public key is, only if you're
like an expert user and you know how to use GPG,
you will check that signature, right? And the other problem
that's probably I think a much, much
bigger problem is that for the bitcoin
devs themselves, it's very hard to detect if
someone stole their secret key.

Like if I'm a state actor
and I break your computer and I steal your secret key, I
will sign this bitcoin binary. And I'll give it to
let's say, one guy. I'll give it to you. You know, and I'll just
target you individually because I'm really– I know you have
a lot of bitcoin. Right, and then the bitcoin devs
will never find out about it unless you know you kind
of realize what happened. Then you take your
bitcoin binary and you go with it
to the bitcoin devs. And they check the signature
on it and they say, oh wow. This is a valid signature
and we never signed it. So somebody must have
stolen our secret key. So this is really a bad– kind of the core
of the problem is that it's hard for
whoever publishes software to detect that
their secret key has been stolen– to detect
malicious signatures on their binaries. Does that make sense? So the solution, of course,
is you know, publish the signatures of bitcoin
binaries in a Catena log So now if there's a malicious
binary being published by a state actor, people
won't accept that binary unless it's in the
Catena log, which means people in the bitcoin
devs will see the same binary.

Right, so let me let me show
you what I mean with a picture. So we have this Catena
log for Bitcoin binaries. And let's say, the
first transaction has a hash of the bitcoin
0.001 tar file, right, the bitcoin binaries. And this hash here
is implicitly signed by the signature in
this input because it signs the whole transaction. So now if I put this
hash in a Catena log, I get a signature
on it for free. And now, if let's say, a state
actor compromises the log server, gets the
secret key, he can publish this second malicious
binary in the log, right? But what that malicious
state actor will want to do, he will want to hide this
from the Bitcoin devs and show it to all of you guys.

All right, so he'll
want to equivocate. So as a result, he will want to
create a different transaction with the right
bitcoin binary there, show this to the bitcoin devs
while showing this to you guys. All right, so the bitcoin
devs would think they're good. This is the binary they wanted
to publish while you guys would be using this malicious binary
published by the state actor. Of course this cannot happen
because in Catena you cannot equivocate. Right? Does everybody see this? Right, any questions about this? There has to be a
question on this. No? So this mechanism is called
software transparency. It's this idea that rather
than just downloading software like a crazy person from the
internet and installing it, we should just be
publishing these binaries in a log that everybody can see,
including the software vendors that created those binaries.

So in this way, if somebody
compromises a software vendor, that vendor can
notice that in the log there's a new version
for their software that they didn't publish. So you know, this isn't to say
that it'll prevent attacks. You know, what a state
actor can do anyway is they can just do this. They can post this
H2 prime in here, show it to you guys
including the bitcoin devs and still screw everyone over.

But at least these attacks
then go undetected anymore. All right, so it's a step
forward in that sense. Yes, so the idea
is that you have to double spend to equivocate. And the other
example that really– the reason I wanted
to start this research had to do with public
key distribution. So let's say we have
Alice and we have Bob. And they both have
their public keys. And I'm using this
letter b to denote Bob and his public
key and the letter a to denote Alice
and her public key. And they have their
corresponding secret keys, right? And Alice and Bob, they want
to chat securely, right? So they want to set
up a secure channel. And there's this directory
which stores their public keys. So this guy stores
Alice, pk Alice.

This guy stores Bob, pk Bob. All right, and the directory
gets updated over time, maybe Karl, Ellen
and Dan registered. And if you have
non-equivocation, if the attacker wants to
impersonate Alice and Bob, he kind of has to put
their public keys, the fake public keys
in the same directory, which means that when
Alice and Bob monitor– they check their
own public keys, they both notice they've
been impersonated, right? So again, the idea is
that you can detect. Now how can this
attacker still trick Alice to send an
encrypted message to Bob with Bob's fake public key? Is there a way even if
you have non-equivocation? So what's the attack? Even I have
non-equivocation and I claim that the
attacker can still get Alice to send a fake,
an encrypted message to Bob that the attacker can decrypt. What should the attacker do? So pretend that we are
here without the ability to equivocate.

So the attacker
cannot equivocate. But I claimed that
the attacker can still trick Alice into
sending a message to Bob that the attacker can read. So now it's time to see if
you guys paid attention. Somebody? Alin? Oh, you? AUDIENCE: Does the attacker
have to have the secret key? ALIN TOMESCU: No, no. He does not. Yeah. He does not have to
have the secret key. The attacker just
creates fake public keys. Here's a hint. AUDIENCE: If you only
changed one person to choose, they don't know that it's a
fake key so they could send it to a fake key for a bit? ALIN TOMESCU: Yeah, so
whose person should they attack or change the key for? AUDIENCE: Like if they
change Bob's, Alice will still think
Bob's is correct so she'll send it to the
fake Bob until Bob checks it.

ALIN TOMESCU: Exactly. So that's exactly right. So what's your name? AUDIENCE: Lucas. ALIN TOMESCU: Lucas. So what Lucas is saying is
look, even without equivocation, I had this directory at T1. I had another one at T2. But at T3 and both of these had
keys for Alice and Bob, right? But at T3, Lucas is saying look,
just put the fake key for Bob here. And that's it. Don't put a fake key for
Alice there, just for Bob.

And now when Alice looks up
this public key for Bob here, she sends a query to
the directory hey, what's Bob's public key? She gets back b prime,
which is equal to Bob pk Bob prime, right? Alice can't tell if that's
really Bob's fake public key. That's the reason she's
using the directory in the first place. She wants sort of a trustworthy
place to get it from. Bob can tell if Bob looks. But by the time Bob looks,
it might be too late. Alice might have already
encrypted a message, right? So again, what's the point
of doing all of this? It's not like you're
preventing attacks, right? And the point of
doing all of this is that you get transparency. Bob can detect, whereas
right now Bob has no hope. In fact, so you said, a
lot of you use Whatsapp. So you know in
Whatsapp, if you really want to be sure, so I
have a conversation here with Alin Dragos.

So, Alin, do you what to
to bring your phone here? Do you have Whatsapp? So if you really want
to be sure that you're talking to the real Alin
and not some other guy, you have to go on
this encryption tab. Can you tape this? So my phone is black and white. It's going through
a depression phase. I apologize. So you have to go here and
there's a code here, right? And Alin, can you
do the same thing? You know what I'm talking about? I really hope I don't get some
weird text message right now with the camera on the phone.

OK, so now with Alin's
phone, is that the same code? Can somebody tell me? I can't see it. AUDIENCE: It's hard to see. ALIN TOMESCU: OK, so
we have 27836 and yeah. So it's the same code. Can you see it on the camera? So now because we have
the same code here. With this code
here, it really is a hash of my public key
and Alin's public key. And if we've got the same hash
of both of our public keys, then we know we're
talking to one another. But we won't really know that's
the case until we actually meet in person and do
this exchange, right? So what this system
does instead is it allows Alin to check
his own public key and it allows me to
check my own public key. This way if we check
our own public key, we'll always know when we're
impersonated even though I might send an encrypted
message to the wrong Alin, Alin will eventually find out. It's a bit confusing because
we're both called Alin.

So does that sort
of makes sense? Yeah? AUDIENCE: So what do
you do when you realize that your key is the wrong key? ALIN TOMESCU: Good question. So that's really the
crucial question. What the hell can you do? Right, so this directory
impersonated you. In fact, you can't even– here's the problem. If you're Bob here and you
see this fake public key. And you go to the New
York Times and you say hey, New York Times,
this Whatsapp directory started impersonating me. And the New York Times
can go to the directory and say, hey directory, why
did you impersonate Bob? And the directory can say,
no, I did not impersonate Bob.

Bob really just ask
for a new public key. And this was the public
key that Bob gave me. And it's just a he said, they
said, kind of a thing, right? So it's really a sort
of an open research area to figure out what's the right
way to whistle blow here. So for example, one project
that we're trying to work on is there a way to track
the directory somehow so that when he does
stuff like this, you get a publicly verifiable
cryptographic proof that he really misbehaved, right? No, there's no
cryptographic proof. The fact that that
public key is there could have come from
the malicious directory or could have come
from an honest Bob who just changed his public key. So yeah. So again, a step forward
but we're not there. You know, it's just that there's
much more work to do here. And I think you
also had a question. AUDIENCE: Can't Alice just
ask Bob if this is you? ALIN TOMESCU: So that's a
chicken and an egg, right? So we have Alice.

We have Bob. And we have the attacker. Alice asks Bob, is
this your public key? You know, let's say, b prime. Let me make this more readable. So attacker, right? So Alice asks, hey Bob. Is this b prime your public key? The attacker changes it. Hey Bob, is b your public key? The attacker– Bob says yes. Attacker forwards
yes to Alice, right? Remember, Bob and Alice
don't have a secure channel. That's the problem we're trying
to solve with this directory, right? So the attacker can always
man in the middle people. Well, people like Alice and Bob.

If the attacker can man
in the middle everything, then there's really no hope. And we're living in a very sad,
sad world if that's the case. Yeah, it actually might
be the case but we'll see. Anyway, so yeah, so I claim here
that this is a step forward. But there's still
much work to do. All right, so we
get transparency. Bob can detect. He'll know. He won't be able to
convince anybody. But if a lot of Bobs
get compromised, you're still like in a
place where everybody knows that something's off and
will all stop using Whatsapp, for example. Right? By the way, Whatsapp
is a great tool. You should continue using it. I'm just saying it's
difficult to use right like if somebody really
wants to target it, they can play a lot of
tricks to still trick you. So yeah, so all right, we
already talked about this. And yeah, again if the director
can equivocate, then you know, all bets are off because now
Bob will look in this directory, he'll think he's
not impersonated.

Alice will look
in this directory. She'll think she's not
impersonated, right? So the reason we
started this research is because I really want to– my thesis is on building
these directories that are efficiently auditable and
have a hard time impersonating people. So that's why we decided
to look at how could you do this with Bitcoin. Yeah, so of course there's
one project called KeyChat that we're working on with
some high school students.

And we're using the key
based public key directory. And we're witnessing
it in a Catena log so that stuff like
that doesn't happen. OK, so now let's talk
about the blockchains. In general, people
nowadays like to say I need a blockchain for x, right? I need a blockchain for
supply chain management. I mean a blockchain
for cats, for whatever. I've heard a lot
of crazy stories. IOT, self-driving
cars, blah, blah, blah. And I think the right way
to think about blockchain is to never ever say
that word unless you use quotes, first of all. And second of all, to
understand what Byzantine state machine replication is. Right, and if you understand
what Byzantine state machine replication is or a
Byzantine consensus, you understand blockchain. And you understand all the hype. And then you can make some
progress in solving problems. In the sense that
what is blockchain? So what we're
doing here is we're doing a Byzantine
consensus algorithm.

We're agreeing on a
log of operations. Right, by the way, that's
what Catena does too by piggybacking on bitcoin. It agrees on a lot on
a log of operations. Right, but the other thing that
SMR or Byzantine consensus does is that it also
allows you to agree on the execution of
the ops in that log. So in Catena, you don't
agree on the execution, you just agree on
the statements. But there is no execution of
those statements in the sense that you can't build
another bitcoin on top of bitcoin in Catena
because you can't prevent double spends of
transactions that are Catena statements, right? Like the Catena statements,
you have to look in each one and tell if it's correct. So to detect a double spend in
a Catena backed cryptocurrency, you would have to download
all of the transactions and because you cannot execute
it like the bitcoin miners do and build this UTXO set.

I'm not sure this is
making a lot of sense. But let's put it another way. In bitcoin, you have
block one and then you have block two, right? And there's a hash pointer and
there's a bunch of transactions here, right? And remember that what prevents
me from double spending something here– I can have two
transactions in this block that double spend
the same one here. What prevents me from doing
that is exactly this execution stage, right? Because in the execution stage,
when I try to I execute this first transaction and I
mark this output as spent, when I execute the
second transaction, I cannot spend that
output anymore, right? In Catena, you can't do anything
like that with the statements. You just agree on
the statements.

In Catena, you would
put this transaction in the log, this one and then
that one and someone would have to detect that the second
one is a bad one by actually downloading it. That's kind of what
I'm trying to say here. So in general,
the way should you should be thinking
about blockchain is through the lens of
Byzantine consensus. And that'll get you
ahead of the curve in this overly
hyped space, right, because it's really just this. You're agreeing on
a log of operations. And then you're agreeing on the
execution of those operations according to some rules. The rules in bitcoin
are transaction cannot– there cannot be two inputs
spending the same output more or less. There's other things too. What that gives you
is it allows you to agree on a final state
which in bitcoin are the valid transactions.

That's the final state right. In ethereum, for
example, the final state are the valid transactions,
and the account balances of everything,
and the smart contract state for everything. And I guess you'll
learn about that later. So you can build
arbitrarily complex things with Byzantine consensus
or with blockchains. And the high level bit, if you
want to look at it another way, is that you have a program p,
right, which could be anything, could be a cryptocurrency. And then, instead of
running this program p on a single server s,
what do you do is you distribute it on a bunch of
servers, s1, s2, s3, s4, right? And now as a result, to
mess with this program p, it's not enough to
compromise one server, you have to compromise
a bunch of them.

Right? OK so, and some
of you might also be familiar with this term
permissioned blockchain. So when you distribute this
program o amongst n servers where n is equal let's
say, three f plus 1 and f is equal to 1 in
this particular case. In a permission blockchain,
this n is fixed, right? Once you've set n to
4, it has to stay 4. These servers have
to know one another. They need to know each
other's public keys. And only one of the servers,
f is equal to 1, can fail. If more than one server
fails, then all bets are off. Your program can start
doing arbitrary things.

In particular, if your
program is bitcoin, it can start double spending. I'm moving a little bit fast,
so I'll take some questions up until this point before I go on. Yes? AUDIENCE: In a
permissioned blockchain, do they use proof of work? ALIN TOMESCU: No,
you don't have to. And that's kind of
what the hype is about. This stuff, there is like– the first interesting
paper on this was 1976 or something like that.

So this is 40-30,
40-year-old research. We've known how to do
permissioned consensus– we used to call it Byzantine
consensus for 30 or 40 years, right? So there's nothing new there. It's just that it's very useful
nowadays to say blockchain then to say consensus
because then you get 10 more million from your
venture capitalist folks. AUDIENCE: But if you don't
have to do proof of work, do they ever do proof of work? ALIN TOMESCU: It would
be such a bad idea technically to do prefer
working a permissioned consensus algorithm. It just– completely
unnecessary, plus probably insecure too.

Yeah, so now the reason
you do proof of work is because in a
permissionless blockchain, this n is not fixed. n could go, let's say, n was 4. It could go to 8. Then it could go to 3. Then it could go to 12. In other words, people
are joining and leaving as they please. And the reason you
need proof of work in bitcoin, one way
you can look at it is that you're really turning
a permissioned consensus algorithm into a
permissionless one. And a consensus
algorithm is just voting. These n folks are just voting. And you need 2f plus 1 votes
to sort of move on, right? And if this n changes over time,
like if the n becomes bigger, it's very easy to take over
a majority of the voters. If I can just add fake voters
to a permissioned consensus algorithm, I can just take
over the consensus algorithm. In other words, I can take
over more than f nodes.

Right, so the trick
there is you have to prevent that from happening. And the only way to
prevent that is to say, look if you're going to join
and then make my n bigger, you better do some work. So that it's not
easy for you to join because if you're a bad guy
and you want to join, you know, you can do that
very easily unless I require you to do some work. So that's kind of
the trick in turning a permissioned
consensus algorithm into a permissionless one.

And in fact, the way these
permissioned animals work is completely
different than bitcoin. They are much more complex. Bitcoin is incredibly simple
as a consensus algorithm. If you ever read a
consensus algorithm paper, you know, it's a bit insane. Also to implement,
it's a bit insane. Bitcoin is very simple
to implement compared to these other things,
I mean, bitcoin is, of course, a complex
beast as well. But you should look at let's
say, practical Byzantine fault tolerant paper, PBFT, and
try and implement that. So anyway, why am I
telling you all of this? The reason I'm telling
you all of this is because if you want to
do a permissioned blockchain for whatever reason,
one way to do that is to use your favorite
Byzantine consensus algorithm.

So that would be– let's say pbft. This was 1999 from MIT. So you could use that. You could have a lot
of fun implementing it. Another thing you could do is
you could take your program p and just give it to an
ethereum smart contract. And you know, that the
ethereum smart contract, if the ethereum security
assumption holds, it will do the right thing. It will execute your
program p correctly, right? But the other thing that
you could do actually, is you could use Catena
to agree on these logs, on the log of operations
for your program. And then you could use another
2f plus 1 servers or replicas to do the execution
stuff so that you can agree on the final state. And this gives you a very simple
Byzantine consensus algorithm. So remember Catena doesn't
give you execution. It allows you to agree
on the log of ops. To get the execution, you'd
basically take a majority vote. If you see you have 2f
plus 1 replica servers and if you see f plus 1
votes on a final state, you know that's the right
state because only f of them are malicious.

So in fact, if you use Catena
with 2f plus 1 replicas, I claim that you can get sort of
a permissioned blockchain that sort of leverages the bitcoin
blockchain to do the agreement on the log of ops. So in that sense, it's sort
of a mix of a permissioned and permissionless. We haven't studied
this like we don't know what properties it would have. So that's future work. And if you don't need the
execution, for example, if all you're doing
is you're agreeing on a public key directory–
like here, there's no execution. This directory just is
supposed to stay append-only, we have some
research that allows you to prove that
every transition is an append-only directory.

And if you only
need execution, you can just use Catena
directly as I already told you guys for the
software transparency application for the public
key directory application. And if you want to do a
permissionless blockchain then, of course, you would
have to roll your own. But you have to proceed
with caution there, right? It's not an easy thing to do. OK, so let's conclude now.

What we did here is that we
enabled these applications to efficiently leverage
bitcoin's consensus, right? So clients can download
transactions selectively rather than full blockchain
and prevent equivocation. Right, and you only need
to get 46 megabytes instead of gigabytes from the
Bitcoin blockchain. So why does this matter? These are just I think
the three killer apps– secure software update,
public key directories– by the way, the
public directories also are applicable to https. So when you go on
Facebook, you have to get Facebook's public key.

The certificate authorities
that sign the public keys are often compromised. So they're often fake
search for Google, for big companies like that. And you might use it. But if you have a
public key directory, Facebook and Google
can immediately notice those fake sorts. It's a step forward. And for more, of course,
you can read our paper. It appeared in a IEEE
security and privacy 2017. And I'll post the
slide on GitHub too. So there are links there. Yeah so, again this is the high
level overview of everything that's previous
work and our work. The difference is very small. And now we can also
talk about other stuff. In fact, I have
more stuff to talk.

But before we do, I'd like to
have a discussion with you guys if you have questions. So? AUDIENCE: So could you implement
Catena in the actual bitcoin node? Would that be something
that they would want? It seems like it would
be a good feature to add? Or is it strictly separate? ALIN TOMESCU: Yeah, I
don't think you need to. That's the whole point, right? The whole point
of the research is how do we use bitcoin without
getting the miners to accept a new version of bitcoin,
without changing bitcoin in any way? So no, I don't think–
there's nothing to do really, we're just– we're taking bitcoin
as it is and we're piggybacking on top of it.

We couldn't change it. I mean, there's a lot of things
you can do in some sense. But then you get a
very different system. Very different, we can talk
about it more if you want. Yeah? AUDIENCE: You were talking about
how you can use this system to verify software
binaries and how you want this to run in
SPV modes on phones So how do you install
software on the phones say, through apple and the app store. Is there a way to sign the
binary that you actually get from the app store? ALIN TOMESCU: I think your
question is really about how do appstore binaries– how do you Verify
App store binaries? It's a chicken and an
egg in some sense right, is that what you're saying? Yeah, you're right. Eventually, I mean
in the best case, wishful thinking would
be to say that look, the app store does this
already for all of the binaries that they publish, allowing the
developers to make sure nobody is posting malicious binaries
on the app store for them.

Yes? AUDIENCE: Who do you
envision running it? ALIN TOMESCU: So I'd really
like to see Keybase run Catena, it seems like a
missed opportunity that they don't do this. I'm sure they have better stuff
to do but it's just really easily allow Keybase– let's say Keybase has
a mobile phone app, it would allow that mobile phone
app to verify the directory and get much, much,
much more security. You know, no equivocation as
long as nobody forks bitcoin. Since Keybase is already
publishing these digests but they cannot be audited
efficiently on a mobile phone. I mean they can but
not securely you know, because full nodes can lie. So there's a big problem with
everything I said so far. And nobody caught it. So one problem is
that what do you do when you run out of funds? Remember I said the log server
starts with two bitcoins. Let's say it issues
thousands of transactions, starts paying those $40 fees
and it runs out of funds. What do you do? Then Yeah? AUDIENCE: You can maybe
reload the new transactions with this transaction
[INAUDIBLE].. ALIN TOMESCU: What's your name? AUDIENCE: Raul.

ALIN TOMESCU: Raul. So Raul is saying
you can reload. And that's exactly right. We just have to change the
transaction format slightly. We talk about this
in the paper as well. But just to demonstrate
real quickly. Suppose, let's take a ridiculous
example which hopefully will never happen in bitcoin. But suppose that the
Bitcoin fee is 1 bitcoin. So now I have one bitcoin here. And I have s1 here. And now I have
zero bitcoins here. All right, so zero
bitcoins in this output. And maybe s2 here. So that would be terrible. Right, now I can't go on. Does everybody see
this as a problem? Right, so what Raul is saying
is look at another input here and make it take coins
from some other transaction whatever, 20 bitcoins.

And now you get
20 bitcoins here. Right, so you can easily
refund transactions. There is a bit more
subtlety there in the sense that you don't
want to join logs– let's say if you have two
logs, GTX and GTX prime for different applications. Right, and they start
issuing statements– s1, s2. You don't want to
be able to join– I'm sorry, s1, s1 prime– these two locks to a single
log for certain reasons right. But this doesn't actually
allow you to join them in the sense that– let's say you actually
do this and join them. Right, so let's say
this transaction here came from GTX prime, right? And there was an s1 prime here.

And you did this, right? So the problem is this is
no longer a valid Catena transaction for this log
because a valid Casino transaction– the
first input spends the previous
transactions output. But in this chain, it's
the second input that spends the previous output. So I cannot join logs. And this matters for
a bunch of reasons that we don't have to go into. Yeah, so we talked about the
about batching statements. I want to show you guys
some previous work, so how did some previous
work do this since there we seem to have a bit of time. OK, so there are
some previous work called liar, liar,
coins on fire. Have you guys seen this? So the idea here is that it
is a really nice piece of work and lots of people in
the bitcoin community already know about this.

And I think it was an idea
before the paper or maybe not, I'm not sure. But Tadge has a similar idea. So for example, suppose
we have this authority. Imagine this is a
Catena block server and it publishes a transaction
which locks to bitcoin and locks those bitcoins
to that public key. And this authority
sometimes will want to say two different things. It will want to say
s and s prime, right? And it will sign the statements
with their secret key. But the secret key
that the authority uses to sign statements is
also a bitcoin secret key. It's the same secret
key that the authority used to lock to bitcoins,
so $20,000 or something. I'm not sure if bitcoin
plummeted since yesterday, can never be sure. Does the setting make sense? So I have an authority. It issues statements
just like before. And it numbers them
with i, let's say. And we want to prevent this
authority from equivocating. We're not actually
going to prevent it. We're just going to
disincentivize it in the sense that if this authority
equivocates like this for the same statement
i, what I claim is that anybody can then
steal that authority's bitcoin because
equivocating like this reveals the secret key.

And the reason it
reveals the secret key is because the signature
shares the same i here. So how many of you
guys actually, you did cover Schnorr
signatures, right? So did Tadge talk about how
to do this with Schnorr? AUDIENCE: This is
the [INAUDIBLE].. ALIN TOMESCU: But
did Tadge cover it? AUDIENCE: Yes. ALIN TOMESCU: OK, great. So yeah so let's go
over that less briefly. So again, the idea is that
if someone observes these two signatures on conflicting
statements for the same i, there is this box where you
can put the two signatures and get back the secret key. And once you have
the secret key, we can spend this
transaction and get that authority's bitcoins. And then there's
a lot of details to get it right because
you might notice that if the authority does
this, before they do this, they might already be spending
this transaction themselves so as to prevent
you from taking it.

And details on how to prevent
the authority from doing that are in those paper. Yeah, and then you
know, whoever discovered this can spend those bitcoins. And the idea is that this
disincentivizes equivocation by locking these funds
under the secret key of the bad authority. But it does not prevent it. Right, so in Catena we
actually prevent equivocation. We say, if you
want to equivocate, you better fork bitcoin. Here they say, if you
want to equivocate, you're going to lose $20,000. Right? But you have to
understand like this could be a good authority
that locked $20,000 here. But the attackers are going
to be– they're not going to care about those $20,000. They're just going to
steal the secret key, equivocate and
then the authority is going to be
left without money. If the authority
is the attacker, then this makes sense. But if the attacker
is not the authority, then this makes less sense
because the authority sort of risking their bitcoins
on the assumption that no attacker can compromise
them, which you know, if you could do that
in computer science, I wouldn't be sitting
here talking to you guys.

OK, so now how do you do this? So do you remember Schnorr
signatures real quickly? An easy way to do this is
using Schnorr signatures. And I think– I could be wrong, but
I think this new SegWit update to bitcoin allows
Schnorr signatures, right? So with SegWit and
with Schnorr signature, as you can definitely do this. And the idea is that
Schnorr signature if you recall, is just k
plus h of m, g to the k s, where s is the secret key. Right, so this is a Schnorr
signature on m, right? How many recall this? Right, and of course,
it's not just this. I mean, the signature is
really this and this h of m, g to the k.

So it's these two things, right. But now I want to show you that
if I sign two different things, I can actually get s. But if I sign them
in a certain way– so remember what I said before
is I would like this authority to sign i m and i m prime. Right, and if the
authority does this, I claim that I can get
the secret key out. But I have to have the same i. So we relax this in a
sense that we're not going to use i here, we're just
going to use this g to the k. So if the authority uses
the same g to the k to sign, then we can extract
the secret key. And the way we can do that
is I'll just show an example, sig1 would be k plus
h of m1 g to the k.

And it'll be sig 1 and I
guess the associated hash E1 would be h m1, g to the k. Does everybody see this? And then sig2 would be k
plus h of m 2 g to the k. So again, oh, you guys are
clearly not paying attention. I forgot the secret key. E2 is hash of m2 g to the k. OK, so I'm using the same
g to the k and the same k. So now how can I
extract the secret key? Does anybody see a
solution to this? AUDIENCE: It's now just system
of two equations and two variables where k and s
are the unknown variables. ALIN TOMESCU: So I think
what Rahul always saying is that s is just sig1 minus
sig2 divided by e1 minus e2. Is that right? Yeah. See this, because if I
subtract this from this, I just get h m1– I'm not going to
have space here.

I get– let's say it here– h of m1, g to the k minus h
of m2 g into the k times s. All right? And now I take this
here in the denominator and I simplify and I get s. All right. So it turns out that
that's kind of the trick that this word leverage
is more or less. The only sort of caveat here
is that remember I said, there is a position
i for the statement. But now I'm saying, there's
no longer a position.

We have to use this g to the k. And how do we map positions
that are g to the k is another trick that you
have to do but you can do it. And if you want more details,
you can read the paper and I'm sure Tadge will tell you
even more details about this. I think the lightning
network also leverages this trick in some cases. All right now, let's talk a
little bit about some attacks if there is time.

The most interesting attack
would be the generalized vector 76 attack. So this is a screenshot
from a paper. Can everybody see this? So the generalized vector 76
attack is very interesting. So you have you have
an attacker, right? And remember, the goal of
the attacker is to replace– let's say this
TX1 with that TX2. It's going to show this TX1
to a merchant saying hey, I paid you money. And then it's going to show
this TX2 to the merchant sending the money back to himself. So the whole trick is to fork
the merchant on this side and then to switch him
back to that, right.

It's a sort of a pre-mining
attack and this attack is much easier to pull on SPV notes
because what the attacker does is– let's see– he works on the
secret chain with TX1 in it. And at some point, the main
chain might take him over, might win. So the attacker kind of
gives up and keeps trying. But at some point the
attacker gets ahead, right in the second chain. He gets ahead of the main chain. This stuff is not posted yet. And let's see this merchant only
for some reason because they're silly, they only need one
confirmation to accept the TX1. And for other silly
reasons, this merchant is an SPV merchant, right? Let's draw a figure here. So this merchant
is an SPV merchant that only needs one confirmation
to accept the transaction. So we have the attacker.

And we have let's
say, the SPV merchant. Right, and now the
attacker's sent you know, this was the main block
there, let's say, this was bi. And the attacker forked it. He put TX1 here. And then had a
confirmation on it, right? So he sends this
chain to the merchant. And the other miners
haven't found anything yet. Right, so the attacker
is a bit ahead. So he's pre-mining.

So far, so good. Are you all with me? And remember that he really
just is showing block headers. So these are not full blocks. He's just showing block headers. They're much smaller
blocks, 80 bytes. And the blocks are missing. All right, so when
he shows TX1, he is just showing like a Merkle
path to TX1 to the merchant. You guys with me? OK, so now he did
this to the merchant. And now what the
attacker is going to do, he's going to relax,
sit back, and post the TX2 the mines–
give TX2 to the miners.

Let's say the miners are here. Hes going to post
TX2 to the miners. And he'll stop
mining the attacker. He's not going to mine anymore. But he got the merchant
to accept the payment. And the merchant
shipped the goods. Maybe this is an
online purchase. And so now the miners, they will
find the next block bi plus 1. They'll put TX2 here. They'll find the next block
and then the next block. And they'll take over. Their chain will
be the main chain. And eventually
this merchant will hear about this main chain. And he'll just see the
headers, of course, because he's an SPV merchant. So now the SPV merchant
just got double spent. Does everybody see this? That I just double
spent the merchant? And sort of the
fundamental problem here is that the merchant
only received block headers.

So he cannot take these block
headers and broadcast them to the miners because miners
don't mine on top of block headers. Miners mine on top
of full blocks. So this merchant cannot has
no protection against this if he's doing SPV– if he's accepting SPV payments. If these were full blocks, what
the merchant could have done was the following. The merchant would
have would have received the full block here
with the transaction in it, right. So again the attacker got ahead. But now the merchant
because he saw a block, he's going to ship this
block with TX1 to the miners because he's a full node. Full nodes do that. When they hear about a
block, they broadcast it. So now the miners
know about this block. They're going to continue
mining on top of it. And in fact, the merchant
will send both blocks, right to the miners. So now miners will
continue building here. So now the attacker has a bit of
a tougher problem on his hands.

There is still a way to
trick even full nodes, even if this guy
is a full node, you can sort of leverage
some timing assumptions to still trick the full node. And the details are in
that paper over there. But that's one way– somebody was asking, I think
you were asking Anne, right– about SPV nodes and how
they're less secure. And this is one fundamental way
in which they're less secure. If you accept payments
with SPV nodes you're really playing with fire. You know, you do need a
sufficiently powerful attacker who can get ahead. But yeah. OK, so with that I think that
kind of concludes the lecture. Any final questions? All right, cool. Thank you guys..

As found on YouTube

You May Also Like