July 28, 2022

Introducing: Proof of SQL™

Whether you’re an on-chain developer or a business analyst and SQL rockstar, too much data is off-chain and off-site.

Whether you’re an on-chain developer or a business analyst and SQL rockstar, too much data is off-chain and off-site in a data warehouse. Cryptographers have promised that you can outsource your database queries, securely, to an untrusted server with verifiable computation. For the first time ever, this really is feasible in the Web3 environment.

Raw Data vs. Useful Data

Your company, your favorite blockchain, and your app are generating terabytes of data every day storing order histories, transactions, and user-generated content. The truth is, you don’t care about most of it!

Think about a Tableau dashboard versus a .csv file (or .xlsx if you prefer). In an intuitive, human sense, the dashboard carries a lot more usable information. It’s simply impossible to read 100,000 rows of comma-separated values and have a profound understanding of what that data means. This is why SQL exists, and why there’s an abundance of business analytics jobs. An aggregation not only makes it easier to understand, but as far as a computer is concerned it actually carries less information than the .csv file. If you aggregate the data into a simple visual, say a pie chart, then the computer doesn’t have to give you information about each individual row. All of that can stay at the data warehouse.

So as an end-user, the issue is that there’s a lot of data (more than you can store locally), you want some info extracted from it (much less information than the raw data), and you’re somewhat limited in bandwidth (the amount of data you can retrieve from the warehouse). So just send the Pie Chart!

This is fine if your company owns the data warehouse, your servers and your organization haven’t been compromised, and there’s no reason for anyone to falsify the result. In the Web3 setting, we have a problem: how do we know that the metaphorical pie chart is correctly generated using the correct data?

Verifiable Computation & A Non-Explanation of Zero Knowledge Proofs

It turns out that this problem has been “solved” since the early 90’s. In theory. Or maybe not until 2008. Hard to say.

If you’ve ever read a crypto paper, you’ll understand that the right way to solve this problem is to dream up an absolutely crazy technical solution.
- Matthew Green on ZKPs

The solution that cryptographers have come up with is “delegated computation” or “verifiable computation” and it’s based on the theory of interactive proofs. We lean especially hard on techniques from Zero Knowledge Proofs (ZKPs), which are protocols for an untrusted prover to convince a verifier that a statement is true while leaking no additional information. Now, zero knowledge is great for other cryptographic protocols, such as digital signatures. Proof-of-SQL is not a ZKP. Quite simply, it doesn’t need to be. We’re not trying to hide the data in the data warehouse, we just want to avoid having to store it all locally.

In fact, research in interactive proofs has spawned numerous protocols so that we can pick and choose which properties we want and optimize for the most performant system for tamperproof SQL queries available.

Arithmetization? More Like (sc)Ary Math Evasion!

I would love to geek out and explain how we get from a SQL query to a cryptographic promise but I’ve been told not to go into that level of detail. The short answer is we turn it into math. I don’t mean “at Space and Time, we turn it into math”. I mean nobody really knows how to make these cryptographic guarantees about programs without some intermediate representation that looks a lot more like math than anything else. This step is called arithmetization, and it’s crucial to the efficiency of Proof-of-SQL.

Arithmetization can really make or break a zero knowledge proof. I mentioned digital signatures earlier, and for several decades, they were the only type of cryptographic proof that was actually used in the real world. Signatures are great and practical because their arithmetization works naturally with the same kind of math as other public key cryptography. I’m not saying the math is easy, but whether it’s based on modular arithmetic or elliptic curves, at least there’s no cost in translation. With anything else (say, turning a smart contract into a provable form) there’s a lot of overhead coming just from the arithmetization step.

Here’s one extreme example of inefficient arithmetization: cryptographers used to take a bit, which is the absolute smallest amount of data you can have, and is enough to store just ‘0’ or ‘1’, ‘yes’ or ‘no’, or whether a light switch is on or off, and they’d represent it as a field element using 256 bits. Think about it; if you had something that was stored in a box 256x larger than itself, you would want to ditch the box because it’s taking up too much space in your house.

The moral of the story there is that the choice of intermediate representation is the most important design choice in reducing overhead. The closer your mathematical model is to the computation itself, the lower the “translation cost” incurred during the arithmetization step. So by targeting SQL specifically we can handle real-world use cases of sizes unimaginable with current general-purpose SNARGs. Database queries (or rather, the relational algebra which underpins them) has a nice characterization that makes them very amenable to cryptographic proofs, but “universal” SNARKs just aren’t equipped to take advantage of it.

No Secret Sauce (That I’d Tell You About)

As much fun as it would be to publish our trade secrets publicly on our blog… we don’t have that many. Space and Time is not in the business of making grand promises that we can’t back up. While tamper-proof database queries are a fantastical promise, we came up with a natural solution in Proof-of-SQL because we know:

  • You don’t want tons of data, you want meaningful analytics, verifiably outsourced
  • Cryptographic innovation has brought that goal within striking distance
  • Domain-specific arithmetization brings us over the finish line

With a mix of standard methods and relentless optimization, the math and cryptography experts at Space and Time are almost ready to release Proof-of-SQL, making the distinction between on-chain data and off-chain data practically irrelevant.