July 28, 2022

Making SQL Queries from a Smart Contract

Space and Time continues to push the boundaries of querying relational databases through a smart contract.

The blockchain's ability to provide trust, security, privacy, visibility, traceability and decentralization gave way to a bevy of benefits which have transformed and transmuted an array of industries including finance, real estate, and gaming. These advancements have led to the term Web3, describing a new era of internet. But at its core, the goal of the blockchain is to just store information. This means the blockchain is simply a new type of database. While this new database format does have many advantages, it is limited in the amount of data it can store, and the amount of data it has stored. In contrast there is a deep world of traditional databases with the ability to store massive amounts of data. These more traditional databases hold the most important insights garnished from all the largest corporations since the 1960s. As such, it is time to bring these old databases and the valuable information they store into the modern age by connecting them to the blockchain. This article will discuss the reasons and ways to make an SQL query from a smart contract. 

What is a Smart Contract?

To begin connecting traditional databases to the Web3 ecosystem we begin with a smart contract. A smart contract is a program stored directly on the blockchain that is run when certain conditions are met. Smart contracts are used for token creation, NFT creation, trading automation, lending, borrowing, staking and all other processes occurring on the blockchain.

What is an SQL query?

SQL stands for Structured Query Language and is the language used on relational databases. A relational database is a collection of information that is organized by pre-defined relationships for fast access. An SQL query is a piece of code used to access these relational databases. Over the last 2 years the Bitcoin blockchain ledger has increased the amount of information stored by about 50G per year. SQL Server Standard Edition however has an upper limit of 524 Petabytes (5.24e+8 Gigabytes).  Understanding the vast amount of data is stored in a relational format further elucidates the need for a connection to the blockchain. 

Use Cases

Now that we understand what we are connecting, let's discuss some specific reasons to connect. 

One reason to call a relational database from a smart contract is to preserve privacy of information. Storing information on the blockchain means it is viewable by the public. This aids in the benefits of visibility and trust, but sometimes we want to check information on the blockchain against private records. For example we may have a password entered by a user into a smart contract. We want the smart contract to check if the password is correct, but it can’t check against passwords stored on the blockchain because then the stored passwords would not be private. So instead the contract could write a query that checks if the entered password is inside the relational database and performs the function only if it is. 

Another common use case is to quickly ascertain specific information from large storages of data for use on the blockchain. For example, there may be a smart contract that uses on-chain data and sentiment data to decide if it should make a trade. The contract needs to grab aggregated sentiment information from social media channels. Storing all the information necessary to aggregate the sentiment on the smart contract would not be feasible. So the data must be stored in a relational database, aggregated, and called with an SQL query. The smart contract only wants to make sentiment based trades at certain times based on on-chain data. So the contract needs to reach out with a query when the on-chain conditions are correct, rather than a backend script reaching out to the smart contract with the sentiment data. 

Options

There are several ways to get information between a smart contract and relational database. For example, one could write a backend node.js script, that uses an api to access a database, return the information and use the ethers.js library to call a smart contract function passing in the data as a variable. This way is indirect however, and requires additional backend infrastructure. 

There are also libraries used to write javascript within a smart contract. Through this method one could theoretically write the api and SQL call within the smart contract using javascript. However, these libraries were found to be outdated, non-functional and/or cost inefficient. 

How to Make and SQL Query From a Smart Contract

The current best option for making an SQL query from a smart contract is with the aid of Chainlink. The smart contract passes in a string of the query as a variable into a function that makes a request to a Chainlink node. The Chainlink node can then make a call to a custom external adapter that accesses the database and sends the results back to the smart contract. In this way a query is written directly in a smart contract and the result is returned from a relational database. 

Solidity, the language used to write smart contracts is statically typed meaning the type of the data must be explicitly defined. To combat this one can convert all the data into strings before sending it back to the smart contract. 

It is important to note that the contract can only receive a limited size result, due to the inherent storage constraints of the blockchain. 

What About Consensus 

While it is possible to pass in off-chain data as a parameter of a function, in order for a smart contract itself to retrieve off-chain data it needs to make the request to an oracle. An oracle is essentially an on-chain API that can be used to query information into the smart contract. 

The blockchain needs every node in the network to get the same result when replaying every transaction. If a smart contract were to call an API for price data, the price data would change each day and the results would be different when the transaction were replayed causing consensus to fail. Oracles solve this problem by storing the data on the blockchain, so that any node replaying the transaction will use the same data to pass consensus. Chainlink is a decentralized oracle network allowing us to access the off-chain relational data.

The next step towards full integration between SQL queries and smart contracts is to ensure SQL queries are tamperproof.

Summary

The bounds of the blockchain are ever expanding as it is continually incorporated into a multitude of technologies and industries. With the groundwork laid by decentralized oracle networks, Space and Time continues to push the boundaries of querying relational databases through a smart contract. Many of the innovations made possible by the direct integration between the new blockchain database and traditional databases will be discovered through inquisitive readers such as yourself.

Happy querying!