Space and Time Infinite Precision
What it is and why we built it
One of the challenges of building a Web3-native data warehouse is the extreme numeric size and precision required—we have seen gas fees, gas limits, token rewards, etc. require numeric scale and precision reaching 72 digits or above. Traditional data warehouses like Oracle or Snowflake have Decimal data types that top out at 37 digits, leveraging the Float data types for larger numbers. The challenge with Float data types is a lack of precision, which in Web3 translates to a lack of fidelity when aggregating value transfer between parties. Alternatively, some Web3 data providers store these very large numbers in character fields like VarChar, at the expense of basic mathematical functions. Users are left with the choice between full precision numbers, and being able to perform basic calculations.
To enable high-fidelity Web3 use-cases, Space and Time built the Big Decimal data type to support Web3 extreme size and precision while still preserving mathematical functionality.
How it works
Space and Time receives blockchain data from multiple endpoints, then sends it to Kafka topics from which the data warehouse consumes data in real time and writes it to tables. Separately, we’ve established an ETL process to bulk load historical blockchain data into the data warehouse. This allows users to access all data from block 1, but also the data being added to the chain in real time. We strive to keep the latency between blockchain events and that data being reflected in Space and Time as low as possible.
Rather than storing blockchain data with a JSON-like schema, we ingest blockchain tables into a particular structured schema that makes it easier for developers to work with the data. It’s much easier for a dev to write apps and queries on top of our relational schema, rather than a simple blob of JSON.
Because Space and Time is an HTAP data warehouse, it’s composed of two different query engines: the OLTP (transactional, row-based) engine and the OLAP (analytic, column-based) engine. From the user perspective, there’s no bifurcation—you simply run a query and get the result. Space and Time handles the routing to the appropriate engine in the Validator layer. When a user runs a transactional query, it’s served from a row-format database with extreme latency. If the query router determines that the user ran an analytic query, it’s served from the OLAP engine, which acts on a columnar storage format to enable faster query response time and less data transfer between storage and the query engine.
As we built our blockchain indexing service, we realized that blockchain data (specifically financial data, like wallet balances) can carry precision up to 78 digits. While this level of precision is supported natively by our OLTP engine, the OLAP engine doesn’t support precision beyond 38 and scale beyond 18. Though the data is stored in a string format, it still isn’t possible to compute calculations (even as basic as mathematical operations like addition and subtraction) on this type of data without losing precision. Because of this, we created the Big Decimal data type, which supports infinite precision to allow the user to work with blockchain data in a meaningful way without losing precision.
How to use it
To create a table with decimal columns for fractional values:
The balance field above can contain a decimal value with precision (total number of digits) of 78. The scale is 10, meaning the number of digits after the decimal can be 10.
To create a table with decimal columns for whole numbers:
Note the scale above which is 0. This will enable us to store whole numbers in the field. You can define scale and precision as arbitrarily as possible.
- Binary arithmetic: [+ , -, * , /, %]
- Comparison operators: [>, < , <=, >=, <=>]
- Grouping operators: [Avg, Min, Max, Count, Sum]
- Null check
Note: For binary operations, if the leading operand’s data type is not Big Decimal, the query engine will try to truncate and round up the big decimal as well, resulting in loss of precision.
e.g. The following expression will cause precision loss as 1 is the leading operand and is not a big decimal:
1 - 3000000089999919191919190000000000000000.2000000
To avoid losing precision, use cast:
What it enables
A Web3-native data type enables high-fidelity Web3 analytic use-cases that would not be possible with real precision otherwise:
- Gas optimization is a major concern for most smart contract authors, but how does a developer know what ‘good’ looks like? Where is the break-point when self-funded users start to look for alternatives? Being able to perform precise data warehouse/big data analysis means being able to easily understand how the ecosystem of gas consumption changes over time, and helps devs place their gas usage into a broader industry context, whether DeFi, DEX, or gaming.
- Similarly, using ML models to predict future gas fees allows both dev and users to understand future likelihood of various price points. This provides users with plans for large usage to make the most efficient selection, as well as devs looking to provide lower-cost alternatives.