Space and Time Infinite Precision

Product announcement.

Stephen Hilton

Head of Solutions

Rishitesh Mishra

Director of Engineering

What it is and why we built it

One of the challenges of building a Web3-native data warehouse is the extreme numeric size and precision required—we have seen gas fees, gas limits, token rewards, etc. require numeric scale and precision reaching 72 digits or above. Traditional data warehouses like Oracle or Snowflake have Decimal data types that top out at 37 digits, leveraging the Float data types for larger numbers. The challenge with Float data types is a lack of precision, which in Web3 translates to a lack of fidelity when aggregating value transfer between parties. Alternatively, some Web3 data providers store these very large numbers in character fields like VarChar, at the expense of basic mathematical functions. Users are left with the choice between full precision numbers, and being able to perform basic calculations.

To enable high-fidelity Web3 use-cases, Space and Time built the Big Decimal data type to support Web3 extreme size and precision while still preserving mathematical functionality.

How it works

Space and Time receives blockchain data from multiple endpoints, then sends it to Kafka topics from which the data warehouse consumes data in real time and writes it to tables. Separately, we’ve established an ETL process to bulk load historical blockchain data into the data warehouse. This allows users to access all data from block 1, but also the data being added to the chain in real time. We strive to keep the latency between blockchain events and that data being reflected in Space and Time as low as possible. 

Rather than storing blockchain data with a JSON-like schema, we ingest blockchain tables into a particular structured schema that makes it easier for developers to work with the data. It’s much easier for a dev to write apps and queries on top of our relational schema, rather than a simple blob of JSON. 

Because Space and Time is an HTAP data warehouse, it’s composed of two different query engines: the OLTP (transactional, row-based) engine and the OLAP (analytic, column-based) engine. From the user perspective, there’s no bifurcation—you simply run a query and get the result. Space and Time handles the routing to the appropriate engine in the Validator layer. When a user runs a transactional query, it’s served from a row-format database with extreme latency. If the query router determines that the user ran an analytic query, it’s served from the OLAP engine, which acts on a columnar storage format to enable faster query response time and less data transfer between storage and the query engine. 

As we built our blockchain indexing service, we realized that blockchain data (specifically financial data, like wallet balances) can carry precision up to 78 digits. While this level of precision is supported natively by our OLTP engine, the OLAP engine doesn’t support precision beyond 38 and scale beyond 18. Though the data is stored in a string format, it still isn’t possible to compute calculations (even as basic as mathematical operations like addition and subtraction) on this type of data without losing precision. Because of this, we created the Big Decimal data type, which supports infinite precision to allow the user to work with blockchain data in a meaningful way without losing precision.

How to use it

To create a table with decimal columns for fractional values:

Create table ETH.FUNGIBLETOKEN_WALLET(key integer, balance decimal(78, 10))

The balance field above can contain a decimal value with precision (total number of digits) of 78. The scale is 10, meaning the number of digits after the decimal can be 10.

To create a table with decimal columns for whole numbers:

Create table ETH.FUNGIBLETOKEN_WALLET(key integer, balance decimal(78, 0))

Note the scale above which is 0. This will enable us to store whole numbers in the field. You can define scale and precision as arbitrarily as possible.

Supported Operations

  • Binary arithmetic: [+ ,  -,  * ,  /,   %]
  • Comparison operators: [>,  < ,  <=,  >=,  <=>]
  • Grouping operators: [Avg, Min, Max, Count, Sum]
  • Null check

Note: For binary operations, if the leading operand’s data type is not Big Decimal, the query engine will try to truncate and round up the big decimal as well, resulting in loss of precision.

e.g. The following expression will cause precision loss as 1 is the leading operand and is not a big decimal:

1 - 3000000089999919191919190000000000000000.2000000

To avoid losing precision, use cast:

cast(1, decimal(48,7) - 3000000089999919191919190000000000000000.2000000)

What it enables

A Web3-native data type enables high-fidelity Web3 analytic use-cases that would not be possible with real precision otherwise:

  • Gas optimization is a major concern for most smart contract authors, but how does a developer know what ‘good’ looks like? Where is the break-point when self-funded users start to look for alternatives? Being able to perform precise data warehouse/big data analysis means being able to easily understand how the ecosystem of gas consumption changes over time, and helps devs place their gas usage into a broader industry context, whether DeFi, DEX, or gaming.
      
  • Similarly, using ML models to predict future gas fees allows both dev and users to understand future likelihood of various price points. This provides users with plans for large usage to make the most efficient selection, as well as devs looking to provide lower-cost alternatives.
Stephen Hilton

Head of Solutions

Stephen Hilton is Head of Solutions for Space and Time, bringing 20 years of expertise in analytics, data processing, and technical leadership and 10 years in building value-focused, customer-facing engineering teams to deliver innovative customer experiences. Stephen most recently worked for Teradata as the Global Director of Strategy, Analytics, and Automation, building out the Solution Engineering and Customer Success organizations while working closely with the largest customers to ensure high strategic alignment and engagement. Based in the Bay Area, Stephen spent 14 years in hands-on managing / consulting with Silicon Valley digital media and financial firms, from small start-ups to trillion-dollar tech giants, helping companies balance bleeding edge technology exploration and innovation with a pragmatic value-based approach to deployment.

Rishitesh Mishra

Director of Engineering

Rishitesh is a tenured software engineer with extensive experience working on #databases and associated technologies. He specializes in #distributed databases, scalable systems, #analytics, and transaction processing.