Blog

Engineering

How we maintain data correctness at institutional scale

We built an agent that investigates every data report autonomously, allowing us to maintain data correctness across 100+ blockchains, 1,200+ protocols, and 3,000+ tokenized assets.

Token Terminal

How we maintain data correctness at institutional scale

Onchain data changes constantly. Protocols upgrade contracts, rename events, and introduce fee mechanisms. Every change is a potential discrepancy between what the data shows and what is happening onchain. We call the work of finding and fixing these discrepancies data investigations.

We track financial and usage metrics across 100+ blockchains, 1,200+ protocols, and 3,000+ tokenized assets. A team of under 20 manages petabytes of onchain data and runs over 30,000 data models daily. At this scale, data investigations come in constantly, and each one can take hours.

We built an autonomous agent that runs these investigations 24/7, which allows us to maintain data correctness at institutional scale.

Classify and trace

Tickets arrive two ways: our monitoring flags an anomaly, or a client reports something directly. The agent classifies each as a data discrepancy, a methodology question, or a new integration request, and routes it accordingly.

A data discrepancy might trace to a liquidity pool migration on a DEX or a stale oracle on a lending protocol. A methodology question might relate to how to count revenue when a protocol routes fees to both a DAO treasury and token holders. A new integration request comes in when a protocol launches a new version or deploys on a new chain, or a tokenized asset launches.

After the classification step, a pipeline tracer maps every model between the final metric and our raw blockchain data in seconds. It gives the agent a dependency map: which models to query, which contracts to verify, and where the issue is most likely to be. Without it, every investigation would start blind.

Most tickets still need human judgment due to novel protocol mechanisms, ambiguous fee structures, or questions requiring domain expertise. These route to a team member with a full triage summary, including the classification, affected pipeline models, and suggested next steps. They start with context, not from scratch.

Tickets with clear precedent in our codebase, like a renamed event or a known data pattern, proceed to autonomous investigation.

near_metrics_revenue
tt-business-intelligence.project_near
7
Models
6
Sources
2
Depth
near_metrics_revenueMetric
int_near_gas_revenue_per_dayIntermediate
fct_near_transactionsFact
tt-blockchain.near.transactionsSource
fct_near_receiptsFact
tt-blockchain.near.receiptsSource
int_near_intents_revenueIntermediate← source of issue
fct_near_intents_treasury_balancesFactnew model
tt-blockchain.near.account_changesSource
fct_near_near_pricesFact
tt-abstractions.coingecko_private.coingecko_daily_market_dataSource
Metric
Intermediate
Fact
Source

The pipeline tracer maps a metric back through every transformation layer to its raw blockchain source.

Investigate and fix

Each investigation runs in its own fresh context window. A single investigation can involve dozens of file reads, query results, and dependency maps. Fresh context per ticket keeps the tenth investigation as sharp as the first.

For each ticket, the agent follows the same sequence:

  • Read the protocol registry to understand the protocol's setup: which contracts are deployed, on which chains, and which metrics they feed
  • Query the data warehouse against the models the tracer identified
  • Verify state onchain by reading contract state directly: token supplies, oracle prices, event signatures
  • Write the fix and post findings to the ticket: what happened, why, and the proposed resolution

For example, a lending protocol upgrades its proxy and renames an interest accrual event. The agent traces the missing data to the decoder reading the old event signature, writes a fix to add the new one, and opens a pull request.

Or a new token launches with treasury addresses that inflate the raw total supply. The agent compares the onchain supply against what the registry expects, identifies the addresses to exclude, and updates the configuration.

What the agent handles autonomously grows over time. Every reviewed investigation encodes a new pattern.

CX-1409 · NEAR Intents revenue
28 min total
00:00
Ticket arrives
Client reports NEAR revenue shows $0 for the Intents product. External benchmark disagrees.
00:02
Classified
Data discrepancy → revenue metric → NEAR Intents product. Routed to autonomous investigation.
00:05
Pipeline traced
Maps near_metrics_revenue → 7 models, 6 sources, depth 2. Identifies int_near_intents_revenue as the suspect.
00:08
Registry checked
NEAR Intents has two treasury accounts: 1csfundsadmin.sputnik-dao.near and fefundsadmin.sputnik-dao.near
00:12
Queried warehouse
Treasury balance deltas from tt-blockchain.near.account_changes. Daily inflow: ~$3.4K/day.
00:18
Verified onchain
Cross-referenced with revenue.near.org API. Exact match on 7 consecutive days.
00:25
Fix written
New model fct_near_intents_treasury_balances. Rewired intermediate model to read treasury balance deltas.
00:28
PR opened, findings posted
PR #23093 with 3 new models. Full investigation posted to ticket.

A single investigation from ticket to pull request, showing the agent's sequence across 28 minutes.

What this enables

Autonomous investigation requires full pipeline access. The agent has access to the same data our team works with every day: raw blockchain data, decoded contract events, an in-house price feed, and the final metric transformations.

Those same years of building the pipeline shaped how we build on top of it. Every classification rule, every investigation pattern, every fix reflects accumulated knowledge of how onchain data actually behaves: how protocols upgrade, how oracles fail, how token supplies drift. That knowledge is what the agent runs on.

Data coverage, depth, and quality scale independently of team size.

The authors of this content, or members, affiliates, or stakeholders of Token Terminal may be participating or are invested in protocols or tokens mentioned herein. The foregoing statement acts as a disclosure of potential conflicts of interest and is not a recommendation to purchase or invest in any token or participate in any protocol. Token Terminal does not recommend any particular course of action in relation to any token or protocol. The content herein is meant purely for educational and informational purposes only, and should not be relied upon as financial, investment, legal, tax or any other professional or other advice. None of the content and information herein is presented to induce or to attempt to induce any reader or other person to buy, sell or hold any token or participate in any protocol or enter into, or offer to enter into, any agreement for or with a view to buying or selling any token or participating in any protocol. Statements made herein (including statements of opinion, if any) are wholly generic and not tailored to take into account the personal needs and unique circumstances of any reader or any other person. Readers are strongly urged to exercise caution and have regard to their own personal needs and circumstances before making any decision to buy or sell any token or participate in any protocol. Observations and views expressed herein may be changed by Token Terminal at any time without notice. Token Terminal accepts no liability whatsoever for any losses or liabilities arising from the use of or reliance on any of this content.

Stay in the loop

Join our mailing list to get the latest insights!

Subscribe to our weekly newsletter
Actionable insights you can’t get elsewhere.
© 2026 Token Terminal