How Assay rates exchange quality

Eleven observable metrics across four dimensions, computed daily from public market data. No exchange cooperation required, no self-reported figures used as inputs, no proprietary feeds.

Summary

Assay rates crypto exchanges on the quality and integrity of their spot market data. Each day, public trade data and order book snapshots from ten venues are ingested, eleven metrics across four dimensions are computed, and a composite score and band assignment per exchange are published. Scores refresh every 24 hours.

The design is motivated by a single constraint: participants making decisions about exchange venues commit six-to-seven figures to a venue based on data the venue itself provides about itself. Existing alternatives are either self-reported (CoinMarketCap, CoinGecko), opaque (CER.live), or gated behind enterprise pricing (Kaiko). Assay fills the gap with an audit-grade, methodology-transparent layer priced for the people making those decisions.

This document describes what is measured (in full) and how scores are computed (in full). Specific threshold calibration parameters are held proprietary for the reasons set out in § What stays proprietary.

At a glance

Exchanges covered10 global spot venues
Trading pairsBTC/USDT, ETH/USDT (venue-native equivalents where applicable)
Data sourcePublic REST and WebSocket APIs only
Update frequencyDaily, approximately 23:30 UTC
WindowTrailing 24 hours (00:00-24:00 UTC)
Minimum history60 days of peer baseline calibration before publication

The four dimensions

Each dimension answers one question a listing lead needs to answer about a venue. Together, the four dimensions cover the failure modes documented in the academic and industry literature on exchange quality.

D1
Volume authenticity
Does the reported trading volume follow patterns consistent with organic market activity?
M01, M02, M03
3 metrics weight 30%
D2
Order book quality
Is the displayed liquidity reflected in observable trading at size?
M04, M05, M06
3 metrics weight 25%
D3
Price formation integrity
Do observed trades produce price dynamics consistent with informed market participation?
M07, M08 (M09 deferred)
2 metrics weight 25%
D4
Cross-venue consistency
Do this venue's aggregate characteristics align with its observed position in the peer basket?
M10, M11
2 metrics weight 20%

Weights reflect relative importance to the buyer use case: volume authenticity carries the highest weight because wash trading is the single signal most likely to mislead a listing decision, and the metrics in this dimension are the hardest for an exchange to game.

The eleven metrics

Each metric is specified below with its question, rationale, inputs, score mapping, and cost-to-replicate analysis. Metrics are computed independently for each (exchange, pair, day) combination. Where a metric is not applicable to a venue, it is marked as not applicable and excluded from dimension aggregation.

M01 Trade size distribution (Benford-adjusted)

Question

Does the distribution of trade sizes match what organic trading produces?

Rationale

Organic trading produces trade sizes that follow a power-law-like distribution with a characteristic tail. Automated or bot-driven volume tends to produce concentrations at round numbers (0.1 BTC, 1.0 BTC, 10.0 BTC) or unusually uniform size distributions. A leading-digit test against Benford's Law captures both failure modes without assuming a "normal" venue size profile.

What this catches and what it does not. M01 is sensitive to unsophisticated trade-size manipulation: round-number bias, uniform-size bot output, and arithmetically-generated streams that ignore the natural leading-digit distribution. Conformance with Benford's Law on its own does not rule out sophisticated wash trading. A counterparty-aware system that constructs synthetic trades with properly power-distributed sizes can pass this metric while still being artificial. M01 is one of three components of the volume-authenticity dimension and is read alongside M02 and M03; agreement across all three is what establishes a high dimension score.

Inputs

Data sourcePublic trades, 24h rolling window
FieldTrade size in base asset
Outlier treatmentWinsorised at 99.9th percentile
Minimum sample1,000 trades; below this the metric is not scored for that day
StatisticPearson chi-squared against Benford expected leading-digit frequencies, normalised by sample size: χ²/N. Normalisation makes the statistic sample-size invariant; raw chi-squared scales linearly with N and would otherwise rank exchanges by trading volume rather than distribution shape.

Score mapping

Piecewise linear in χ²/N: lower per-trade divergence maps to a higher score. Anchors:

χ²/NScore
0.00100
0.0580
0.1550
0.3520
0.650

Cost to replicate

Matching this distribution requires generating synthetic trades with properly power-distributed sizes — a non-trivial engineering requirement. Simple automation with uniform or round sizes produces detectable leading-digit patterns.

M02 Volume-to-order-book-depth ratio

Question

Does the reported 24-hour volume plausibly pass through the observable order book?

Rationale

An exchange reporting $1B daily BTC/USDT volume with a $50k order book has an implied book turnover rate that is physically unusual. Reference venues show a ratio of 24h volume to mean bid+ask depth within ±2% of mid that sits in a consistent empirical range. Values outside this range — in either direction — are atypical relative to the peer basket.

Inputs

Volume24h USD notional, computed from own trade data — exchange-reported ticker is not used
DepthTime-averaged ±2% book depth across 1,440 one-minute snapshots
Coverage thresholdThe metric is not scored for that day below 50% snapshot coverage

Score mapping

Non-monotonic in the raw ratio R: both unusually low and unusually high R are atypical, and the typical zone sits in the middle of the empirical peer-basket range. One of two metrics in the spec with this shape (the other is M10).

Cost to replicate

Matching both ends of this ratio requires maintaining real depth at all times. Deep books carry a direct cost: market-maker incentives, inventory risk, and capital that could otherwise earn yield. This is the metric most difficult to match without the underlying market fundamentals.

M03 Trade interval entropy

Question

Are trades arriving at times consistent with Poisson-like arrival processes?

Rationale

Organic trading produces trade arrival times that follow approximately a Poisson or Hawkes process — bursty, self-exciting, with heavy-tailed inter-arrival times. Bot-driven flow often produces either suspiciously regular intervals (exact 1-second spacings, periodic patterns) or unnaturally uniform distributions. Shannon entropy of the inter-arrival distribution captures both failure modes.

Inputs

Data sourceTrade timestamps for the target pair, 24h window
BucketingLogarithmic: 0-100ms, 100ms-1s, 1s-10s, 10s-100s, 100s+
Minimum sample5,000 trades; below this the metric is not scored for that day
Peer baselineEmpirical distribution of trade-interval entropy observed across the reference peer basket, recalibrated periodically

Score mapping

Monotonic in the absolute z-score of entropy against the peer baseline: both abnormally regular and abnormally uniform arrival patterns map to a lower score. Typical: within 2σ of peer mean. Atypical: beyond 2σ in either direction.

Cost to replicate

Matching a Poisson-like arrival distribution requires sophisticated bot design that injects timing variability deliberately. Simpler automation tends to produce detectable regularity at the millisecond or second scale.

M04 Effective spread

Question

What does it actually cost to trade?

Rationale

Quoted spread (best ask − best bid) can be very narrow while the venue is quiet. Effective spread — the realised cost of market orders relative to mid-price — measures actual execution cost against actual trades.

Inputs

Mid-price reference1-minute order book snapshots
Trade directionBuy/sell from feed where provided; Lee-Ready inferred otherwise
AggregationVolume-weighted average over 24h, in basis points
Low-volume fallbackWiden window to 7-day rolling for statistical power

Score mapping

Monotonic in effective spread: lower bps maps to a higher score. BTC/USDT benchmarks: under 5 bps is excellent, 5-15 bps typical, above 50 bps atypical. Exact thresholds are anchored to the live peer distribution and recalibrated periodically.

Cost to replicate

Effective spread reflects actual execution cost. A low effective spread without lowering fees or subsidising market makers is not easily replicated — both substitutes carry direct costs.

M05 Order book slope (Kyle's λ proxy)

Question

How much does the book absorb? What is the price impact of size?

Rationale

A deep book has a gradual slope: a $1M sell order moves price incrementally. A book with thin layers beyond the inside quote shows a steep slope at size. The relationship between executed size and price impact is a proxy for Kyle's lambda, computable from public order book data alone.

Inputs

Snapshots1-minute frequency, full depth within ±5% of mid
Standard sizes$10k, $100k, $1M notional on each side
Outlier treatmentSizes winsorised at 99.9th percentile

Score mapping

Monotonic in λ expressed as slippage-bps for a standard $100k trade: lower λ maps to a higher score. Typical range 10-50 bps; values above 200 bps are atypical.

Cost to replicate

Matching this requires maintaining real depth across multiple price levels. The capital cost scales with the square of the depth advertised.

M06 Quote stability and update rate

Question

Are quotes being maintained, or is the book updating at rates disproportionate to actual trading?

Rationale

Quote update rates disproportionate to trade rates can indicate rapid cancel-and-replace cycles that make displayed depth difficult to hit in practice. High update-to-trade ratios are associated with low effective fill rates in several academic studies of exchange microstructure.

Inputs

Order book streamWebSocket add/cancel/modify events from the continuous WS consumer
Peer baselineEmpirical distribution of the quote-update-to-trade ratio observed across the reference peer basket, recalibrated periodically
Coverage thresholdIf WebSocket sequence gaps cover more than 50% of the day, the metric is not scored for that day

Score mapping

Monotonic in the absolute z-score against the peer baseline: deviations from the baseline ratio in either direction map to a lower score. Typical: within 2σ. Atypical: beyond 2σ.

Cost to replicate

A lower update ratio requires more stable quoting, which uses market-maker capital and inventory tolerance.

M07 Cross-venue price deviation

Question

Does this venue's price track the global market?

Rationale

A venue's mid-price for BTC/USDT should track the volume-weighted global mid within a tight tolerance, given arbitrage incentives. Persistent deviations may reflect lower arbitrage activity, stale feeds, or isolated price formation on that venue.

Inputs

SamplingMid-price every 60 seconds, target exchange and reference basket
Reference basketBinance, Coinbase, Kraken, OKX (volume-weighted), excluding target if a basket member
Cross-rateUSD/USDT translated via Kraken USD VWAP / Binance USDT VWAP at each timestamp

Score mapping

Monotonic in mean absolute deviation from the reference basket: lower MAD maps to a higher score. Typical: MAD under 5 bps and tail under 30 bps. Atypical: MAD above 50 bps, or persistent autocorrelation above 0.5.

Cost to replicate

Keeping prices aligned with the global market requires arbitrage linkage. Without underlying liquidity to support two-way arbitrage, alignment is difficult to maintain.

M08 Mid-price reversion dynamics

Question

After large trades, does price revert in a way consistent with real market impact?

Rationale

In real markets, informed trades produce permanent impact and uninformed trades produce temporary impact that reverts. If trades at a venue show near-complete reversion within seconds regardless of size, this is inconsistent with the mix of informed and uninformed flow observed at reference venues.

Inputs

Trade filterPrints above approximately $50k notional
Mid-price series1-second resolution, around each large-trade event
Peer baselineEmpirical distribution of the full-reversion fraction observed across the reference peer basket, recalibrated periodically
Minimum sample50 large trades per day; else 7-day rolling fallback

Score mapping

Monotonic in distance above the peer reference reversion fraction: reference venues show 20-40% full reversion, and values above that band map to lower scores. Atypical above ~70% — inconsistent with the mix of informed flow observed at the peer basket.

Cost to replicate

Matching this pattern requires trades that carry genuine price impact, which means actually moving the book and taking position risk.

M09 Funding rate / basis sanity

What it measures: For exchanges offering perpetual futures, whether the funding rate and basis stay within no-arbitrage bounds relative to spot. Systematic drift indicates limited arbitrage activity between spot and derivatives legs.

Why it is not scored in v1: Scoring M09 consistently across all ten covered venues requires a dedicated derivatives data pipeline. Three of the current venues are spot-only (Coinbase, Kraken, Gate.io) and would always return not applicable, making cross-venue comparison misleading. M09 is computed internally and will be published in a future methodology version once derivatives coverage is consistent.

M10 Volume share vs liquidity share

Question

Does this exchange's share of global volume match its share of global liquidity?

Rationale

A venue's share of global volume and its share of global depth should be of similar order. If a venue shows 15% of global BTC/USDT volume but 2% of global order book depth at ±2%, the ratio is unusual relative to peers.

Inputs

Volume24h notional, computed from own trade data — exchange ticker is not used
Reference volumeAggregate 24h across the peer basket, excluding target if a basket member
DepthTime-averaged ±2% depth for target and peers

Score mapping

Non-monotonic in R = vol_share / depth_share: both unusually low and unusually high R are atypical. Typical: R in [0.7, 1.5]. Atypical: outside [0.3, 5] in either direction. The other non-monotonic metric in the spec (alongside M02).

Cost to replicate

This is the measurement hardest to match without the underlying fundamentals. Matching both ends requires real volume and real depth simultaneously.

M11 Price leadership and contribution to price discovery

Question

Does this exchange lead global price discovery or lag it?

Rationale

Venues with informed flow tend to lead price moves — price discovery happens there first and others follow. Venues with predominantly uninformed flow lag. Hasbrouck's information share and Gonzalo-Granger contribution to common factor quantify this directly. The v1 implementation uses a lighter lead-lag correlation proxy of equivalent intent.

Inputs

Mid-price series1-second resolution, target + peer basket (excluding target if a member), 24h window
Peer baselineEmpirical distribution of price-leadership and information-share values observed across the reference peer basket, recalibrated periodically

Score mapping

Conditional rather than purely monotonic: information share above 10% of global is typical for large venues, below 2% typical for small venues. A volume share above 5% combined with an information share below 1% is atypical — it suggests volume without discovery.

Cost to replicate

Matching this requires attracting informed traders, which compounds over time and is difficult to short-circuit.

Composite scoring

CodeDimensionMetricsWeight
D1Volume authenticityM01, M02, M0330%
D2Order book qualityM04, M05, M0625%
D3Price formation integrityM07, M08 (M09 deferred)25%
D4Cross-venue consistencyM10, M1120%

Step 1 — Metric normalisation

Each raw metric value is converted to a 0-100 score using piecewise linear mappings defined in a versioned mapping file. Higher score means higher quality. Every score row is stamped with the mapping_version active at computation time, so historical scores remain interpretable after methodology updates.

Step 2 — Dimension scores

The dimension score is the arithmetic mean across that dimension's metrics that returned status='ok'.

Step 3 — Composite score

The composite is a weighted mean across dimensions: 0.30·D1 + 0.25·D2 + 0.25·D3 + 0.20·D4. A composite is only published when every dimension has at least one scoring metric.

Step 4 — Band assignment

1≥ 85
275 - 85
360 - 75
445 - 60
5< 45

D3 (Price Formation Integrity) averages over M07 and M08 only in v1; funding rate and basis metrics applicable to venues offering perpetual futures are planned for a future methodology version.

What stays proprietary

Most calibration parameters are not published in full. The exceptions are listed first.

  • M01 score-mapping anchors are public. The five-anchor piecewise-linear curve from χ²/N to score is set out in the M01 entry above. Published following operator review; intended as a worked example of the calibration shape and as a transparency commitment for the metric most often cited in commission discussions.

The remainder of the calibration set remains proprietary:

  • Score-mapping anchors for M02 through M11. The score ranges shown qualitatively in each metric entry are illustrative; production thresholds are calibrated from observed distributions across the peer basket.
  • Peer baseline calibration parameters for M03, M06, M08, and M11.
  • Per-metric outlier rules beyond the published 99.9% winsorisation.

Threshold calibration uses observed distributions across a reference peer basket and is recalibrated periodically. Full calibration parameters are available to data licence customers; the baseline_version and mapping_version stamped on each score row allows any customer to reproduce any historical score exactly.

Data sources

All 10 covered exchanges are ingested via their public REST and WebSocket APIs. No authentication, no paid feeds, no exchange cooperation is required at any stage.

USD-quoted venues (Coinbase, Kraken) are handled in venue-native pairs for single-venue metrics. For cross-venue comparison, USD prices are translated via the Kraken USD VWAP / Binance USDT VWAP basis computed at each timestamp.

Limitations

Assay scores are narrow by design. They describe market data integrity, not broader aspects of an exchange's operation. The following are not measured and should not be inferred:

  • Custody security or solvency
  • Regulatory standing or licensing status
  • User experience, customer support quality, or dispute handling
  • Fiat on-ramp / off-ramp quality
  • Derivatives market quality (v1 is spot-only)
  • Token-level listing quality for pairs other than BTC and ETH

A high Assay score does not imply safety. A low Assay score reflects market data characteristics outside the typical range — it does not imply intent.

Funding rate and basis sanity (M09) is computed internally but not published in v1. It applies only to venues offering perpetual futures and is absent for spot-only exchanges; a consistent treatment across all covered venues requires a dedicated derivatives data pipeline, which is planned for a future release.

Version history

VersionNotes
v0.1.1 M09 (funding rate / basis sanity) deferred from public scoring pending derivatives data pipeline.
v0.1.0 Initial release. 10 exchanges, 11 metrics, 2 pairs.