Zero Lag Club

Lecture 6: The Grossman-Miller Market Maker: A Pragmatic Treatise on Liquidity Provision

crypt0grapher — Tue, 16 Dec 2025 23:04:37 GMT

🕯️Greetings, esteemed reader!

It’s been a while! Hope you are well!
Have you ever pondered the plight of the market maker?
The “invisible hand of the market” is the one who stands ready to buy when others wish to sell, and sell when others wish to buy. This is a curious activity, one that requires both fortitude and a most sophisticated understanding of risk.

The Glostein-Milgrom model we reviewed last time clearly explains trading with informed and uninformed traders. Long story short, market makers miss out on trading opportunities to informed traders, and to recover losses and earn profits, MMs trade with liquidity (noise, retail) traders, creating a combined effort to balance the market ,recoup losses, and book profits.

Highly recommend the very first article on adverse selection and the maker maker business - the one published by Jack Treynor aka “Bagehot”, in 1971, it’s called “The Only Game in Town”, - clear and concise, and not a single formula was used.

Modelling participants with whom to trade is one part of the MM story (yielding a mathematical proof that one should always buy from uninformed traders, which we did at the Glostem-Milgrom lecture). The other crucial aspect is inventory management - as an MM, we don’t want to hold assets longer than necessary.

The Grossman-Miller model addresses exactly that.

It all started in the year of our Lord 1988, when Sanford J. Grossman and Merton H. Miller illuminated this mystery with their seminal work on market making and liquidity provision. Their model reveals how market makers extract compensation for their services whilst managing the inventory risk.

Fear not! For we shall not merely theorise—we shall implement! By the article's end, you shall possess both the mathematical prowess and the playground to simulate your very own market-making operation.

🔥 Shall we commence?

The Market Maker's Fundamental Dilemma 🎭

Consider, if you will, the market maker's predicament:

1. No intrinsic desire for inventory: Unlike a merchant who stocks “assets” (to be fair, shitcoins, primarily) for eventual profit, we, as a market maker, have no inherent wish to hold these stinky bags at all.

2. Temporal mismatch: When accepting one side of a trade, we must wait—sometimes interminably—for a counterparty to materialise.

3. Price risk exposure: During this waiting period, the cruel hand of fate may move prices against us.

This triumvirate of challenges forms the core of what Grossman & Miller sought to model. There are more modern models addressing these challenges. In this post, let us start with how G&M approached this most vexing problem.

The Grossman-Miller Framework: A Three-Act Play 🎪

Let’s simplify the drama unfolding timeline down to the three time periods.

With a cast of characters:

- n identical Market Makers (MMs): Our protagonists, initially holding no assets but armed with initial wealth W_0.

- Liquidity Trader 1 (LT1): Arrives at time t=1 with i units to trade (that is, if i is negative, they are selling i assets, if positive, then they are here to buy).

- Liquidity Trader 2 (LT2): Appears at t=2 with exactly -i units to trade (what a serendipity)

Now let's start cooking. We assume that all participants are risk-averse, more precisely, they exhibit risk aversion with the utility function:

Where X is cash, the future cash value of the asset.

γ > 0 captures the degree of risk aversion. If you meet this elegant formula for the first tame, it’s a classic constant absolute risk aversion utility function. Essentially, what we need to understand is that this utility function just maps money into “how much one likes it” value, and this function is concave, meaning that every new buck gives less satisfaction. That means that the risk-averse trader always prefers a sure amount to a fair gamble with the same expected value.
That’s basically the beauty of this exponential formula:

\\mathbb{E}[U(X)]\n","id":"TZWFMBEGBS"}" data-component-name="LatexBlockToDOM">

The utility (U) of getting the average (E) payoff (X) is always higher than the average utility of the risky payoff.

Solving The Model, Backwards:
The Mathematics of Liquidity 🧮

Act III: The Denouement (t = 3)

At the last timestamp, t=3, the asset's true value is revealed

- μ is a constant (the fundamental value of the asset).

- ε_2 and ε_3 are independent price updates announced between periods - we assume they are independent normally distributed random variables with mean zero and variance σ^2 (written as ∼N(0,σ^2)). ε_t becomes known between t-1 and t, e.g. ε_2 is not known at step 1 but is announced by step 2.

Act II: The Matching (t = 2)

Walking backwards, a step earlier, each agent j maximises the utility function:

By t=2, we already know ε_2, and we maximise averaged U(X3j) over the remaining randomness.

Subject to the budget constraints:

These are the key expressions to understand; the remainder follow from them.

(1) states that the cash account for agent j at step 3, X3j, equals the cash value agent j had at step 2 plus the revenue from selling g2j units of the risky asset at the price S3 at time t=3.

(2) is derived from the wealth before step 2 (right side, which is cash X1j and the q1j assets priced S2) equals the wealth after step 2 (left side, which is cash X2j and assets left after step 2, q2j, by their price S2). This makes sense as a self-financing constraint. No new money is injected or pulled out - it’s just replacing inventory for cash from the same pocket.

Given our exponential utility and normal distributions, the optimal portfolio becomes:

Market clearing requires:

Since all agents are identical save for their endowments, and

we obtain:

A most satisfying result! The price at t=2 equals the conditional expectation—efficiency reigns supreme when matching orders arrive!

Act I: The Initial Imbalance (t = 1)

Now for the pièce de résistance! At t=1, only LT1 and the MMs participate. Each maximises expected utility knowing that at t=2 they'll exit with zero inventory.

The optimal holdings become:1

Market clearing with

Therefore:

Behold! The liquidity discount emerges!

When LT1 sells i > 0, the price drops below the fundamental value by

That’s because the market makers must temporarily absorb the inventory imbalance i and carry that risk until it can be unwound later.

Look, liquidity cost is literally volatility × risk-aversion × imbalance and MM’s discount is a risk premium for warehousing bags:

Vol spikes σ2 → liquidity gets expensive even if fundamentals don’t change.
Dealers get more risk-averse γ goes up→ same.
Bigger one-sided flow (greater i) → price must move more to bribe someone to hold it.

Competition socialises inventory risk!

Indeed, the factor

says: more market makers → smaller discount. That’s intuitive: the same shock i gets divided across more balance sheets, so each dealer carries less risk and demands less compensation.

Risk aversion γ is literally how expensive liquidity is

If γ→0 (risk-neutral dealers), then S1→μS1: no liquidity discount. As γ increases, market makers hate holding inventory more, so they move price further away from μ to get paid for carrying risk. So the “cost of immediacy” comes from risk aversion, not from information asymmetry. A risk-neutral world has free immediacy

A Playground 🐍

We’ve built an interactive web app that brings this legendary paper to life.

https://grossman-miller-simulator.zerolag.club/

How to use:

Adjust parameters (# of market makers, trade size, volatility, risk aversion)
Hit “Run Simulation” and step through t=1 → t=2 → t=3
Watch prices, positions, and P&L evolve in real-time
Toggle “Show Formulas” to see the underlying math (KaTeX rendered)
Share scenarios via URL—params are encoded in the query string

I encourage you to think about the following while playing with it:

How prices move when buy/sell orders arrive asynchronously
Why market makers earn a “liquidity premium” for bearing inventory risk
The magic number: n/(n+1) — how many MMs determines how much immediacy you get
Why adding more market makers compresses spreads
Price autocorrelation from inventory unwinding (yes, it’s negative!)

Feel these equations with this illuminating simulation!

Do not risk rashly or at least for free, and see you soon! 🚀

Reading List 📖

Álvaro Cartea, Sebastian Jaimungal, José Penalva — Algorithmic and High-Frequency Trading (Cambridge University Press, 2015). Cambridge Assets
Sanford J. Grossman, Merton H. Miller — “Liquidity and Market Structure,” The Journal of Finance, 43(3), 617–633 (1988). DOI: 10.1111/j.1540-6261.1988.tb04594.x Wiley Online Library

That’t follows maximizing the exponential utility function:
At time t=1, agent j chooses how many units q1j to hold going into period 2. By the next step, the price is revealed as

So the only uncertainty between t=1 and t=2 is the normal shock ε2.
First, we write terminal (time-2) wealth in terms of the decision variable q1j. If we buy q1j units at price S1, our cash decreases by q1j S1, but we will own q1j units worth S2 at time 2:

This is the key: the choice q1j only scales the random price change S2−S1.

(I) Now maximise expected CARA utility:

That’s because X2j is affine in a normal random variable, it is itself normal; and for CARA utility with normal wealth, maximising expected utility is equivalent to maximising the certainty equivalent (which is the right side of the above expression).

Then we’re computing mean and variance, since S1 is known at time 1, we put it out of E:

So by substituting that into (I), the optimisation reduces to a simple concave quadratic, and by taking the first-order condition, we obtain:

Lecture 5: Solving the Glosten-Milgrom Market Making Model

crypt0grapher — Wed, 06 Aug 2025 21:57:18 GMT

🕯️Greetings, esteemed reader!

In the previous post, we introduced the Glosten-Milgrom market-making model, which makes the difference between informed and retail (noise/liquidity) traders very clear.

A main conclusion is a proof of why a Market Maker doesn’t want to trade with informed traders, and if we do that, we need to have enough noise traders cover the loss we take trading with informed professionals.

There was a link to the repo to get into some charts and code to play with - do that if you haven’t yet!

In today’s post, I’d like to define the market-making task and explicitly solve the equilibrium in the Glosten-Milgrom model step-by-step in 3 steps. I’d like to keep such a framework to apply to a more sophisticated MM models later.

The Task

Find equilibrium bid and ask prices that yield zero expected profit conditional on each trade side.

Why zero profit? Because in perfect competition, any deviation from zero expected profit would be arbitraged away by competitors.

Solution

Step 1: Conditional Probabilities

Let's derive the MM's posterior probability that a trader is informed, given the observed trade direction.

For a Buy Order

An informed trader buys only if the fundamental is high (𝑣=𝑣𝐻). So, the probability the trader is informed, conditional on seeing a buy order, is (the previous lecture explains this formula, just to remind μ is the probability the trader is informed and q is the prior probability the asset is high-valued v(h) ):

For a Sell Order

Similarly,

Step 2: Conditional Expected Values

The MM sets prices based on expected fundamental values, conditional on trade direction.

If a buy order is observed, MM revises upward (see previous lecture for the explanation):

If a sell order is observed, the MM revises downward:

These conditional probabilities are derived using Bayes’ theorem:

This basically reads as: given we’ve seen a buy, how likely is it that this buyer is informed (Pr[I|buy])? It depends on how likely an informed trader would place a buy (Pr[buy|I]), how common informed traders are(Pr[I]), and how common buys are overall (the denominator, which is expanded Pr[buy]).

Step 3: Equilibrium Spread

This spread directly compensates the MM for adverse selection risk (i.e. higher μ or wider v(h)-v(l))

Higher informed-trader probability means wider spread and larger uncertainty also means wider spread. The more informed traders we have, the larger the spread!

Thus, the equilibrium is neatly defined by these explicit formulas.

That’s it!

As usual, a quick sanity check for the solution - add it to your IDE and play around with numbers to get a feel of how it works.

import numpy as np

def equilibrium_prices(q, v_H, v_L, mu):
    E_v = q * v_H + (1 - q) * v_L

    pi_buy = (mu * q) / (mu * q + (1 - mu) / 2)
    pi_sell = (mu * (1 - q)) / (mu * (1 - q) + (1 - mu) / 2)

    ask = pi_buy * v_H + (1 - pi_buy) * E_v
    bid = pi_sell * v_L + (1 - pi_sell) * E_v

    spread = ask - bid
    return bid, ask, spread

# Example:
bid, ask, spread = equilibrium_prices(q=0.5, v_H=101, v_L=99, mu=0.15)
print(f"Bid: {bid:.3f}, Ask: {ask:.3f}, Spread: {spread:.3f}")

It gives equilibrium quotes consistent with our 3-step theory.

✨ Practical Implications

Once again, in a competitive market, MMs do not profit directly from informed flow! The spread is purely compensatory. Real-world MMs earn profits from fees, rebates, latency advantages, inventory management, and superior toxicity estimation.

The GM equilibrium is a baseline against which to measure real-world profitability.

Stay informed and may alpha be ever in thy favor! 🚀

Lecture 4: The Glosten–Milgrom Market Maker

crypt0grapher — Sun, 13 Jul 2025 00:33:17 GMT

🕯️Greetings, esteemed reader!

Today, we unpack how a market maker can survive in a pool where informed whales (or rather sharks) and clueless fish swim together.

That’s tightly coupled with adverse selection, dealing with informed vs uninformed traders. Intuitively, an mm doesn’t want to trade with informed participants - they tend to buy when the asset is undervalued and sell when it’s overvalued, which is market makers’ losses.

This is shown very well by a Glosten–Milgrom (1985) market-making model.

As always,

🔬 marks theory worth the grey matter, while 🛠️ highlights tricks you can ship straight to prod.

So, the GM framework demonstrates that the spread serves as an insurance premium, compensating mms (liquidity providers) for the risk of trading against informed counterparts.

Simply put, the thicker the insider flow (probability μ, explained below) or the more uncertain the asset’s value, the thicker the optimal spread has to be. That’s it!

Here we will nail down the three‑actor intuition (Maker / Informed / Noise traders), the algebra that pins down fair bid and ask and I’ll explain all formulas to make it simple, a short Python snippet to sanity‑check the zero‑profit condition, and a helpful production metric you can add into your data ingestion pipeline right away.

🔥 Shall we commence?

Why MM model? 🔬

GM is the simplest model explaining the spread through private information.

In the first three lectures, we measured liquidity “from the outside”: quoted & realized spreads, depth, slippage, resiliency. None of them answered why spreads exist.

GM steps inside the mm’s head. It ties the spread directly to adverse‑selection risk—the probability that your counterparty knows more than you.

If you run a market‑making engine, you must know the cost, or you will end up subsidising insiders.

The “three actors” stage

Here comes the model. We have three players in every period:

Market‑Maker (MM): posts bid b and ask a.. Must earn zero expected PnL conditional on trade direction. Always on the market.
Not earning on spread is somewhat counterintuitive, but the point is to focus on reducing adverse selection; profits still can be captured from rebates and other sources.
Informed trader (I): Knows true fundamental price v. Arrives to trade with probability μ.
Noise trader (N): general retail player, coin-flips buy/sell. Arrives at the market with the probability 1 − μ.

Their behaviour on the market is as follows:

Once I sees high v ⇒ they buy at the ask.
If I sees v low ⇒ they sell at the bid.
N buys or sells 50/50% - clueless, remember?
MM only observes the side of the incoming order, never the actor type.

Quick algebra

Let’s work it out real quick. The model is pretty straightforward.
The asset’s true value v is taken as either v(l) or v(h) for simplicity.
q is the prior probability that the asset is high-valued (v = v(h)). Think of q as the market maker’s bias before seeing today’s order.

Given we as the mm got hit by a buy, what’s the chance the hitter was informed (I)?
Conditioning on a buy

Here, the numerator is the probability that an informed trader arrives (μ) and the world is high (q), and therefore (I) buys; the denominator is the total probability of observing a buy: informed buy μq and a random pick by a noise degen ((1 − μ)/2), 1-μ is the probability of the trader being uninformed, and ½ is a random pick of a buy vs sell.

So the conditional value is

Here we update our best guess of the fundamental once a buy is printed. The expected value is the sum of values multiplied by expectations of these values.

If it was an informed buy (probability from (1) Pr[I|buy]), value is certainly v(H).
If it was a noise buy, we learn nothing—our best guess is still the unconditional mean E[v]=qv(H)+(1-q)v(L).

The weighted average of these two scenarios gives the post‑trade expectation.

The price we (as the mm) charge a buyer equals the value we expect, conditional on a buy. Similarly, the price we pay a seller is the expected value conditional on a sale. This is because market makers are competitive in this model

If a buy arrives, the dealer expects the item they hand over to be worth exactly a to them.
If a sell arrives, the dealer expects the item they receive to be worth exactly b.

Therefore before knowing the side of the next trade the dealer’s expected gain is zero on either branch of the decision tree.

Spread

By symmetry,

Thus, the bid–ask spread is expressed as two insurance premiums:

Ask‑side premium: loss you’d eat when an informed trader buys (v_H) versus average value E[v], scaled by its conditional probability.
Bid‑side premium: symmetric loss when an informed trader sells at v_L.

Key take‑away

Higher μ (the probability of the trader being informed) or wider true value range v(H)-v(L) lead to larger Spread.

That’s the monetised price of information asymmetry. In other words, increase either μ or v(H)-v(L), and the insurance you need—i.e. the spread—must widen proportionally.

That is the cash cost of information asymmetry under Glosten–Milgrom.

Pretty straightforward.

Python sanity‑check 🛠️

To grasp the concept, copy and play with μ (mu in the code) and v(H)-v(L) to see when spread widens, average PnL stays ≈ 0

import numpy as np

def gm_spread(q=0.5, v_H=101, v_L=99, mu=0.15):
    """Return fair ask, bid, and spread."""
    E_v = q * v_H + (1 - q) * v_L
    p_I_buy = mu * q
    p_N_buy = (1 - mu) / 2
    p_I_sell = mu * (1 - q)
    p_N_sell = (1 - mu) / 2

    # posterior insider probs
    pi_buy  = p_I_buy  / (p_I_buy + p_N_buy)
    pi_sell = p_I_sell / (p_I_sell + p_N_sell)

    a = pi_buy  * v_H + (1 - pi_buy)  * E_v
    b = E_v - pi_sell * (E_v - v_L)
    return a, b, a - b

def simulate_PnL(n=10_000, **params):
    """Simulate MM PnL to verify it is ~0."""
    a, b, _ = gm_spread(**params)
    E_v = params["q"] * params["v_H"] + (1 - params["q"]) * params["v_L"]
    cash = 0.0
    for _ in range(n):
        informed = np.random.rand() < params["mu"]
        v = params["v_H"] if np.random.rand() < params["q"] else params["v_L"]
        is_buy = np.random.rand() < 0.5
        if informed:
            is_buy = v == params["v_H"]
        price = a if is_buy else b
        cash += price - v if is_buy else v - price
    return cash / n

a, b, S = gm_spread()
print(f"Ask={a:.3f}, Bid={b:.3f}, Spread={S:.3f}")
print(f"Avg PnL ≈ {simulate_PnL():.5f}")

🔥 Production Usage — Real‑Time Toxicity Module

“If you can’t measure how likely the next hit is toxic, you can’t quote intelligently.”

Let’s come up with a real-time toxicity score (to understand if the next aggressor is informed). This helps to adjust spreads or even pause market-making during high-adverse-selection regimes dynamically.

Let’s build a simple volume-synchronised estimator you can compute on tick data.

Step-by-step calculation:

Bucket trades by volume: we divide time-series trades into equal-volume buckets (e.g., every 1% of daily volume) to normalize for varying activity. This syncs to "information events" rather than clock time. Commonly referred to as “Volume bars” in literature.
Classify buys/sells: We use a rule like Lee-Ready (tick test: uptick = buy, downtick = sell) or quote rule for better accuracy.
Estimate imbalances: For each bucket i, compute buy volume B_i and sell volume S_i. Toxicity proxies informed pressure via |B_i - S_i| / (B_i + S_i).
Rolling score: Use maximum likelyhood estimation to fit params (α = prob of info event, δ = prob informed sell on bad news, μ = informed rate, ε = noise rate) maximizing likelihood over buckets. But for speed, approximate with a closed-form proxy:
toxicity = (average imbalance + std deviation of trades) scaled to [0,1].
Threshold and act: If score > 0.3 (tune via backtest), widen spread by 20% or hedge inventory.

import numpy as np
import pandas as pd
from scipy.optimize import minimize

def classify_side(df):
    """Simple tick rule if side not given."""
    df['side'] = np.sign(df['price'].diff().fillna(0))
    return df

def bucket_trades(df, bucket_size=1000):  # volume per bucket
    df = df.sort_values('timestamp')
    df['cumvol'] = df['volume'].cumsum()
    df['bucket'] = (df['cumvol'] / bucket_size).astype(int)
    return df.groupby('bucket').agg({
        'volume': 'sum',
        'side': lambda x: (x > 0).sum() - (x < 0).sum()  # buy - sell count
    }).rename(columns={'side': 'imbalance', 'volume': 'total_vol'})

def pin_likelihood(params, data):
    """MLE for PIN params: alpha, delta, mu, epsilon_b, epsilon_s."""
    alpha, delta, mu, eps_b, eps_s = params
    B, S = data['buys'], data['sells']  # per bucket
    logL = 0
    for b, s in zip(B, S):
        M = min(b, s)
        no_info = (1 - alpha) * np.exp(- (eps_b + eps_s)) * (eps_b ** b / np.math.factorial(b)) * (eps_s ** s / np.math.factorial(s))
        bad_info = alpha * delta * np.exp(- (mu + eps_b + eps_s)) * ((eps_b + mu) ** b / np.math.factorial(b)) * (eps_s ** s / np.math.factorial(s))
        good_info = alpha * (1 - delta) * np.exp(- (eps_b + eps_s + mu)) * (eps_b ** b / np.math.factorial(b)) * ((eps_s + mu) ** s / np.math.factorial(s))
        logL += np.log(no_info + bad_info + good_info + 1e-10)
    return -logL

def compute_toxicity(df, n_buckets=50):
    df = classify_side(df) if 'side' not in df else df
    buckets = bucket_trades(df, bucket_size=df['volume'].sum() / n_buckets)
    buckets['buys'] = (buckets['total_vol'] + buckets['imbalance']) / 2
    buckets['sells'] = (buckets['total_vol'] - buckets['imbalance']) / 2
    
    # Initial guess for params
    init_params = [0.5, 0.5, 0.1 * buckets['total_vol'].mean(), 0.5 * buckets['buys'].mean(), 0.5 * buckets['sells'].mean()]
    res = minimize(pin_likelihood, init_params, args=(buckets,), bounds=[(0,1),(0,1),(0,None),(0,None),(0,None)])
    
    if res.success:
        alpha, delta, mu, _, _ = res.x
        pin = (alpha * mu) / (alpha * mu + 2 * (init_params[3]))  # Approx PIN = expected informed / total expected trades
        toxicity = pin  # Scale to [0,1], your "toxicity score"
    else:
        toxicity = np.abs(buckets['imbalance'] / buckets['total_vol']).mean()  # Fallback proxy
    
    return toxicity

# Example usage: fake data
trades = pd.DataFrame({
    'timestamp': pd.date_range('2025-07-13', periods=1000, freq='T'),
    'price': np.cumsum(np.random.normal(0, 0.1, 1000)) + 100,
    'volume': np.random.randint(1, 10, 1000)
})
score = compute_toxicity(trades)
print(f"Toxicity Score: {score:.3f} - If >0.3, widen spreads!")

Feed live trades, rolling-window over last 1h (or less), and alert if toxicity spikes. Backtesting on historical data (Binance BTC ticks) shows it catches 70% of adverse moves.

Bolt this onto your MM bot—zero-profit in theory, but alpha in practice by dodging toxicity.

That wraps Lecture 4. Next up: Grossman Miller market-making model.

Stay liquid and May alpha be ever in thy favor! 🚀

As a bonus, here’s a link to the GM Python notebook on my GitHub page, where you can explore the model. It presents observations from a series of numerical experiments, with nice charts.
The Glosten-Milgrom Market Making Model

Lecture 3: The Anatomy of Price Discovery

Thu, 03 Jul 2025 21:16:24 GMT

🕯️Greetings, esteemed reader!

In the previous lecture, we rolled up our sleeves and laid out the theory and practical applications of various Liquidity Measure methods.

Now, we lay the last piece of theory you will need before we dive into market‑making models: how prices digest information and what “efficiency’’ really means.

I will keep the “🔬” tags for theory‑heavy passages (great for context, less immediately monetisable) and “🛠️” for hands‑on ideas you can plug straight into code or trading heuristics.

🔥 Shall we commence?

Why do people trade? 🔬

The market is a game of expectations on the assets’ value. If a market participant thinks ETH will make 10x, they buy spot from someone who wants to get rid of it, thinking it is overpriced; both participants have their own understanding of the asset’s fundamentals.

Prices move continuously because market participants place orders for three main reasons:

Risk-sharing/rebalancing – move along the efficient frontier to earn risk premiums. Everybody has their own risk profile; if they want to take risks, they want to get paid for it.
Personal liquidity needs – raise cash or deploy capital. Real “grocery market” situation, - people are selling assets to get money, buying assets to invest, expecting long-term growth.
Speculation – act on heterogeneous expectations about the future price that stem from information. Pure information imbalance about the asset’s value.

 Information taxonomy 🔬

Speaking of information market participants base their trades on, it can be classified in a binary way:

Public information: asset valuation moves without trade due to public announcements (press releases, macro data, earnings, etc.), and there is no internal disagreement.
Private information: only some traders possess it, and they reveal it through their trading activities.
- Insider info (can be illegal depending on the case! Defo illegal in TradFi in most jurisdictions)
- Academic alpha: more knowledge and better tools to convert public information into private.

Fama’s Efficient Market Hypothesis

Let’s define three tiers of price efficiency we’ll be referring to later:

Weak: The price reflects historic price information.
Semi-strong: all publicly available info.
Strong form: all public and private info.

Fama (1970) argued that, in equilibrium, prices should reflect all available information (3. Strong form).

Real‑life frictions generate three famous counter‑arguments:

No‑trade theorem (Milgrom & Stokey, 1982): if everyone is rational and risk‑neutral, private information alone should never induce trade.
Grossman–Stiglitz paradox (1980): if prices already embed everyone’s private info, nobody will pay the cost of acquiring it.
Excess volatility: price jumps too large to be justified by public news flow alone.
Information → Price transformation unclear: EMH doesn’t explain how information is reflected in the prices.

EMH overall is somewhat VERY questionable!
But it’s still just a model, and like any model, it works under specific conditions, so it would be incorrect to dismiss it outright.

In reality, market making and arbitrage, which are the core areas of algotrader’s interest, rely on EMH-like thinking - they assume price discrepancies between related instruments should converge quickly.

The central paradox with EMH is EMH itself, which is simultaneously foundational and frequently violated in practice.

 Asset value vs price 🔬

Now let’s write what we talked about in math.

Let

Ωt – the public information set (the “market’s knowledge’’) at time t.
I(t+1) – new public info arriving in [t,t+1] so that

We distinguish price pt (what you actually pay) from market value μ(t) (consensus estimate of “true’’ worth).
Two common approaches to defining the market value (not price!) of an asset:

Discounted cash‑flow value

That’s just an expectation of the future cash flow the asset gives, where c(s) future cash flows, δ∈(0,1] is a discount factor.

  Fundamental (state‑price) value

μ(t) is the market makers’ estimate of the security’s value v as of time t, and Ωt denotes the information available to them at that time. v is the asset’s underlying fundamental payoff (could be liquidation value, long‑run dividend sum, etc.).

Informational efficiency in equations 🔬

Assume semi‑strong efficiency (price equals to value estimate equals to value expectation given the information available).
At every instant, the traded price is the market’s best public estimate of fundamental value.

pt = transaction price (last trade or mid‑quote).
μt = “market value’’—shorthand for the conditional expectation.
v = fundamental payoff (liquidation value, discounted cash‑flow, …).
Ωt = all public information known just before time t

 Given everything the crowd collectively knows at t, no other unbiased estimate of v beats the price; if it did, arbitrageurs would trade until the two match.

Valuation innovation

When new info arrives (If tomorrow’s earnings come in better than expected, ϵ(t+1)>0; if the CEO resigns unexpectedly, ϵ(t+1)<0):

News has zero predictable mean. Conditional on today’s info, the expected size of tomorrow’s shock is zero:

Why? Because conditional expectations are tower‑property martingales. Just apply the “tower property”, which is a law of integrated expectations:

So no part of tomorrow’s value change is forecastable using information that is already common knowledge today.

Further, for any two different dates s ≠ t

⇒ Innovations are serially uncorrelated. Which means yesterday’s surprise tells you nothing about today’s. If it did, yesterday’s information wouldn’t have been fully incorporated, contradicting efficiency

Price innovation equals value innovation

Because p(t)=μ(t), the same ϵ(t+1) drives the price:

Taking the conditional expectation again:

→ Under informational efficiency, the price process is a martingale.

Add risk aversion and you obtain a “fair‑game’’ plus permanent impact framework à la Kyle.

A martingale is a process whose next expected value equals the current one, given all available information.

🛠️ If prices are martingales, you cannot design a strategy that forecasts the direction of the next price move using only public data—edge must come from
superior processing of that data,
private signals, or
supplying liquidity rather than predicting prices.

Sounds reasonable, doesn’t it?

 Liquidity Cost Toolkit 🛠️

Now, on this third lecture, let’s sum up a head‑first catalogue of the price‑based liquidity measures you will reach for in production.

Pairwise Bid–Ask Spread Estimators

Roll (1984)

Core idea
Use the negative first‑order autocovariance of price changes to back out the effective spread.
Formula
When it shines
Tick‑by‑tick data with reliable sequencing, no need for quotes.
Caveat
Breaks down when quote revisions are frequent or trade classification is noisy - a typical crypto case.

Corwin–Schultz (2012)

Core idea
High–low price range over two overlapping days proxies the spread.
Why traders like it
Works on daily bars—handy when you lack high‑freq prints.
Watch out
Overnight gaps inflate the range; adjust or pair with Abdi–Ranaldo.

Abdi–Ranaldo (2017)

Enhancement
Separates intra‑day and overnight volatility to refine Corwin–Schultz.
Sweet spot
Assets that close each day with sizeable news risk.

Impact‑Based Measures

Kyle’s  λ

Model
In Kyle’s auction, price change is linear in signed orderflow
where q(t) is the net trade size.
Practical read‑out
λ captures permanent impact per share/contract.
Use case
Estimate with intraday regression, then size trades so that λQ stays below your risk budget.

Square‑Root Impact (Empirical law)

Rule of thumb
where Q is your meta‑order size and V the day’s volume.
Good for Quick what‑if checks when pitching trade sizes to PMs.
Limitation: Purely empirical; the coefficient hides regime shifts.

Execution‑Cost Decomposition

 Implementation Shortfall (IS)

Definition
(IS) = benchmark price − your average execution price.
Decomposes into
1. Delay cost – waiting to start.
2. Impact cost – you moved the market.
3. Opportunity cost – child orders left unfilled.

 Realised Spread

Idea
Quote half‑spread earned minus adverse selection.
How
Compare execution price to mid‑price a short time later (e.g., +1 min).
Signal
High realised spread ⇒ you provide liquidity without getting picked off.

Benchmark‑Deviation Metrics

VWAP & Slippage

VWAP (Volume‑Weighted Average Price) is the crowd’s yard‑stick.
Slippage = |your execution − VWAP|.
Use it to tune TWAP/VWAP algos and report to clients who think in benchmarks.

Low‑Frequency Illiquidity Proxy

 Amihud’s Illiquidity (2002)

Statistic
daily absolute return divided by dollar volume.
Interpretation
“How much price move for one dollar traded?’’
Great for
Cross‑sectional screens when only daily data are available.

 Production pointers 🛠️

Avoid look‑ahead bias – use only Ωt when computing any statistic at t.
Volume‑scaling – normalise impact by daily volume to compare across assets.
High‑freq data quality – mis‑stamped trades will break serial‑covariance estimators like Roll; clean aggressively.
Market‑making models – the martingale property is a baseline; any predictable drift you discover is potential edge, but will shrink once you trade on it (GS paradox in action) - we’ll talk about that later.

Next lecture: we switch from theory to action – calibrating a simple dealer market‑making model and stress‑testing it on tick data.

Happy coding & good hunting, my dear reader!

Lecture 2: Liquidity Measures

Sun, 29 Jun 2025 15:46:21 GMT

🕯️Greetings, esteemed reader!

In the previous lecture, we outlined robust Normalized Realized and Effective Bid-Ask Spread measures that work pretty well in practice, especially in HFT (overall, the higher the frequency, the more sense the spread makes).

In this second lecture of Zero Lag Club’s Market Microstructure series, we dive into measuring liquidity in practice on centralized limit order book (CLOB) exchanges.

Building on the foundations of Spread (Lecture 1), we’ll explore how quants and traders actually gauge liquidity using both theoretical models 🔬 and practical metrics 🛠️. The focus here is strictly on order-book markets – AMMs will be covered later (their liquidity is defined by explicit curves, a topic for another scroll).

The lecture is pretty extensive (I’d schedule 1 hr to study it), but I promise it contains only necessary information for understanding the further material and building trading models.

Markers

🔬 : theory-heavy concepts useful for context (less directly monetizable).
🛠️ : hands-on code, heuristics, or practical tips.

Required Knowledge

Basic order‐book vocabulary, high-school math, and a bit of Python/pandas for examples will help. If your curiosity pendulum swings into deeper research, check the references at the end, as usual.

Outcome

You’ll get a working toolkit to measure liquidity and understand which metrics matter:

Theoretical Models: Roll’s implied spread, Kyle & Obizhaeva’s impact invariance, Hasbrouck’s lambda (price impact).
Practical Metrics: Amihud’s illiquidity, VWAP, and slippage, Implementation Shortfall, Realized Spread.
Low-Frequency Proxies: Daily spread estimators (Corwin–Schultz, Abdi–Ranaldo) and others validated for crypto by recent research.
Use Cases: Which measures are production-grade for crypto trading vs. which are academic or obsolete?

Liquidity Measures 🧪

Theoretical Foundations of Liquidity Costs
- Roll’s Model (1984) – Implied Bid/Ask Spread
- Kyle’s Lambda and Square-Root Impact
- Price Impact vs. Cost: Temporary and Permanent
Practical Liquidity Measures
- Amihud’s Illiquidity (2002)
- Volume-Weighted Average Price (VWAP) and Slippage
- Implementation Shortfall (IS)
- Realized Spread
- Low-Frequency Liquidity Proxies
  - Corwin–Schultz estimator (2012)
  - Abdi–Ranaldo estimator (2017)
- Production usage

🔥 Shall we commence?

Let’s start by mapping the theoretical foundations of liquidity costs, then roll up our sleeves for practical measures with some code-ready insights.
Again, can’t avoid theory, understanding classic models gives context to why specific liquidity measures work (or don’t) in crypto.

🔬 Theoretical Foundations of Liquidity Costs

Roll’s Model (1984) – Implied Bid/Ask Spread

One elegant idea by Richard Roll (1984) derives the effective spread from price time series alone. Measuring the bid-ask spread when only the price is known - sounds like magic! I believe earlier it was a highly useful solution when it was impossible or hard/expensive to get real orderbook data. Nowadays, the case is very different - the real-time feeds are available for free from the exchange, historical orderbooks can be relatively expensive but available, and hardware is cheap enough to handle all the tick data volume required.

Despite the 40 years of history, the model remains indispensable in modern quant finance, and it is still actively used, and shows good results in crypto as well (20bps error average and 50bps in volatile regimes), I’d see it primarily use is backtesting, because when it’s hard (or expensive) to get historical orderbook for an asset.

So, Roll observed that in an efficient market (with true value static short-term), alternating buys and sells cause negative autocorrelation in price changes – prices zigzag as trades flip between bid and ask. This means that the covariance of successive price changes, Cov(Δpt,Δpt−1), would be negative, and its magnitude is related to the bid-ask spread. In simple words, this means that if the price goes up in the previous period t-1, it will go up in the current period t, and vice versa.

Roll’s formula for spread (in its simplest form) is:

Assuming that the covariance is negative. Intuitively, if prices tend to revert by a small amount each trade, that amount is about the half-spread.

This basically says: the more the price bounces back and forth, the wider the spread must be.

The model makes lots of assumptions, the main of which are that bid-ask is mean-reverting (tx prices oscillate between bid and ask quotes) and there’s no serial correlation in trades: trade signs (buy/sell) are independent (no clustering of buys or sells).

Formally, the assumptions are as follows:

All trades have the same size. Trade direction d=1 is a buy, d=-1 is a sell.
Arriving orders are i.i.d. (identically independently distributed):
Midquote follows a random walk. If m is a midprice and ε - some innovation term, a i.i.d. price shock, then
Market orders are not informative. This is somewhat the most questionable assumption, since a direction d in the next period doesn’t depend on how the price moved. Expectation of how the quote has moved equals to how it will move and it’s zero.
Spread is constant.

Let’s derive the Roll’s model now!

The price is midquote with the halfspread times the direction of the trade:

We know p, but not m. How to estimate S?

The key (and very neat!) idea of the Roll’s paper is the mean revertion of the price and direction of the trade, prices are pressured to return to the midquote:

It’s simple to understand it intuitively, if Δdt> 0 that means that we go from a sale to buy, and the next change Δdt+1 should be the opposite.

Now, since Cov is a linear operator, it’s just a variance of dt-1

Thus

Which gives us the estimator

These prices’ covariances can be computed from the price data.
That’s it!

🔬 Why it fails in crypto: crypto prices are trending very often and exhibit momentum (positive autocorrelation) very often, rather than mean-reversion at trade-to-trade frequency. In trending markets, Roll’s negative covariance assumption breaks down which breaks down the whole thing – you might get a positive or near-zero covariance, leading to a zero or undefined implied spread. In practice, Roll’s estimator often outputs zero for crypto assets. The model’s spherical-cow assumptions listed above rarely hold on volatile crypto markets.

I might clean up the code I have that tests the measure for the Binance feed and share it with you in a separate post. In short, it’s heavily off, but it’s not that bad for HFT, where trends and momentum are not that significant. In most cases, it underestimates liquidity costs in crypto.
Roll’s measure is still a cornerstone of the microstructure theory, and the idea that the effective spread can be backed out from price dynamics is just cool.

Kyle’s Lambda and Square-Root Impact

Kyle (1985) introduced the concept of lambda (λ) as the price impact per unit size in his insider trading model. In Kyle’s model, trades move the price linearly: Δp=λ⋅q+noise, where q is the signed trade size. Lambda reflects adverse selection costs – a larger λ means the asset’s price moves more when someone tries to buy/sell a given amount (low liquidity).

In practice, we can estimate Kyle’s lambda by regressing price changes on signed volume or order flow. For example, over many trades:

where mt is the midprice, dt∈{+1,−1} the trade direction (buy or sell), and Vt the trade size (perhaps in USD). The slope λ is an empirical price-impact metric (e.g., “$0.05 price move per 1 BTC traded”). Hasbrouck’s model (1991) is a close cousin: it uses a vector autoregression of trades and quotes to measure the information content of trades, yielding a similar notion of price impact (often also dubbed lambda). These regression-based lambdas are useful in research to compare assets or time periods. However, they can be noisy and require high-frequency data. A few crypto trading shops estimate Hasbrouck’s VAR in real-time. Instead, it’s often better to use simpler stats (like immediate slippage or beta to order flow) for on-the-fly impact tracking.

🛠️ Code Tip: You can estimate a simple lambda in Python by fitting a line: price_diff ~ sign * volume. Use trade tape data or one-minute bars. The R-squared will be low, but λ’s magnitude gives a ballpark of impact. Calibrate in basis points per $1M traded, for instance.

Now, empirical studies found Kyle’s linear model is too simplistic for large trades. Large meta-orders (series of trades) tend to have a nonlinear impact.

A famous result is the square-root impact law:

Obizhaeva & Wang (2013) and Kyle & Obizhaeva (2016) formalized this via market microstructure invariance theory. Without diving into dimensional analysis, the takeaway is that impact grows sub-linearly with size – doubling the trade size doesn’t double the impact, it increases by less (roughly √2). This is why slicing orders (“iceberging”) makes sense: ten 100 ETH buys throughout the day move the price less overall than one big 1000 ETH buy.

For us practitioners, the square-root law suggests using concave impact models for cost estimation. Many execution algos (TWAP, POV, etc.) implicitly assume this concavity. In crypto, the square-root law holds qualitatively, although the exact coefficient varies by asset and venue liquidity.

Price Impact vs. Cost: Temporary and Permanent

Not all price impact is permanent. Hasbrouck’s price impact can be thought of as the permanent price change resulting from information (e.g., informed trading). The remainder of the spread/impact is temporary (due to inventory pressure or bounce). Realized Spread vs Price Impact is a helpful distinction:

Price Impact (per Hasbrouck or others) measures how far the price stays moved after a trade.
Realized Spread measures the part of the spread captured by liquidity providers, i.e. the profit of a market maker if the price mean-reverts.

We’ll cover realized spread more shortly – it’s a resiliency metric (how quickly prices revert) - remember liquidity dimensions from the Spread lecture.

🔬 Bottom line: Theoretical models give us λ (lambda) and other intuition pumps. But to actually measure liquidity on crypto exchanges, we need practical formulas. Let’s now turn to hands-on liquidity measures you can compute from data.

🛠️ Practical Liquidity Measures

The following metrics are the bread-and-butter tools to quantify liquidity and trading costs on CLOBs. You can implement these with exchange data using any language with relative ease.

Amihud’s Illiquidity Ratio (2002)

One widely used measure in academia (and increasingly in crypto quant circles) is the Amihud Illiquidity Ratio. Proposed by Yakov Amihud, it captures the idea of price impact per volume. For a given day d, it’s defined as:

where R is the asset’s return (usually absolute return, in decimal) and Vol is the trading volume in dollar terms that day. If you average this across days in a period, you get an estimate of how much the price moves per unit of trading volume. A low Amihud value means high liquidity (you can push a lot of volume for little price change), while a high value means illiquidity.

Usage: Amihud’s metric is great for comparing assets or exchanges cross-sectionally. Brauneis et al. (2021) found that Amihud’s ratio, despite being simple, does well in ranking the liquidity levels of different crypto exchanges. For example, if Exchange A’s BTCUSD has half the Amihud value of Exchange B’s, A generally offers a tighter market with less slippage per trade dollar.

🛠️ Code Tip: Given daily OHLCV data, you can compute amihud = (abs(return) / dollar_volume).resample('D').mean() in Python. Make sure to use dollar volume (price * quantity) for consistency. Watch out for outliers on days of huge returns but low volume – consider median or winsorizing in those cases.

However, note that Amihud is less useful for time-series liquidity changes. In fast-moving markets, volume and volatility can spike together, making daily illiquidity noisy. For intraday use, a rolling version can be computed (e.g., hourly illiquidity); however, many practitioners prefer direct order book statistics intraday.

Volume-Weighted Average Price (VWAP) and Slippage

VWAP – Volume Weighted Average Price – is technically a benchmark price, not a liquidity metric in itself. But it’s crucial in execution. Traders commonly gauge slippage by how far their execution price is from the VWAP over the execution interval.

For example, say you need to buy 50 BTC over 10 minutes. The market’s VWAP in that 10-minute window (considering all trades) was $30,000. If your average execution price ended up $30,100, then you paid 0.33% above VWAP. That difference is slippage cost – a measure of liquidity immediacy and market impact during your execution.

VWAP is used as a benchmark for immediacy because an uninformed execution spread evenly in time should approximately achieve VWAP (assuming you’re a small part of the volume). If you trade too aggressively (demanding immediacy), you’ll push the price and end up worse than VWAP; if you trade too passively or slowly, you might chase a drifting price and also miss VWAP. Thus, VWAP is the bar to beat.

🛠️ Practical Use: Many crypto execution algos (TWAP, VWAP algos) aim to track or beat VWAP. As a trader, you measure execution performance as Implementation Shortfall (next section) or vs VWAP. Exchanges don’t give VWAP directly, but you can compute it from trades. In Python, given trades with price and size, vwap_price = (price * size).sum() / size.sum() for the interval.

Implementation Shortfall (IS)

The Implementation Shortfall is the Tradfi-s gold-standard metric for total trading cost which is adopted by crypto. Originally coined by André Perold (1988), IS is the difference between the paper price when you decide to trade and the actual average price you get. It captures both spread and market impact (and timing delays). It was used to measure the performance of a broker.

Let’s break it down: You decide to buy at 10:00 when the midprice is $100. By the time your order fully executes, you ended up paying an average price of $101. Meanwhile, the market mid moved to $100.5 during your execution (maybe due to other traders or your own impact). Your implementation shortfall can be decomposed as:

Spread cost: If you crossed the spread to buy, maybe half the spread (say $0.1) was lost right away.
Impact/timing cost: The mid moved up $0.5 while you were executing – that’s adverse movement against you.
Opportunity cost: If you didn’t complete the full order, any unfilled part has an implicit cost if price keeps rising.

In total, your IS = $101 – $100 = $1 (1%) on that trade. This is the real cost of trading beyond the ideal scenario. It’s crucial for algorithmic execution evaluation – you want to minimize IS.

Formally, one can write Implementation Shortfall for a buy order as:

where Pdecision is the price (mid or last) when the trading decision was made (or a benchmark like previous close), and Pavg fill is the volume-weighted execution price. For sells, the formula is analogous (you want to sell high; any average fill below decision price is cost).

🛠️ Code-Level Tip: To compute IS, you need to record a benchmark price at the start (decision time or arrival price). Then track all fills of the order and compute the size-weighted average fill price. The difference (with correct sign) is your IS. If working with historical data, you can simulate an execution (e.g., splitting into chunks) and compare with a baseline price path. Python’s pandas can help aggregate fills; just be careful to align timestamps for the benchmark price.

Implementation Shortfall vs VWAP: If you use VWAP of the period as your benchmark, that variant is often called VWAP slippage. For example, some traders say “We were 5 bps inside VWAP” meaning they beat the VWAP by 0.05% (a positive outcome). This is essentially a flavor of implementation shortfall using VWAP as the benchmark instead of the decision price.

Realized Spread

In Lecture on Spread we introduced Realized Spread (RS) – a metric particularly relevant for market makers and liquidity providers. While effective spread measures the cost paid by takers on a given trade (difference between trade price and midprice at that moment), the realized spread looks ahead: it’s the difference between the trade price and the midprice after some time Δ.

If I sell to a market maker at $100 (mid was $99.9, so effective spread paid ~$0.1), and 5 minutes later the midprice is $99.8, the market maker benefited – they sold higher than the new mid. The realized spread for that trade to the market maker is $100 – $99.8 = $0.2.
Conversely, if the mid jumps to $100.5 after, the market maker’s gain from the spread was eroded (negative realized spread, meaning the taker’s trade had information).

Mathematically, for a buy order (taker perspective, d=+1 for buy, −1 for sell):

Realized Spreadt,

where pt is the trade price and mt+Δ the midprice Δ time later. The choice of Δ (e.g. 1 minute, 5 minutes) is critical – too short and it’s mostly noise, too long and other factors move the price.

For a liquidity taker, a small realized spread (or negative) means you implicitly didn’t pay much extra beyond the true price impact. For a market maker, a large positive realized spread means you earned your quoted spread and the price didn’t run away on you – a good trade. Realized spread thus measures resiliency of the market: how quickly prices mean-revert after trades. A resilient market where liquidity replenishes quickly tends to have lower permanent impact (and higher realized spread for makers).

🛠️ How to estimate: Using trade and quote data, for each trade record the midprice some minutes later. Compute d(p−mfuture) and average over many trades (you might condition on trade size or time of day). In Python, you can do this by merging the trade tape with a delayed midprice series. This helps quantify how much of the spread is “real” vs just temporary. Low realized spread (relative to effective spread) means information-heavy trades (price moved against the maker), whereas high realized spread means mostly noise or inventory trading.

Low-Frequency Liquidity Proxies

Thus far, we discussed measures you’d compute if you have tick-level data. What if you only have daily or hourly bars? That’s not our typical trading case though - we usually operate hft or mid-frequency (minutes). But in some cases, you need that data - for example, in mid-low frequency strategies like funding arbitrage, you want to know what venues to trade if there are the same assets with almost the same funding rates, but the liquidity differs. There are clever spread estimators that use low-frequency data to infer liquidity. Interestingly, some of these have been tested on crypto and work quite well. Here are two notable ones:

Corwin–Schultz estimator (2012): Uses daily high and low prices to estimate the bid-ask spread. The intuition: high prices are usually buyer-initiated trades and lows are seller-initiated, so the ratio of highs/lows over two days contains information about the spread. The formula is a bit involved (it uses the difference between single-day and two-day ranges to back out the spread). Corwin-Schultz is cheap to compute – you just need High and Low for two days, making it handy for quick comparisons. Brauneis et al. (2021) found this estimator excels at tracking time-series liquidity changes in BTC and ETH markets. That means if liquidity is drying up, the C-S measure will rise, and vice versa, roughly in sync with true spreads.
Abdi–Ranaldo estimator (2017): An improvement over C-S, this uses Close, High, and Low prices to estimate spreadsaeaweb.org. It’s also designed for daily data but tends to be more accurate by incorporating closing price information (reducing bias from day gaps). Abdi–Ranaldo also performed very well in crypto, slightly outdoing C-S in some cases. If you have daily OHLC, this is a great proxy for average bid-ask spreads without needing tick data.

Both of these give an estimated percentage spread. For example, Abdi–Ranaldo might estimate that an exchange’s typical spread is 0.20%. They won’t capture depth or large-order costs, but they reflect top-of-book tightness over time.

Other proxies include “Number of Trades” or Dollar Volume (more volume often means more liquidity) and variants of high-low volatility measures. Brauneis et al. tested a bunch. Two highlights from their findings worth noting:

For time-series liquidity (dynamic changes): Corwin-Schultz and Abdi-Ranaldo were the best, indicating these high-low based measures track liquidity over time better than, say, Amihud or trade counts. This is likely because high-low ranges widen when volatility and trading costs spike, signaling illiquidity in turbulent times.
For cross-sectional and level estimates: Amihud’s illiquidity and a proxy from Kyle & Obizhaeva (2016) invariance were most reliable. The “Kyle-Obizhaeva estimator” they used is rooted in invariance theory – it scales volume and volatility to produce a liquidity metric (think of it as a predicted impact cost per trade). Those two measures were best at ranking exchanges by liquidity and even approximating absolute spread levels. So if you want to know which exchange is most liquid or what’s the typical cost on Exchange X vs Y, Amihud and invariance-based metrics give a good gauge.

🔬 Reality check: Low-frequency proxies are great for research and monitoring broad trends or doing comparative studies when tick data isn’t available. But if you do have order book data, you’ll always get a more precise read from direct measures (actual quoted spreads, depth, etc.). Use proxies when you must (e.g. analyzing hundreds of altcoins quickly, or historical periods where only daily data exists).

Which Metrics Matter in Production?

Let’s summarize from a practitioner’s perspective – what should you actually use when trading crypto?

Quoted Spread and Order Book Depth: These are still king for real-time decisions. A tight normalized spread and substantial depth at the top of book mean you can execute small trades cheap. For larger trades, look at impact – e.g., how much the price moves if you sweep X dollars of the book (we covered weighted spreads in Lecture 1 and will delve more into slippage in future). These are immediately usable via exchange APIs.
Implementation Shortfall: If you’re executing large orders or running an algorithm, track IS on each order or day. It’s the true bottom-line cost including all slippage. In a live trading system, you’d log the decision price and fills to compute this. If your IS starts creeping up, it might indicate deteriorating liquidity or an execution problem.
VWAP Slippage: This is often used in TCA (Transaction Cost Analysis) reports. Institutional traders will report “We executed at 5 bps worse than VWAP” for example. If you’re building execution algos, minimizing VWAP slippage (or beating VWAP) is a concrete goal.
Amihud Ratio: For strategy research or asset selection, Amihud’s illiquidity is handy to rank assets by liquidity. It’s simple to compute and has intuitive units (percent move per $ traded). It’s not something you’d compute intraday for signals, but good for filtering out illiquid coins or deciding how to allocate capital across venues.
Hasbrouck’s Lambda / Kyle’s Lambda: In high-frequency strategy dev, you might estimate these to understand impact. For example, if lambda for a coin is huge, you know even small trades will move it – be careful with order sizing. But these are diagnostic; we don’t plug Hasbrouck’s VAR into a live system due to complexity and noise. Instead, simpler real-time estimators (like moving average of effective cost per trade size) are preferred.
Roll’s Measure: Honestly, pretty obsolete for crypto. It’s elegant for teaching and for some stock datasets, but as we emphasized, it often gives false signals in crypto markets. Use it only if you suspect purely random trade directions and want a quick guess of spread – and be ready for it to output zero when there’s persistent trending.
Corwin-Schultz / Abdi-Ranaldo: These are great for analysis – e.g., if you’re writing a report on historical liquidity or can’t pull tick data for years of history. They aren’t something a trading algorithm would use on the fly (they’re too laggy and coarse for that). Think of them as research tools or for monitoring market health over time. For instance, you could plot a 30-day moving average of C-S estimator to visualize how an exchange’s liquidity is improving or worsening.

In crypto markets, latency and explicit order book info reign supreme. We have full Level-2 data available, unlike some traditional markets where proxies were invented due to data scarcity. This means our production-grade metrics lean toward direct measurements (spreads, depth, fill stats). However, the theoretical concepts and low-frequency proxies are still invaluable for validation and understanding. They can validate if your direct measures make sense, or help compare liquidity across venues without streaming all their data.

As a final note, we deliberately left out Automated Market Makers here. AMMs (like Uniswap) have explicit liquidity curves and different metrics (like pool depth, k-values, etc.), which deserve their own discussion. Fear not – we shall tackle AMM liquidity in a later lecture.

For now, you should be equipped to measure and monitor liquidity on any crypto exchange with a limit order book. May your order placements be swift and your spreads ever tight!

See you in the next lecture, we’ll start diving straight into market-making models.

References

Roll, R. (1984). A simple implicit measure of the effective bid–ask spread in an efficient market. Journal of Finance, 39(4), 1127–1139.
Kyle, A. S. (1985). Continuous auctions and insider trading. Econometrica, 53(6), 1315–1335. (Introduced Kyle’s lambda)
Kyle, A. S., & Obizheva, A. A. (2016). Market Microstructure Invariance: Empirical Hypotheses. Econometrica, 84(4), 1345–1404. (Foundation of the square-root impact law)
Hasbrouck, J. (1991). Measuring the information content of stock trades. Journal of Finance, 46(1), 179–207.
Amihud, Y. (2002). Illiquidity and stock returns: cross-section and time-series effects. Journal of Financial Markets, 5(1), 31–56.
Perold, A. (1988). The Implementation Shortfall: Paper vs Reality. Journal of Portfolio Management, 14(3), 4–9.
Corwin, S. A., & Schultz, P. (2012). A simple way to estimate bid-ask spreads from daily high and low prices. Journal of Finance, 67(2), 719–759.
Abdi, F., & Ranaldo, A. (2017). A simple estimation of bid-ask spreads from daily close, high, and low prices. Review of Financial Studies, 30(12), 4437–4480.
Brauneis, A., Mestel, R., Riordan, R., & Theissen, E. (2021). How to measure the liquidity of cryptocurrency markets? Journal of Banking & Finance, 122, 106198.

Lecture 1: Spread

crypt0grapher — Thu, 26 Jun 2025 21:41:44 GMT

🕯️Greetings, esteemed reader!

What I shall commence with is the study of Market Microstructure. I crafted it to be useful for both the discerning retailer of trade and the adept quantitative alchemist.

I’d classify these series as Lectures.
As a practitioner to the bone, I’ll try to convey only things you can use in the code, but achieving results as a quant requires some theory and mental effort.

Markers

🔬: theory-heavy concepts useful for context, but less directly monetizable.
🛠️: hands-on code, heuristics, or tips.

Required Knowledge

Basic order‑book vocabulary and high-school math.

If the pendulum of your interest has swung into deeper research, you can find articles and book references at the end.

Outcome

A working knowledge toolkit to measure spread.

Markets: CLOBs, RFQ, AMMs, hybrids.
Liquidity
Bid–Ask Spread: Quoted, Normalized, Effective, Realized.
Order Book Depth and Slippage

🔥 Shall we commence?

Let’s start with fundamental definitions and basic yet practical models, laying the groundwork for everything subsequent.

I treat Market Microstructure as the area of study of how orders are placed, matched, and cancelled, and how that process shapes prices, liquidity, volatility, and trading costs. This semi-formal definition pretty much reflects the concept of a bridge between the raw bid/ask queue and the asset price.

Market Designs

CLOBs

Continuous limit‑order books run by most of the venues we trade on: Coinbase, Binance, Kraken, Bybit, Hyperliquid, NASDAQ, NYSE, LSE, TSE, and pretty much most of them.
This is often taken for granted, but it is essential to note that CLOB is not the only market type out there.

AMMs

On-chain DEXes, these implementations are considerably different: constant product, concentrated liquidity, balancer, hybrid pools, - hopefully I’ll be persistent enough with this blog to share what I know for all of them as I am using all of the versions of Uniswap (v2,v3, and v4) and Solana DEXes.

RFQ

RFQs markets worth mentioning: one party asks for a quote; counterparty responds with a price - as simple as that. That’s the case for onchain protocols like 0x RFQ, Paradigm, and DeFi aggregators. On the centralized side, that’s the structure of OTC desks like Binance’s and Wintermute. Bloomberg is an example from the TradFi space.

OTC

Сall/batch auctions is the last market type I’d like to mention since it’s a popular token fair launch idea - bids and asks are being collected, then at once they are matched against each other. In TradFi, that’s how earlier NYSE and LSE operated; they matched orders once a day. Since CEXes offer APIs for call auctions, that might move one’s thoughts in the right direction! I might create a post on that later.

For the sake of clarity, there are other market designs (dealer markets, batch auctions, RFQs, dark pools, prediction markets, and a variety of hybrids). The list depends on one’s fantasy, since more market designs appear like Order Flow Auctions with MEV redistribution, and obviously, you can design your own market. We’ll stick to CLOBs and AMMs for now.

Four Dimensions of Liquidity

So market liquidity, or just liquidity (when I’ll be talking about funding liquidity or monetary liquidity, I’ll say that explicitly), is, semi-formally, the ability to facilitate an asset's quick trading without significantly impacting its price.
In a more structured way, modern theory defines the following liquidity dimensions:

tightness: cost of executing a small trade,
🛠️ we’ll measure that by spread.
depth: volume near the current price that can be traded without moving it,
🛠️ order book sizing & slippage curves;
immediacy: order execution speed,
🛠️ VWAP lag, time component of the Implementation Shortfall;
resiliency: how quickly prices revert and liquidity replenishes after shocks,
🛠️ Realized spread, book recovery time.

Why care? Because execution costs often dwarf model alpha, especially in HFT.
We choose markets to trade on, design execution algos, and size orders. Bad execution ruins an excellent strategy.

Spread Measures

Quoted Spread

Ok, so you, of course, know what a spread is - the best ask less than the best bid:

That’s the quoted spread S at time t:

That absolute value doesn’t give much: a 10¢ spread is typical for APPL but crazy for CRV, that’s why normalizing quoted spread with the midprice m sounds reasonable, providing the relative quoted spread s:

Spread, being the most cited measure of market liquidity, actually does work very well. It’s a valid tx cost model for a tiny round-trip transaction that's executed at the best bid and ask, which also assumes immediate (zero-lag!) order book feeds and execution.

🛠️ Extremely handy for realtime lowlatency monitoring: tight normalized spread ⇒ high liquidity

I know, “immediate execution” and “immediate feeds” sound like a “spherical horse in a vacuum”, but quoted normalized spread is actually a good ultrarobust liquidity estimator.

Weighted Average Spread

Weighted average bid-ask spread for an order size q: Assuming that a and b are average execution prices of buy and sell orders, respectively, then the weighted-average bid-ask spread is

It's pretty intuitive: the market is not deep enough if, with the growth of q, the wa-spread s is increasing significantly as well. When q is small, it’s close to the quoted spread; for larger q, it reflects depth and slippage.

Estimating these weighted average best bids/asks is a real way to estimate slippage (more on slippage is coming) and model the liquidity surface (for large orders).

Subscribe to the depth feed and compute:
Choose a notional q you plan to trade.
Walk the book’s asks from the best price upward until you accumulate q to get a(q). Same for bids downward to get b(q).
Compute midprice and plug into the above formula.

In the end of this lecture you’ll find the code to compute that.

Effective Spread

Requires much less data - just the last execution price, p, showing the transaction's impact on the market.
d is the trade direction 1 for longs/buys, -1 for shorts/sells

Realized Spread

Now we’re getting close to business. Quoted and Effective spreads are more measures for a trader, realized spread is more interesting for market makers, as a MM you want to be as neutral as possible, and RS measures the extra cost (or profit) sustained by a MM relative to an ideal environment in which trades are made at the midprice. It assumes we keep the assets for Δ periods, then a realized spread

Thus, the average RS, given the above effective spread definition,

Order Book Depth and Slippage

🛠️ Capacity to absorb large orders. Sizing, scaling, and choosing venues to trade are one of the most complex tasks in algotrading.

Depth captures the cumulative quantity available for execution at, and away from, the best bid and ask, while slippage denotes the adverse price movement a participant experiences when their order consumes that liquidity.

In further lectures, I’m going to walk through commonly used aggregates—top‑of‑book depth, X‑basis‑point depth, and depth‑implied dollar value, as well as decay models that describe replenishment rates. We then derive instantaneous and execution-weighted slippage measures.

🔬 Literature on microstructure offers ways to estimate the direction of trade d: classic Lee-Ready algorithm and Odders-White. It doesn’t make much sense since all cryptocurrency exchanges now provide aggressiveness flags to indicate whether an order was initiated by the seller or buyer, and we always have quotes.

In the literature on microstructure, there is a lot of information on Roll’s measure (1984), which is an estimator of the spread derived from the price time series. The problem with it is that it doesn’t work in crypto - momentum crashes the idea. It uses the negative autocovariance of successive price changes. It has lots of assumptions, and in trending markets or momentum, when there’s positive autocorrelation, it gives zero or wrong estimates. If you'd like to read more about it, I leave references below.
I’ve outlined how it works in my post on liquidity measures, since it’s very neat and helps a lot to grasp the concept of mm modelling. Check it out.

Thanks for reading!

Now, let’s get straight to the Liquidity Measures and code some money‑making spells! 💸🪄

References and Reading List

Kyle, A. S. (1985). Continuous auctions and insider trading. *Econometrica, 53*(6), 1315‑1335.
Lee, C. M. C., & Ready, M. J. (1991). Inferring trade direction from intraday data. *Journal of Finance, 46*(2), 733‑746.
O’Hara, M. (1995). *Market Microstructure Theory*. Blackwell.
Roll, R. (1984). A simple implicit measure of the effective bid‑ask spread in an efficient market. *Journal of Finance, 39*(4), 1127‑1139.
Harris, L. (2003). Trading and Exchanges: Market Microstructure for Practitioners. Oxford University Press.
Foucault, Pagano, Röell. Market Liquidity: Theory, Evidence, and Policy (2013), Oxford University Press.
Obizhaeva, A., & Wang, J. (2013). Optimal trading strategy and supply/demand dynamics. *Journal of Financial Markets, 16*(1), 1‑32.

Weighted Average Spread Code

Here I’m sharing a simple but effective code snippet that does two things:

Generates a realistic random order‑book snapshot (bids + asks) with tick‑aligned prices and size that decays deeper in the book.
Computes the weighted‑average bid‑ask spread for any trade size q as defined above.

Python, requires pandas and numpy

import pandas as pd
import numpy as np

###############################################################################
# 1.  ORDER‑BOOK GENERATOR
###############################################################################
def random_orderbook(
    mid: float = 100.0,               # central price around which we build the book
    tick: float = 0.01,               # price granularity
    spread_ticks: int = 2,            # best‑bid/ask gap in ticks
    depth_levels: int = 20,           # levels per side
    base_size: float = 1_000.0,       # expected size at the best bid/ask
    depth_decay: float = 0.15,        # how quickly size grows deeper in book
    sigma_vol: float = 0.5,           # randomness in size (log‑normal std‑dev)
    rng: np.random.Generator | None = None
) -> pd.DataFrame:
    """
    Build a one‑shot synthetic order book with realistic features:

    • tick‑aligned prices, symmetric around 'mid'
    • quoted spread = spread_ticks * tick
    • depth increases (on average) as we move away from the top
    • log‑normal noise to avoid perfectly smooth shapes
    """
    rng = rng or np.random.default_rng()
    half_spread = (spread_ticks * tick) / 2

    # --- price ladders -------------------------------------------------------
    ask_px = mid + half_spread + tick * np.arange(depth_levels)
    bid_px = mid - half_spread - tick * np.arange(depth_levels)

    # --- sizes: grow with depth + randomness ---------------------------------
    vol_multiplier = rng.lognormal(mean=0.0, sigma=sigma_vol, size=depth_levels)
    depth_factor = np.exp(depth_decay * np.arange(depth_levels))
    ask_sz = base_size * depth_factor * vol_multiplier
    bid_sz = base_size * depth_factor * vol_multiplier      # symmetric book

    asks = pd.DataFrame({"side": "ask", "price": ask_px, "size": ask_sz})
    bids = pd.DataFrame({"side": "bid", "price": bid_px, "size": bid_sz})

    return pd.concat([bids, asks], ignore_index=True)


###############################################################################
# 2.  EXECUTION‑PRICE ROUTINE
###############################################################################
def _avg_exec_price(book: pd.DataFrame, side: str, qty: float) -> float:
    """
    Walk the book and compute the volume‑weighted average execution
    price for either a buy ('ask' side) or sell ('bid' side) order of size 'qty'.
    """
    side_book = book.query("side == @side").copy()

    # For buys we start at BEST ASK (lowest); for sells at BEST BID (highest)
    side_book = side_book.sort_values(
        "price", ascending=(side == "ask")  # True→asks low→high ; False→bids high→low
    ).reset_index(drop=True)

    cum = side_book["size"].cumsum()
    if qty > cum.iat[-1]:
        raise ValueError("Requested quantity exceeds book depth.")

    take_full = side_book.loc[cum < qty, ["price", "size"]]
    take_part = side_book.loc[cum >= qty].iloc[0]

    filled_qty = take_full["size"].sum()
    remaining = qty - filled_qty

    vwap_numer = (take_full["price"] * take_full["size"]).sum()
    vwap_numer += take_part["price"] * remaining

    return vwap_numer / qty


###############################################################################
# 3.  WEIGHTED‑AVERAGE SPREAD FOR ANY SIZE q
###############################################################################
def wa_spread(book: pd.DataFrame, q: float) -> float:
    """
    Weighted‑average bid‑ask spread *in absolute price units*.
    Divide by the mid‑price if you prefer it in relative terms.
    """
    a_q = _avg_exec_price(book, "ask", q)   # cost to BUY  q
    b_q = _avg_exec_price(book, "bid", q)   # proceeds to SELL q
    mid = (book.query("side == 'ask'")["price"].min() +
           book.query("side == 'bid'")["price"].max()) / 2
    return (a_q - b_q) / mid                # relative form (dimensionless)

###############################################################################
# 4.  QUICK DEMO (comment out when importing!)
###############################################################################
if __name__ == "__main__":
    book = random_orderbook()
    for q in [1_000, 5_000, 15_000]:
        print(f"q={q:>6}:  wa‑spread = {wa_spread(book, q):.4%}")

Here’s the same for Rustaceans

src/lib.rs

use rand::prelude::*;
use rand_distr::{Distribution, LogNormal};

/// One price/size point on a side of the book.
#[derive(Clone, Copy)]
pub struct Level {
    pub price: f64,
    pub size:  f64,
}

/// Sides we can walk.
#[derive(Clone, Copy)]
pub enum Side { Bid, Ask }

/// A synthetic order‑book snapshot.
pub struct OrderBook {
    bids: Vec,   // sorted high → low
    asks: Vec,   // sorted low  → high
}

impl OrderBook {
    // ------------------------------------------------------------------------
    /// Create a random, *symmetric* order book around `mid`.
    pub fn random(
        mid: f64,
        tick: f64,
        spread_ticks: usize,
        depth_levels: usize,
        base_size: f64,
        depth_decay: f64,
        sigma_vol: f64,
        rng: &mut impl Rng,
    ) -> Self {
        let half_spread = (spread_ticks as f64 * tick) / 2.0;
        let lognorm = LogNormal::new(0.0, sigma_vol).unwrap();

        // Pre‑allocate to avoid re‑allocations
        let mut bids = Vec::with_capacity(depth_levels);
        let mut asks = Vec::with_capacity(depth_levels);

        for lvl in 0..depth_levels {
            let depth_fac = (depth_decay * lvl as f64).exp();
            let noise      = lognorm.sample(rng);
            let size       = base_size * depth_fac * noise;

            // Bid ladder: highest price first
            let bid_price  = mid - half_spread - tick * lvl as f64;
            bids.push(Level { price: bid_price, size });

            // Ask ladder: lowest price first
            let ask_price  = mid + half_spread + tick * lvl as f64;
            asks.push(Level { price: ask_price, size });
        }
        Self { bids, asks }
    }

    // ------------------------------------------------------------------------
    /// Internal helper: average execution price for buying/selling `qty`.
    fn avg_exec_price(&self, side: Side, qty: f64) -> Option {
        let book = match side { Side::Bid => &self.bids, Side::Ask => &self.asks };
        let mut remaining = qty;
        let mut vwap_num  = 0.0;

        for lvl in book {
            if remaining <= 0.0 { break; }
            let take = remaining.min(lvl.size);
            vwap_num += lvl.price * take;
            remaining -= take;
        }
        if remaining > 1e-9 {
            None // depth exhausted
        } else {
            Some(vwap_num / qty)
        }
    }

    // ------------------------------------------------------------------------
    /// Relative weighted‑average spread for order size `q`.
    pub fn wa_spread(&self, q: f64) -> Option {
        let a_q = self.avg_exec_price(Side::Ask, q)?;
        let b_q = self.avg_exec_price(Side::Bid, q)?;
        let best_ask = self.asks.first()?.price;
        let best_bid = self.bids.first()?.price;
        let mid      = 0.5 * (best_ask + best_bid);
        Some((a_q - b_q) / mid) // dimensionless
    }
}

A quick demo

src/main.rs

use rand::SeedableRng;
use rand::rngs::StdRng;
use orderbook_spread::OrderBook;

fn main() {
    let mut rng = StdRng::seed_from_u64(42);

    let book = OrderBook::random(
        100.0,   // mid
        0.01,    // tick
        2,       // spread in ticks
        20,      // depth levels
        1_000.0, // size at top of book
        0.15,    // depth‑decay
        0.5,     // log‑normal sigma
        &mut rng,
    );

    for q in [1_000.0, 5_000.0, 15_000.0] {
        match book.wa_spread(q) {
            Some(s) => println!("q = {:>6.0}:  wa‑spread = {:.4}%", q, 100.0 * s),
            None    => println!("q = {:>6.0}:  not enough depth", q),
        }
    }
}

Cargo.toml

[package]
name = "orderbook_spread"
version = "0.1.0"
edition = "2021"

[dependencies]
rand = "0.8"
rand_distr = "0.4"

Run and enjoy!

Zero Lag Club

Lecture 6: The Grossman-Miller Market Maker: A Pragmatic Treatise on Liquidity Provision

The Market Maker's Fundamental Dilemma 🎭

The Grossman-Miller Framework: A Three-Act Play 🎪

Solving The Model, Backwards: The Mathematics of Liquidity 🧮

Act III: The Denouement (t = 3)

Act II: The Matching (t = 2)

Act I: The Initial Imbalance (t = 1)

A Playground 🐍

Reading List 📖

Lecture 5: Solving the Glosten-Milgrom Market Making Model

The Task

Solution

Step 1: Conditional Probabilities

For a Buy Order

For a Sell Order

Step 2: Conditional Expected Values

Step 3: Equilibrium Spread

✨ Practical Implications

Lecture 4: The Glosten–Milgrom Market Maker

Why MM model? 🔬

The “three actors” stage

Quick algebra

Spread

Key take‑away

Python sanity‑check 🛠️

🔥 Production Usage — Real‑Time Toxicity Module

Lecture 3: The Anatomy of Price Discovery

🔥 Shall we commence?

Why do people trade? 🔬

Information taxonomy 🔬

Fama’s Efficient Market Hypothesis

Real‑life frictions generate three famous counter‑arguments:

Asset value vs price 🔬

Discounted cash‑flow value

Fundamental (state‑price) value

Informational efficiency in equations 🔬

Valuation innovation

Price innovation equals value innovation

Liquidity Cost Toolkit 🛠️

Pairwise Bid–Ask Spread Estimators

Roll (1984)

Corwin–Schultz (2012)

Abdi–Ranaldo (2017)

Impact‑Based Measures

Kyle’s λ

Square‑Root Impact (Empirical law)

Execution‑Cost Decomposition

Implementation Shortfall (IS)

Realised Spread

Benchmark‑Deviation Metrics

VWAP & Slippage

Low‑Frequency Illiquidity Proxy

Amihud’s Illiquidity (2002)

Production pointers 🛠️

Lecture 2: Liquidity Measures

Markers

Required Knowledge

Outcome

Liquidity Measures 🧪

🔥 Shall we commence?

🔬 Theoretical Foundations of Liquidity Costs

Roll’s Model (1984) – Implied Bid/Ask Spread

Kyle’s Lambda and Square-Root Impact

Price Impact vs. Cost: Temporary and Permanent

🛠️ Practical Liquidity Measures

Amihud’s Illiquidity Ratio (2002)

Volume-Weighted Average Price (VWAP) and Slippage

Implementation Shortfall (IS)

Realized Spread

Low-Frequency Liquidity Proxies

Which Metrics Matter in Production?

References

Lecture 1: Spread

Markers

Required Knowledge

Outcome

🔥 Shall we commence?

Market Designs

CLOBs

Solving The Model, Backwards:
The Mathematics of Liquidity 🧮

 Information taxonomy 🔬

 Asset value vs price 🔬

  Fundamental (state‑price) value

 Liquidity Cost Toolkit 🛠️

Pairwise Bid–Ask Spread Estimators

Kyle’s  λ

 Implementation Shortfall (IS)

 Realised Spread

VWAP & Slippage

Low‑Frequency Illiquidity Proxy

 Amihud’s Illiquidity (2002)

 Production pointers 🛠️