Rust Practice: Maintaining Lighter Exchange Local Orderbook — A Note on Going from 220µs to 492ns

2025-12-28

Many times, when we need to interface with an exchange’s data for local analysis or other tasks, we need to obtain continuous market information from a Websocket and build an Orderbook locally to facilitate analysis. This article is a practice note from my perspective as a Rust newbie.

Raw data

The target exchange for this practice is Lighter. After establishing a Websocket connection, you can start subscribing by sending the following message:

{
    "type": "subscribe",
    "channel": "order_book/{MARKET_INDEX}"
}

A selection of the data obtained is as follows. After a successful subscription, you will first receive a full Orderbook snapshot. It has been trimmed here; in reality, the asks and bids in the first snapshot data subscribed/order_book both have over 2500 entries.

{
  "channel": "order_book:1",
  "offset": 12837513,
  "order_book": {
    "code": 0,
    "asks": [
      {
        "price": "87194.4",
        "size": "1.03736"
      },
      {
        "price": "87194.5",
        "size": "0.02980"
      }
    ],
    "bids": [
      {
        "price": "87194.0",
        "size": "0.00020"
      },
      {
        "price": "87191.6",
        "size": "0.03676"
      }
    ],
    "offset": 12837513,
    "nonce": 3901590619
  },
  "timestamp": 1766059985139,
  "type": "subscribed/order_book"
}

The structure of subsequent update/order_book data is similar to this:

{
  "channel": "order_book:1",
  "offset": 12837556,
  "order_book": {
    "code": 0,
    "asks": [
      {
        "price": "87208.2",
        "size": "0.00000"
      }
    ],
    "bids": [
      {
        "price": "87173.2",
        "size": "0.01389"
      }
    ],
    "offset": 12837556,
    "nonce": 3901590736
  },
  "timestamp": 1766059985417,
  "type": "update/order_book"
}

It is not hard to understand that the first push provides a full snapshot of the orderbook, and subsequent pushes provide incremental data. If size is “0.00000”, it means deleting the corresponding data from the orderbook. Additionally, one must strictly judge based on the auto-incrementing offset; if any single piece of data is missed, you must disconnect and restart from the snapshot.

Cargo run

All the following codes are run using: cargo run, not cargo run --release.

First attempt——220µs

As a novice, the more intuitive idea is definitely—let me just get it running first!

The schematic diagram is as follows (Gemini drawing):

+-----------------------------------------------------------------------+
|             🧠 Core Idea: "Intuitive Mapping" (Intuitive)              |
|   "Whatever the JSON looks like, I'll define that Struct and save it!" |
+-----------------------------------------------------------------------+
        |
        |  1. JSON Input (Incoming Data)
        V
+--------------------------+
| JSON Object              |
|  {                       |
|    "type": "...",        |     serde automatic deserialization
|    "bids": [...],        | ========================>
|    "asks": [...]         |
|  }                       |
+--------------------------+
        |
        |  2. Rust Memory Layout
        V
+-------------------------------------------------------------+
| struct LighterOrderBook (Root)                              |
+-------------------------------------------------------------+
| - channel:   Option<String>                                 |
| - offset:    Option<u64>                                    |
| - type_:     String                                         |
| - timestamp: Option<u64>                                    |
|                                                             |
| [!] Nested Layer                                            |
| - order_book: Option -------------------------------------> +
+-------------------------------------------------------------+
                                                              |
          +---------------------------------------------------+
          |
          v
+-------------------------------------------------------------+
| struct LighterOrderBookData                                 |
+-------------------------------------------------------------+
| - code: u32                                                 |
|                                                             |
| [!] List -> Vec                                             |
| - asks: Vec<LighterOrderBookEntry> ---+                     |
| - bids: Vec<LighterOrderBookEntry> ---|---+                 |
+-------------------------------------------------------------+
                                        |   |
          +-----------------------------+   |
          |                                 |
          v                                 v
+----------------------------+    +----------------------------+
| Entry (Ask)                |    | Entry (Bid)                |
+----------------------------+    +----------------------------+
| - price: String (Heap 🐢)  |    | - price: String (Heap 🐢)  |
| - size:  String (Heap 🐢)  |    | - size:  String (Heap 🐢)  |
+----------------------------+    +----------------------------+
       ^                                ^
       |                                |
       +--- "Since it's a List, I'll save two Lists and update locally" ---+

So the immediate brainless thought is to create a Rust Struct following the JSON structure:

Aren’t bids and asks just two Lists? Then I’ll just store two Lists and update them locally, done.

#[derive(Debug, Deserialize)]
pub struct LighterOrderBookEntry {
    pub price: String,
    pub size: String,
}

#[derive(Debug, Deserialize)]
pub struct LighterOrderBookData {
    pub code: u32,
    pub asks: Vec<LighterOrderBookEntry>,
    pub bids: Vec<LighterOrderBookEntry>,
}

#[derive(Debug, Deserialize)]
pub struct LighterOrderBook {
    pub channel: Option<String>,
    pub offset: Option<u64>,
    pub order_book: Option<LighterOrderBookData>,
    #[serde(rename = "type")]
    pub type_: String,
    pub timestamp: Option<u64>,
}

Then, for every piece of data received from the Websocket, we go and update our local Orderbook:

fn update_order_book_data(
    order_book_data: &mut LighterOrderBookData,
    updated_entries: &LighterOrderBookData,
) {
    for update_order in &updated_entries.asks {
        let new_size_float: f64 =
            update_order.size.parse().unwrap();

        if new_size_float == 0.0 {
            order_book_data.asks.retain(|order| {
                order.price != update_order.price
            });
        } else if let Some(existing_order) = order_book_data
            .asks
            .iter_mut()
            .find(|order| order.price == update_order.price)
        {
            existing_order.size = update_order.size.clone();
        } else {
            order_book_data.asks.push(
                LighterOrderBookEntry {
                    price: update_order.price.clone(),
                    size: update_order.size.clone(),
                },
            );
        }
    }

    for update_order in &updated_entries.bids {
        // bids work the same way
    }
}

Then let’s monitor the time consumption here:

let start_time = Instant::now();
update_order_book_data(order_book_data, update_order_book);
let duration = start_time.elapsed();
total_parsed_txs += 1;
total_parse_time_cost += duration;
let average_parse_time_cost =
    total_parse_time_cost / total_parsed_txs;
println!("update_order_book_data time cost: {:?}", duration);
println!(
    "update_order_book_data average time cost: {:?}",
    average_parse_time_cost
);

The results are as follows:

update_order_book_data average time cost: 225.065µs
update_order_book_data time cost: 181.152µs
update_order_book_data average time cost: 224.999µs
update_order_book_data time cost: 503.89µs
update_order_book_data average time cost: 225.416µs
update_order_book_data time cost: 115.778µs
update_order_book_data average time cost: 225.252µs
update_order_book_data time cost: 80.782µs
update_order_book_data average time cost: 225.037µs
update_order_book_data time cost: 79.84µs
update_order_book_data average time cost: 224.821µs
update_order_book_data time cost: 85.672µs
update_order_book_data average time cost: 224.614µs
update_order_book_data time cost: 320.204µs
update_order_book_data average time cost: 224.756µs

Here is one piece of good news and another piece of good news. Good news #1 is that for a Rust super-newbie like me, this code runs, and the Orderbook is updating correctly (confirmed via logs elsewhere). The second piece of good news is—this average time of 230µs looks way too fast!

BTreeMap Refined——20µs (4µs average)

But obviously, we all know this implementation is very clumsy. We learned earlier that “actually asks and bids both have over 2500 entries,” so operations like retain and .iter_mut().find()—traversing arrays + copying memory every time—are pretty deadly.

So, let’s not be misled by the structure returned by the exchange. Let’s optimize our storage approach here. Instead of storing each Price-Size pair locally as an element of a Vec, we put them into a BTreeMap.

We will make some modifications to the storage structure, using Price as the Key. At the same time, to avoid the pitfall of String price comparison failing to correctly identify the spread (lowest Asks and highest Bids), we introduce the ordered-float crate. Our local Orderbook definition is now as follows:

#[derive(Debug, Deserialize, Default)]
pub struct FastLighterOrderBookData {
    pub code: u32,
    // Key: Price, Value: Size
    pub asks: BTreeMap<OrderedFloat<f64>, String>,
    pub bids: BTreeMap<OrderedFloat<f64>, String>,
}

This way, our function for parsing + updating the local Orderbook can be significantly simplified:

fn update_order_book_data(
    local_order_book_data: &mut FastLighterOrderBookData,
    updated_entries: &LighterOrderBookData,
) {
    for update_order_entry in &updated_entries.asks {
        let new_size_float: f64 =
            update_order_entry.size.parse().unwrap();
        let new_price_float_key = OrderedFloat(
            update_order_entry.price.parse().unwrap(),
        );

        if new_size_float == 0.0 {
            local_order_book_data
                .asks
                .remove(&new_price_float_key);
        } else {
            local_order_book_data.asks.insert(
                new_price_float_key,
                update_order_entry.size.clone(),
            );
        }
    }

    for update_order_entry in &updated_entries.bids {
        let new_size_float: f64 =
            update_order_entry.size.parse().unwrap();
        let new_price_float_key = OrderedFloat(
            update_order_entry.price.parse().unwrap(),
        );

        if new_size_float == 0.0 {
            local_order_book_data
                .bids
                .remove(&new_price_float_key);
        } else {
            local_order_book_data.bids.insert(
                new_price_float_key,
                update_order_entry.size.clone(),
            );
        }
    }
}

No more iterating through arrays or copying memory. Let’s run it for a while and see the effect:

update_order_book_data average time cost: 18.807µs
update_order_book_data time cost: 13.515µs
update_order_book_data average time cost: 18.795µs
update_order_book_data time cost: 12.674µs
update_order_book_data average time cost: 18.782µs
update_order_book_data time cost: 33.072µs
update_order_book_data average time cost: 18.812µs
update_order_book_data time cost: 9.097µs
update_order_book_data average time cost: 18.792µs
update_order_book_data time cost: 27.842µs
update_order_book_data average time cost: 18.811µs

Alright, a 10x performance improvement just like that!

Ratio

The magnitude here is already quite small. To get more precise statistics and not be affected by the actual number of updates, let’s modify the statistics part of the code:

let update_order_book_total_length =
update_order_book.asks.len()
    + update_order_book.bids.len();
let start_time = Instant::now();
update_order_book_data(
    &mut local_lighter_order_book,
    update_order_book,
);
let duration = start_time.elapsed();
let ratio_duration = duration
    / update_order_book_total_length.try_into().unwrap();
total_parsed_txs += 1;
total_parse_time_cost += ratio_duration;
let average_parse_time_cost =
    total_parse_time_cost / total_parsed_txs;
println!(
    "update_order_book_data weighted time cost: {:?}, actual total time: {:?}",
    ratio_duration, duration
);
println!(
    "update_order_book_data weighted average time cost: {:?}",
    average_parse_time_cost
);

The time cost for each statistic is divided by the actual number of bids/asks that need to be modified to obtain an average time cost. Re-running the code above, the data is as follows:

update_order_book_data weighted time cost: 5.48µs, actual total time: 5.48µs
update_order_book_data weighted average time cost: 3.994µs
update_order_book_data weighted time cost: 1.77µs, actual total time: 37.181µs
update_order_book_data weighted average time cost: 3.989µs
update_order_book_data weighted time cost: 1.38µs, actual total time: 48.311µs
update_order_book_data weighted average time cost: 3.982µs
update_order_book_data weighted time cost: 1.32µs, actual total time: 33.002µs
update_order_book_data weighted average time cost: 3.976µs
update_order_book_data weighted time cost: 1.188µs, actual total time: 30.898µs
update_order_book_data weighted average time cost: 3.969µs

`String` to `f64`——12µs (2.9µs average)

Now there is another obvious place that can be optimized, which is here:

BTreeMap<OrderedFloat<f64>, String>

Since the data provided by the exchange are strings, I instinctively wrote String. However, since:

A String is a wrapper over a Vec.

Allocating String happens on the heap. For a program that modifies data at high frequency, using memory on the heap is definitely not a good idea:

if you are allocating, you are losing

So, a small optimization here is to change everything to an f64 structure, as follows:

#[derive(Debug, Deserialize, Default)]
pub struct FastLighterOrderBookData {
    pub code: u32,
    // Key: Price, Value: Size
    pub asks: BTreeMap<OrderedFloat<f64>, f64>,
    pub bids: BTreeMap<OrderedFloat<f64>, f64>,
}

Then, every time WS data is received, convert String to f64 and store it. The running effect is now as follows:

update_order_book_data weighted time cost: 3.354µs, actual total time: 20.128µs
update_order_book_data weighted average time cost: 2.909µs
update_order_book_data weighted time cost: 2.995µs, actual total time: 5.991µs
update_order_book_data weighted average time cost: 2.91µs
update_order_book_data weighted time cost: 4.114µs, actual total time: 12.343µs
update_order_book_data weighted average time cost: 2.914µs

It steadily dropped further from 4µs to 2.9µs!

Tick Table——**µs (990ns average)

Using BTreeMap has already vastly improved our speed, but the tree structure and pointers behind BTreeMap are still too slow. I asked Gemini to draw an ASCII chart to help everyone understand:

+------------------+
|   Stack          |
+------------------+
|  root_ptr      ----------------+
+------------------+             | (1. Pointer Jump)
                                 v
                        +---------------------------+
                        | Heap Addr: 0x1000 (Root)  | <--- 1st Memory Read
                        |---------------------------|      (Time ~70ns)
                        | Keys: [88000, 89000...]   |
                        | Children Ptrs:            |
                        |  [0x5000, 0x9000...]      |
                        +------------+--------------+
                                     |
                                     | (2. Pointer Jump - Pointer Chasing)
                                     v
                        +---------------------------+
                        | Heap Addr: 0x9000 (Node)  | <--- 2nd Memory Read
                        |---------------------------|      (Time ~70ns)
                        | Keys: [88700, 88800...]   |
                        | Children Ptrs:            |
                        |  [0x2000, 0x3000...]      |
                        +------------+--------------+
                                     |
                                     | (3. Pointer Jump - Pointer Chasing)
                                     v
                        +---------------------------+
                        | Heap Addr: 0x3000 (Leaf)  | <--- 3rd Memory Read
                        |---------------------------|      (Time ~70ns)
                        | Key: 88797.9              |
                        | Value: 0.03268            | <--- Finally found data
                        +---------------------------+

When we look up the price 88797.9, the CPU must play “hopscotch,” jumping from one memory address to another completely unrelated memory address. Every Pointer Chasing can cause a Cache Miss, forcing the CPU to stop and wait for the slow main memory to respond.

The idea for further optimization now is to change the tree structure operation to an array operation—the Tick Table.

const TICK_SIZE_MULTIPLIER: f64 = 10.0; // corresponds to tick_size 0.1
const MAX_PRICE_CAPACITY: usize = 2_000_000;

#[derive(Debug, Deserialize, Default)]
pub struct FlatLighterOrderBookData {
    pub code: u32,
    // {"price":"88797.9","size":"0.03268"}
    // ->
    // asks[887979] -> "0.03268"
    pub asks: Vec<f64>,
    pub bids: Vec<f64>,

    pub best_ask_idx: usize,
    pub best_bid_idx: usize,
}

The principle is as follows: Transform Price * TICK_SIZE into an integer to serve as the Key of the array, and use size directly as the Value of the array. For example, assuming we are processing BTC here, the data returned by the exchange is precise to 0.1 (see above), so we can multiply the actual data by 10 to use as the Key. At the same time, we maintain two cursors—best_ask_idx and best_bid_idx—to easily obtain the BBO (Best Bid Offer).

The schematic is as follows:

Array Index (Index)	Restored Price (Price)	Asks Array (Size)	Bids Array (Size)	Cursor State (Cursors)
887981	88798.1	`0.55000`	`0.00000`
887980	88798.0	`1.02000`	`0.00000`	⬅️ best_ask_idx (Sell 1)
887979	88797.9	`0.00000`	`0.03268`	⬅️ best_bid_idx (Buy 1)
887978	88797.8	`0.00000`	`0.15000`
887977	88797.7	`0.00000`	`0.00000`	(Empty slot, size is 0)

At this time, updating the price is simply updating directly within the array:

#[inline(always)]
fn price_to_index(price: f64) -> usize {
    // Adding 0.5 is to handle floating point precision errors (epsilon), e.g., 0.999999 -> 1.0
    (price * TICK_SIZE_MULTIPLIER + 0.5) as usize
}

pub fn update(&mut self, price: f64, size: f64, is_bid: bool) {
    let price_idx = Self::price_to_index(price);

    if price_idx >= MAX_PRICE_CAPACITY {
        eprintln!("Error: Price {} out of bounds", price);
        return;
    }

    if is_bid {
        self.bids[price_idx] = size;

        if size > 0.0 {
            if price_idx > self.best_bid_idx {
                self.best_bid_idx = price_idx;
            }
            // Same as current best_bid_idx and size == 0
            // best_bid_idx is deleted
        } else if price_idx == self.best_bid_idx {
            let mut scan_price_idx = price_idx;

            loop {
                if scan_price_idx == 0 {
                    self.best_bid_idx = 0; // liquidity is gone
                    break;
                }
                scan_price_idx -= 1;
                if self.bids[scan_price_idx] > 0.0 {
                    self.best_bid_idx = scan_price_idx;
                    break;
                }
            }
        }
    } else {
        self.asks[price_idx] = size;
        
        // The rest is similar to Bids above
    }
}

Let’s run it again to see the time consumption this time.

update_order_book_data weighted time cost: 618ns, actual total time: 2.475µs
update_order_book_data weighted average time cost: 959ns
update_order_book_data weighted time cost: 1.245µs, actual total time: 4.98µs
update_order_book_data weighted average time cost: 963ns
update_order_book_data weighted time cost: 2.274µs, actual total time: 2.274µs
update_order_book_data weighted average time cost: 978ns
update_order_book_data weighted time cost: 1.132µs, actual total time: 2.264µs
update_order_book_data weighted average time cost: 980ns
update_order_book_data weighted time cost: 1.613µs, actual total time: 1.613µs
update_order_book_data weighted average time cost: 987ns
update_order_book_data weighted time cost: 663ns, actual total time: 2.655µs
update_order_book_data weighted average time cost: 984ns
update_order_book_data weighted time cost: 1.703µs, actual total time: 1.703µs
update_order_book_data weighted average time cost: 992ns

We have further reduced the update time to around 990ns!

Finally, let’s cargo run --release and check the results:

update_order_book_data weighted time cost: 576ns, actual total time: 4.608µs
update_order_book_data weighted average time cost: 477ns
update_order_book_data weighted time cost: 1.052µs, actual total time: 1.052µs
update_order_book_data weighted average time cost: 483ns
update_order_book_data weighted time cost: 127ns, actual total time: 1.273µs
update_order_book_data weighted average time cost: 479ns
update_order_book_data weighted time cost: 360ns, actual total time: 1.443µs
update_order_book_data weighted average time cost: 478ns
update_order_book_data weighted time cost: 1.372µs, actual total time: 1.372µs
update_order_book_data weighted average time cost: 488ns
update_order_book_data weighted time cost: 947ns, actual total time: 1.894µs
update_order_book_data weighted average time cost: 492ns

492ns!

Summary

We went step by step from mimicking the exchange’s asks and bids arrays, to switching to BTreeMap, to the small optimization of String -> f64, and finally to the practice of using a Tick Table. Starting from writing a program that looked practically usable, I learned some content about Rust (and some things like the Tokio runtime which were not included in this article).

Of course, we also have to look back occasionally to see how much value our optimization really has, after all— https://apidocs.lighter.xyz/docs/account-types

Premium Account (Opt-in) – Suitable for HFT, the lowest latency on Lighter.

Fees: 0.002% Maker, 0.02% Taker
maker/cancel latency: 0ms
taker latency: 150ms
part of volume quota program

Standard Account (Default) – Suitable for retail and latency insensitive traders.

fees: 0 maker / 0 taker
taker latency: 300ms
maker: 200ms
cancel order: 100ms

How much?! 150ms?! Suitable for HFT?!

#English #Crypto