Why Your Feature Store Has a Freshness Ceiling

Dani Lang - Director of Product Marketing
Linda Zhou - Marketing Manager
by Dani Lang, Linda Zhou
March 5, 2026

Intro

A fraud model has milliseconds to decide whether something is legitimate. A signup could be genuine or automated. A transaction could be a real purchase or a stolen card. To do so, the model pulls together signals: device fingerprints, behavioral patterns, network risk data. In Machine Learning systems, these signals are features.

Features are typically managed by a feature store: the infrastructure that computes, stores, and serves them to models in production. But the signals that features represent change constantly. A device fingerprint that was valid an hour ago may no longer reflect the current session.

What is feature freshness?

Feature freshness is that gap between when the underlying data changes and when an updated feature is available at inference time. In contexts like fraud detection, even 100 milliseconds of staleness can mean the difference between catching a fake account and letting it through.

Feature freshness directly impacts the quality of predictions in systems that rely on real-time signals.

Examples of freshness-sensitive ML applications include:

Freshness as an architecture problem

Most teams, when they notice stale features, look at the serving layer first. They tighten freshness windows, shorten sync intervals between offline and online stores, and run batch pipelines more frequently. These are reasonable instincts since the serving layer is the most visible and most tunable part of the system.

But this treats freshness as a configuration problem: a matter of tuning settings without examining the assumptions baked into the architecture itself. If tuning settings do not change the value your model sees, your architecture has a freshness ceiling.

To understand why those fixes hit a ceiling, let's look at how most feature stores actually work.

Where traditional feature stores fall short on freshness

In a typical ML system, getting a feature to a model involves a long chain of steps. Raw data gets extracted from source systems, transformed, aggregated, and written to an offline store. A batch pipeline, usually running on a schedule (say hourly or daily), handles this processing. The results are then synced to an online store, which is what the model actually reads from at inference time.

The feature store, in most implementations, sits at the end of this chain. It's primarily a storage and serving layer. It serves pre-computed values from upstream systems.

This is a storage-first, or cache-first, architecture. It carries a set of assumptions:

  • All data must be moved to one place before features can be computed.
  • Features are computed in batch jobs, not at serving time.
  • Values must be materialized and stored before they can be served.
  • Training and inference run on separate infrastructure, with feature logic often rewritten between the two.

These assumptions create limitations beyond the scope of tuning.

  1. There is a ceiling on freshness you cannot exceed. The online store can only serve what has already been materialized from the most recent pipeline run. Tightening staleness settings does not make the value fresher. It simply re-reads the same data. If the upstream batch has not run, the feature cannot reflect new changes. The architecture defines the freshness ceiling.
  2. Pushing freshness lower means escalating complexity. When serving-layer fixes aren't enough, teams move upstream: shorter batch intervals, streaming pipelines layered onto batch, synchronization logic, and monitoring drift between online and offline stores. Each workaround adds cost and operational burden.

What freshness requires

To minimize the lag between when source data changes and when updated features are available at inference, you need to move beyond tuning the last mile:

  • Compute where the data lives. Rather than waiting for data to be extracted, staged, and moved through a pipeline before any computation can begin, compute features at the source.
  • Support on-demand computation. The system should be able to execute feature logic at query time, rather than only returning whatever was last written to the store.
  • Treat materialization as an optimization, not a requirement. Cache when it helps latency. Do not depend on it for correctness.

This is the shift from storage-first to compute-first architecture. In a compute-first system, freshness is bounded by source latency and execution time, not by materialization schedules.

Tradeoffs

When designing a feature store for real-time ML, there are a few tradeoffs worth thinking through.

Materialized vs. on-demand

Pre-computing features can be faster to serve but inherently stale. On-demand computation is fresh but adds compute at query time.

Most systems force you to choose this tradeoff at the architecture level. A compute-first system allows you to choose per feature.

A transaction sum can be materialized into time-bucketed aggregations for performance, while a continuous buffer fills in the gap between the last backfill and now with live data.

Latency vs. correctness

Serving a cached value is fast. Serving the right value may take a few more milliseconds. In domains like fraud, abuse, and underwriting, correctness wins.

Batch, streaming, and real-time

Some features don't need real-time freshness. A 30-day aggregate can be batch-computed daily. Others need to reflect changes within minutes, making them natural fits for streaming. And some must be computed at the moment of inference.

The architecture should support all three without requiring separate systems or duplicated logic.

How Chalk approaches freshness

Chalk is a feature store built around computation, sometimes also referred to as a feature engine. Where traditional feature stores are storage-first, Chalk is compute-first.

It stores, serves, and reuses features across training and production. But it can also compute features on demand at query time. When you query a feature, Chalk can execute the function directly, traverse dependencies, fetch fresh data, and run transformations on the fly.

Several architectural choices make this possible.

Source-agnostic. Chalk connects directly to underlying data sources, whether that's a database, an API, or a Kafka stream. Data doesn't need to be staged or pre-loaded before features can be computed.

Federate. Chalk fetches data where it lives instead of requiring centralization. A lending decision might combine internal transaction history with real-time income data from Plaid and bureau data from TransUnion in a single query.

Chalk diagram

Unified online and offline definitions. The same Python feature definitions run in both training and production contexts. This eliminates training-serving skew by design.

On-demand computation. Chalk's query planner dynamically builds an optimized execution plan based on feature dependencies and available sources at query time, rather than relying on what was last written to the store.

query planner image

Materialization when it helps, on-demand when it matters. Chalk can pre-compute and cache where latency requires it, while always retaining the ability to compute fresh values when correctness demands it.

The result is a system where materialization is an optimization, not the source of truth.

transaction_count: Windowed[float] = windowed(
    "30m", "6d", "12h",
    materialization={"bucket_duration":"10m"},
    expression=_.transactions[
        _.timestamp <= _.chalk_now,
        _.timestamp > _.chalk_window
    ].count(),
)

For example, a rolling 30-minute transaction count can be partially materialized for performance, with streaming data from Kafka updating the aggregation as new events arrive.

The full path from source data to served feature runs in single-digit milliseconds, even across heterogeneous sources.

Where computation happens defines your freshness ceiling

When computation and serving are unified rather than separated by layers of ETL, freshness stops being something you fight for and becomes something the architecture enables by default.

If tightening your freshness window or lowering staleness does not change your feature values, your architecture is telling you something.

If you are evaluating how your feature platform handles real-time decisions, start by asking: Where does computation happen? If the answer is “upstream in a batch job,” you already know where your freshness ceiling lives.

To go deeper into how compute-first feature systems work in practice, explore our Architectures and reach out to our FDEs.

Want to stay up-to-date with Chalk?

Subscribe for updates on what we’re building (and shipping!) at Chalk