mne_rt.protocols.RLProtocol#

class mne_rt.protocols.RLProtocol(direction: str = 'up', initial_threshold: float = 0.0, target_hit_rate: float = 0.7, lr: float = 0.05, epsilon: float = 0.05, smoothing: float = 0.0, history_len: int = 50, warmup_windows: int = 20, rng_seed: int | None = None)#

Bases: object

Adaptive NF protocol with reinforcement-learning threshold updates.

Adjusts the decision threshold after every evaluation to maintain a target hit rate using the update rule:

threshold += lr * (hit_rate - target_hit_rate) * running_std

Unlike ThresholdProtocol (which also has an adaptive mode), this protocol tracks a rolling hit rate in a fixed-length window, scales updates by the running standard deviation of recent values, and optionally applies epsilon-greedy exploration: with probability epsilon a reward is given regardless of the threshold. Exploration trials do not count toward the hit-rate used for threshold updates.

During the first warmup_windows calls to evaluate() the threshold is frozen and crossed is always False.

Parameters:
direction{“up”, “down”}

“up” -> reward when value > threshold (e.g., enhance alpha power). “down” -> reward when value < threshold (e.g., suppress beta power). Default is “up”.

initial_thresholdfloat

Starting decision threshold. Default is 0.0.

target_hit_ratefloat

Desired proportion of non-exploration windows that cross the threshold. Must be strictly in (0, 1). Default is 0.70.

lrfloat

Learning rate for threshold updates. Must be > 0. Default is 0.05.

epsilonfloat

Exploration probability. On each call to evaluate(), epsilon is the chance of giving a reward regardless of threshold. Must be in [0, 1). Default is 0.05.

smoothingfloat

EMA smoothing coefficient applied to the raw input before thresholding. Must be in [0, 1). 0.0 disables smoothing. Applied as: smoothed = (1 - smoothing) * new + smoothing * prev. Default is 0.0.

history_lenint

Rolling-window length for hit-rate and running-std estimation. Must be >= 10. Default is 50.

warmup_windowsint

Number of initial evaluations used solely to seed the rolling statistics before any reward can be issued or any threshold update is applied. Must be >= 1. Default is 20.

rng_seedint | None

Seed for the NumPy random generator used for epsilon draws. Default is None (non-deterministic).

Raises:
ValueError

If any parameter is outside its valid range.

Notes

The update rule is direction-aware: when direction="up" a higher threshold raises difficulty; when direction="down" a lower threshold raises difficulty. The sign of the update is therefore flipped for “down” protocols.

Examples

RL-adaptive alpha-up protocol targeting 70 % hit rate:

proto = RLProtocol(
    direction="up",
    initial_threshold=0.5,
    target_hit_rate=0.70,
    lr=0.05,
    epsilon=0.05,
)
for value in nf_stream:
    crossed, magnitude = proto.evaluate(value)
    if crossed:
        send_reward(magnitude)

Added in version 1.0.0.

__init__(direction: str = 'up', initial_threshold: float = 0.0, target_hit_rate: float = 0.7, lr: float = 0.05, epsilon: float = 0.05, smoothing: float = 0.0, history_len: int = 50, warmup_windows: int = 20, rng_seed: int | None = None) None[source]#

Methods

__init__([direction, initial_threshold, ...])

evaluate(value)

Evaluate one NF value and return (crossed, magnitude).

reset()

Reset all adaptive state to initial conditions.

Attributes

hit_rate

Rolling hit rate over non-exploration evaluations (0–1).

n_evaluated

Total number of evaluations since init or last reset().

n_explored

Number of exploration trials (epsilon draws) since init or reset.

threshold

Current decision threshold.

evaluate(value: float) tuple[bool, float][source]#

Evaluate one NF value and return (crossed, magnitude).

Applies optional EMA smoothing, checks the current threshold, draws for epsilon-greedy exploration, updates the rolling hit history (exploration draws excluded), and then applies the RL threshold update. Warmup period suppresses all rewards and threshold updates.

Parameters:
valuefloat

Current NF feature value.

Returns:
crossedbool

True if a reward is issued. May be True due to exploration even when the threshold was not crossed. Always False during warmup.

magnitudefloat

Absolute distance from the current threshold, normalised by the running standard deviation. 0.0 when not rewarded.

Notes

Exploration trials (where the reward is given due to epsilon-greedy) are counted in n_explored but are not recorded in the hit history used for the threshold-update rule.

property hit_rate: float#

Rolling hit rate over non-exploration evaluations (0–1).

Returns 0.0 before any non-exploration evaluations are recorded.

property n_evaluated: int#

Total number of evaluations since init or last reset().

property n_explored: int#

Number of exploration trials (epsilon draws) since init or reset.

reset() None[source]#

Reset all adaptive state to initial conditions.

Restores the threshold to initial_threshold, clears the rolling histories, resets counters and the smoothed value. All constructor parameters (lr, epsilon, target_hit_rate, etc.) are preserved.

property threshold: float#

Current decision threshold.