mne_rt.protocols.RLProtocol#

class mne_rt.protocols.RLProtocol(direction: str = 'up', initial_threshold: float = 0.0, target_hit_rate: float = 0.7, lr: float = 0.05, epsilon: float = 0.05, smoothing: float = 0.0, history_len: int = 50, warmup_windows: int = 20, rng_seed: int | None = None)#

Bases: object

Adaptive NF protocol with reinforcement-learning threshold updates.

Adjusts the decision threshold after every evaluation to maintain a target hit rate using the update rule:

threshold += lr * (hit_rate - target_hit_rate) * running_std

Unlike ThresholdProtocol (which also has an adaptive mode), this protocol tracks a rolling hit rate in a fixed-length window, scales updates by the running standard deviation of recent values, and optionally applies epsilon-greedy exploration: with probability epsilon a reward is given regardless of the threshold. Exploration trials do not count toward the hit-rate used for threshold updates.

During the first warmup_windows calls to evaluate() the threshold is frozen and crossed is always False.

Parameters:

direction{“up”, “down”}: “up” -> reward when value > threshold (e.g., enhance alpha power). “down” -> reward when value < threshold (e.g., suppress beta power). Default is “up”.
initial_thresholdfloat: Starting decision threshold. Default is 0.0.
target_hit_ratefloat: Desired proportion of non-exploration windows that cross the threshold. Must be strictly in (0, 1). Default is 0.70.
lrfloat: Learning rate for threshold updates. Must be > 0. Default is 0.05.
epsilonfloat: Exploration probability. On each call to evaluate(), epsilon is the chance of giving a reward regardless of threshold. Must be in [0, 1). Default is 0.05.
smoothingfloat: EMA smoothing coefficient applied to the raw input before thresholding. Must be in [0, 1). 0.0 disables smoothing. Applied as: smoothed = (1 - smoothing) * new + smoothing * prev. Default is 0.0.
history_lenint: Rolling-window length for hit-rate and running-std estimation. Must be >= 10. Default is 50.
warmup_windowsint: Number of initial evaluations used solely to seed the rolling statistics before any reward can be issued or any threshold update is applied. Must be >= 1. Default is 20.
rng_seedint | None: Seed for the NumPy random generator used for epsilon draws. Default is None (non-deterministic).

Raises:

ValueError: If any parameter is outside its valid range.

Notes

The update rule is direction-aware: when direction="up" a higher threshold raises difficulty; when direction="down" a lower threshold raises difficulty. The sign of the update is therefore flipped for “down” protocols.

Examples

RL-adaptive alpha-up protocol targeting 70 % hit rate:

proto = RLProtocol(
    direction="up",
    initial_threshold=0.5,
    target_hit_rate=0.70,
    lr=0.05,
    epsilon=0.05,
)
for value in nf_stream:
    crossed, magnitude = proto.evaluate(value)
    if crossed:
        send_reward(magnitude)

Added in version 1.0.0.

__init__(direction: str = 'up', initial_threshold: float = 0.0, target_hit_rate: float = 0.7, lr: float = 0.05, epsilon: float = 0.05, smoothing: float = 0.0, history_len: int = 50, warmup_windows: int = 20, rng_seed: int | None = None) → None[source]#

Methods

`__init__`([direction, initial_threshold, ...])
`evaluate`(value)	Evaluate one NF value and return (crossed, magnitude).
`reset`()	Reset all adaptive state to initial conditions.

Attributes

`hit_rate`	Rolling hit rate over non-exploration evaluations (0–1).
`n_evaluated`	Total number of evaluations since init or last `reset()`.
`n_explored`	Number of exploration trials (epsilon draws) since init or reset.
`threshold`	Current decision threshold.

evaluate(value: float) → tuple[bool, float][source]#

Evaluate one NF value and return (crossed, magnitude).

Applies optional EMA smoothing, checks the current threshold, draws for epsilon-greedy exploration, updates the rolling hit history (exploration draws excluded), and then applies the RL threshold update. Warmup period suppresses all rewards and threshold updates.

Parameters:

valuefloat: Current NF feature value.

Returns:

crossedbool: True if a reward is issued. May be True due to exploration even when the threshold was not crossed. Always False during warmup.
magnitudefloat: Absolute distance from the current threshold, normalised by the running standard deviation. 0.0 when not rewarded.

Notes

Exploration trials (where the reward is given due to epsilon-greedy) are counted in n_explored but are not recorded in the hit history used for the threshold-update rule.

property hit_rate: float#

Rolling hit rate over non-exploration evaluations (0–1).

Returns 0.0 before any non-exploration evaluations are recorded.

property n_evaluated: int#: Total number of evaluations since init or last reset().

property n_explored: int#: Number of exploration trials (epsilon draws) since init or reset.

reset() → None[source]#

Reset all adaptive state to initial conditions.

Restores the threshold to initial_threshold, clears the rolling histories, resets counters and the smoothed value. All constructor parameters (lr, epsilon, target_hit_rate, etc.) are preserved.

property threshold: float#: Current decision threshold.