An Explorer object is used in Agents, receives the current state and action (from the controller Module) and returns an explorative action that is executed instead the given action.
A continuous explorer, that perturbs the resulting action with additive, normally distributed random noise. The exploration has parameter(s) sigma, which are related to the distribution’s standard deviation. In order to allow for negative values of sigma, the real std. derivation is a transformation of sigma according to the expln() function (see pybrain.tools.functions).
Discrete explorers choose one of the available actions from the set of actions. In order to know which actions are available and which action to choose, discrete explorers need access to the module (which has to of class ActionValueTable).
A discrete explorer, that executes the original policy in most cases, but sometimes returns a random action (uniformly drawn) instead. The randomness is controlled by a parameter 0 <= epsilon <= 1. The closer epsilon gets to 0, the more greedy (and less explorative) the agent behaves.
A discrete explorer, that executes the actions with probability that depends on their action values. The boltzmann explorer has a parameter tau (the temperature). for high tau, the actions are nearly equiprobable. for tau close to 0, this action selection becomes greedy.
A discrete explorer, that directly manipulates the ActionValue estimator (table or network) and keeps the changes fixed for one full episode (if episodic) or slowly changes it over time.
TODO: currently only implemented for episodes