Simple Spread¶
This environment is part of the MPE environments. Please read that page first for general information.
Import |
|
|---|---|
Actions |
Discrete/Continuous |
Parallel API |
Yes |
Manual Control |
No |
Agents |
|
Agents |
3 |
Action Shape |
(5) |
Action Values |
Discrete(5)/Box(0.0, 1.0, (5)) |
Observation Shape |
(18) |
Observation Values |
(-inf,inf) |
State Shape |
(54,) |
State Values |
(-inf,inf) |
This environment has N agents, N landmarks (default N=3). At a high level, agents must learn to cover all the landmarks while avoiding collisions.
More specifically, all agents are globally rewarded based on how far the closest agent is to each landmark (sum of the minimum distances). Locally, the agents are penalized if they collide with other agents (-1 for each collision). The relative weights of these rewards can be controlled with the
local_ratio parameter.
Agent observations: [self_vel, self_pos, landmark_rel_positions, other_agent_rel_positions, communication]
Agent action space: [no_action, move_left, move_right, move_down, move_up]
Arguments¶
simple_spread_v3.env(N=3, local_ratio=0.5, max_cycles=25, continuous_actions=False, dynamic_rescaling=False, curriculum=False, num_agent_neighbors=None, num_landmark_neighbors=None)
N: number of agents and landmarks
local_ratio: Weight applied to local reward and global reward. Global reward weight will always be 1 - local reward weight.
max_cycles: number of frames (a step for each agent) until game terminates
continuous_actions: Whether agent action spaces are discrete(default) or continuous
dynamic_rescaling: Whether to rescale the size of agents and landmarks based on the screen size
curriculum: Whether to enable curriculum learning mode. When enabled, training proceeds through
stages that gradually increase task difficulty. Use env.unwrapped.advance_curriculum() to move
to the next stage, or env.unwrapped.set_curriculum_stage(n) to jump to a specific stage.
Curriculum stages:
Stage 0: Agents receive no collision penalty — focus purely on covering landmarks.
Stage 1: Collision penalty is restored — agents must cover landmarks while avoiding each other.
To scale the number of agents/landmarks across stages, recreate the environment with a larger N
and reset the curriculum stage accordingly.
terminate_on_success: When True, the episode terminates as soon as every landmark is covered
by at least one agent (an agent is within distance 0.1 of the landmark). This gives a stronger
training signal than always running to max_cycles, and pairs naturally with curriculum learning.
num_agent_neighbors: Partial observability. Maximum number of other agents each agent
observes, selected by Euclidean distance (nearest first). Observation slots beyond the
available count are zero-padded so the observation shape remains fixed. Communication signals
are also filtered to the same N nearest agents. None (default) = full observability.
simple_spread is generally solvable under PO – agents can learn locally-greedy covering
policies without needing global information.
num_landmark_neighbors: Partial observability. Maximum number of landmarks each agent
observes, selected by Euclidean distance (nearest first). Zero-padded to a fixed size.
None (default) = full observability.
API¶
- class mpe2.simple_spread.simple_spread.env(**kwargs)¶
- class mpe2.simple_spread.simple_spread.raw_env(N=3, local_ratio=0.5, max_cycles=25, continuous_actions=False, render_mode=None, dynamic_rescaling=False, benchmark_data=False, curriculum=False, terminate_on_success=False, num_agent_neighbors=None, num_landmark_neighbors=None)¶
- advance_curriculum()¶
Advance to the next curriculum stage. No-op if already at the final stage.
- property curriculum_stage¶
Current curriculum stage (0-indexed). Only meaningful when curriculum=True.
- set_curriculum_stage(stage)¶
Jump to a specific curriculum stage (0-indexed, clamped to valid range).