Simple Tag¶
This environment is part of the MPE environments. Please read that page first for general information.
Import |
|
|---|---|
Actions |
Discrete/Continuous |
Parallel API |
Yes |
Manual Control |
No |
Agents |
|
Agents |
4 |
Action Shape |
(5) |
Action Values |
Discrete(5)/Box(0.0, 1.0, (50)) |
Observation Shape |
(14),(16) |
Observation Values |
(-inf,inf) |
State Shape |
(62,) |
State Values |
(-inf,inf) |
This is a predator-prey environment. Good agents (green) are faster and receive a negative reward for being hit by adversaries (red) (-10 for each collision). Adversaries are slower and are rewarded for hitting good agents (+10 for each collision). Obstacles (large black circles) block the way. By default, there is 1 good agent, 3 adversaries and 2 obstacles.
So that good agents don’t run to infinity, they are also penalized for exiting the area by the following function:
def bound(x):
if x < 0.9:
return 0
if x < 1.0:
return (x - 0.9) * 10
return min(np.exp(2 * x - 2), 10)
Agent and adversary observations: [self_vel, self_pos, landmark_rel_positions, other_agent_rel_positions, other_agent_velocities]
Agent and adversary action space: [no_action, move_left, move_right, move_down, move_up]
Arguments¶
simple_tag_v3.env(num_good=1, num_adversaries=3, num_obstacles=2, max_cycles=25, continuous_actions=False, dynamic_rescaling=False, curriculum=False, num_agent_neighbors=None, num_landmark_neighbors=None)
num_good: number of good agents
num_adversaries: number of adversaries
num_obstacles: number of obstacles
max_cycles: number of frames (a step for each agent) until game terminates
continuous_actions: Whether agent action spaces are discrete(default) or continuous
dynamic_rescaling: Whether to rescale the size of agents and landmarks based on the screen size
curriculum: Whether to enable curriculum learning mode. When enabled, prey (good agents) start
slow and become faster as stages advance, making them progressively harder to catch. Use
env.unwrapped.advance_curriculum() to move to the next stage, or
env.unwrapped.set_curriculum_stage(n) to jump to a specific stage. Stage changes take effect
on the next env.reset().
num_agent_neighbors: Partial observability. Maximum number of other agents each agent
observes, selected by Euclidean distance (nearest first). Observation slots beyond the
available count are zero-padded so the observation shape remains fixed. None (default)
restores full observability (all agents observed) and preserves backwards-compatibility.
Under PO, velocity information is restricted to good agents visible within the neighbour
window; velocity slots for adversaries or padded slots are zero.
num_landmark_neighbors: Partial observability. Maximum number of landmarks (obstacles)
each agent observes, selected by Euclidean distance (nearest first). Zero-padded to a fixed
size when fewer landmarks are available. None (default) = full observability.
Curriculum stages (prey max_speed / accel as fraction of full speed 1.3 / 4.0):
Stage 0: 50% speed — prey is slow and easy to catch.
Stage 1: 75% speed — prey moves at moderate pace.
Stage 2: 100% speed — prey moves at full speed (normal game difficulty).
To scale the number of agents across stages, recreate the environment with updated num_good /
num_adversaries values and reset the curriculum stage accordingly.
terminate_on_success: When True, the episode terminates as soon as every good agent is
simultaneously caught (colliding with at least one adversary).
API¶
- class mpe2.simple_tag.simple_tag.env(**kwargs)¶
- class mpe2.simple_tag.simple_tag.raw_env(num_good=1, num_adversaries=3, num_obstacles=2, max_cycles=25, continuous_actions=False, render_mode=None, dynamic_rescaling=False, benchmark_data=False, curriculum=False, terminate_on_success=False, num_agent_neighbors=None, num_landmark_neighbors=None)¶
- advance_curriculum()¶
Advance to the next curriculum stage. Takes effect on the next env.reset().
- property curriculum_stage¶
Current curriculum stage (0-indexed). Only meaningful when curriculum=True.
- set_curriculum_stage(stage)¶
Jump to a specific curriculum stage (0-indexed, clamped to valid range). Takes effect on the next env.reset().