Simple Adversary¶

This environment is part of the MPE environments. Please read that page first for general information.

Import	`from mpe2 import simple_adversary_v3`
Actions	Discrete/Continuous
Parallel API	Yes
Manual Control	No
Agents	`agents= [adversary_0, agent_0,agent_1]`
Agents	3
Action Shape	(5)
Action Values	Discrete(5)/Box(0.0, 1.0, (5))
Observation Shape	(8),(10)
Observation Values	(-inf,inf)
State Shape	(28,)
State Values	(-inf,inf)

In this environment, there is 1 adversary (red), N good agents (green), N landmarks (default N=2). All agents observe the position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents are rewarded based on how close the closest one of them is to the target landmark, but negatively rewarded based on how close the adversary is to the target landmark. The adversary is rewarded based on distance to the target, but it doesn’t know which landmark is the target landmark. All rewards are unscaled Euclidean distance (see main MPE documentation for average distance). This means good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary.

Agent observation space: [goal_rel_position, landmark_rel_position, other_agent_rel_positions]

Adversary observation space: [landmark_rel_position, other_agents_rel_positions]

Agent action space: [no_action, move_left, move_right, move_down, move_up]

Adversary action space: [no_action, move_left, move_right, move_down, move_up]

Arguments¶

simple_adversary_v3.env(N=2, max_cycles=25, continuous_actions=False, dynamic_rescaling=False, num_agent_neighbors=None, num_landmark_neighbors=None)

N: number of good agents and landmarks

max_cycles: number of frames (a step for each agent) until game terminates

continuous_actions: Whether agent action spaces are discrete(default) or continuous

dynamic_rescaling: Whether to rescale the size of agents and landmarks based on the screen size

num_agent_neighbors: Partial observability. Maximum number of other agents each agent observes, selected by Euclidean distance (nearest first). Observation slots beyond the available count are zero-padded so the observation shape remains fixed. None (default) = full observability.

.. warning::
    **Solvability under PO is not guaranteed for simple_adversary.**
    The core task requires good agents to *split up* and cover all N
    landmarks simultaneously to confuse the adversary.  Under tight PO
    constraints good agents may not observe every landmark or every
    other good agent, making coordinated coverage much harder or
    impossible.  Investigation and task-specific tuning are recommended
    before applying PO to this environment.

num_landmark_neighbors: Partial observability. Maximum number of landmarks each agent observes, selected by Euclidean distance (nearest first). Zero-padded to a fixed size. Note: the goal landmark relative position is always included in good agents’ observations regardless of this setting (it is private, 2-D information, not a positional slot). None (default) = full observability.

API¶

class mpe2.simple_adversary.simple_adversary.env(**kwargs)¶

class mpe2.simple_adversary.simple_adversary.raw_env(N=2, max_cycles=25, continuous_actions=False, render_mode=None, dynamic_rescaling=False, benchmark_data=False, num_agent_neighbors=None, num_landmark_neighbors=None)¶