Collect Treasure¶

This environment is part of the MPE environments. Please read that page first for general information.

Import	`from mpe2 import collect_treasure_v1`
Actions	Discrete/Continuous
Parallel API	Yes
Manual Control	No
Agents	`agents= [collector_0, ..., collector_5, deposit_0, deposit_1]`
Agents	8 (default: 6 collectors + 2 deposits)
Action Shape	(5,)
Action Values	Discrete(5)/Box(0.0, 1.0, (5,))
Observation Shape	(86,) for collectors, (84,) for deposits (default config)
Observation Values	(-inf, inf)
State Shape	(684,) (default config)
State Values	(-inf, inf)

A cooperative multi-agent task in which collector agents must pick up treasures and deliver them to the matching deposit agent.

There are two types of agents:

Collectors (grey by default): roam the environment to pick up treasure landmarks and carry them to the appropriate deposit agent. A collector can hold at most one treasure at a time. When holding a treasure it changes color to match the treasure’s type.
Deposits (dark-tinted, color-coded by type): goal zones. Each deposit accepts only one treasure type. Deposits try to position themselves near collectors that are carrying matching treasure.

Treasures are landmarks that appear at random positions. When a collector touches a treasure (is within collision distance), the collector picks it up and the treasure disappears. The treasure then respawns at a new random location on the next step. When a collector carrying a treasure touches the matching deposit agent, the treasure is delivered, the collector’s inventory is cleared, and it turns grey again.

Reward structure (shared globally + shaped locally):

+5 global reward each time a collector is touching a live treasure while not holding anything
+5 global reward each time a collector is touching its matching deposit while carrying the right treasure type
-0.1 * distance shaping for collectors (toward nearest treasure or matching deposit)
-0.1 * distance shaping for deposits (toward collectors carrying matching treasure, or toward the centroid of all collectors if none are carrying their type)
-5 per pair of collectors that are overlapping (collision penalty)

Observations: Each agent observes its own position and velocity. Collectors additionally observe a one-hot encoding of what they are holding. Every agent then sees all other agents sorted by distance, each described by relative position, velocity, and a 4-element encoding [deposit_type_0, deposit_type_1, holding_type_0, holding_type_1]. Finally, every agent observes all treasures sorted by distance, each described by relative position and a one-hot type encoding. Dead (just-picked-up) treasures are shown with zero relative position and zero type encoding.

Agent action space: [no_action, move_left, move_right, move_down, move_up]

Arguments¶

collect_treasure_v1.env(
    num_collectors=6,
    num_deposits=2,
    num_treasures=6,
    max_cycles=25,
    continuous_actions=False,
    dynamic_rescaling=False,
)

num_collectors: number of collector agents

num_deposits: number of deposit agents (also sets the number of treasure types)

num_treasures: number of treasure landmarks in the world

max_cycles: number of frames (a step for each agent) until game terminates

continuous_actions: Whether agent action spaces are discrete (default) or continuous

dynamic_rescaling: Whether to rescale the size of agents and landmarks based on the screen size

API¶

class mpe2.collect_treasure.collect_treasure.env(**kwargs)¶

class mpe2.collect_treasure.collect_treasure.raw_env(num_collectors=6, num_deposits=2, num_treasures=6, max_cycles=25, continuous_actions=False, render_mode=None, dynamic_rescaling=False, benchmark_data=False)¶