Collect Treasure¶
This environment is part of the MPE environments. Please read that page first for general information.
Import |
|
|---|---|
Actions |
Discrete/Continuous |
Parallel API |
Yes |
Manual Control |
No |
Agents |
|
Agents |
8 (default: 6 collectors + 2 deposits) |
Action Shape |
(5,) |
Action Values |
Discrete(5)/Box(0.0, 1.0, (5,)) |
Observation Shape |
(86,) for collectors, (84,) for deposits (default config) |
Observation Values |
(-inf, inf) |
State Shape |
(684,) (default config) |
State Values |
(-inf, inf) |
A cooperative multi-agent task in which collector agents must pick up treasures and deliver them to the matching deposit agent.
There are two types of agents:
Collectors (grey by default): roam the environment to pick up treasure landmarks and carry them to the appropriate deposit agent. A collector can hold at most one treasure at a time. When holding a treasure it changes color to match the treasure’s type.
Deposits (dark-tinted, color-coded by type): goal zones. Each deposit accepts only one treasure type. Deposits try to position themselves near collectors that are carrying matching treasure.
Treasures are landmarks that appear at random positions. When a collector touches a treasure (is within collision distance), the collector picks it up and the treasure disappears. The treasure then respawns at a new random location on the next step. When a collector carrying a treasure touches the matching deposit agent, the treasure is delivered, the collector’s inventory is cleared, and it turns grey again.
Reward structure (shared globally + shaped locally):
+5 global reward each time a collector is touching a live treasure while not holding anything
+5 global reward each time a collector is touching its matching deposit while carrying the right treasure type
-0.1 * distance shaping for collectors (toward nearest treasure or matching deposit)
-0.1 * distance shaping for deposits (toward collectors carrying matching treasure, or toward the centroid of all collectors if none are carrying their type)
-5 per pair of collectors that are overlapping (collision penalty)
Observations: Each agent observes its own position and velocity. Collectors additionally observe a one-hot encoding of what they are holding. Every agent then sees all other agents sorted by distance, each described by relative position, velocity, and a 4-element encoding [deposit_type_0, deposit_type_1, holding_type_0, holding_type_1]. Finally, every agent observes all treasures sorted by distance, each described by relative position and a one-hot type encoding. Dead (just-picked-up) treasures are shown with zero relative position and zero type encoding.
Agent action space: [no_action, move_left, move_right, move_down, move_up]
Arguments¶
collect_treasure_v1.env(
num_collectors=6,
num_deposits=2,
num_treasures=6,
max_cycles=25,
continuous_actions=False,
dynamic_rescaling=False,
)
num_collectors: number of collector agents
num_deposits: number of deposit agents (also sets the number of treasure types)
num_treasures: number of treasure landmarks in the world
max_cycles: number of frames (a step for each agent) until game terminates
continuous_actions: Whether agent action spaces are discrete (default) or continuous
dynamic_rescaling: Whether to rescale the size of agents and landmarks based on the screen
size
API¶
- class mpe2.collect_treasure.collect_treasure.env(**kwargs)¶
- class mpe2.collect_treasure.collect_treasure.raw_env(num_collectors=6, num_deposits=2, num_treasures=6, max_cycles=25, continuous_actions=False, render_mode=None, dynamic_rescaling=False, benchmark_data=False)¶