Collect Treasure

This environment is part of the MPE environments. Please read that page first for general information.

Import

from mpe2 import collect_treasure_v1

Actions

Discrete/Continuous

Parallel API

Yes

Manual Control

No

Agents

agents= [collector_0, ..., collector_5, deposit_0, deposit_1]

Agents

8 (default: 6 collectors + 2 deposits)

Action Shape

(5,)

Action Values

Discrete(5)/Box(0.0, 1.0, (5,))

Observation Shape

(86,) for collectors, (84,) for deposits (default config)

Observation Values

(-inf, inf)

State Shape

(684,) (default config)

State Values

(-inf, inf)

A cooperative multi-agent task in which collector agents must pick up treasures and deliver them to the matching deposit agent.

There are two types of agents:

  • Collectors (grey by default): roam the environment to pick up treasure landmarks and carry them to the appropriate deposit agent. A collector can hold at most one treasure at a time. When holding a treasure it changes color to match the treasure’s type.

  • Deposits (dark-tinted, color-coded by type): goal zones. Each deposit accepts only one treasure type. Deposits try to position themselves near collectors that are carrying matching treasure.

Treasures are landmarks that appear at random positions. When a collector touches a treasure (is within collision distance), the collector picks it up and the treasure disappears. The treasure then respawns at a new random location on the next step. When a collector carrying a treasure touches the matching deposit agent, the treasure is delivered, the collector’s inventory is cleared, and it turns grey again.

Reward structure (shared globally + shaped locally):

  • +5 global reward each time a collector is touching a live treasure while not holding anything

  • +5 global reward each time a collector is touching its matching deposit while carrying the right treasure type

  • -0.1 * distance shaping for collectors (toward nearest treasure or matching deposit)

  • -0.1 * distance shaping for deposits (toward collectors carrying matching treasure, or toward the centroid of all collectors if none are carrying their type)

  • -5 per pair of collectors that are overlapping (collision penalty)

Observations: Each agent observes its own position and velocity. Collectors additionally observe a one-hot encoding of what they are holding. Every agent then sees all other agents sorted by distance, each described by relative position, velocity, and a 4-element encoding [deposit_type_0, deposit_type_1, holding_type_0, holding_type_1]. Finally, every agent observes all treasures sorted by distance, each described by relative position and a one-hot type encoding. Dead (just-picked-up) treasures are shown with zero relative position and zero type encoding.

Agent action space: [no_action, move_left, move_right, move_down, move_up]

Arguments

collect_treasure_v1.env(
    num_collectors=6,
    num_deposits=2,
    num_treasures=6,
    max_cycles=25,
    continuous_actions=False,
    dynamic_rescaling=False,
)

num_collectors: number of collector agents

num_deposits: number of deposit agents (also sets the number of treasure types)

num_treasures: number of treasure landmarks in the world

max_cycles: number of frames (a step for each agent) until game terminates

continuous_actions: Whether agent action spaces are discrete (default) or continuous

dynamic_rescaling: Whether to rescale the size of agents and landmarks based on the screen size

API

class mpe2.collect_treasure.collect_treasure.env(**kwargs)
class mpe2.collect_treasure.collect_treasure.raw_env(num_collectors=6, num_deposits=2, num_treasures=6, max_cycles=25, continuous_actions=False, render_mode=None, dynamic_rescaling=False, benchmark_data=False)