Skip to content

Gym Environment

This guide explains the Gymnasium-compatible environment, its observation and action spaces, render modes, integration patterns with RL libraries, and customization hooks. It also covers seeding, recording, and common troubleshooting.

Contents

  • Overview and capabilities
  • Initialization and configuration
  • Observation and action spaces
  • Reset/step/render lifecycle
  • Seeding and determinism
  • Using custom generators and renderers
  • Wrappers, vectorization, and RL integration
  • Recording videos and logging
  • Reward shaping and termination logic
  • Examples (end-to-end)
  • Troubleshooting

Overview and capabilities

  • Class: grid_universe.gym_env.GridUniverseEnv

  • Compatible with Gymnasium (supports reset, step, render, action/observation spaces).

  • Observation includes:

    • An RGBA image (texture-rendered snapshot of State).

    • A structured info dict (agent health/effects/inventory, score/phase/turn, config metadata, message string).

  • Reward:

    • Default is the change in score since the previous step (delta score).
  • Termination and truncation:

    • terminated is True if the environment reaches a “win” condition.

    • truncated is True if the agent reaches a “lose” condition (e.g., dead). Use this as a failure terminal for RL.

  • Rendering:

    • Texture mode returns a PIL.Image (RGBA).

    • Human mode shows a window via PIL.Image.show() (blocking view on some platforms).

Initialization and configuration

from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate

env = GridUniverseEnv(
    render_mode="texture",           # "texture" returns PIL.Image in render(); "human" shows a window
    render_resolution=640,           # width in pixels (height derived from grid aspect)
    render_texture_map=None,         # use default texture map unless overridden
    initial_state_fn=maze_generate,  # required: a callable returning a State
    width=9,
    height=9,
    seed=123,                        # forwarded to the generator
    # You can pass any kwargs accepted by the initial_state_fn (maze.generate by default)
    num_required_items=1,
    num_rewardable_items=1,
    num_portals=1,
    num_doors=1,
    wall_percentage=0.8,
)
obs, info = env.reset()

Observation and action spaces

  • Spaces are defined in __init__.

  • You can choose the observation representation via observation_type constructor arg:

    • observation_type="image" (default): Observation is a dict with image and info (described below). This is the RL‑friendly numeric space (Gym Dict).
    • observation_type="level": Observation is a reconstructed authoring‑time Level object (see levels.grid.Level). This exposes full symbolic structure (entities, nested inventory/effects, wiring refs) each step. The observation space becomes a placeholder (Discrete(1)) because Gym cannot natively specify arbitrary Python objects. Use this mode for research / planning algorithms needing structured world models rather than standard deep RL libraries.

    Example:

    env = GridUniverseEnv(observation_type="level", width=7, height=7)
    level_obs, _ = env.reset()   # level_obs is a Level instance
    

  • With image mode the observation is a dict with:

    • image: Box(low=0, high=255, shape=(H, W, 4), dtype=uint8)

    • info: Dict with nested dicts for agent, status, config, and message (see below)

  • agent sub-dict:

    • health: { health: int | -1, max_health: int | -1 }

    • effects: sequence of effect entries:

      • id: int

      • type: "", "IMMUNITY", "PHASING", "SPEED"

      • limit_type: "", "TIME", "USAGE"

      • limit_amount: int (or -1)

      • multiplier: int (SPEED only; -1 otherwise)

    • inventory: sequence of item entries:

      • id: int

      • type: "item" | "key" | "core" | "coin"

      • key_id: str ("" if N/A)

      • appearance_name: str ("" if unknown)

  • status sub-dict:

    • score: int

    • phase: "ongoing" | "win" | "lose"

    • turn: int

  • config sub-dict:

  • message field:

    • A free-form text string (empty string if None) for narrative or task hints.

    • move_fn: str (function name)

    • objective_fn: str (function name)

    • seed: int (or -1)

    • width: int

    • height: int

  • Action space:

    • Discrete(len(Action)) with Action enum order (UP, DOWN, LEFT, RIGHT, USE_KEY, PICK_UP, WAIT)

    • Index mapping:

      • 0 → UP

      • 1 → DOWN

      • 2 → LEFT

      • 3 → RIGHT

      • 4 → USE_KEY

      • 5 → PICK_UP

      • 6 → WAIT

Reset/step/render lifecycle

  • reset(seed=None, options=None):

    • Generates a new State using initial_state_fn(**kwargs) from constructor.

    • Selects the first agent_id by default.

    • Builds or reuses TextureRenderer for rendering.

    • Returns observation dict and info dict (currently empty).

  • step(action: np.integer):

    • Converts integer action to Action enum via ordering.

    • Applies grid_universe.step.step to advance the State.

    • Computes reward = new_score − old_score.

    • terminated = state.win; truncated = state.lose.

    • Returns (obs, reward, terminated, truncated, info).

  • render(mode=None):

    • Returns a PIL.Image if mode (or render_mode) is "texture".

    • Calls img.show() and returns None if "human".

    • Raises NotImplementedError for unknown mode.

  • close():

    • No-op (reserved for compatibility).

Minimal loop:

import numpy as np
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate

env = GridUniverseEnv(initial_state_fn=maze_generate, render_mode="texture", width=7, height=7, seed=1)
obs, info = env.reset()

done = False
while not done:
    action = np.int64(0)  # UP
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

img = env.render()  # PIL.Image when render_mode="texture"
img.save("episode_last_frame.png")

Seeding and determinism

  • Pass seed to GridUniverseEnv to forward to the generator (examples.maze.generate), which sets State.seed.

  • Some movement functions (e.g., windy_move_fn) use per‑turn RNG derived from (state.seed, state.turn). The renderer’s directory‑based texture choice uses a seed derived from state.seed to pick a file deterministically per state.

  • For strict reproducibility:

    • Fix the environment seed in constructor.

    • Avoid non-deterministic policies during tests, or seed the policy RNG.

  • Gymnasium’s reset(seed=...) param: The current env ignores the reset call’s seed and uses the constructor’s seed forwarded to the generator. If you need per-episode seeding, set it on construction time or implement a custom initial_state_fn that reads a seed passed via options and propagate it in reset.

Using custom generators and renderers

  • Custom initial_state_fn:

    • Provide a function that returns a grid_universe.state.State, and pass it as initial_state_fn.

    • All extra kwargs are forwarded to that function.

Example replacing the generator:

from typing import Optional
from grid_universe.state import State
from grid_universe.moves import default_move_fn
from grid_universe.objectives import default_objective_fn
from grid_universe.levels.grid import Level
from grid_universe.levels.factories import create_floor, create_agent, create_exit
from grid_universe.levels.convert import to_state
from grid_universe.gym_env import GridUniverseEnv

def my_small_world(width=5, height=5, seed: Optional[int] = None) -> State:
    lvl = Level(width, height, move_fn=default_move_fn, objective_fn=default_objective_fn, seed=seed)
    for y in range(height):
        for x in range(width):
            lvl.add((x, y), create_floor())
    lvl.add((1, 1), create_agent())
    lvl.add((width - 2, height - 2), create_exit())
    return to_state(lvl)

env = GridUniverseEnv(initial_state_fn=my_small_world, width=6, height=4, seed=99, render_mode="texture")
obs, info = env.reset()
  • Custom texture map or resolution:

    • The env stores a TextureRenderer internally. To customize rendering globally, supply render_resolution and render_texture_map in constructor.
from grid_universe.renderer.texture import DEFAULT_TEXTURE_MAP

env = GridUniverseEnv(
    render_mode="texture",
    render_resolution=800,
    render_texture_map=DEFAULT_TEXTURE_MAP,  # or a customized mapping
    width=9, height=9, seed=7,
)

Wrappers, vectorization, and RL integration

  • Gym wrappers:

    • You can wrap GridUniverseEnv with standard Gymnasium wrappers (FrameStack, GrayScaleObservation, ResizeObservation, etc.). Many wrappers expect numeric arrays from render; here image is provided via obs["image"], not returned by render() on step. Wrap the observation key you need.
  • Observation key selection:

    • Use a custom wrapper to replace obs with obs["image"] or to add embeddings of info if your agent expects a simpler observation.

Example wrapper to expose only the image as observation (works only with observation_type="image"):

import gymnasium as gym
import numpy as np

class ImageOnlyWrapper(gym.ObservationWrapper):
    def __init__(self, env: gym.Env):
        super().__init__(env)
        h, w = self.observation_space["image"].shape[:2]
        self.observation_space = gym.spaces.Box(low=0, high=255, shape=(h, w, 4), dtype=np.uint8)

    def observation(self, observation):
        return observation["image"]

# Usage:
# env = ImageOnlyWrapper(GridUniverseEnv(...))
  • Vectorized envs:

    • Use Gymnasium’s SyncVectorEnv/AsyncVectorEnv to run multiple instances. Ensure each one has its own seed and independent initial_state_fn kwargs.
  • RL libraries:

    • Stable-Baselines3: Works with Gymnasium compatibility wrappers. Ensure the observation is a Box; consider ImageOnlyWrapper above.

    • CleanRL / RLlib: Similar considerations; ensure observation shape/dtype matches the algorithm’s expectations.

Recording videos and logging

  • Because render(mode="texture") returns a PIL.Image, you can manually record frames and assemble a GIF or MP4.

Record frames:

frames = []
obs, info = env.reset()
done = False
while not done:
    frames.append(env.render())
    obs, reward, terminated, truncated, info = env.step(np.int64(0))
    done = terminated or truncated
frames.append(env.render())  # final frame

Save a GIF (Pillow):

from PIL import Image

frames_rgba = [im.convert("RGBA") for im in frames]
frames_rgba[0].save(
    "episode.gif",
    save_all=True,
    append_images=frames_rgba[1:],
    duration=200,
    loop=0,
)

Save MP4 (imageio-ffmpeg):

import imageio.v3 as iio
import numpy as np

with iio.get_writer("episode.mp4", fps=5) as w:
    for im in frames:
        w.append_data(np.array(im.convert("RGB")))

Reward shaping and termination logic

  • Reward:

    • Default reward is delta score per step. To change this, wrap the env and post-process the reward or replace step logic by subclassing.
  • Termination:

    • terminated (win) is True if objective function is satisfied.

    • truncated (lose) is True if the agent dies or lose condition is set. Treat both as episode terminal in RL loops.

  • Shaping strategies:

    • Add dense signals (e.g., distance-to-goal) via a wrapper; do not modify State directly. Keep the environment source reward consistent and add shaped terms externally for clarity and reproducibility.

Examples (end-to-end)

Basic random policy loop:

import numpy as np
import gymnasium as gym
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate

env = GridUniverseEnv(initial_state_fn=maze_generate, render_mode="texture", width=7, height=7, seed=3)
obs, info = env.reset()
done = False

while not done:
    action = env.action_space.sample().astype(np.int64)  # random discrete action
    obs, reward, terminated, truncated, info = env.step(action)
    if (terminated or truncated):
        done = True

env.render().save("random_last.png")

Stable-Baselines3 (with image-only observations):

# pip install stable-baselines3[extra] gymnasium
import gymnasium as gym
import numpy as np
from stable_baselines3 import PPO
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate

class ImageOnlyWrapper(gym.ObservationWrapper):
    def __init__(self, env):
        super().__init__(env)
        h, w = self.observation_space["image"].shape[:2]
        self.observation_space = gym.spaces.Box(0, 255, shape=(h, w, 4), dtype=np.uint8)
    def observation(self, observation):
        return observation["image"]

def make_env():
    base = GridUniverseEnv(initial_state_fn=maze_generate, render_mode="texture", width=9, height=9, seed=7)
    return ImageOnlyWrapper(base)

env = make_env()
model = PPO("CnnPolicy", env, verbose=1)
model.learn(total_timesteps=1000)

Troubleshooting

  • “Render returns None”:

    • render_mode must be "texture" to return PIL.Image. In "human", it calls show() and returns None.
  • “Observations are not images”:

    • The observation is a dict with "image" and "info". If your agent expects only an array, wrap to extract obs["image"].
  • “Episodes never end”:

    • Confirm your objective function can be satisfied (or lose condition can occur). Check obs["info"]["status"]["phase"] and "turn".
  • “Multiple envs have identical layouts”:

    • Provide distinct seeds per env instance or pass per-env kwargs to the generator.
  • “High rendering cost”:

    • Reuse env instances across episodes.

    • Reduce render_resolution.

    • Consider skipping render() during training and only render evaluation episodes.

  • “Texture map not applied”:

    • Ensure render_texture_map is passed at construction, and that asset paths resolve under asset_root.
  • “Gym wrapper errors about spaces”:

    • The base observation is a Dict space; ensure your wrappers adapt it to the expected shape (e.g., ImageOnlyWrapper).