Gym Environment¶
This guide explains the Gymnasium-compatible environment, its observation and action spaces, render modes, integration patterns with RL libraries, and customization hooks. It also covers seeding, recording, and common troubleshooting.
Contents
- Overview and capabilities
- Initialization and configuration
- Observation and action spaces
- Reset/step/render lifecycle
- Seeding and determinism
- Using custom generators and renderers
- Wrappers, vectorization, and RL integration
- Recording videos and logging
- Reward shaping and termination logic
- Examples (end-to-end)
- Troubleshooting
Overview and capabilities¶
-
Class:
grid_universe.gym_env.GridUniverseEnv
-
Compatible with Gymnasium (supports reset, step, render, action/observation spaces).
-
Observation includes:
-
An RGBA image (texture-rendered snapshot of
State
). -
A structured info dict (agent health/effects/inventory, score/phase/turn, config metadata, message string).
-
-
Reward:
- Default is the change in score since the previous step (delta score).
-
Termination and truncation:
-
terminated
is True if the environment reaches a “win” condition. -
truncated
is True if the agent reaches a “lose” condition (e.g., dead). Use this as a failure terminal for RL.
-
-
Rendering:
-
Texture mode returns a
PIL.Image
(RGBA). -
Human mode shows a window via
PIL.Image.show()
(blocking view on some platforms).
-
Initialization and configuration¶
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate
env = GridUniverseEnv(
render_mode="texture", # "texture" returns PIL.Image in render(); "human" shows a window
render_resolution=640, # width in pixels (height derived from grid aspect)
render_texture_map=None, # use default texture map unless overridden
initial_state_fn=maze_generate, # required: a callable returning a State
width=9,
height=9,
seed=123, # forwarded to the generator
# You can pass any kwargs accepted by the initial_state_fn (maze.generate by default)
num_required_items=1,
num_rewardable_items=1,
num_portals=1,
num_doors=1,
wall_percentage=0.8,
)
obs, info = env.reset()
Observation and action spaces¶
-
Spaces are defined in
__init__
. -
You can choose the observation representation via
observation_type
constructor arg:observation_type="image"
(default): Observation is a dict withimage
andinfo
(described below). This is the RL‑friendly numeric space (GymDict
).observation_type="level"
: Observation is a reconstructed authoring‑timeLevel
object (seelevels.grid.Level
). This exposes full symbolic structure (entities, nested inventory/effects, wiring refs) each step. The observation space becomes a placeholder (Discrete(1)
) because Gym cannot natively specify arbitrary Python objects. Use this mode for research / planning algorithms needing structured world models rather than standard deep RL libraries.
Example:
env = GridUniverseEnv(observation_type="level", width=7, height=7) level_obs, _ = env.reset() # level_obs is a Level instance
-
With
image
mode the observation is a dict with:-
image
:Box(low=0, high=255, shape=(H, W, 4), dtype=uint8)
-
info
:Dict
with nested dicts foragent
,status
,config
, andmessage
(see below)
-
-
agent
sub-dict:-
health
:{ health: int | -1, max_health: int | -1 }
-
effects
: sequence of effect entries:-
id
: int -
type
: "", "IMMUNITY", "PHASING", "SPEED" -
limit_type
: "", "TIME", "USAGE" -
limit_amount
: int (or -1) -
multiplier
: int (SPEED only; -1 otherwise)
-
-
inventory
: sequence of item entries:-
id
: int -
type
: "item" | "key" | "core" | "coin" -
key_id
: str ("" if N/A) -
appearance_name
: str ("" if unknown)
-
-
-
status
sub-dict:-
score
: int -
phase
: "ongoing" | "win" | "lose" -
turn
: int
-
-
config
sub-dict: -
message
field:-
A free-form text string (empty string if None) for narrative or task hints.
-
move_fn
: str (function name) -
objective_fn
: str (function name) -
seed
: int (or -1) -
width
: int -
height
: int
-
-
Action space:
-
Discrete(len(Action))
with Action enum order (UP, DOWN, LEFT, RIGHT, USE_KEY, PICK_UP, WAIT) -
Index mapping:
-
0 → UP
-
1 → DOWN
-
2 → LEFT
-
3 → RIGHT
-
4 → USE_KEY
-
5 → PICK_UP
-
6 → WAIT
-
-
Reset/step/render lifecycle¶
-
reset(seed=None, options=None)
:-
Generates a new
State
usinginitial_state_fn(**kwargs)
from constructor. -
Selects the first
agent_id
by default. -
Builds or reuses
TextureRenderer
for rendering. -
Returns observation dict and info dict (currently empty).
-
-
step(action: np.integer)
:-
Converts integer action to
Action
enum via ordering. -
Applies
grid_universe.step.step
to advance theState
. -
Computes
reward = new_score − old_score
. -
terminated = state.win
;truncated = state.lose
. -
Returns
(obs, reward, terminated, truncated, info)
.
-
-
render(mode=None)
:-
Returns a
PIL.Image
if mode (orrender_mode
) is"texture"
. -
Calls
img.show()
and returnsNone
if"human"
. -
Raises
NotImplementedError
for unknown mode.
-
-
close()
:- No-op (reserved for compatibility).
Minimal loop:
import numpy as np
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate
env = GridUniverseEnv(initial_state_fn=maze_generate, render_mode="texture", width=7, height=7, seed=1)
obs, info = env.reset()
done = False
while not done:
action = np.int64(0) # UP
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
img = env.render() # PIL.Image when render_mode="texture"
img.save("episode_last_frame.png")
Seeding and determinism¶
-
Pass
seed
toGridUniverseEnv
to forward to the generator (examples.maze.generate
), which setsState.seed
. -
Some movement functions (e.g.,
windy_move_fn
) use per‑turn RNG derived from(state.seed, state.turn)
. The renderer’s directory‑based texture choice uses a seed derived fromstate.seed
to pick a file deterministically per state. -
For strict reproducibility:
-
Fix the environment seed in constructor.
-
Avoid non-deterministic policies during tests, or seed the policy RNG.
-
-
Gymnasium’s
reset(seed=...)
param: The current env ignores the reset call’s seed and uses the constructor’s seed forwarded to the generator. If you need per-episode seeding, set it on construction time or implement a custominitial_state_fn
that reads a seed passed viaoptions
and propagate it inreset
.
Using custom generators and renderers¶
-
Custom
initial_state_fn
:-
Provide a function that returns a
grid_universe.state.State
, and pass it asinitial_state_fn
. -
All extra kwargs are forwarded to that function.
-
Example replacing the generator:
from typing import Optional
from grid_universe.state import State
from grid_universe.moves import default_move_fn
from grid_universe.objectives import default_objective_fn
from grid_universe.levels.grid import Level
from grid_universe.levels.factories import create_floor, create_agent, create_exit
from grid_universe.levels.convert import to_state
from grid_universe.gym_env import GridUniverseEnv
def my_small_world(width=5, height=5, seed: Optional[int] = None) -> State:
lvl = Level(width, height, move_fn=default_move_fn, objective_fn=default_objective_fn, seed=seed)
for y in range(height):
for x in range(width):
lvl.add((x, y), create_floor())
lvl.add((1, 1), create_agent())
lvl.add((width - 2, height - 2), create_exit())
return to_state(lvl)
env = GridUniverseEnv(initial_state_fn=my_small_world, width=6, height=4, seed=99, render_mode="texture")
obs, info = env.reset()
-
Custom texture map or resolution:
- The env stores a
TextureRenderer
internally. To customize rendering globally, supplyrender_resolution
andrender_texture_map
in constructor.
- The env stores a
from grid_universe.renderer.texture import DEFAULT_TEXTURE_MAP
env = GridUniverseEnv(
render_mode="texture",
render_resolution=800,
render_texture_map=DEFAULT_TEXTURE_MAP, # or a customized mapping
width=9, height=9, seed=7,
)
Wrappers, vectorization, and RL integration¶
-
Gym wrappers:
- You can wrap
GridUniverseEnv
with standard Gymnasium wrappers (FrameStack, GrayScaleObservation, ResizeObservation, etc.). Many wrappers expect numeric arrays from render; here image is provided viaobs["image"]
, not returned byrender()
on step. Wrap the observation key you need.
- You can wrap
-
Observation key selection:
- Use a custom wrapper to replace
obs
withobs["image"]
or to add embeddings ofinfo
if your agent expects a simpler observation.
- Use a custom wrapper to replace
Example wrapper to expose only the image as observation (works only with observation_type="image"
):
import gymnasium as gym
import numpy as np
class ImageOnlyWrapper(gym.ObservationWrapper):
def __init__(self, env: gym.Env):
super().__init__(env)
h, w = self.observation_space["image"].shape[:2]
self.observation_space = gym.spaces.Box(low=0, high=255, shape=(h, w, 4), dtype=np.uint8)
def observation(self, observation):
return observation["image"]
# Usage:
# env = ImageOnlyWrapper(GridUniverseEnv(...))
-
Vectorized envs:
- Use Gymnasium’s
SyncVectorEnv
/AsyncVectorEnv
to run multiple instances. Ensure each one has its own seed and independentinitial_state_fn
kwargs.
- Use Gymnasium’s
-
RL libraries:
-
Stable-Baselines3: Works with Gymnasium compatibility wrappers. Ensure the observation is a Box; consider
ImageOnlyWrapper
above. -
CleanRL / RLlib: Similar considerations; ensure observation shape/dtype matches the algorithm’s expectations.
-
Recording videos and logging¶
- Because
render(mode="texture")
returns aPIL.Image
, you can manually record frames and assemble a GIF or MP4.
Record frames:
frames = []
obs, info = env.reset()
done = False
while not done:
frames.append(env.render())
obs, reward, terminated, truncated, info = env.step(np.int64(0))
done = terminated or truncated
frames.append(env.render()) # final frame
Save a GIF (Pillow):
from PIL import Image
frames_rgba = [im.convert("RGBA") for im in frames]
frames_rgba[0].save(
"episode.gif",
save_all=True,
append_images=frames_rgba[1:],
duration=200,
loop=0,
)
Save MP4 (imageio-ffmpeg):
import imageio.v3 as iio
import numpy as np
with iio.get_writer("episode.mp4", fps=5) as w:
for im in frames:
w.append_data(np.array(im.convert("RGB")))
Reward shaping and termination logic¶
-
Reward:
- Default reward is delta score per step. To change this, wrap the env and post-process the reward or replace step logic by subclassing.
-
Termination:
-
terminated
(win) is True if objective function is satisfied. -
truncated
(lose) is True if the agent dies or lose condition is set. Treat both as episode terminal in RL loops.
-
-
Shaping strategies:
- Add dense signals (e.g., distance-to-goal) via a wrapper; do not modify
State
directly. Keep the environment source reward consistent and add shaped terms externally for clarity and reproducibility.
- Add dense signals (e.g., distance-to-goal) via a wrapper; do not modify
Examples (end-to-end)¶
Basic random policy loop:
import numpy as np
import gymnasium as gym
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate
env = GridUniverseEnv(initial_state_fn=maze_generate, render_mode="texture", width=7, height=7, seed=3)
obs, info = env.reset()
done = False
while not done:
action = env.action_space.sample().astype(np.int64) # random discrete action
obs, reward, terminated, truncated, info = env.step(action)
if (terminated or truncated):
done = True
env.render().save("random_last.png")
Stable-Baselines3 (with image-only observations):
# pip install stable-baselines3[extra] gymnasium
import gymnasium as gym
import numpy as np
from stable_baselines3 import PPO
from grid_universe.gym_env import GridUniverseEnv
from grid_universe.examples.maze import generate as maze_generate
class ImageOnlyWrapper(gym.ObservationWrapper):
def __init__(self, env):
super().__init__(env)
h, w = self.observation_space["image"].shape[:2]
self.observation_space = gym.spaces.Box(0, 255, shape=(h, w, 4), dtype=np.uint8)
def observation(self, observation):
return observation["image"]
def make_env():
base = GridUniverseEnv(initial_state_fn=maze_generate, render_mode="texture", width=9, height=9, seed=7)
return ImageOnlyWrapper(base)
env = make_env()
model = PPO("CnnPolicy", env, verbose=1)
model.learn(total_timesteps=1000)
Troubleshooting¶
-
“Render returns None”:
render_mode
must be"texture"
to returnPIL.Image
. In"human"
, it callsshow()
and returnsNone
.
-
“Observations are not images”:
- The observation is a dict with
"image"
and"info"
. If your agent expects only an array, wrap to extractobs["image"]
.
- The observation is a dict with
-
“Episodes never end”:
- Confirm your objective function can be satisfied (or lose condition can occur). Check
obs["info"]["status"]["phase"]
and"turn"
.
- Confirm your objective function can be satisfied (or lose condition can occur). Check
-
“Multiple envs have identical layouts”:
- Provide distinct seeds per env instance or pass per-env kwargs to the generator.
-
“High rendering cost”:
-
Reuse env instances across episodes.
-
Reduce
render_resolution
. -
Consider skipping
render()
during training and only render evaluation episodes.
-
-
“Texture map not applied”:
- Ensure
render_texture_map
is passed at construction, and that asset paths resolve underasset_root
.
- Ensure
-
“Gym wrapper errors about spaces”:
- The base observation is a
Dict
space; ensure your wrappers adapt it to the expected shape (e.g.,ImageOnlyWrapper
).
- The base observation is a