Lean Repl Cryogenics
Circa November, 2025Contributors: Nehal Patel (G-Research)
Introduction
Commonly used Lean 4 Read-Eval-Print-Loops (REPL’s) such as the Lean Community repl (“repl” henceforth) and Pantograph provide machine-to-machine interfaces for incremental, but back-trackable, theorem proving. Endeavors such a Morph offer branch, clone, rewind at enterprise scale.
It occurred to me that cloning running REPL’s with CRIU might be a good middle ground between the checkpointing capabilities built into repl on the one hand and Morph on the other.
This post pulls that thread…
Let’s first start with an existence proof, which demonstrates
- launching a repl (
lake env .../bin/repl) - executing repl commands that
import Mathliband define a theorem with asorry - “freezing” the repl process to disk (without having to kill it)
pdum-criu shell freeze --dir ... --pid ... - rematerializing (“thawing”) the frozen image into an independent, running process:
pdum-criu shell thaw --dir ... - performing a “freeze & thaw” in a single step (
pdum-criu shell beam ...)
Prelude
repl let’s one enter JSON-formatted commands one-by-one and examining the internal Lean 4 state along the way, much in the way one can do in VS Code and the Lean InfoView.pickling state to disk. This pickled state may then be loaded by multiple new repl’s enabling parallel search starting from a common state.
CRIU is a Linux system utility that allows users to leverage kernel capabilities to checkpoint a process tree and restore it later. Filled with Russianpickle.
habemus-papadum-criu is a high-level library and CLI that wraps CRIU to simplify common workflows for freezing and thawing running processes.
pickle vs. CRIU
We will compare checkpointing and restoring repl state to disk, first measuring the performance of the pickle command built into repl and then using CRIU’s checkpointing capabilities. A useful example to keep in mind is a scenario where one has evolved a proof to a certain state and now would like to efficiently spawn multiple copies of the repl from that state to explore search paths in parallel.
repl’s pickle
Here are the raw timings for pickle:
| Command | Time (seconds) |
|---|---|
{"cmd":"import Mathlib\nopen BigOperators\nopen Real\nopen Nat"} |
3.155 |
{"pickleTo":"pickle.olean","env":0} |
0.001 |
{"unpickleEnvFrom":"pickle.olean"} |
2.367 |
The size of pickle.olean is 1.5 KB.
Code
Timing code:
#!/usr/bin/env python3
import contextlib
import json
import subprocess
import sys
import time
REPL_CMD = ["lake", "env", "../../.lake/build/bin/repl"]
def read_blocks():
chunk = []
for line in sys.stdin:
if line.strip() == "":
if chunk:
yield "".join(chunk)
chunk = []
continue
chunk.append(line)
if chunk:
yield "".join(chunk)
def main():
proc = subprocess.Popen(
REPL_CMD, stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True
)
try:
for block in read_blocks():
text = block.rstrip("\n")
if not text:
continue
obj = json.loads(text)
flat = json.dumps(obj, separators=(",", ":"))
start = time.perf_counter()
proc.stdin.write(text + "\n\n")
proc.stdin.flush()
while True:
line = proc.stdout.readline()
if line == "":
raise RuntimeError("REPL terminated")
if line.strip() == "":
break
elapsed = time.perf_counter() - start
print(f"{elapsed:.3f} {flat}", flush=True)
finally:
if proc.stdin:
proc.stdin.close()
with contextlib.suppress(ProcessLookupError):
proc.terminate()
with contextlib.suppress(subprocess.TimeoutExpired, ProcessLookupError):
proc.wait(timeout=5)
if __name__ == "__main__":
import os
os.chdir("repl/test/Mathlib")
main()
CRIU
Here are the timings for CRIU:
| Step | Time (seconds) |
|---|---|
| import Mathlib | 3.211 |
| freeze to disk | 0.829 |
| thaw from disk | 0.510 |
Image size: 455 MB
Code
#!/usr/bin/env python3
"""Launch, freeze, and thaw a goblin to measure restore latency."""
from __future__ import annotations
import os
import shlex
import shutil
import subprocess
import sys
import time
from pathlib import Path
from pdum.criu import goblins
EXECUTABLE = ["lake", "env", str(Path("repl/.lake/build/bin/repl").resolve())]
WORKDIR = Path("repl/test/Mathlib").resolve()
IMAGES_DIR = Path("/tmp/time-demo-image")
THAW_PIDFILE = IMAGES_DIR / "thaw.pid"
THAW_LOG = IMAGES_DIR / "thaw.log"
PRIME_COMMAND = '{"cmd": "import Mathlib\\nopen BigOperators\\nopen Real\\nopen Nat"}'
PAYLOAD_TEXT = '{"cmd": "def f := 37"}' #there is no way to know when that is done, so we send a simple command and wait for the reponse
def _launch_process(command: list[str], workdir: Path) -> subprocess.Popen[bytes]:
"""Launch the target process with pipe-based stdio."""
print(f"Launching {command} (cwd={workdir})")
env = {**os.environ, "UV_USE_IO_URING": "0"}
proc = subprocess.Popen(
command,
cwd=str(workdir),
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
start_new_session=True,
env=env,
)
if proc.stdin is None or proc.stdout is None or proc.stderr is None:
raise RuntimeError("failed to capture stdio pipes for the target process")
return proc
def _send_command_read_response(
writer,
reader,
command: str,
*,
stage: str,
) -> list[str]:
"""Write ``command`` and collect stdout until a blank line terminates the response."""
if writer is None or reader is None:
raise RuntimeError(f"{stage} pipes are not available")
payload = (command.rstrip("\n") + "\n\n").encode("utf-8")
print(f"[{stage}] Sending payload: {command!r}")
writer.write(payload)
writer.flush()
lines: list[str] = []
while True:
raw = reader.readline()
if not raw:
raise RuntimeError(f"{stage} stdout closed before emitting a blank line")
line = raw.decode("utf-8", errors="replace").rstrip("\n")
print(f"[{stage}] Response: {line}")
if line == "":
break
lines.append(line)
return lines
def _prime_process(proc: subprocess.Popen[bytes], command: str) -> None:
"""Send a priming command and ensure the process responds fully."""
_send_command_read_response(proc.stdin, proc.stdout, command, stage="prime")
def _freeze_process(proc: subprocess.Popen[bytes]) -> Path:
"""Freeze the launched process into the requested images directory."""
print(f"Freezing PID {proc.pid} into {IMAGES_DIR}")
log_path = goblins.freeze(
proc.pid,
IMAGES_DIR,
leave_running=False,
shell_job=False,
)
print(f"Freeze complete (log: {log_path})")
return log_path
def _cleanup_process(proc: subprocess.Popen[bytes]) -> None:
"""Best-effort termination of the original process."""
for stream in (proc.stdin, proc.stdout, proc.stderr):
if stream is None:
continue
try:
stream.close()
except Exception:
pass
try:
proc.terminate()
except OSError:
return
try:
proc.wait(timeout=5)
except Exception:
pass
def prepare_images(
executable: list[str],
workdir: Path,
command: str,
) -> tuple[Path, float, float]:
"""Launch the target, issue a command, and freeze its state."""
start_launch = time.perf_counter()
proc = _launch_process(executable, workdir)
try:
_prime_process(proc, command)
prime_done = time.perf_counter()
launch_prime_elapsed = prime_done - start_launch
freeze_start = time.perf_counter()
log_path = _freeze_process(proc)
freeze_elapsed = time.perf_counter() - freeze_start
return log_path, launch_prime_elapsed, freeze_elapsed
finally:
_cleanup_process(proc)
def measure_thaw(images_dir: Path, message: str) -> float:
"""Thaw a goblin, send a message, and time until a blank line appears."""
start = time.perf_counter()
pidfile = THAW_PIDFILE
log_path = THAW_LOG
if pidfile.exists():
pidfile.unlink()
if log_path.exists():
log_path.unlink()
goblin = goblins.thaw(
images_dir,
shell_job=False,
detach=True,
pidfile=pidfile,
log_path=log_path,
)
#print(f"Waiting for PID file {pidfile}")
#pid = _wait_for_pidfile(pidfile)
#print(f"Thawed goblin PID: {pid} (log: {log_path})")
try:
_send_command_read_response(goblin.stdin, goblin.stdout, message, stage="thaw")
finally:
goblin.close()
return time.perf_counter() - start
def _wait_for_pidfile(pidfile: Path, timeout: float = 10.0) -> int:
"""Poll ``pidfile`` until CRIU writes the restored PID."""
deadline = time.time() + timeout
while time.time() < deadline:
try:
data = pidfile.read_text().strip()
except FileNotFoundError:
time.sleep(0.05)
continue
if data:
return int(data)
time.sleep(0.05)
raise TimeoutError(f"timed out waiting for pidfile {pidfile}")
def main() -> None:
images_dir = IMAGES_DIR
executable = EXECUTABLE
workdir = WORKDIR.expanduser().resolve()
try:
if images_dir.exists():
print(f"Removing existing images directory {images_dir}")
shutil.rmtree(images_dir)
images_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
print(f"error: failed to create images directory: {exc}", file=sys.stderr)
raise SystemExit(1) from exc
print(f"Preparing goblin checkpoint in {images_dir}")
try:
log_path, launch_prime_elapsed, freeze_elapsed = prepare_images(executable, workdir, PRIME_COMMAND)
print(f"Startup + prime elapsed: {launch_prime_elapsed:.3f}s")
print(f"Freeze log written to {log_path}")
print(f"Freeze duration: {freeze_elapsed:.3f}s")
print(f"Thawing goblin from {images_dir}")
thaw_elapsed = measure_thaw(images_dir, PAYLOAD_TEXT)
print(f"Thaw + response elapsed: {thaw_elapsed:.3f}s")
except Exception as exc: # pragma: no cover - demo CLI
print(f"error: {exc}", file=sys.stderr)
raise SystemExit(1) from exc
finally:
if images_dir.exists():
print(f"Images directory usage:")
os.system(f"du -sh {shlex.quote(str(images_dir))}")
else:
print(f"Images directory {images_dir} does not exist; skipping usage report")
if __name__ == "__main__":
main()
To freeze repl with CRIU, it must be launched with UV_USE_IO_URING=0 (cf. #2131)
The unpickle / thaw timings demonstrate the major differences between the two approaches. unpickling esssentially replays the prior the session commands while CRIU is loading pages of memory from disk. This simple example, which only imports Mathlib, is not necessarily indicative of realistic scenarios, but this post aims mostly to demonstrate CRIU so we will leave it here for now.
So what?
Not too much, to be honest. CRIU is certainly a fun parlor trick to keep in one’s grimoire, and it is
quite nice to see that repl and the Lean4 machinery are largely compatible with its approach (just don’t forget
UV_USE_IO_URING=0). It’s possible there are scenarios and / or customized point applications that might benefit from this technique. But it has also been my (admittedly limited) experience that the bitter lesson for AI & Lean is that elegant and efficient MCTS gets trounced by simply teaching codex/claude-code how to use echo "import Mathlib\n..." | lake env lean --stdin and that Mathlib lives in .lake/packages so it can grep away.
And also…
CRIU has many advanced features: for instance page, servers and methods for creating efficient incremental snapshots. Using the habemus-papadum-criu library is always going to be a Faustian bargain. Feel free to use it to “have a play,” but make sure you graduate onto learning the details if you decide to use CRIU for anything substantial.
Cheers
If you have questions, comments, corrections or would like to collaborate, please don’t hesitate to reach out.