Video Depth Anything¶

Annolid includes Video Depth Anything integration for estimating per-frame depth from a loaded video. The GUI path is the normal workflow, and the same runtime is available from Python through annolid.depth.run_video_depth_anything.

Use this workflow when you need a depth sidecar for review, visualization, point cloud export, or later 2D-to-3D workflows such as FlyBody.

Requirements¶

Video Depth Anything needs the GUI plus the model runtime dependencies used by Annolid's ML workflows, including PyTorch and huggingface-hub for checkpoint downloads. Use an environment created by the one-line installer, or install the GUI and ML extras:

pip install "annolid[gui,ml]"

For script-only use without the desktop GUI, pip install "annolid[ml]" is enough.

If you are working from a local checkout:

pip install -e ".[gui,ml]"

For EXR export, also install OpenEXR and Imath in the active environment.

Checkpoints¶

Annolid stores Video Depth Anything checkpoints under:

annolid/depth/checkpoints

The GUI auto-downloads the selected checkpoint on first use. If you want to prefetch checkpoints before a GUI session, use the bundled downloader:

python -m annolid.depth.download_weights --model video_depth_anything_vitl

Useful downloader commands:

python -m annolid.depth.download_weights --list
python -m annolid.depth.download_weights --all
python -m annolid.depth.download_weights --model metric_video_depth_anything_vitb

Existing checkpoint files are not downloaded again. If you use an authenticated Hugging Face endpoint, set HF_HUB_TOKEN before running the downloader.

GUI Workflow¶

Open a video in Annolid.
Open View -> Depth Settings....
Choose the encoder, resolution, frame limit, target FPS, metric-depth option, and output artifacts.
Open View -> Video Depth Anything....
Watch progress in the status bar and review the live depth overlay on the canvas.
Inspect the generated depth.ndjson and any optional rendered outputs.

Streaming mode is enabled by default. It processes one frame at a time, keeps memory use lower on long videos, and emits records incrementally.

Python Workflow¶

For batch jobs, call the same runtime directly:

from pathlib import Path

from annolid.depth import run_video_depth_anything

input_video = Path("videos/mice.mp4")
output_dir = Path("outputs/mice_depth")

result = run_video_depth_anything(
    input_video=str(input_video),
    output_dir=str(output_dir),
    encoder="vitb",
    max_res=900,
    max_len=500,
    target_fps=15,
    save_depth_video=True,
    save_depth_frames=True,
    save_point_clouds=True,
)

print(result["depth_ndjson"])

Outputs¶

Every run writes outputs to the selected output directory. The primary artifact is:

depth.ndjson

Each line is one frame record. The depth map is stored as a base64-encoded uint16 PNG under otherData.depth_map, with scale metadata for converting quantized values back to floating-point depth values. Metric depth is in metric units only when the metric-depth model is enabled.

Optional outputs include:

<video_stem>_vis.mp4 when save_depth_video=True
depth_frames/ when save_depth_frames=True
point_clouds/*.csv when save_point_clouds=True
<video_stem>_depths.npz when save_npz=True
<video_stem>_depths_exr/ when save_exr=True

Point-cloud CSVs contain x, y, z, and intensity columns. When region labels are enabled and matching Annolid annotation files are available, they can also include a region column.

Decode a Depth Record¶

Use the scale metadata from each record to convert the stored PNG back to a floating-point depth array:

import base64
import io
import json
from pathlib import Path

import numpy as np
from PIL import Image

record = json.loads(Path("outputs/mice_depth/depth.ndjson").read_text().splitlines()[0])
payload = record["otherData"]["depth_map"]
scale = payload["scale"]

image = Image.open(io.BytesIO(base64.b64decode(payload["image_data"])))
depth = (np.asarray(image, dtype=np.float32) / 65535.0) * (
    scale["max"] - scale["min"]
) + scale["min"]

Settings Reference¶

Setting	Effect
Encoder	`vits`, `vitb`, or `vitl`; larger encoders can preserve more detail but use more memory.
Max resolution	Downscales each frame so the longer side does not exceed this value before inference.
Max frames	Limits how many frames are processed; `-1` means unlimited.
Target FPS	Samples frames to a target FPS; `-1` keeps the source rate.
Metric depth model	Uses `metric_video_depth_anything_*` checkpoints and metric-depth scaling.
FP32 inference	Forces float32 inference; Annolid also forces FP32 on non-CUDA devices for stable values.
Grayscale overlay	Uses grayscale rendering for GUI preview and saved depth frames.
Save depth video	Writes `<video_stem>_vis.mp4`.
Save depth frames	Writes rendered PNGs under `depth_frames/`.
Save point clouds	Writes XYZ-intensity CSV files under `point_clouds/`.
Save NPZ / EXR	Writes array stacks for downstream processing; EXR requires `OpenEXR` and `Imath`.

Troubleshooting¶

Problem	What to check
Import error for `huggingface_hub`	Install the ML runtime with `pip install "annolid[ml]"` or `pip install -e ".[gui,ml]"`.
First run appears slow	Confirm the selected checkpoint is downloading and the active environment has network access.
CUDA runs out of memory	Use a smaller encoder, lower max resolution, limit frames, or run on CPU.
EXR export fails	Install `OpenEXR` and `Imath`, or disable EXR output.
No `depth.ndjson` appears	Check the status message and logs for video decoding, checkpoint, or output-directory errors.

For related downstream 3D workflows, see Simulation and FlyBody.