Video Depth Anything¶
Annolid includes Video Depth Anything integration for estimating per-frame depth
from a loaded video. The GUI path is the normal workflow, and the same runtime is
available from Python through annolid.depth.run_video_depth_anything.
Use this workflow when you need a depth sidecar for review, visualization, point cloud export, or later 2D-to-3D workflows such as FlyBody.
Requirements¶
Video Depth Anything needs the GUI plus the model runtime dependencies used by
Annolid's ML workflows, including PyTorch and huggingface-hub for checkpoint
downloads. Use an environment created by the one-line installer, or install the
GUI and ML extras:
pip install "annolid[gui,ml]"
For script-only use without the desktop GUI, pip install "annolid[ml]" is
enough.
If you are working from a local checkout:
pip install -e ".[gui,ml]"
For EXR export, also install OpenEXR and Imath in the active environment.
Checkpoints¶
Annolid stores Video Depth Anything checkpoints under:
annolid/depth/checkpoints
The GUI auto-downloads the selected checkpoint on first use. If you want to prefetch checkpoints before a GUI session, use the bundled downloader:
python -m annolid.depth.download_weights --model video_depth_anything_vitl
Useful downloader commands:
python -m annolid.depth.download_weights --list
python -m annolid.depth.download_weights --all
python -m annolid.depth.download_weights --model metric_video_depth_anything_vitb
Existing checkpoint files are not downloaded again. If you use an authenticated
Hugging Face endpoint, set HF_HUB_TOKEN before running the downloader.
GUI Workflow¶
- Open a video in Annolid.
- Open View -> Depth Settings....
- Choose the encoder, resolution, frame limit, target FPS, metric-depth option, and output artifacts.
- Open View -> Video Depth Anything....
- Watch progress in the status bar and review the live depth overlay on the canvas.
- Inspect the generated
depth.ndjsonand any optional rendered outputs.
Streaming mode is enabled by default. It processes one frame at a time, keeps memory use lower on long videos, and emits records incrementally.
Python Workflow¶
For batch jobs, call the same runtime directly:
from pathlib import Path
from annolid.depth import run_video_depth_anything
input_video = Path("videos/mice.mp4")
output_dir = Path("outputs/mice_depth")
result = run_video_depth_anything(
input_video=str(input_video),
output_dir=str(output_dir),
encoder="vitb",
max_res=900,
max_len=500,
target_fps=15,
save_depth_video=True,
save_depth_frames=True,
save_point_clouds=True,
)
print(result["depth_ndjson"])
Outputs¶
Every run writes outputs to the selected output directory. The primary artifact is:
depth.ndjson
Each line is one frame record. The depth map is stored as a base64-encoded
uint16 PNG under otherData.depth_map, with scale metadata for converting
quantized values back to floating-point depth values. Metric depth is in metric
units only when the metric-depth model is enabled.
Optional outputs include:
<video_stem>_vis.mp4whensave_depth_video=Truedepth_frames/whensave_depth_frames=Truepoint_clouds/*.csvwhensave_point_clouds=True<video_stem>_depths.npzwhensave_npz=True<video_stem>_depths_exr/whensave_exr=True
Point-cloud CSVs contain x, y, z, and intensity columns. When region
labels are enabled and matching Annolid annotation files are available, they can
also include a region column.
Decode a Depth Record¶
Use the scale metadata from each record to convert the stored PNG back to a floating-point depth array:
import base64
import io
import json
from pathlib import Path
import numpy as np
from PIL import Image
record = json.loads(Path("outputs/mice_depth/depth.ndjson").read_text().splitlines()[0])
payload = record["otherData"]["depth_map"]
scale = payload["scale"]
image = Image.open(io.BytesIO(base64.b64decode(payload["image_data"])))
depth = (np.asarray(image, dtype=np.float32) / 65535.0) * (
scale["max"] - scale["min"]
) + scale["min"]
Settings Reference¶
| Setting | Effect |
|---|---|
| Encoder | vits, vitb, or vitl; larger encoders can preserve more detail but use more memory. |
| Max resolution | Downscales each frame so the longer side does not exceed this value before inference. |
| Max frames | Limits how many frames are processed; -1 means unlimited. |
| Target FPS | Samples frames to a target FPS; -1 keeps the source rate. |
| Metric depth model | Uses metric_video_depth_anything_* checkpoints and metric-depth scaling. |
| FP32 inference | Forces float32 inference; Annolid also forces FP32 on non-CUDA devices for stable values. |
| Grayscale overlay | Uses grayscale rendering for GUI preview and saved depth frames. |
| Save depth video | Writes <video_stem>_vis.mp4. |
| Save depth frames | Writes rendered PNGs under depth_frames/. |
| Save point clouds | Writes XYZ-intensity CSV files under point_clouds/. |
| Save NPZ / EXR | Writes array stacks for downstream processing; EXR requires OpenEXR and Imath. |
Troubleshooting¶
| Problem | What to check |
|---|---|
Import error for huggingface_hub |
Install the ML runtime with pip install "annolid[ml]" or pip install -e ".[gui,ml]". |
| First run appears slow | Confirm the selected checkpoint is downloading and the active environment has network access. |
| CUDA runs out of memory | Use a smaller encoder, lower max resolution, limit frames, or run on CPU. |
| EXR export fails | Install OpenEXR and Imath, or disable EXR output. |
No depth.ndjson appears |
Check the status message and logs for video decoding, checkpoint, or output-directory errors. |
For related downstream 3D workflows, see Simulation and FlyBody.