- Valkyrie: scalable, cloud-native infrastructure for evaluating AI agents across benchmarks.
- FormalProofBench: benchmark for evaluating whether frontier models can write graduate-level Lean proofs that pass formal verification.
- Marin: open lab for building fully open foundation models with transparent experiments, data, and training code.
- REAL: benchmark for autonomous agents on deterministic simulations of real websites.
- FAIR principles for AI models: Scientific Data paper proposing practical FAIR principles for AI models, with an application to accelerated high-energy diffraction microscopy.
- TensorRT tutorial: tutorial on optimizing deep learning inference with NVIDIA TensorRT; video.
- Meta Sparse: estimating task relationships using subnetwork similarity; report.