Work | Nikil Ravi

Valkyrie: scalable, cloud-native infrastructure for evaluating AI agents across benchmarks.
FormalProofBench: benchmark for evaluating whether frontier models can write graduate-level Lean proofs that pass formal verification.
Marin: open lab for building fully open foundation models with transparent experiments, data, and training code.
REAL: benchmark for autonomous agents on deterministic simulations of real websites.
G-Mixup for Graph Data Augmentation: Stanford CS224W project/tutorial implementing graphon-based mixup for graph data augmentation in PyTorch Geometric.
FAIR principles for AI models: Scientific Data paper proposing practical FAIR principles for AI models, with an application to accelerated high-energy diffraction microscopy.
TensorRT tutorial: tutorial on optimizing deep learning inference with NVIDIA TensorRT; video.
Meta Sparse: estimating task relationships using subnetwork similarity; report.