Pico

A Modular Language Model Development Toolkit

Building small language models remains more of an art than a science. We believe it shouldn't be.

Pico provides a lightweight, modular framework for systematic, hypothesis-driven research. Built around two core libraries: pico-train for model training and pico-analyze for in-depth analysis, Pico creates a sandbox for researchers to develop and test new ideas.

Training Made Easy

pico-train makes the process of training models simple and efficient.

With pico-train, you can train language models of various sizes with minimal configuration. The framework handles the complexities of distributed training, gradient accumulation, and checkpoint management, allowing researchers to focus on experimenting with model architectures and training paradigms.

Small-Scale Focus

Train and study models from 1M to 1B parameters, making experimentation with training paradigms practical and accessible.

Advanced Checkpointing

Access model activations, gradients, and other rich information throughout training for mechanistic interpretability research.

Easy Retraining

Simple, modular codebase designed for researchers to modify and retrain the entire model suite with custom training paradigms.

PyTorch Lightning

Built on PyTorch Lightning for efficient, scalable training with minimal boilerplate code.

Minimal Dependencies

Lightweight framework with only essential dependencies, making it easy to install and modify.

Research Ready

Designed with researchers in mind, providing tools and flexibility needed for academic exploration.

Learning Dynamics Revealed

pico-analyze provides comprehensive tooling to capture and analyze training metrics, enabling researchers to understand how models learn.

Out of the box, pico-analyze provides a suite of tools to capture and analyze training metrics, including:

  • Convergence Rates

    Compute layer convergence rates across model sizes using automatically stored activation checkpoints.

  • Effective Rank

    Analyze dimensional utilization across layers to understand how models distribute complexity and identify potential bottlenecks.

  • Gradient Magnitude

    Track how gradient magnitudes evolve during training to understand optimization dynamics and identify potential training instabilities.

  • Model Sparsity

    Measure the percentage of near-zero weights in models to understand pruning potential and efficiency.

Using checkpoints from pico-train, the analysis framework pico-analyze lets you extract critical insights about model behavior throughout training. These insights can help identify optimization issues and guide architectural improvements.

Built with ❤ by the Pico team

Code and Artifacts are licensed under Apache License 2.0