MLOps Academy · Lesson

Why Git Alone Cannot Version Data

The limits of Git for gigabyte-sized datasets.

Code Is Tiny, Data Is Huge

Git was built for source code: small text files that change line by line. Your datasets are often gigabytes of binary data, a totally different beast.

Git Stores Every Version

Git keeps a full snapshot of each file version forever. Commit a 2 GB file ten times and your repo can balloon toward 20 GB of history. 😬

All lessons in this course

  1. Why Git Alone Cannot Version Data
  2. Initialize DVC and Track a Dataset
  3. Push Data to Remote Storage
  4. Roll Back to an Earlier Dataset
← Back to MLOps Academy