0Pricing
Pandas & NumPy Academy · Lesson

Scheduling and Logging Pipeline Runs

Run your pipeline as a Python script from the command line, log start and end times, and use cron or a scheduler for automation.

From Notebook to Script

A pipeline that runs only when a developer manually opens a notebook provides no business value beyond the first run. To run automatically every day, the pipeline must be structured as a Python script that is executable from the command line: python pipeline.py. This requires a if __name__ == '__main__': entry point, command-line argument parsing, and proper logging — the three pillars of a production script.

# pipeline.py
import argparse
import logging
import pandas as pd

def main(config_path):
    logging.info(f'Starting pipeline with config: {config_path}')
    # ... run ETL steps ...
    logging.info('Pipeline complete.')

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--config', default='config.json')
    args = parser.parse_args()
    main(args.config)

Configuring Python Logging

Python's built-in logging module is the correct tool for pipeline logs — not print() statements. Configure a logger with both console output and file output using logging.basicConfig(). Log at the INFO level for normal progress and ERROR for failures. File-based logs persist after the process exits, which is essential for debugging scheduled runs that no one was watching.

import logging
from datetime import date

log_file = f'pipeline_{date.today()}.log'

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info('Logger configured.')

All lessons in this course

  1. Structuring Transformation Steps as Functions
  2. Parameterising Pipelines with Config Dicts
  3. Testing Pipeline Steps with Assertions
  4. Scheduling and Logging Pipeline Runs
← Back to Pandas & NumPy Academy