Pandas & NumPy Academy · Lesson

Scheduling and Logging Pipeline Runs

Run your pipeline as a Python script from the command line, log start and end times, and use cron or a scheduler for automation.

From Notebook to Script

A pipeline that runs only when a developer manually opens a notebook provides no business value beyond the first run. To run automatically every day, the pipeline must be structured as a Python script that is executable from the command line: python pipeline.py. This requires a if __name__ == '__main__': entry point, command-line argument parsing, and proper logging — the three pillars of a production script.

# pipeline.py
import argparse
import logging
import pandas as pd

def main(config_path):
    logging.info(f'Starting pipeline with config: {config_path}')
    # ... run ETL steps ...
    logging.info('Pipeline complete.')

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--config', default='config.json')
    args = parser.parse_args()
    main(args.config)

Configuring Python Logging

Python's built-in logging module is the correct tool for pipeline logs — not print() statements. Configure a logger with both console output and file output using logging.basicConfig(). Log at the INFO level for normal progress and ERROR for failures. File-based logs persist after the process exits, which is essential for debugging scheduled runs that no one was watching.

import logging
from datetime import date

log_file = f'pipeline_{date.today()}.log'

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler()
    ]
)

logging.info('Logger configured.')

All lessons in this course

Structuring Transformation Steps as Functions
Parameterising Pipelines with Config Dicts
Testing Pipeline Steps with Assertions
Scheduling and Logging Pipeline Runs

← Back to Pandas & NumPy Academy