[Shorts-2] Papermill=>Adding Parameters to Python Notebooks & executing them like a function

less than 1 minute read

Python Notebooks are great when you are experimenting/ideating. You can quickly test your ideas. And before you realize, you’ll end-up writing the entire code in a notebook. The biggest pain point is to convert the code spanned over 40-50 cells into a python function for looping it over multiple times. This is where PaperMill helps.

Let us understand this with a sample script,

Figure 1: Sample Script

Problem If we want to run the entire script with multiple names = ["def.csv","ghi.csv","abc.csv"],

  1. We will have to push all the code into a function with name as the argument. OR
  2. Restart & Run the notebook while you change the variable name for every file.

Papermill Solution

  1. Papermill tells you to tag the cells which think you are parameters. You can tag your variables’ cell the following way,
Figure 2: Select the option to tag cells
Figure 3: Name the cell as `parameters`
Figure 4: Cell tagged as `parameters`

Now, use the following code to execute the notebook with different arguments

import papermill as pm

names = ["abc.csv","bcd.csv","efg.csv"]

for name in names:
    pm.execute_notebook(
       'papermill-in.ipynb', ## input notebook
       f'out_pm_{name}.ipynb', ## output notebook
       parameters=dict(name=name) ## parameters
    )

Above code executes the notebooks by injecting parameters. You can look at the injected parameters in the output notebooks. For ex, in out_pm_bcd.csv.ipynb:

Figure 5: Injected Parameters by papermill

Leave a comment