[Shorts-2] Papermill=>Adding Parameters to Python Notebooks & executing them like a function
Python Notebooks are great when you are experimenting/ideating. You can quickly test your ideas. And before you realize, you’ll end-up writing the entire code in a notebook. The biggest pain point is to convert the code spanned over 40-50 cells into a python function for looping it over multiple times. This is where PaperMill helps.
Let us understand this with a sample script,
data:image/s3,"s3://crabby-images/08b22/08b22fd4b99e762e4f156e0737a5d06faff75c98" alt=""
Problem
If we want to run the entire script with multiple names = ["def.csv","ghi.csv","abc.csv"]
,
- We will have to push all the code into a function with
name
as the argument. OR Restart & Run
the notebook while you change the variablename
for every file.
Papermill Solution
- Papermill tells you to tag the cells which think you are parameters. You can tag your variables’ cell the following way,
data:image/s3,"s3://crabby-images/4d06a/4d06a6de0b7b5152360dbea9ad91ed87ec60269e" alt=""
data:image/s3,"s3://crabby-images/c25bb/c25bbc6744b542869e7c5c7b16eb0af40b9c479e" alt=""
data:image/s3,"s3://crabby-images/52430/52430a9179bfe56f660b240a7ac059f2c7f2db5f" alt=""
Now, use the following code to execute the notebook with different arguments
import papermill as pm
names = ["abc.csv","bcd.csv","efg.csv"]
for name in names:
pm.execute_notebook(
'papermill-in.ipynb', ## input notebook
f'out_pm_{name}.ipynb', ## output notebook
parameters=dict(name=name) ## parameters
)
Above code executes the notebooks by injecting parameters. You can look at the injected parameters
in the output notebooks. For ex, in out_pm_bcd.csv.ipynb
:
data:image/s3,"s3://crabby-images/7fecd/7fecd1448c7ab5ae5307f6d41785486f1d29ddfb" alt=""
Leave a comment