pip install nbdime
Comparing and Merging Jupyter Notebooks: A Guide to Using nbdime
Introduction
Jupyter Notebooks are now a ubiquitous tool among data scientists, researchers, and developers alike, providing an incredible interactive computing platform, data visualization tool, and collaborative environment. Yet, version control for .ipynb files can prove especially difficult. Unlike text files, Jupyter Notebooks exist in a JSON-based format, containing not just code and markdown but also metadata, execution output, and cell structure. This complexity also makes it challenging for conventional version control tools, such as git diff, to track and present changes effectively.
When several collaborators edit the same notebook, discrepancies frequently occur in the form of concurrent edits, varying execution outputs, or internal structure modifications. Resolving them manually is frustrating, time-consuming, and error-prone. Step forward, nbdime (Notebook Diff and Merge)—a specialized tool that addresses these issues through intelligent diffing and merging that is specifically intended for Jupyter Notebooks. In this blog, we’ll dive into how nbdime simplifies the process of diffing (identifying changes) and merging (combining changes) in Jupyter Notebooks. We’ll explore:
How nbdime provides clear, structured comparisons of .ipynb files, highlighting differences in code, markdown, outputs, and metadata.
Strategies for effectively merging conflicting changes, even in complex scenarios.
A step-by-step guide to using nbdime, both via the command line and its user-friendly web interface.
By the end of this blog, you’ll have a solid understanding of how to leverage nbdime to streamline collaboration and version control for Jupyter Notebooks, ensuring smoother workflows and more efficient teamwork.
Why Use nbdime?
Traditional diffing and merging tools handle Jupyter notebooks as plain text, which makes it hard to understand differences in code, outputs, and metadata well. This usually leads to incomprehensible diffs, particularly for visual components such as plots and rich outputs. nbdime is specifically built for Jupyter notebooks, offering structured diffs that point out significant changes in code cells, outputs, and metadata. It also supports effortless three-way merging, which helps resolve conflicts in an efficient manner without compromising the notebook structure. The traditional diffing tools often leads to the following output: Fig: Showing the output generated by diffing two Jupyter notebooks using conventional diffing tools.
If we look closely at the image showing the output, we can observe that traditional diffing tools simply compare the text and note down the differing cell as it appears, without regard to the output. nbdime, however, interactively presents the differences, with changes in code, outputs, and metadata being presented in a structured manner. Additionally, it offers strong merging functionality, enabling users to fix conflicts effectively without compromising the integrity of Jupyter notebooks. We shall see later in the ensuing sections how nbdime facilitates the merge of Jupyter files.
Installation
To efficiently merge and diff the Jupyter notebooks we should first install the nbdime packages. nbdime is a Python package, and its installation is straightforward. Below is a step-by-step guide to get you started.
Step 1:- Ensure Python and pip are Installed
nbdime requires Python and pip (Python’s package installer). If you don’t already have them, download and install Python from python.org pip is included by default with Python 3.4 and later.
Step 2:- To use nbdime, install it via pip:
This command installs nbdime and its dependencies. If you prefer to install it in a specific environment (e.g., using conda), activate your environment first.
Step 3:- Verify installation:
After installation, verify that nbdime is installed correctly by running
--version nbdime
This should display the installed version of nbdime, confirming that the installation was successful.
Step 4:- To integrate nbdime with Git:(Optional)
To integrate nbdime with Git for seamless version control, run the following command:
-git --enable nbdime config
This sets up nbdime as the default diff and merge tool for Jupyter Notebooks in your Git configuration.
Step:- 5. Start Using nbdime
Now you can use nbdime to compare and merge notebooks directly from the command line or through its web interface. For example, to compare two notebooks, use:
-m nbdime diff diffing_1.ipynb diffing_2.ipynb python
Or, to launch the web-based diff viewer, run:
-m nbdime diff-web diffing_1.ipynb diffing_2.ipynb python
You can similarly merge files with the command,
-m nbdime merge merging_1.ipynb merging_2.ipynb --out merged_output1.ipynb python
Or, to launch the web-based merge viewer, run:
-m nbdime merge-web base.ipynb merging_1.ipynb merging_2.ipynb python
Key Features and Explaining the Command-line codes
1. Notebook-Specific Diffing
- Intelligent Comparison: Unlike traditional diff tools that treat notebooks as plain JSON, nbdime understands the structure of Jupyter Notebooks. It can intelligently compare:
○ Code cells
○ Markdown cells
○ Outputs (e.g., plots, tables, and text)
○ Metadata (e.g., cell execution order, kernel information)
- Context-Awareness: It preserves the notebook’s structure and readability, making it easier to track changes.
2. Command-Line Interface (CLI)
Nbdiff(nbdime diff): Compare two notebooks and display differences in the terminal.
-m nbdime diff diffing_1.ipynb diffing_2.ipynb python
Let’s try to understand the code,
python -m nbdime
This part of the command tells Python to run the nbdime module as a script.
m stands for “module” and allows you to run a Python module directly from the command line.
nbdime is the module that provides the diffing and merging functionality for Jupyter Notebooks.
diff
This is the subcommand provided by nbdime to compare two notebooks.
It tells nbdime to perform a diff operation (i.e., show the differences between the two files).
diffing_1.ipynb diffing_2.ipynb
These are the two Jupyter Notebook files you want to compare.
diffing_1.ipynb is the first notebook file.
diffing_2.ipynb is the second notebook file.
nbdime will compare these two files and display the differences.
To compare notebooks and see a web based differing we use the prompt:
-m nbdime diff-web diffing_1.ipynb diffing_2.ipynb python
This prompt have the similar breakdown as of the terminal based output the only difference is the use of,
diff-web
This is the subcommand provided by nbdime to launch a web-based diff viewer.
Unlike the diff command, which outputs differences in the terminal, diff-web opens a visual, interactive web interface for comparing notebooks.
Nbmerge (nbdime merge): Merge changes from one notebook into another.
-m nbdime merge merging_1.ipynb merging_2.ipynb --out merged_output1.ipynb python
The python -m nbdime command means the same in all cases what changes here is,
merge
This is the subcommand provided by nbdime to merge two notebooks.
It tells nbdime to perform a merge operation (i.e., combine changes from both notebooks into a single notebook).
merging_1.ipynb merging_2.ipynb
These are the two Jupyter Notebook files you want to merge.
merging_1.ipynb is the first notebook file (often considered the “base” notebook).
merging_2.ipynb is the second notebook file (often containing changes to be merged into the base notebook).
–out merged_output1.ipynb
This is an optional flag that specifies the output file where the merged notebook will be saved.
merged_output1.ipynb is the name of the output file. If this flag is not provided, the merged notebook will be printed to the terminal.
To perform Web based merging,
-m nbdime merge-web base.ipynb merging_1.ipynb merging_2.ipynb python
merge-web
This is the subcommand provided by nbdime to launch a web-based merge viewer.
Unlike the merge command, which performs the merge directly in the terminal, merge-web opens a visual, interactive web interface for merging notebooks.
base.ipynb merging_1.ipynb merging_2.ipynb
- These are the three Jupyter Notebook files involved in the merge operation:
o base.ipynb: The common ancestor or base version of the notebook.
o merging_1.ipynb: The first modified version of the notebook.
o merging_2.ipynb: The second modified version of the notebook.
nbdime will compare the changes in merging_1.ipynb and merging_2.ipynb relative to base.ipynb and merge them.
The web based merging involves three files and implies the use of three way merging, where the base notebook is shown alongside the two modified versions, making it easy to understand changes and resolve conflicts.
The remaining part of the code means the same as that of the terminal based merging
3. Web-Based Interface
Interactive Visualization: Launch a web-based diff viewer for a more intuitive and user-friendly experience.
Side-by-Side Comparison: View differences between notebooks in a clean, side-by-side layout.
Conflict Resolution: Easily resolve merge conflicts in a visual interface.
4. Git Integration
- Seamless Version Control: nbdime can be configured as the default diff and merge tool for Jupyter Notebooks in Git.
-git --enable nbdime config
- Git Diff and Merge: Automatically use nbdime for git diff and git merge operations on .ipynb files.
5. Output Comparison
Output-Aware Diffing: nbdime can compare notebook outputs (e.g., plots, tables, and text) in addition to code and markdown.
Output Filtering: Optionally ignore outputs during diffing to focus on code and markdown changes.
6. Customizable Diffing
Configurable Settings: Customize how nbdime handles diffs, such as ignoring metadata or specific cell types.
Filtering Options: Exclude certain cells or outputs from the diff process.
7. Cross-Platform Support
Works Everywhere: nbdime is compatible with Linux, macOS, and Windows.
Easy Installation: Install via pip or conda:
8. Lightweight and Fast
Efficient Performance: Designed to handle large notebooks efficiently, even with complex outputs.
Minimal Dependencies: Lightweight and easy to integrate into existing workflows.
9. Open Source and Actively Maintained
Community-Driven: nbdime is an open-source project with active development and community support.
Extensible: Developers can extend its functionality to suit specific needs.
10. Conflict Resolution
Merge Conflicts: nbdime provides tools to resolve conflicts during notebook merges, ensuring smooth collaboration.
Interactive Resolution: Use the web interface to resolve conflicts interactively.
Code Examples
Let’s study with some examples
Firstly, lets take example of diffing and create a Jupyter file namely diffing_1.ipynb with the below cells
import matplotlib.pyplot as plt
import pandas as pd
=10
a=11
bprint(a+b)
21
=25
x=13
yprint(x+y)
38
=pd.DataFrame({
data"X" : [1,2,3,4,5],
"Y" : [1,2,3,4,5]
})'X'],data['Y'],label="Equation of line: y=x")
plt.plot(data[
plt.grid()
plt.legend() plt.show()
Let’s take another Jupyter file for diffing namely, diffed_2.ipynnb
import matplotlib.pyplot as plt
import pandas as pd
=10
a=11
bprint(a+b)
21
=25
x=13
yprint(x+y)
38
=pd.DataFrame({
data"X" : [1,2,3,4,5],
"Y" : [5,4,3,2,1]
})'X'],data['Y'],label="Equation of line: y=-x")
plt.plot(data[
plt.grid()
plt.legend() plt.show()
Now we can Diff the file using
-m nbdime diff diffing_1.ipynb diffing_2.ipynb python
Output: Fig: This resembles the output we get after diffing in the terminal
Now trying to get the Web based diffing output
-m nbdime diff-web diffing_1.ipynb diffing_2.ipynb python
Output: Fig: Web based output for diffing of jupyter notebook
Here, we can clearly see the output of diffing the two codes, using nbdime help us to get a more interactive and a user friendly output
Now Let’s try with merging of files
First we will create a file namely merging_1.ipynb
import pandas as pd
import matplotlib.pyplot as plt
=5
a=6
bprint(a+b)
11
=10
x=11
yprint(y-x)
1
=pd.DataFrame({
data"x": [1,9],
"y": [2,3]
})"x"],data["y"])
plt.plot(data[
plt.grid()1,9)
plt.xlim(2,3)
plt.ylim( plt.show()
Now we will create another file namely, merging_2.ipynb
import pandas as pd
import matplotlib.pyplot as plt
=5
a=6
bprint(a+b)
11
=10
x=11
yprint(x+y)
21
=pd.DataFrame({
data"x": [1,9],
"y": [6,2]
})"x"],data["y"])
plt.plot(data[
plt.grid()1,9)
plt.xlim(2,6)
plt.ylim( plt.show()
We can get the output for the terminal merging which will directtly store a new Jupyter folder by giving the prompt in the terminal,
-m nbdime merge merging_1.ipynb merging_2.ipynb --out merged_output1.ipynb python
Output:
Fig: Showing the Output of merging using nbdime in the terminal Window
Here it can be clearly noticed that a file has been created named merged_output1.ipynb and the merged file is stored i it, It can also be seen that the locations where conflict has occured have been clearly showed so that it could be manuall changed
Now let,s try the web based output by running the comand in the terminal:
Since web based merging will have three way merging we can create a base file which would just contain our file merging_1.ipynb which will act as both our base file as well as our modified file,
It can be done by giving a prompt in the terminal window
copy merging_1.ipynb base.ipynb
Now let’s run the web based prompt,
-m nbdime merge-web base.ipynb merging_1.ipynb merging_2.ipynb python
Output:
Fig: Web based output for merging of Jupyter notebooks
The merging of two Jupyter notebook is done by nbdime in such a way that it allows you to compare between the conflicts in a better interface
For more examples read more….
Use Cases
1. Merge Conflict Resolution in Team Workflows
If several team members collaborate on the same Jupyter notebook, manual merging of changes can be tricky because of JSON-based conflicts.
nbdime’s three-way merge intelligently merges edits without risking lost work.
It indicates code changes, markdown edits, and output changes, making it easier to resolve conflicts.
2. Readable and Clean Diffs for Code Review
In collaborative projects, comparing changes in Jupyter notebooks is challenging with traditional diff tools that show raw JSON.
nbdime offers a cell-by-cell structured diff, making it easy for teams to comprehend changes in code, text, and outputs.
This enhances peer reviews and accelerates the approval process.
3. Effective Collaboration with Git Integration
Teams working with Git version control can set nbdime as the default diff and merge tool for notebooks.
This makes it possible for developers, data scientists, and researchers to follow changes easily without having to work with unstructured JSON diffs.
Assists with feature branch management, parallel development, and reproducibility.
4. Experimentation and Model Iterations Tracking
Various members in data science and machine learning teams might make adjustments to models, datasets, or parameters within the same notebook.
nbdime makes comparisons across various versions easy, such that no critical changes are missed.
This is especially helpful for monitoring progress across several rounds of analysis.
5. Avoiding Accidental Overwrites and Lost Work
In collaborative projects, two authors may edit the same notebook at the same time.
Without nbdime, combining changes could result in lost changes because of JSON conflicts.
nbdime’s structured merging preserves all contributions, avoiding unnecessary rework
Conclusion
nbdime is a game-changer for managing Jupyter Notebooks, offering notebook-aware diffing and merging that traditional tools can’t match. Whether you’re collaborating, resolving conflicts, or automating workflows, nbdime ensures seamless version control and efficient workflows. Its command-line tools and web-based interface make it easy to track changes and merge notebooks intelligently. For anyone working with Jupyter Notebooks, nbdime is an essential tool to enhance productivity and collaboration, For a better understanding of the library visit nbdime