3  Data Science Tools for Reporting

3.1 Documentation and Reporting Tools

As a data scientist, sharing your findings clearly is just as important as the analysis itself. Now that we have our analytics platforms set up, let’s explore tools for creating reports, documentation, and presentations.

3.1.1 Markdown: The Foundation of Documentation

Markdown is a lightweight markup language that’s easy to read and write. It forms the basis of many documentation systems.

Markdown’s simplicity and widespread support have made it the de facto standard for documentation in data science projects.

3.1.1.1 Basic Markdown Syntax

# Heading 1
## Heading 2
### Heading 3

**Bold text**
*Italic text*

[Link text](https://example.com)

![Alt text for an image](image.jpg)

- Bullet point 1
- Bullet point 2

1. Numbered item 1
2. Numbered item 2

Table:
| Column 1 | Column 2 |
|----------|----------|
| Cell 1   | Cell 2   |

> This is a blockquote

`Inline code`

```{python}
# Code block
print("Hello, world!")
```

Markdown is designed to be readable even in its raw form. The syntax is intuitive—for example, surrounding text with asterisks makes it italic, and using hash symbols creates headings of different levels.

Many platforms interpret Markdown, including GitHub, Jupyter notebooks, and the documentation tools we’ll discuss next.

3.1.2 R Markdown

R Markdown combines R code, output, and narrative text in a single document that can be rendered to HTML, PDF, Word, and other formats.

The concept of “literate programming” behind R Markdown was first proposed by computer scientist Donald Knuth in 1984, and it has become a cornerstone of reproducible research in data science.

3.1.2.1 Installing and Using R Markdown

If you’ve installed R and RStudio as described earlier, R Markdown is just a package installation away:

Show code
install.packages("rmarkdown")

To create your first R Markdown document:

  1. In RStudio, go to File → New File → R Markdown
  2. Fill in the title and author information
  3. Choose an output format (HTML, PDF, or Word)
  4. Click “OK”

RStudio creates a template document with examples of text, code chunks, and plots. This template is extremely helpful because it shows you the basic structure of an R Markdown document right away—you don’t have to start from scratch.

A typical R Markdown document consists of three components:

  1. YAML Header: Contains metadata like title, author, and output format
  2. Text: Written in Markdown for narratives, explanations, and interpretations
  3. Code Chunks: R code that can be executed to perform analysis and create outputs

For example:

---
title: "My First Data Analysis"
author: "Your Name"
date: "2025-04-30"
output: html_document
---

# Introduction

This analysis explores the relationship between variables X (carat) and Y (price).

## Data Import and Cleaning

```{r setup, eval=FALSE}
# load the diamonds dataset from ggplot2
data(diamonds, package = "ggplot2")

# Create a smaller sample of the diamonds dataset
set.seed(123)  # For reproducibility
my_data <- diamonds %>% 
  dplyr::sample_n(1000) %>%
  dplyr::select(
    X = carat,
    Y = price,
    cut = cut,
    color = color,
    clarity = clarity
  )

# Display the first few rows
head(my_data)
```

## Data Visualization

```{r visualization, eval=FALSE}
ggplot2::ggplot(my_data, ggplot2::aes(x = X, y = Y)) +
  ggplot2::geom_point() +
  ggplot2::geom_smooth(method = "lm") +
  ggplot2::labs(title = "Relationship between X and Y")
```
Note

Note that we’ve used the namespace convention (package::function()) in the code above rather than loading each package with library(). This is a matter of preference, but benefits include:

  • Avoids loading the full package with library()
  • Prevents naming conflicts (e.g. dplyr::filter() vs. stats::filter())
  • Keeps dependencies explicit and localised right next to each call

When you click the “Knit” button in RStudio, the R code in the chunks is executed, and the results (including plots and tables) are embedded in the output document. The reason this is so powerful is that it combines your code, results, and narrative explanation in a single, reproducible document. If your data changes, you simply re-knit the document to update all results automatically.

R Markdown has become a standard in reproducible research because it creates a direct connection between your data, analysis, and conclusions. This connection makes your work more transparent and reliable, as anyone can follow your exact steps and see how you reached your conclusions.

3.1.3 Jupyter Notebooks for Documentation

We’ve already covered Jupyter notebooks for Python development, but they’re also excellent documentation tools. Like R Markdown, they combine code, output, and narrative text.

3.1.3.1 Exporting Jupyter Notebooks

Jupyter notebooks can be exported to various formats:

  1. In a notebook, go to File → Download as
  2. Choose from options like HTML, PDF, Markdown, etc.

Alternatively, you can use nbconvert from the command line:

jupyter nbconvert --to html my_notebook.ipynb

The ability to export notebooks is particularly valuable because it allows you to write your analysis once and then distribute it in whatever format your audience needs. For example, you might use the PDF format for a formal report to stakeholders, HTML for sharing on a website, or Markdown for including in a GitHub repository.

3.1.3.2 Jupyter Book

For larger documentation projects, Jupyter Book builds on the notebook format to create complete books:

# Install Jupyter Book
pip install jupyter-book

# Create a new book project
jupyter-book create my-book

# Build the book
jupyter-book build my-book/

Jupyter Book organises multiple notebooks and markdown files into a cohesive book with navigation, search, and cross-references. This is especially useful for comprehensive documentation, tutorials, or course materials. The resulting books have a professional appearance with a table of contents, navigation panel, and consistent styling throughout.

3.1.4 Quarto: The Next Generation of Literate Programming

Quarto is a newer system that works with both Python and R, unifying the best aspects of R Markdown and Jupyter notebooks.

# Install Quarto CLI from https://quarto.org/docs/get-started/

# Create a new Quarto project (website, book, manuscript, etc.)
quarto create project default my-project

# Or, for a standalone one-off document, simply create a file
# named document.qmd in your editor (no command needed).

# Render a document to HTML (or PDF, docx, etc.)
quarto render document.qmd

Quarto represents an evolution in documentation tools because it provides a unified system for creating computational documents with multiple programming languages. This is particularly valuable if you work with both Python and R, as you can maintain a consistent documentation approach across all your projects.

The key advantage of Quarto is its language-agnostic design—you can mix Python, R, Julia, and other languages in a single document, which reflects the reality of many data science workflows where different tools are used for different tasks.

3.1.4.1 Quarto Dashboards

Since Quarto 1.4 (released in 2024), Quarto can render a document directly into a dashboard layout with rows, columns, value boxes, and tab sets, with no Shiny, Dash, or Streamlit required for the static case. If all you need is a periodic refresh of a dashboard view over your data, this is by far the lightest way to get there: you write a normal .qmd file with a format: dashboard YAML key, and Quarto handles the layout.

---
title: "Sales Overview"
format: dashboard
---
#| title: "Revenue by Quarter"
ggplot(sales, aes(quarter, revenue)) + geom_col()

For interactive behaviour (filters, user inputs), Quarto dashboards can embed Shiny, Observable JS, or even Python/R running in the browser via webR/Pyodide. For readers whose dashboards are read-only and updated nightly, this alone replaces a lot of what used to require a Shiny or Dash server.

3.1.4.2 Parameterised Reports

One of the highest-value Quarto (and R Markdown) features for a business analytics audience is parameterised reports: a single template that you can render with different inputs to produce many tailored outputs. For example, a monthly sales report that takes a region and month parameter and can be rendered once for every region without copy-pasting the document.

---
title: "Sales Report"
format: html
params:
  region: "South Africa"
  month: "2026-03"
---

Inside the document you refer to params$region (R) or params["region"] (Python), and render with:

quarto render sales.qmd -P region:"EU" -P month:"2026-03"

This single pattern replaces a surprising amount of the ad-hoc “one notebook per client” sprawl that plagues data science teams.

3.1.5 LaTeX for Professional Document Creation

When creating data science reports that require a professional appearance, particularly for academic or formal business contexts, LaTeX provides powerful typesetting capabilities. While Markdown is excellent for simple documents, LaTeX excels at complex formatting, mathematical equations, and producing publication-quality PDFs.

3.1.5.1 Why LaTeX for Data Scientists?

LaTeX offers several advantages for data science documentation:

  1. Professional typesetting: Produces publication-quality documents with consistent formatting
  2. Exceptional math support: Renders complex equations with beautiful typography
  3. Advanced layout control: Provides precise control over document structure and appearance
  4. Bibliography management: Integrates with citation systems like BibTeX
  5. Reproducibility: Separates content from presentation in a plain text format that works with version control

LaTeX documents, particularly those with programmatically generated figures, tend to be more reproducible than those created with proprietary document formats.

3.1.5.2 Getting Started with LaTeX

LaTeX works differently from word processors—you write plain text with special commands, then compile it to produce a PDF. For data science, you don’t need to install a full LaTeX distribution, as Quarto and R Markdown can handle the compilation process.

3.1.5.3 Installing LaTeX for Quarto and R Markdown

The easiest way to install LaTeX for use with Quarto or R Markdown is to use TinyTeX, a lightweight LaTeX distribution:

In R:

install.packages("tinytex")
tinytex::install_tinytex()

In the command line with Quarto:

quarto install tinytex

TinyTeX is designed specifically for R Markdown and Quarto users. It installs only the essential LaTeX packages (around 150MB) compared to full distributions (several GB), and it automatically installs additional packages as needed when you render documents.

3.1.5.4 LaTeX Basics for Data Scientists

Let’s explore the essential LaTeX elements you’ll need for data science documentation:

3.1.5.5 Document Structure

A basic LaTeX document structure looks like this:

\documentclass{article}
\usepackage{graphicx}  % For images
\usepackage{amsmath}   % For advanced math
\usepackage{booktabs}  % For professional tables

\title{Analysis of Customer Purchasing Patterns}
\author{Your Name}
\date{\today}

\begin{document}

\maketitle
\tableofcontents

\section{Introduction}
This report analyses...

\section{Methodology}
\subsection{Data Collection}
We collected data from...

\section{Results}
The results show...

\section{Conclusion}
In conclusion...

\end{document}

When using Quarto or R Markdown, you won’t write this structure directly. Instead, it’s generated based on your YAML header and document content.

3.1.5.6 Mathematical Equations

LaTeX shines when it comes to mathematical notation. Here are examples of common equation formats:

Inline equations use single dollar signs:

The model accuracy is $\alpha = 0.95$, which exceeds our threshold.

Display equations use double dollar signs:

$$
\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i
$$

Equation arrays for multi-line equations:

\begin{align}
Y &= \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon \\
&= \beta_0 + \sum_{i=1}^{2} \beta_i X_i + \epsilon
\end{align}

Some common math symbols in data science:

Description LaTeX Code Result
Summation \sum_{i=1}^{n} \(\sum_{i=1}^{n}\)
Product \prod_{i=1}^{n} \(\prod_{i=1}^{n}\)
Fraction \frac{a}{b} \(\frac{a}{b}\)
Square root \sqrt{x} \(\sqrt{x}\)
Bar (mean) \bar{X} \(\bar{X}\)
Hat (estimate) \hat{\beta} \(\hat{\beta}\)
Greek letters \alpha, \beta, \gamma \(\alpha, \beta, \gamma\)
Infinity \infty \(\infty\)
Approximately equal \approx \(\approx\)
Distribution X \sim N(\mu, \sigma^2) \(X \sim N(\mu, \sigma^2)\)

3.1.5.7 Tables

LaTeX can create publication-quality tables. The booktabs package is recommended for professional-looking tables with proper spacing:

\begin{table}[htbp]
\centering
\caption{Model Performance Comparison}
\begin{tabular}{lrrr}
\toprule
Model & Accuracy & Precision & Recall \\
\midrule
Random Forest & 0.92 & 0.89 & 0.94 \\
XGBoost & 0.95 & 0.92 & 0.91 \\
Neural Network & 0.90 & 0.87 & 0.92 \\
\bottomrule
\end{tabular}
\end{table}

3.1.5.8 Figures

To include figures with proper captioning and referencing:

\begin{figure}[htbp]
\centering
\includegraphics[width=0.8\textwidth]{histogram.png}
\caption{Distribution of customer spending by category}
\label{fig:spending-dist}
\end{figure}

As shown in Figure \ref{fig:spending-dist}, the distribution is right-skewed.

3.1.5.9 Using LaTeX with Quarto

Quarto makes it easy to incorporate LaTeX features while keeping your document source readable. Here’s how to configure Quarto for PDF output using LaTeX:

3.1.5.9.1 YAML Configuration

In your Quarto YAML header, specify PDF output with LaTeX options:

---
title: "Analysis Report"
author: "Your Name"
format:
  pdf:
    documentclass: article
    geometry:
      - margin=1in
    fontfamily: libertinus
    colorlinks: true
    number-sections: true
    fig-width: 7
    fig-height: 5
    cite-method: biblatex
    biblio-style: apa
---
3.1.5.9.2 Customising PDF Output

You can further customise the LaTeX template by:

  1. Including raw LaTeX: Use the raw attribute to include LaTeX commands

    ```{=latex}
    \begin{center}
    \large\textbf{Confidential Report}
    \end{center}
    ```
  2. Adding LaTeX packages: Include additional packages in the YAML

    format:
      pdf:
        include-in-header: 
          text: |
            \usepackage{siunitx}
            \usepackage{algorithm2e}
  3. Using a custom template: Create your own template for full control

    format:
      pdf:
        template: custom-template.tex
3.1.5.9.3 Equations in Quarto

Quarto supports LaTeX math syntax directly:

The linear regression model can be represented as:

$$
y_i = \beta_0 + \beta_1 x_i + \epsilon_i
$$

where $\epsilon_i \sim N(0, \sigma^2)$.
3.1.5.9.4 Citations and Bibliography

For managing citations, create a BibTeX file (e.g., references.bib):

@article{knuth84,
  author = {Knuth, Donald E.},
  title = {Literate Programming},
  year = {1984},
  journal = {The Computer Journal},
  volume = {27},
  number = {2},
  pages = {97--111}
}

Then cite in your Quarto document:

Literate programming [@knuth84] combines documentation and code.

And configure in YAML:

bibliography: references.bib
csl: ieee.csl  # Citation style

3.1.6 Advanced LaTeX Features for Data Science

3.1.6.1 Algorithm Description

The algorithm2e package helps document computational methods:

\begin{algorithm}[H]
\SetAlgoLined
\KwData{Training data $X$, target values $y$}
\KwResult{Trained model $M$}
Split data into training and validation sets\;
Initialize model $M$ with random weights\;
\For{each epoch}{
    \For{each batch}{
        Compute predictions $\hat{y}$\;
        Calculate loss $L(y, \hat{y})$\;
        Update model weights using gradient descent\;
    }
    Evaluate on validation set\;
    \If{early stopping condition met}{
        break\;
    }
}
\caption{Training Neural Network with Early Stopping}
\end{algorithm}

3.1.6.2 Professional Tables with Statistical Significance

For reporting analysis results with significance levels:

\begin{table}[htbp]
\centering
\caption{Regression Results}
\begin{tabular}{lrrrr}
\toprule
Variable & Coefficient & Std. Error & t-statistic & p-value \\
\midrule
Intercept & 23.45 & 2.14 & 10.96 & $<0.001^{***}$ \\
Age & -0.32 & 0.05 & -6.4 & $<0.001^{***}$ \\
Income & 0.015 & 0.004 & 3.75 & $0.002^{**}$ \\
Education & 1.86 & 0.72 & 2.58 & $0.018^{*}$ \\
\bottomrule
\multicolumn{5}{l}{\scriptsize{$^{*}p<0.05$; $^{**}p<0.01$; $^{***}p<0.001$}} \\
\end{tabular}
\end{table}

3.1.6.3 Multi-part Figures

For comparing visualisations side by side:

\begin{figure}[htbp]
\centering
\begin{subfigure}{0.48\textwidth}
    \includegraphics[width=\textwidth]{model1_results.png}
    \caption{Linear Model Performance}
    \label{fig:model1}
\end{subfigure}
\hfill
\begin{subfigure}{0.48\textwidth}
    \includegraphics[width=\textwidth]{model2_results.png}
    \caption{Neural Network Performance}
    \label{fig:model2}
\end{subfigure}
\caption{Performance comparison of predictive models}
\label{fig:models-comparison}
\end{figure}

3.1.7 LaTeX in R Markdown

If you’re using R Markdown instead of Quarto, the approach is similar:

---
title: "Statistical Analysis Report"
author: "Your Name"
output:
  pdf_document:
    toc: true
    number_sections: true
    fig_caption: true
    keep_tex: true  # Useful for debugging
    includes:
      in_header: preamble.tex
---

The preamble.tex file can contain additional LaTeX packages and configurations:

% preamble.tex
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{array}
\usepackage{multirow}
\usepackage{wrapfig}
\usepackage{float}
\usepackage{colortbl}
\usepackage{pdflscape}
\usepackage{tabu}
\usepackage{threeparttable}
\usepackage{threeparttablex}
\usepackage[normalem]{ulem}
\usepackage{makecell}
\usepackage{xcolor}

3.1.8 Troubleshooting LaTeX Issues

LaTeX can sometimes produce cryptic error messages. Here are solutions to common issues:

3.1.8.1 Missing Packages

If you get an error about a missing package when rendering:

! LaTeX Error: File 'tikz.sty' not found.

With TinyTeX, you can install the missing package:

tinytex::tlmgr_install("tikz")

Or let TinyTeX handle it automatically:

options(tinytex.verbose = TRUE)

3.1.8.2 Figure Placement

If figures aren’t appearing where expected:

\begin{figure}[!htbp]  % The ! makes LaTeX try harder to respect placement

3.1.8.3 Large Tables Spanning Multiple Pages

For large tables that need to span pages:

\begin{longtable}{lrrr}
\caption{Comprehensive Model Results}\\
\toprule
Model & Accuracy & Precision & Recall \\
\midrule
\endhead
% Table contents...
\bottomrule
\end{longtable}

3.1.8.4 PDF Compilation Hangs

If compilation seems to hang, it might be waiting for user input due to an error. Try:

# In R
tinytex::pdflatex('document.tex', pdflatex_args = c('-interaction=nonstopmode'))

3.1.9 Conclusion

LaTeX has been the de facto gold standard for scientific documentation for decades, and for good reason. Most PDF rendering systems still use LaTeX under the hood, making it the backbone of academic publishing, technical reports, and mathematical documentation. When you generate a PDF from Quarto or R Markdown, you’re ultimately leveraging LaTeX’s sophisticated typesetting engine.

While LaTeX provides unmatched power and precision for creating professional data science documents, especially when mathematical notation is involved, there is undeniably a learning curve. The integration with Quarto and R Markdown has made LaTeX more accessible by handling much of the complexity behind the scenes, allowing you to focus on content rather than typesetting commands.

3.1.9.1 The Rise of Modern Alternatives: Typst

However, the document preparation landscape is evolving. Newer tools like Typst are emerging as modern alternatives that aim to simplify the traditional LaTeX workflow while maintaining high-quality output. Typst offers several advantages:

Simpler Syntax: Where LaTeX might require complex commands, Typst uses more intuitive markup:

// Typst syntax
= Introduction
== Subsection

$x = (a + b) / c$  // Math notation

#figure(
  image("plot.png"),
  caption: "Sample Plot"
)

Compare this to equivalent LaTeX:

% LaTeX syntax
\section{Introduction}
\subsection{Subsection}

$x = \frac{a + b}{c}$

\begin{figure}
  \includegraphics{plot.png}
  \caption{Sample Plot}
\end{figure}

Faster Compilation: Typst compiles documents significantly faster than LaTeX, making it more suitable for iterative document development.

Better Error Messages: When something goes wrong, Typst provides clearer, more actionable error messages compared to LaTeX’s often cryptic feedback.

Modern Design: Built from the ground up with modern document needs in mind, including better handling of digital-first workflows.

3.1.9.2 Choosing Your Path Forward

For data scientists starting their journey, here’s how to think about these tools:

Choose LaTeX when:

  • Working in academic environments where LaTeX is expected
  • Creating documents with complex mathematical notation
  • Collaborating with teams already using LaTeX workflows
  • You need the ecosystem of specialised packages LaTeX offers

Consider Typst when:

  • You want faster iteration cycles during document development
  • You prefer more modern, readable syntax
  • You’re starting fresh and don’t have legacy LaTeX requirements
  • You want to avoid LaTeX’s steep learning curve

The Quarto Advantage: One of Quarto’s strengths is that it abstracts away many of these decisions. You can often switch between PDF engines (including future Typst support) without changing your content, giving you flexibility as the ecosystem evolves.

3.1.9.3 Looking Ahead

As you progress in your data science career, investing time in understanding document preparation will pay dividends when creating reports, papers, or presentations that require precise typesetting and mathematical expressions. Whether you choose the established power of LaTeX or explore newer alternatives like Typst, start with the basics and gradually incorporate more advanced features as your needs grow.

The key is to pick the tool that best fits your current workflow and requirements, knowing that the fundamental principles of good document structure and clear communication remain constant regardless of the underlying technology.

3.1.10 Creating Technical Documentation

For more complex projects, specialised documentation tools may be needed:

3.1.10.1 MkDocs: Simple Documentation with Markdown

MkDocs creates a documentation website from Markdown files:

# Install MkDocs
pip install mkdocs

# Create a new project
mkdocs new my-documentation

# Serve the documentation locally
cd my-documentation
mkdocs serve

MkDocs is focused on simplicity and readability. It generates a clean, responsive website from your Markdown files, with navigation, search, and themes. This makes it an excellent choice for project documentation that needs to be accessible to users or team members.

3.1.10.2 Sphinx: Comprehensive Documentation

Sphinx is a more powerful documentation tool widely used in the Python ecosystem:

# Install Sphinx
pip install sphinx

# Create a new documentation project
sphinx-quickstart docs

# Build the documentation
cd docs
make html

Sphinx offers advanced features like automatic API documentation generation, cross-referencing, and multiple output formats. It’s the system behind the official documentation for Python itself and many major libraries like NumPy, pandas, and scikit-learn.

The reason Sphinx has become the standard for Python documentation is its powerful extension system and its ability to generate API documentation automatically from docstrings in your code. This means you can document your functions and classes directly in your code, and Sphinx will extract and format that information into comprehensive documentation.

3.2 Reproducible Reports - Working with Data

When using external data files in Quarto projects, it’s important to understand how to handle file paths properly to ensure reproducibility across different environments.

3.2.1 Common Issues with File Paths

The error 'my_data.csv' does not exist in current working directory is a common issue when transitioning between different editing environments like VS Code and RStudio. This happens because:

  1. Different IDEs may have different default working directories
  2. Quarto’s rendering process often sets the working directory to the chapter’s location
  3. Absolute file paths won’t work when others try to run your code

3.2.2 Project-Relative Paths with the here Package

The here package provides an elegant solution by creating paths relative to your project root:

Show code
library(tidyverse)
library(here)

# Load data using project-relative path
data <- read_csv(here("data", "my_data.csv"))
head(data)

The here() function automatically detects your project root (usually where your .Rproj file is located) and constructs paths relative to that location. This ensures consistent file access regardless of:

  • Which IDE you’re using
  • Where the current chapter file is located
  • The current working directory during rendering

To implement this approach:

  1. Create a data folder in your project root
  2. Store all your datasets in this folder
  3. Use here("data", "filename.csv") to reference them

3.2.3 Alternative: Built-in Datasets

For maximum reproducibility, consider using built-in datasets that come with R packages:

Show code
# Load a dataset from a package
data(diamonds, package = "ggplot2")

# Display the first few rows
head(diamonds)

Using built-in datasets eliminates file path issues entirely, as these datasets are available to anyone who has the package installed. This is ideal for examples and tutorials where the specific data isn’t crucial.

3.2.4 Creating Sample Data Programmatically

Another reproducible approach is to generate sample data within your code:

Show code
# Create synthetic data
set.seed(0491)  # For reproducibility
synthetic_data <- tibble(
  id = 1:20,
  value_x = rnorm(20),
  value_y = value_x * 2 + rnorm(20, sd = 0.5),
  category = sample(LETTERS[1:4], 20, replace = TRUE)
)

# Display the data
head(synthetic_data)

This approach works well for illustrative examples and ensures anyone can run your code without any external files.

3.2.5 Remote Data with Caching

For real-world datasets that are too large to include in packages, you can fetch them from reliable URLs:

Show code
# URL to a stable dataset (ggplot2's default branch is 'main', not 'master')
url <- "https://raw.githubusercontent.com/tidyverse/ggplot2/main/data-raw/diamonds.csv"

# Download and read the data
remote_data <- readr::read_csv(url)

# Display the data
head(remote_data)

The cache: true option tells Quarto to save the results and only re-execute this chunk when the code changes, which prevents unnecessary downloads.

3.2.6 Best Practices for Documentation

Effective documentation follows certain principles:

  1. Start early: Document as you go rather than treating it as an afterthought
  2. Be consistent: Use the same style and terminology throughout
  3. Include examples: Show how to use your code or analysis
  4. Consider your audience: Technical details for peers, higher-level explanations for stakeholders
  5. Update regularly: Keep documentation in sync with your code

Projects with comprehensive documentation tend to have fewer defects and require less maintenance effort. Well-documented data science projects are also more likely to be reproducible and reusable by others.

The practice of documenting your work isn’t just about helping others understand what you’ve done—it also helps you think more clearly about your own process. By explaining your choices and methods in writing, you often gain new insights and identify potential improvements in your approach.

3.3 Conclusion

This chapter has walked through the document side of data science output: Markdown and Quarto for literate programming, LaTeX (and newer alternatives like Typst) for publication-quality typesetting, parameterised reports for templating, and reproducible data-loading patterns so your reports can be rendered from any machine.

What we haven’t covered yet are the charts that live inside these documents and the interactive applications that extend beyond them:

  • The Data Visualisation chapter covers the static and lightly-interactive plotting libraries you’ll embed in reports (matplotlib, seaborn, plotly, ggplot2), plus Mermaid for code-based diagrams.
  • The Web Development for Data Scientists chapter covers the tools for when “rerun the render” isn’t enough and stakeholders need to poke at the data themselves: Shiny, Dash, Streamlit, and Flask.

Together these three chapters form a progression: from static documents that communicate findings, to visualisations that make those findings immediate, to interactive applications that invite exploration.