6  Web Development for Data Scientists

6.1 Web Development Fundamentals for Data Scientists

As a data scientist, you’ll often need to share your work through web applications, dashboards, or APIs. Understanding web development basics helps you create more effective and accessible data products while giving you more control of your projects. The deployment tools discussed earlier (such as Shiny or Quarto) are largely wrappers for lower-level web technologies like HTML and CSS. These tools handle the heavy lifting for us, but what if we wanted our HTML Quarto report to have a custom theme? This becomes possible with a basic understanding of web development.

6.1.1 Why Web Development for Data Scientists?

Web development skills are increasingly important for data scientists because:

  1. Sharing Results: Web interfaces make your analysis accessible to non-technical stakeholders
  2. Interactive Visualisations: Web technologies enable rich, interactive data exploration
  3. Model Deployment: Web APIs allow your models to be integrated into larger systems
  4. Data Collection: Web applications can facilitate data gathering and annotation
  5. Professional Completeness: Being able to deploy your analysis closes the loop in being able to deliver a complete end-to-end solution.

Web development skills become increasingly valuable as you advance in your data science career, particularly when you need to deliver complete end-to-end solutions.

6.1.2 HTML, CSS, and JavaScript Basics

These three technologies form the foundation of web development:

  • HTML: Structures the content of web pages
  • CSS: Controls the appearance and layout
  • JavaScript: Adds interactivity and dynamic behaviour

Let’s create a simple web page that displays a data visualisation:

  1. Create a file named index.html:
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Data Visualization Example</title>
    <link rel="stylesheet" href="styles.css">
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
    <div class="container">
        <h1>Sales Data Analysis</h1>
        <div class="chart-container">
            <canvas id="salesChart"></canvas>
        </div>
        <div class="summary">
            <h2>Key Findings</h2>
            <ul>
                <li>Q4 had the highest sales, driven by holiday promotions</li>
                <li>Product A consistently outperformed other products</li>
                <li>Year-over-year growth was 15.3%</li>
            </ul>
        </div>
    </div>
    <script src="script.js"></script>
</body>
</html>
  1. Create a file named styles.css:
body {
    font-family: Arial, sans-serif;
    line-height: 1.6;
    color: #333;
    margin: 0;
    padding: 0;
    background-color: #f5f5f5;
}

.container {
    max-width: 1000px;
    margin: 0 auto;
    padding: 20px;
    background-color: white;
    box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
}

h1 {
    color: #2c3e50;
    text-align: center;
    margin-bottom: 30px;
}

.chart-container {
    margin-bottom: 30px;
    height: 400px;
}

.summary {
    border-top: 1px solid #ddd;
    padding-top: 20px;
}

h2 {
    color: #2c3e50;
}

ul {
    padding-left: 20px;
}
  1. Create a file named script.js:
// Sample data
const salesData = {
    labels: ['Q1', 'Q2', 'Q3', 'Q4'],
    datasets: [
        {
            label: 'Product A',
            data: [12, 19, 15, 28],
            backgroundColor: 'rgba(54, 162, 235, 0.2)',
            borderColor: 'rgba(54, 162, 235, 1)',
            borderWidth: 1
        },
        {
            label: 'Product B',
            data: [10, 15, 12, 25],
            backgroundColor: 'rgba(255, 99, 132, 0.2)',
            borderColor: 'rgba(255, 99, 132, 1)',
            borderWidth: 1
        },
        {
            label: 'Product C',
            data: [8, 10, 14, 20],
            backgroundColor: 'rgba(75, 192, 192, 0.2)',
            borderColor: 'rgba(75, 192, 192, 1)',
            borderWidth: 1
        }
    ]
};

// Get the canvas element
const ctx = document.getElementById('salesChart').getContext('2d');

// Create the chart
const salesChart = new Chart(ctx, {
    type: 'bar',
    data: salesData,
    options: {
        responsive: true,
        maintainAspectRatio: false,
        scales: {
            y: {
                beginAtZero: true,
                title: {
                    display: true,
                    text: 'Sales (millions)'
                }
            }
        }
    }
});
  1. Open index.html in a web browser

This example demonstrates how to create a web page with a chart using Chart.js, a popular JavaScript visualisation library. The HTML provides structure, CSS handles styling, and JavaScript creates the interactive chart. I would stress that, as a data scientist, you do not need to be able to write the above web page from scratch. Rather, become familiar with the structure and language. That way, when you’re presented with raw output, you can find the things that are useful for you and be able to make changes effectively.

6.1.3 Interactive Dashboards and Web Frameworks

Most data scientists don’t build web apps from raw HTML/CSS/JavaScript. They use a framework that wraps the web machinery and lets them stay in Python or R. There are four tools worth knowing, each optimised for a different situation:

  • Shiny (R, also available for Python): reactive dashboards, strongest when you live in R and need server-side reactivity
  • Dash (Python): Plotly’s callback-based framework, good when you want explicit wiring between inputs and outputs
  • Streamlit (Python): script-based, fastest path from analysis to a demo
  • Flask / FastAPI (Python): lower-level web frameworks, for when you’re building a custom API or a page with specific backend logic rather than a dashboard

If your problem is “give stakeholders a dashboard where they can filter this chart,” pick Shiny, Dash, or Streamlit. If it’s “expose this trained model as a JSON endpoint that another service can call,” Flask or FastAPI is the right level. We cover all four below.

The examples in this section reference placeholder files like sales_data.csv. These aren’t shipped with the book; substitute your own file, a built-in dataset (ggplot2::diamonds, sns.load_dataset("tips"), or similar), or a stable public CSV. The framework mechanics don’t depend on which data you use.

6.1.3.1 Shiny: Interactive Web Applications with R

Shiny allows you to build interactive web applications entirely in R, without requiring knowledge of HTML, CSS, or JavaScript.

Show code
# Install Shiny if needed
install.packages("shiny")

A simple Shiny app consists of two components:

  1. UI (User Interface): Defines what the user sees
  2. Server: Contains the logic that responds to user input

Here’s a basic example:

Show code
library(shiny)
library(ggplot2)
library(dplyr)

# Define UI
ui <- fluidPage(
  titlePanel("Diamond Explorer"),

  sidebarLayout(
    sidebarPanel(
      sliderInput("carat_range",
                  "Carat Range:",
                  min = 0.2,
                  max = 5.0,
                  value = c(0.5, 3.0)),

      selectInput("cut",
                  "Cut Quality:",
                  choices = c("All", unique(as.character(diamonds$cut))),
                  selected = "All")
    ),

    mainPanel(
      plotOutput("scatterplot"),
      tableOutput("summary_table")
    )
  )
)

# Define server logic
server <- function(input, output) {

  # Filter data based on inputs
  filtered_data <- reactive({
    data <- diamonds

    # Filter by carat
    data <- data %>%
      filter(carat >= input$carat_range[1] & carat <= input$carat_range[2])

    # Filter by cut if not "All"
    if (input$cut != "All") {
      data <- data %>% filter(cut == input$cut)
    }

    data
  })

  # Create scatter plot
  output$scatterplot <- renderPlot({
    ggplot(filtered_data(), aes(x = carat, y = price, color = cut)) +
      geom_point(alpha = 0.5) +
      theme_minimal() +
      labs(title = "Diamond Price vs. Carat",
           x = "Carat",
           y = "Price (USD)")
  })

  # Create summary table
  output$summary_table <- renderTable({
    filtered_data() %>%
      group_by(cut) %>%
      summarize(
        Count = n(),
        `Avg Price` = round(mean(price), 2),
        `Avg Carat` = round(mean(carat), 2)
      )
  })
}

# Run the application
shinyApp(ui = ui, server = server)

What makes Shiny powerful is its reactivity system, which automatically updates outputs when inputs change. You specify relationships between inputs and outputs, and the framework figures out what to recompute when something changes. It’s similar to how a spreadsheet automatically recalculates formulas when a cell’s value changes: you don’t write “when the dropdown changes, rerun the plot”, you say “the plot depends on the filtered data, which depends on the dropdown”, and Shiny does the rest.

Shiny is also now available for Python via Shiny for Python, which offers the same reactive model with Python syntax. A useful option if you like the Shiny mental model but your team uses Python.

6.1.3.2 Dash: Interactive Web Applications with Python

Dash is Python’s equivalent to Shiny, created by the makers of Plotly:

Show code
# Install Dash
pip install dash dash-bootstrap-components

A simple Dash app follows a similar structure to Shiny:

Show code
import dash
from dash import dcc, html, dash_table
from dash.dependencies import Input, Output
import plotly.express as px
import pandas as pd

# Load data - using built-in dataset for reproducibility
df = px.data.iris()

# Initialize app
app = dash.Dash(__name__)

# Define layout
app.layout = html.Div([
    html.H1("Iris Dataset Explorer"),

    html.Div([
        html.Div([
            html.Label("Select Species:"),
            dcc.Dropdown(
                id='species-dropdown',
                options=[{'label': 'All', 'value': 'all'}] +
                        [{'label': i, 'value': i} for i in df['species'].unique()],
                value='all'
            ),

            html.Label("Select Y-axis:"),
            dcc.RadioItems(
                id='y-axis',
                options=[
                    {'label': 'Sepal Width', 'value': 'sepal_width'},
                    {'label': 'Petal Length', 'value': 'petal_length'},
                    {'label': 'Petal Width', 'value': 'petal_width'}
                ],
                value='sepal_width'
            )
        ], style={'width': '25%', 'padding': '20px'}),

        html.Div([
            dcc.Graph(id='scatter-plot')
        ], style={'width': '75%'})
    ], style={'display': 'flex'}),

    html.Div([
        html.H3("Data Summary"),
        dash_table.DataTable(
            id='summary-table',
            style_cell={'textAlign': 'left'},
            style_header={
                'backgroundColor': 'lightgrey',
                'fontWeight': 'bold'
            }
        )
    ])
])

# Define callbacks
@app.callback(
    [Output('scatter-plot', 'figure'),
     Output('summary-table', 'data'),
     Output('summary-table', 'columns')],
    [Input('species-dropdown', 'value'),
     Input('y-axis', 'value')]
)
def update_graph_and_table(selected_species, y_axis):
    # Filter data
    if selected_species == 'all':
        filtered_df = df
    else:
        filtered_df = df[df['species'] == selected_species]

    # Create figure
    fig = px.scatter(
        filtered_df,
        x='sepal_length',
        y=y_axis,
        color='species',
        title=f'Sepal Length vs {y_axis.replace("_", " ").title()}'
    )

    # Create summary table
    summary_df = filtered_df.groupby('species').agg({
        'sepal_length': ['mean', 'std'],
        'sepal_width': ['mean', 'std'],
        'petal_length': ['mean', 'std'],
        'petal_width': ['mean', 'std']
    }).reset_index()

    # Flatten the multi-index
    summary_df.columns = ['_'.join(col).strip('_') for col in summary_df.columns.values]

    # Format table
    table_data = summary_df.to_dict('records')
    columns = [{"name": col.replace('_', ' ').title(), "id": col} for col in summary_df.columns]

    return fig, table_data, columns

# Run app
if __name__ == '__main__':
    # debug=True must NEVER be used in production.
    app.run(debug=False)

Dash uses Plotly for visualisations and React.js for the UI, resulting in modern, responsive applications without requiring front-end experience.

Unlike Shiny’s reactive programming model, Dash uses an explicit callback-based approach. You define functions that take specific inputs and produce specific outputs, with the Dash framework wiring them together. This approach may feel more familiar to Python programmers used to callback-based frameworks.

6.1.3.3 Streamlit: Rapid Application Development

Streamlit simplifies interactive app creation even further with a minimal, straightforward API. Here’s a simple Streamlit app:

Show code
import streamlit as st
import pandas as pd
import plotly.express as px
import seaborn as sns

# Set page title
st.set_page_config(page_title="Data Explorer", page_icon="📊")

# Add a title
st.title("Interactive Data Explorer")

# Add sidebar with dataset options
st.sidebar.header("Settings")
dataset_name = st.sidebar.selectbox(
    "Select Dataset",
    options=["Iris", "Diamonds", "Gapminder"]
)

# Load data based on selection - using built-in datasets for reproducibility
@st.cache_data
def load_data(dataset):
    if dataset == "Iris":
        return sns.load_dataset("iris")
    elif dataset == "Diamonds":
        return sns.load_dataset("diamonds").sample(n=1000, random_state=42)
    else:  # Gapminder
        return px.data.gapminder()

df = load_data(dataset_name)

# Display basic dataset information
st.header(f"{dataset_name} Dataset")

tab1, tab2, tab3 = st.tabs(["📋 Data", "📈 Visualization", "📊 Summary"])

with tab1:
    st.subheader("Raw Data")
    st.dataframe(df.head(100))

with tab2:
    st.subheader("Data Visualization")
    numeric_cols = df.select_dtypes("number").columns.tolist()
    if numeric_cols:
        x_var = st.selectbox("X variable", options=numeric_cols)
        y_var = st.selectbox("Y variable", options=numeric_cols, index=min(1, len(numeric_cols) - 1))
        fig = px.scatter(df, x=x_var, y=y_var, title=f"{x_var} vs {y_var}")
        st.plotly_chart(fig, use_container_width=True)

with tab3:
    st.subheader("Statistical Summary")
    st.dataframe(df.describe())

Streamlit’s appeal lies in its simplicity. Instead of defining callbacks between inputs and outputs (as in Dash and Shiny), the entire script runs from top to bottom whenever any input changes. You write a straightforward Python script that builds the UI linearly, and Streamlit reruns it on every interaction. This procedural approach is very intuitive for beginners and allows for rapid prototyping, though it can become less efficient for complex applications where you don’t want to recompute everything on every keystroke.

6.1.3.4 Flask: A Lightweight Framework for Custom Backends

Shiny, Dash, and Streamlit are great for dashboards, but sometimes you need a plain web endpoint: for example, exposing a trained model as a JSON API that other applications can call, or serving a custom HTML page that mixes pandas-generated content with JavaScript on the front end. Flask is the most common lightweight choice for that role. (FastAPI is a modern alternative with built-in data validation via Pydantic and automatic OpenAPI documentation; worth considering for any new API project in 2026.)

import os
from flask import Flask, render_template
import pandas as pd
import json

app = Flask(__name__)

@app.route('/')
def index():
    # Load and process data
    df = pd.read_csv('sales_data.csv')

    # Convert data to JSON for JavaScript
    chart_data = {
        'labels': df['quarter'].tolist(),
        'datasets': [
            {
                'label': 'Product A',
                'data': df['product_a'].tolist(),
                'backgroundColor': 'rgba(54, 162, 235, 0.2)',
                'borderColor': 'rgba(54, 162, 235, 1)',
                'borderWidth': 1
            },
            # Other products...
        ]
    }

    return render_template('index.html', chart_data=json.dumps(chart_data))

if __name__ == '__main__':
    # Only enable debug mode when a FLASK_DEBUG env var is set.
    # Never run with debug=True in production: it exposes an
    # interactive Python console to anyone who can reach the app.
    app.run(debug=os.environ.get("FLASK_DEBUG") == "1")

Flask is well-suited for data scientists because it lets you use your Python data-processing code alongside a web server with very little overhead. The deployment chapter shows a full Flask-based ML model API example.

6.1.3.5 Which one should you pick?

You want to… Pick
Build a dashboard in R Shiny
Build a dashboard in Python with explicit input→output wiring Dash
Build a dashboard in Python as fast as possible Streamlit
Expose a model or dataset as a JSON API Flask or FastAPI
Render a static report with a few interactive elements Quarto dashboards (see the Reporting chapter)

Don’t get paralysed by the choice. Any of the first three will produce a working dashboard in an afternoon; the right answer is usually “whichever one your team already uses.”

6.1.4 Deploying Web Applications

Once you’ve built your application, you’ll need to deploy it for others to access. Rather than duplicating the material here, the next chapter (Deploying Data Science Projects) walks through full deployment workflows for Flask, Dash, and Shiny applications on modern platforms like Render, Google Cloud Run, Posit Connect Cloud, and shinyapps.io.

The short version: Flask apps are typically deployed by containerising them with Docker and pushing the image to a platform-as-a-service provider; Shiny apps are most easily deployed via shinyapps.io or Posit Connect Cloud directly from RStudio. The best choice depends on your specific needs, including factors like:

  • Expected traffic volume
  • Security requirements
  • Budget constraints
  • Integration with other systems
  • Need for custom domains or SSL

6.2 Conclusion

Web development skills complement your data science toolkit by enabling you to share your work more effectively. A light understanding of HTML, CSS, and JavaScript lets you customise reports and debug rendering issues. The frameworks in this chapter (Shiny, Dash, Streamlit, and Flask) abstract away most of the rest so you can focus on the analysis rather than the plumbing.

The ability to put an interactive application in front of a stakeholder is what turns a clever notebook into something the business actually uses. In the next chapter, we’ll explore deployment in more depth, covering various platforms and strategies for making your applications accessible to stakeholders.