Show code
# Install Shiny if needed
install.packages("shiny")As a data scientist, you’ll often need to share your work through web applications, dashboards, or APIs. Understanding web development basics helps you create more effective and accessible data products while giving you more control of your projects. The deployment tools discussed earlier (such as Shiny or Quarto) are largely wrappers for lower-level web technologies like HTML and CSS. These tools handle the heavy lifting for us, but what if we wanted our HTML Quarto report to have a custom theme? This becomes possible with a basic understanding of web development.
Web development skills are increasingly important for data scientists because:
Web development skills become increasingly valuable as you advance in your data science career, particularly when you need to deliver complete end-to-end solutions.
These three technologies form the foundation of web development:
Let’s create a simple web page that displays a data visualisation:
index.html:<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Data Visualization Example</title>
<link rel="stylesheet" href="styles.css">
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
<div class="container">
<h1>Sales Data Analysis</h1>
<div class="chart-container">
<canvas id="salesChart"></canvas>
</div>
<div class="summary">
<h2>Key Findings</h2>
<ul>
<li>Q4 had the highest sales, driven by holiday promotions</li>
<li>Product A consistently outperformed other products</li>
<li>Year-over-year growth was 15.3%</li>
</ul>
</div>
</div>
<script src="script.js"></script>
</body>
</html>styles.css:body {
font-family: Arial, sans-serif;
line-height: 1.6;
color: #333;
margin: 0;
padding: 0;
background-color: #f5f5f5;
}
.container {
max-width: 1000px;
margin: 0 auto;
padding: 20px;
background-color: white;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
}
h1 {
color: #2c3e50;
text-align: center;
margin-bottom: 30px;
}
.chart-container {
margin-bottom: 30px;
height: 400px;
}
.summary {
border-top: 1px solid #ddd;
padding-top: 20px;
}
h2 {
color: #2c3e50;
}
ul {
padding-left: 20px;
}script.js:// Sample data
const salesData = {
labels: ['Q1', 'Q2', 'Q3', 'Q4'],
datasets: [
{
label: 'Product A',
data: [12, 19, 15, 28],
backgroundColor: 'rgba(54, 162, 235, 0.2)',
borderColor: 'rgba(54, 162, 235, 1)',
borderWidth: 1
},
{
label: 'Product B',
data: [10, 15, 12, 25],
backgroundColor: 'rgba(255, 99, 132, 0.2)',
borderColor: 'rgba(255, 99, 132, 1)',
borderWidth: 1
},
{
label: 'Product C',
data: [8, 10, 14, 20],
backgroundColor: 'rgba(75, 192, 192, 0.2)',
borderColor: 'rgba(75, 192, 192, 1)',
borderWidth: 1
}
]
};
// Get the canvas element
const ctx = document.getElementById('salesChart').getContext('2d');
// Create the chart
const salesChart = new Chart(ctx, {
type: 'bar',
data: salesData,
options: {
responsive: true,
maintainAspectRatio: false,
scales: {
y: {
beginAtZero: true,
title: {
display: true,
text: 'Sales (millions)'
}
}
}
}
});index.html in a web browserThis example demonstrates how to create a web page with a chart using Chart.js, a popular JavaScript visualisation library. The HTML provides structure, CSS handles styling, and JavaScript creates the interactive chart. I would stress that, as a data scientist, you do not need to be able to write the above web page from scratch. Rather, become familiar with the structure and language. That way, when you’re presented with raw output, you can find the things that are useful for you and be able to make changes effectively.
Most data scientists don’t build web apps from raw HTML/CSS/JavaScript. They use a framework that wraps the web machinery and lets them stay in Python or R. There are four tools worth knowing, each optimised for a different situation:
If your problem is “give stakeholders a dashboard where they can filter this chart,” pick Shiny, Dash, or Streamlit. If it’s “expose this trained model as a JSON endpoint that another service can call,” Flask or FastAPI is the right level. We cover all four below.
The examples in this section reference placeholder files like sales_data.csv. These aren’t shipped with the book; substitute your own file, a built-in dataset (ggplot2::diamonds, sns.load_dataset("tips"), or similar), or a stable public CSV. The framework mechanics don’t depend on which data you use.
Shiny allows you to build interactive web applications entirely in R, without requiring knowledge of HTML, CSS, or JavaScript.
# Install Shiny if needed
install.packages("shiny")A simple Shiny app consists of two components:
Here’s a basic example:
library(shiny)
library(ggplot2)
library(dplyr)
# Define UI
ui <- fluidPage(
titlePanel("Diamond Explorer"),
sidebarLayout(
sidebarPanel(
sliderInput("carat_range",
"Carat Range:",
min = 0.2,
max = 5.0,
value = c(0.5, 3.0)),
selectInput("cut",
"Cut Quality:",
choices = c("All", unique(as.character(diamonds$cut))),
selected = "All")
),
mainPanel(
plotOutput("scatterplot"),
tableOutput("summary_table")
)
)
)
# Define server logic
server <- function(input, output) {
# Filter data based on inputs
filtered_data <- reactive({
data <- diamonds
# Filter by carat
data <- data %>%
filter(carat >= input$carat_range[1] & carat <= input$carat_range[2])
# Filter by cut if not "All"
if (input$cut != "All") {
data <- data %>% filter(cut == input$cut)
}
data
})
# Create scatter plot
output$scatterplot <- renderPlot({
ggplot(filtered_data(), aes(x = carat, y = price, color = cut)) +
geom_point(alpha = 0.5) +
theme_minimal() +
labs(title = "Diamond Price vs. Carat",
x = "Carat",
y = "Price (USD)")
})
# Create summary table
output$summary_table <- renderTable({
filtered_data() %>%
group_by(cut) %>%
summarize(
Count = n(),
`Avg Price` = round(mean(price), 2),
`Avg Carat` = round(mean(carat), 2)
)
})
}
# Run the application
shinyApp(ui = ui, server = server)What makes Shiny powerful is its reactivity system, which automatically updates outputs when inputs change. You specify relationships between inputs and outputs, and the framework figures out what to recompute when something changes. It’s similar to how a spreadsheet automatically recalculates formulas when a cell’s value changes: you don’t write “when the dropdown changes, rerun the plot”, you say “the plot depends on the filtered data, which depends on the dropdown”, and Shiny does the rest.
Shiny is also now available for Python via Shiny for Python, which offers the same reactive model with Python syntax. A useful option if you like the Shiny mental model but your team uses Python.
Dash is Python’s equivalent to Shiny, created by the makers of Plotly:
# Install Dash
pip install dash dash-bootstrap-componentsA simple Dash app follows a similar structure to Shiny:
import dash
from dash import dcc, html, dash_table
from dash.dependencies import Input, Output
import plotly.express as px
import pandas as pd
# Load data - using built-in dataset for reproducibility
df = px.data.iris()
# Initialize app
app = dash.Dash(__name__)
# Define layout
app.layout = html.Div([
html.H1("Iris Dataset Explorer"),
html.Div([
html.Div([
html.Label("Select Species:"),
dcc.Dropdown(
id='species-dropdown',
options=[{'label': 'All', 'value': 'all'}] +
[{'label': i, 'value': i} for i in df['species'].unique()],
value='all'
),
html.Label("Select Y-axis:"),
dcc.RadioItems(
id='y-axis',
options=[
{'label': 'Sepal Width', 'value': 'sepal_width'},
{'label': 'Petal Length', 'value': 'petal_length'},
{'label': 'Petal Width', 'value': 'petal_width'}
],
value='sepal_width'
)
], style={'width': '25%', 'padding': '20px'}),
html.Div([
dcc.Graph(id='scatter-plot')
], style={'width': '75%'})
], style={'display': 'flex'}),
html.Div([
html.H3("Data Summary"),
dash_table.DataTable(
id='summary-table',
style_cell={'textAlign': 'left'},
style_header={
'backgroundColor': 'lightgrey',
'fontWeight': 'bold'
}
)
])
])
# Define callbacks
@app.callback(
[Output('scatter-plot', 'figure'),
Output('summary-table', 'data'),
Output('summary-table', 'columns')],
[Input('species-dropdown', 'value'),
Input('y-axis', 'value')]
)
def update_graph_and_table(selected_species, y_axis):
# Filter data
if selected_species == 'all':
filtered_df = df
else:
filtered_df = df[df['species'] == selected_species]
# Create figure
fig = px.scatter(
filtered_df,
x='sepal_length',
y=y_axis,
color='species',
title=f'Sepal Length vs {y_axis.replace("_", " ").title()}'
)
# Create summary table
summary_df = filtered_df.groupby('species').agg({
'sepal_length': ['mean', 'std'],
'sepal_width': ['mean', 'std'],
'petal_length': ['mean', 'std'],
'petal_width': ['mean', 'std']
}).reset_index()
# Flatten the multi-index
summary_df.columns = ['_'.join(col).strip('_') for col in summary_df.columns.values]
# Format table
table_data = summary_df.to_dict('records')
columns = [{"name": col.replace('_', ' ').title(), "id": col} for col in summary_df.columns]
return fig, table_data, columns
# Run app
if __name__ == '__main__':
# debug=True must NEVER be used in production.
app.run(debug=False)Dash uses Plotly for visualisations and React.js for the UI, resulting in modern, responsive applications without requiring front-end experience.
Unlike Shiny’s reactive programming model, Dash uses an explicit callback-based approach. You define functions that take specific inputs and produce specific outputs, with the Dash framework wiring them together. This approach may feel more familiar to Python programmers used to callback-based frameworks.
Streamlit simplifies interactive app creation even further with a minimal, straightforward API. Here’s a simple Streamlit app:
import streamlit as st
import pandas as pd
import plotly.express as px
import seaborn as sns
# Set page title
st.set_page_config(page_title="Data Explorer", page_icon="📊")
# Add a title
st.title("Interactive Data Explorer")
# Add sidebar with dataset options
st.sidebar.header("Settings")
dataset_name = st.sidebar.selectbox(
"Select Dataset",
options=["Iris", "Diamonds", "Gapminder"]
)
# Load data based on selection - using built-in datasets for reproducibility
@st.cache_data
def load_data(dataset):
if dataset == "Iris":
return sns.load_dataset("iris")
elif dataset == "Diamonds":
return sns.load_dataset("diamonds").sample(n=1000, random_state=42)
else: # Gapminder
return px.data.gapminder()
df = load_data(dataset_name)
# Display basic dataset information
st.header(f"{dataset_name} Dataset")
tab1, tab2, tab3 = st.tabs(["📋 Data", "📈 Visualization", "📊 Summary"])
with tab1:
st.subheader("Raw Data")
st.dataframe(df.head(100))
with tab2:
st.subheader("Data Visualization")
numeric_cols = df.select_dtypes("number").columns.tolist()
if numeric_cols:
x_var = st.selectbox("X variable", options=numeric_cols)
y_var = st.selectbox("Y variable", options=numeric_cols, index=min(1, len(numeric_cols) - 1))
fig = px.scatter(df, x=x_var, y=y_var, title=f"{x_var} vs {y_var}")
st.plotly_chart(fig, use_container_width=True)
with tab3:
st.subheader("Statistical Summary")
st.dataframe(df.describe())Streamlit’s appeal lies in its simplicity. Instead of defining callbacks between inputs and outputs (as in Dash and Shiny), the entire script runs from top to bottom whenever any input changes. You write a straightforward Python script that builds the UI linearly, and Streamlit reruns it on every interaction. This procedural approach is very intuitive for beginners and allows for rapid prototyping, though it can become less efficient for complex applications where you don’t want to recompute everything on every keystroke.
Shiny, Dash, and Streamlit are great for dashboards, but sometimes you need a plain web endpoint: for example, exposing a trained model as a JSON API that other applications can call, or serving a custom HTML page that mixes pandas-generated content with JavaScript on the front end. Flask is the most common lightweight choice for that role. (FastAPI is a modern alternative with built-in data validation via Pydantic and automatic OpenAPI documentation; worth considering for any new API project in 2026.)
import os
from flask import Flask, render_template
import pandas as pd
import json
app = Flask(__name__)
@app.route('/')
def index():
# Load and process data
df = pd.read_csv('sales_data.csv')
# Convert data to JSON for JavaScript
chart_data = {
'labels': df['quarter'].tolist(),
'datasets': [
{
'label': 'Product A',
'data': df['product_a'].tolist(),
'backgroundColor': 'rgba(54, 162, 235, 0.2)',
'borderColor': 'rgba(54, 162, 235, 1)',
'borderWidth': 1
},
# Other products...
]
}
return render_template('index.html', chart_data=json.dumps(chart_data))
if __name__ == '__main__':
# Only enable debug mode when a FLASK_DEBUG env var is set.
# Never run with debug=True in production: it exposes an
# interactive Python console to anyone who can reach the app.
app.run(debug=os.environ.get("FLASK_DEBUG") == "1")Flask is well-suited for data scientists because it lets you use your Python data-processing code alongside a web server with very little overhead. The deployment chapter shows a full Flask-based ML model API example.
| You want to… | Pick |
|---|---|
| Build a dashboard in R | Shiny |
| Build a dashboard in Python with explicit input→output wiring | Dash |
| Build a dashboard in Python as fast as possible | Streamlit |
| Expose a model or dataset as a JSON API | Flask or FastAPI |
| Render a static report with a few interactive elements | Quarto dashboards (see the Reporting chapter) |
Don’t get paralysed by the choice. Any of the first three will produce a working dashboard in an afternoon; the right answer is usually “whichever one your team already uses.”
Once you’ve built your application, you’ll need to deploy it for others to access. Rather than duplicating the material here, the next chapter (Deploying Data Science Projects) walks through full deployment workflows for Flask, Dash, and Shiny applications on modern platforms like Render, Google Cloud Run, Posit Connect Cloud, and shinyapps.io.
The short version: Flask apps are typically deployed by containerising them with Docker and pushing the image to a platform-as-a-service provider; Shiny apps are most easily deployed via shinyapps.io or Posit Connect Cloud directly from RStudio. The best choice depends on your specific needs, including factors like:
Web development skills complement your data science toolkit by enabling you to share your work more effectively. A light understanding of HTML, CSS, and JavaScript lets you customise reports and debug rendering issues. The frameworks in this chapter (Shiny, Dash, Streamlit, and Flask) abstract away most of the rest so you can focus on the analysis rather than the plumbing.
The ability to put an interactive application in front of a stakeholder is what turns a clever notebook into something the business actually uses. In the next chapter, we’ll explore deployment in more depth, covering various platforms and strategies for making your applications accessible to stakeholders.