5 Web Development for Data Scientists
5.1 Web Development Fundamentals for Data Scientists
As a data scientist, you’ll often need to share your work through web applications, dashboards, or APIs. Understanding web development basics helps you create more effective and accessible data products while giving you more control of your projects. The deployment tools discussed earlier (such as Shiny or Quarto) are largely wrappers for lower-level web technologies like HTML and CSS. These tools handle the heavy lifting for us, but what if we wanted our HTML Quarto report to have a custom theme? This becomes possible with a basic understanding of web development.
5.1.1 Why Web Development for Data Scientists?
Web development skills are increasingly important for data scientists because:
- Sharing Results: Web interfaces make your analysis accessible to non-technical stakeholders
- Interactive Visualizations: Web technologies enable rich, interactive data exploration
- Model Deployment: Web APIs allow your models to be integrated into larger systems
- Data Collection: Web applications can facilitate data gathering and annotation
- Professional Completeness: Being able to deploy your analysis closes the loop in being able to deliver a complete end-to-end solution.
Web development skills become increasingly valuable as you advance in your data science career, particularly when you need to deliver complete end-to-end solutions.
5.1.2 HTML, CSS, and JavaScript Basics
These three technologies form the foundation of web development:
- HTML: Structures the content of web pages
- CSS: Controls the appearance and layout
- JavaScript: Adds interactivity and dynamic behavior
Let’s create a simple web page that displays a data visualization:
- Create a file named
index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Data Visualization Example</title>
<link rel="stylesheet" href="styles.css">
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
<div class="container">
<h1>Sales Data Analysis</h1>
<div class="chart-container">
<canvas id="salesChart"></canvas>
</div>
<div class="summary">
<h2>Key Findings</h2>
<ul>
<li>Q4 had the highest sales, driven by holiday promotions</li>
<li>Product A consistently outperformed other products</li>
<li>Year-over-year growth was 15.3%</li>
</ul>
</div>
</div>
<script src="script.js"></script>
</body>
</html>- Create a file named
styles.css:
body {
font-family: Arial, sans-serif;
line-height: 1.6;
color: #333;
margin: 0;
padding: 0;
background-color: #f5f5f5;
}
.container {
max-width: 1000px;
margin: 0 auto;
padding: 20px;
background-color: white;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
}
h1 {
color: #2c3e50;
text-align: center;
margin-bottom: 30px;
}
.chart-container {
margin-bottom: 30px;
height: 400px;
}
.summary {
border-top: 1px solid #ddd;
padding-top: 20px;
}
h2 {
color: #2c3e50;
}
ul {
padding-left: 20px;
}- Create a file named
script.js:
// Sample data
const salesData = {
labels: ['Q1', 'Q2', 'Q3', 'Q4'],
datasets: [
{
label: 'Product A',
data: [12, 19, 15, 28],
backgroundColor: 'rgba(54, 162, 235, 0.2)',
borderColor: 'rgba(54, 162, 235, 1)',
borderWidth: 1
},
{
label: 'Product B',
data: [10, 15, 12, 25],
backgroundColor: 'rgba(255, 99, 132, 0.2)',
borderColor: 'rgba(255, 99, 132, 1)',
borderWidth: 1
},
{
label: 'Product C',
data: [8, 10, 14, 20],
backgroundColor: 'rgba(75, 192, 192, 0.2)',
borderColor: 'rgba(75, 192, 192, 1)',
borderWidth: 1
}
]
};
// Get the canvas element
const ctx = document.getElementById('salesChart').getContext('2d');
// Create the chart
const salesChart = new Chart(ctx, {
type: 'bar',
data: salesData,
options: {
responsive: true,
maintainAspectRatio: false,
scales: {
y: {
beginAtZero: true,
title: {
display: true,
text: 'Sales (millions)'
}
}
}
}
});- Open
index.htmlin a web browser
This example demonstrates how to create a web page with a chart using Chart.js, a popular JavaScript visualization library. The HTML provides structure, CSS handles styling, and JavaScript creates the interactive chart. I would stress that, as a data scientist, you do not need to be able to write the above web page from scratch. Rather, become familiar with the structure and language. That way, when you’re presented with raw output, you can find the things that are useful for you and be able to make changes effectively.
5.1.3 Web Frameworks for Data Scientists
While you can build websites from scratch, frameworks simplify the process. Here are some popular options for data scientists:
5.1.3.1 Flask (Python)
Flask is a lightweight web framework that’s easy to learn and works well for data science applications:
from flask import Flask, render_template
import pandas as pd
import json
app = Flask(__name__)
@app.route('/')
def index():
# Load and process data
df = pd.read_csv('sales_data.csv')
# Convert data to JSON for JavaScript
chart_data = {
'labels': df['quarter'].tolist(),
'datasets': [
{
'label': 'Product A',
'data': df['product_a'].tolist(),
'backgroundColor': 'rgba(54, 162, 235, 0.2)',
'borderColor': 'rgba(54, 162, 235, 1)',
'borderWidth': 1
},
# Other products...
]
}
return render_template('index.html', chart_data=json.dumps(chart_data))
if __name__ == '__main__':
app.run(debug=True)Flask is particularly well-suited for data scientists because it allows you to use your Python data processing code alongside a web server. It’s lightweight, which means there’s not a lot of overhead to learn, and it integrates easily with data science libraries like pandas, scikit-learn, and more.
5.1.3.2 Shiny (R)
We covered Shiny earlier in the data visualization section. It’s worth noting again as a complete web framework for R users:
library(shiny)
library(ggplot2)
library(dplyr)
# Load data
sales_data <- read.csv("sales_data.csv")
# Define UI
ui <- fluidPage(
titlePanel("Sales Data Analysis"),
sidebarLayout(
sidebarPanel(
selectInput("product", "Select Product:",
choices = c("All", "Product A", "Product B", "Product C"))
),
mainPanel(
plotOutput("salesPlot"),
h3("Key Findings"),
verbatimTextOutput("summary")
)
)
)
# Define server logic
server <- function(input, output) {
# Filter data based on input
filtered_data <- reactive({
if (input$product == "All") {
return(sales_data)
} else {
return(sales_data %>% filter(product == input$product))
}
})
# Create plot
output$salesPlot <- renderPlot({
ggplot(filtered_data(), aes(x = quarter, y = sales, fill = product)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
labs(title = "Quarterly Sales", y = "Sales (millions)")
})
# Generate summary
output$summary <- renderText({
data <- filtered_data()
paste(
"Total Sales:", sum(data$sales), "million\n",
"Average per Quarter:", round(mean(data$sales), 2), "million\n",
"Growth Rate:", paste0(round((data$sales[4] / data$sales[1] - 1) * 100, 1), "%")
)
})
}
# Run the application
shinyApp(ui = ui, server = server)Shiny is notable for how little web development knowledge it requires. You can create interactive web applications using almost entirely R code, without needing to learn HTML, CSS, or JavaScript.
5.1.4 Deploying Web Applications
Once you’ve built your application, you’ll need to deploy it for others to access:
5.1.4.1 Deployment Options for Flask
Heroku: Platform as a Service with a free tier
# Install the Heroku CLI # Create a requirements.txt file pip freeze > requirements.txt # Create a Procfile echo "web: gunicorn app:app" > Procfile # Deploy git init git add . git commit -m "Initial commit" heroku create git push heroku mainPythonAnywhere: Python-specific hosting
- Sign up for an account
- Upload your files
- Set up a web app with Flask
AWS, GCP, or Azure: More complex but scalable
5.1.4.2 Deployment Options for Shiny
shinyapps.io: RStudio’s hosting service
# Install the rsconnect package install.packages("rsconnect") # Configure your account rsconnect::setAccountInfo(name="youraccount", token="TOKEN", secret="SECRET") # Deploy the app rsconnect::deployApp(appDir = "path/to/app")Shiny Server: Self-hosted option (can be installed on cloud VMs)
These deployment options range from simple services designed specifically for data science applications to more general-purpose cloud platforms. The best choice depends on your specific needs, including factors like:
- Expected traffic volume
- Security requirements
- Budget constraints
- Integration with other systems
- Need for custom domains or SSL
5.2 Conclusion
Web development skills complement your data science toolkit by enabling you to share your work more effectively. While you don’t need to become a full-stack developer, understanding the basics of HTML, CSS, and JavaScript helps you customize reports, debug rendering issues, and create more polished data products.
The frameworks we’ve covered—Flask for Python and Shiny for R—abstract away much of the complexity, allowing you to focus on your analysis rather than web infrastructure. As you grow more comfortable with these tools, you’ll find that the ability to deploy interactive applications significantly increases the impact of your data science work.
In the next chapter, we’ll explore deployment in more depth, covering various platforms and strategies for making your applications accessible to stakeholders.