5  Web Development for Data Scientists

5.1 Web Development Fundamentals for Data Scientists

As a data scientist, you’ll often need to share your work through web applications, dashboards, or APIs. Understanding web development basics helps you create more effective and accessible data products while giving you more control of your projects. The deployment tools discussed earlier (such as Shiny or Quarto) are largely wrappers for lower-level web technologies like HTML and CSS. These tools handle the heavy lifting for us, but what if we wanted our HTML Quarto report to have a custom theme? This becomes possible with a basic understanding of web development.

5.1.1 Why Web Development for Data Scientists?

Web development skills are increasingly important for data scientists because:

  1. Sharing Results: Web interfaces make your analysis accessible to non-technical stakeholders
  2. Interactive Visualizations: Web technologies enable rich, interactive data exploration
  3. Model Deployment: Web APIs allow your models to be integrated into larger systems
  4. Data Collection: Web applications can facilitate data gathering and annotation
  5. Professional Completeness: Being able to deploy your analysis closes the loop in being able to deliver a complete end-to-end solution.

Web development skills become increasingly valuable as you advance in your data science career, particularly when you need to deliver complete end-to-end solutions.

5.1.2 HTML, CSS, and JavaScript Basics

These three technologies form the foundation of web development:

  • HTML: Structures the content of web pages
  • CSS: Controls the appearance and layout
  • JavaScript: Adds interactivity and dynamic behavior

Let’s create a simple web page that displays a data visualization:

  1. Create a file named index.html:
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Data Visualization Example</title>
    <link rel="stylesheet" href="styles.css">
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
    <div class="container">
        <h1>Sales Data Analysis</h1>
        <div class="chart-container">
            <canvas id="salesChart"></canvas>
        </div>
        <div class="summary">
            <h2>Key Findings</h2>
            <ul>
                <li>Q4 had the highest sales, driven by holiday promotions</li>
                <li>Product A consistently outperformed other products</li>
                <li>Year-over-year growth was 15.3%</li>
            </ul>
        </div>
    </div>
    <script src="script.js"></script>
</body>
</html>
  1. Create a file named styles.css:
body {
    font-family: Arial, sans-serif;
    line-height: 1.6;
    color: #333;
    margin: 0;
    padding: 0;
    background-color: #f5f5f5;
}

.container {
    max-width: 1000px;
    margin: 0 auto;
    padding: 20px;
    background-color: white;
    box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
}

h1 {
    color: #2c3e50;
    text-align: center;
    margin-bottom: 30px;
}

.chart-container {
    margin-bottom: 30px;
    height: 400px;
}

.summary {
    border-top: 1px solid #ddd;
    padding-top: 20px;
}

h2 {
    color: #2c3e50;
}

ul {
    padding-left: 20px;
}
  1. Create a file named script.js:
// Sample data
const salesData = {
    labels: ['Q1', 'Q2', 'Q3', 'Q4'],
    datasets: [
        {
            label: 'Product A',
            data: [12, 19, 15, 28],
            backgroundColor: 'rgba(54, 162, 235, 0.2)',
            borderColor: 'rgba(54, 162, 235, 1)',
            borderWidth: 1
        },
        {
            label: 'Product B',
            data: [10, 15, 12, 25],
            backgroundColor: 'rgba(255, 99, 132, 0.2)',
            borderColor: 'rgba(255, 99, 132, 1)',
            borderWidth: 1
        },
        {
            label: 'Product C',
            data: [8, 10, 14, 20],
            backgroundColor: 'rgba(75, 192, 192, 0.2)',
            borderColor: 'rgba(75, 192, 192, 1)',
            borderWidth: 1
        }
    ]
};

// Get the canvas element
const ctx = document.getElementById('salesChart').getContext('2d');

// Create the chart
const salesChart = new Chart(ctx, {
    type: 'bar',
    data: salesData,
    options: {
        responsive: true,
        maintainAspectRatio: false,
        scales: {
            y: {
                beginAtZero: true,
                title: {
                    display: true,
                    text: 'Sales (millions)'
                }
            }
        }
    }
});
  1. Open index.html in a web browser

This example demonstrates how to create a web page with a chart using Chart.js, a popular JavaScript visualization library. The HTML provides structure, CSS handles styling, and JavaScript creates the interactive chart. I would stress that, as a data scientist, you do not need to be able to write the above web page from scratch. Rather, become familiar with the structure and language. That way, when you’re presented with raw output, you can find the things that are useful for you and be able to make changes effectively.

5.1.3 Web Frameworks for Data Scientists

While you can build websites from scratch, frameworks simplify the process. Here are some popular options for data scientists:

5.1.3.1 Flask (Python)

Flask is a lightweight web framework that’s easy to learn and works well for data science applications:

from flask import Flask, render_template
import pandas as pd
import json

app = Flask(__name__)

@app.route('/')
def index():
    # Load and process data
    df = pd.read_csv('sales_data.csv')
    
    # Convert data to JSON for JavaScript
    chart_data = {
        'labels': df['quarter'].tolist(),
        'datasets': [
            {
                'label': 'Product A',
                'data': df['product_a'].tolist(),
                'backgroundColor': 'rgba(54, 162, 235, 0.2)',
                'borderColor': 'rgba(54, 162, 235, 1)',
                'borderWidth': 1
            },
            # Other products...
        ]
    }
    
    return render_template('index.html', chart_data=json.dumps(chart_data))

if __name__ == '__main__':
    app.run(debug=True)

Flask is particularly well-suited for data scientists because it allows you to use your Python data processing code alongside a web server. It’s lightweight, which means there’s not a lot of overhead to learn, and it integrates easily with data science libraries like pandas, scikit-learn, and more.

5.1.3.2 Shiny (R)

We covered Shiny earlier in the data visualization section. It’s worth noting again as a complete web framework for R users:

library(shiny)
library(ggplot2)
library(dplyr)

# Load data
sales_data <- read.csv("sales_data.csv")

# Define UI
ui <- fluidPage(
  titlePanel("Sales Data Analysis"),
  
  sidebarLayout(
    sidebarPanel(
      selectInput("product", "Select Product:",
                  choices = c("All", "Product A", "Product B", "Product C"))
    ),
    
    mainPanel(
      plotOutput("salesPlot"),
      h3("Key Findings"),
      verbatimTextOutput("summary")
    )
  )
)

# Define server logic
server <- function(input, output) {
  
  # Filter data based on input
  filtered_data <- reactive({
    if (input$product == "All") {
      return(sales_data)
    } else {
      return(sales_data %>% filter(product == input$product))
    }
  })
  
  # Create plot
  output$salesPlot <- renderPlot({
    ggplot(filtered_data(), aes(x = quarter, y = sales, fill = product)) +
      geom_bar(stat = "identity", position = "dodge") +
      theme_minimal() +
      labs(title = "Quarterly Sales", y = "Sales (millions)")
  })
  
  # Generate summary
  output$summary <- renderText({
    data <- filtered_data()
    paste(
      "Total Sales:", sum(data$sales), "million\n",
      "Average per Quarter:", round(mean(data$sales), 2), "million\n",
      "Growth Rate:", paste0(round((data$sales[4] / data$sales[1] - 1) * 100, 1), "%")
    )
  })
}

# Run the application
shinyApp(ui = ui, server = server)

Shiny is notable for how little web development knowledge it requires. You can create interactive web applications using almost entirely R code, without needing to learn HTML, CSS, or JavaScript.

5.1.4 Deploying Web Applications

Once you’ve built your application, you’ll need to deploy it for others to access:

5.1.4.1 Deployment Options for Flask

  1. Heroku: Platform as a Service with a free tier

    # Install the Heroku CLI
    # Create a requirements.txt file
    pip freeze > requirements.txt
    
    # Create a Procfile
    echo "web: gunicorn app:app" > Procfile
    
    # Deploy
    git init
    git add .
    git commit -m "Initial commit"
    heroku create
    git push heroku main
  2. PythonAnywhere: Python-specific hosting

    • Sign up for an account
    • Upload your files
    • Set up a web app with Flask
  3. AWS, GCP, or Azure: More complex but scalable

5.1.4.2 Deployment Options for Shiny

  1. shinyapps.io: RStudio’s hosting service

    # Install the rsconnect package
    install.packages("rsconnect")
    
    # Configure your account
    rsconnect::setAccountInfo(name="youraccount", token="TOKEN", secret="SECRET")
    
    # Deploy the app
    rsconnect::deployApp(appDir = "path/to/app")
  2. Shiny Server: Self-hosted option (can be installed on cloud VMs)

These deployment options range from simple services designed specifically for data science applications to more general-purpose cloud platforms. The best choice depends on your specific needs, including factors like:

  • Expected traffic volume
  • Security requirements
  • Budget constraints
  • Integration with other systems
  • Need for custom domains or SSL

5.2 Conclusion

Web development skills complement your data science toolkit by enabling you to share your work more effectively. While you don’t need to become a full-stack developer, understanding the basics of HTML, CSS, and JavaScript helps you customize reports, debug rendering issues, and create more polished data products.

The frameworks we’ve covered—Flask for Python and Shiny for R—abstract away much of the complexity, allowing you to focus on your analysis rather than web infrastructure. As you grow more comfortable with these tools, you’ll find that the ability to deploy interactive applications significantly increases the impact of your data science work.

In the next chapter, we’ll explore deployment in more depth, covering various platforms and strategies for making your applications accessible to stakeholders.