7  Deploying Data Science Projects

7.1 Understanding Deployment for Data Science

After developing your data science project, the next crucial step is deployment—making your work accessible to others. Deployment can mean different things depending on your project: publishing an analysis report (using the documentation tools from the Reporting chapter), sharing an interactive dashboard (like the Shiny and Dash applications we explored in previous chapters), or creating an API for a machine learning model.

7.1.1 Why Deployment Matters

Deployment is often overlooked in data science education, but it’s critical for several reasons:

  1. Impact: Even the most insightful analysis has no impact if it remains on your computer
  2. Collaboration: Deployment enables others to interact with your work
  3. Reproducibility: Properly deployed projects document the environment and dependencies
  4. Professional growth: Deployment skills significantly enhance your value as a data scientist

Data scientists who can effectively deploy their work are more likely to see their projects create real business value.

7.1.2 Static vs. Dynamic Deployment

Before selecting a deployment platform, it’s important to understand the fundamental difference between static and dynamic content:

7.1.2.1 Static Content

Static content doesn’t change based on user input and is pre-generated:

  • HTML reports from R Markdown, Jupyter notebooks, or Quarto
  • Documentation sites
  • Fixed visualisations and dashboards

Advantages:

  • Simpler to deploy
  • More secure
  • Lower hosting costs
  • Better performance

7.1.2.2 Dynamic Applications

Dynamic applications respond to user input and may perform calculations:

  • Interactive Shiny or Dash dashboards
  • Machine learning model APIs
  • Data exploration tools

Advantages:

  • Interactive user experience
  • Real-time calculations
  • Ability to handle user-specific data
  • More flexible functionality

7.1.3 Deployment Requirements by Project Type

Different data science projects have specific deployment requirements:

Project Type Interactivity Computation Data Access Suitable Platforms
Analysis reports None None None GitHub Pages, Netlify, Vercel, Quarto Pub
Interactive visualisations Medium Low Static GitHub Pages (with JavaScript), Netlify
Dashboards High Medium Often dynamic Render, Fly.io, Railway, shinyapps.io, Posit Connect Cloud
ML model APIs Low High May need database Google Cloud Run, AWS App Runner, Azure Container Apps

Understanding these requirements helps you choose the most appropriate deployment strategy.

7.2 Deployment Platforms for Data Science

Let’s examine the most relevant deployment options for data scientists, focusing on ease of use, cost, and suitability for different project types.

7.2.1 Static Site Deployment Options

7.2.1.1 GitHub Pages

GitHub Pages offers free hosting for static content directly from your GitHub repository:

Best for: HTML reports, documentation, simple visualisations Setup complexity: Low Cost: Free Limitations: Only static content, 1GB repository limit

Quick setup:

# Assuming you have a GitHub repository
# 1. Create a gh-pages branch
git checkout -b gh-pages

# 2. Add your static HTML files
git add .
git commit -m "Add website files"

# 3. Push to GitHub
git push origin gh-pages

# Your site will be available at: https://username.github.io/repository

For automated deployment with GitHub Actions, create a file at .github/workflows/publish.yml:

name: Deploy to GitHub Pages

on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Build
        run: npm run build

      - name: Deploy
        uses: JamesIves/github-pages-deploy-action@v4
        with:
          folder: build

GitHub Actions publishes new versions of its actions/* building blocks regularly. Check github.com/actions and pin to the current major version at the time you set up the workflow.

7.2.1.2 Netlify

Netlify provides more advanced features for static sites:

Best for: Static sites that require a build process Setup complexity: Low to medium Cost: Free tier with generous limits, paid plans start at $19/month Limitations: Limited build minutes on free tier

Quick setup:

  1. Sign up at netlify.com
  2. Connect your GitHub repository
  3. Configure build settings:
    • Build command (e.g., quarto render or jupyter nbconvert)
    • Publish directory (e.g., _site or output)

Netlify automatically rebuilds your site when you push changes to your repository.

7.2.1.3 Vercel

Vercel is a cloud platform that specialises in frontend frameworks and static sites, with excellent support for modern web technologies and serverless functions. Originally created by the makers of Next.js, Vercel has become popular for its speed and developer experience.

Best for: Static sites with interactive elements, data visualisations with JavaScript, projects using modern web frameworks Setup complexity: Low to medium Cost: Generous free tier, paid plans start at $20/month per team member Limitations: Optimised for frontend applications, limited backend capabilities compared to full cloud platforms

Vercel excels at deploying static content that includes interactive JavaScript components, making it ideal for data science projects that combine static analysis with interactive visualisations. Unlike traditional static hosts, Vercel can also run serverless functions, allowing you to add dynamic capabilities without managing servers.

Quick setup:

The simplest way to deploy to Vercel is through their web interface:

  1. Sign up at vercel.com
  2. Connect your GitHub, GitLab, or Bitbucket repository
  3. Vercel automatically detects your project type and configures build settings
  4. Click “Deploy” - your site will be live in minutes

For command-line deployment, install the Vercel CLI:

# Install Vercel CLI globally
npm install -g vercel

# From your project directory
vercel

# Follow the prompts to link your project
# Your site will be deployed and you'll get a URL

Configuration for data science projects:

Create a vercel.json file in your project root to customise the build process:

{
  "buildCommand": "quarto render",
  "outputDirectory": "_site",
  "installCommand": "npm install",
  "functions": {
    "api/*.py": {
      "runtime": "python3.12"
    }
  }
}

This configuration tells Vercel to use Quarto to build your site (common for data science documentation), specifies where the built files are located, and enables Python serverless functions for any dynamic features you might need.

Example use case: Vercel is particularly well-suited for deploying interactive data visualisations created with modern JavaScript libraries. For instance, if you create visualisations using Observable Plot or D3.js alongside your static analysis, Vercel can host both the static content and any serverless functions needed for data processing.

Why choose Vercel over alternatives:

  • Speed: Vercel’s global CDN ensures fast loading times worldwide
  • Automatic optimisation: Images and assets are automatically optimised
  • Preview deployments: Every pull request gets its own preview URL for testing
  • Serverless functions: Add dynamic capabilities without complex backend setup
  • Analytics: Built-in web analytics to understand how users interact with your deployed projects

7.2.1.4 Quarto Pub

If you’re using Quarto for your documents, Quarto Pub offers simple publishing:

Best for: Quarto documents and websites Setup complexity: Very low Cost: Free for public content Limitations: Limited to Quarto projects

Quick setup:

# Install Quarto CLI from https://quarto.org/
# From your Quarto project directory:
quarto publish

7.2.2 Dynamic Application Deployment

NoteA note on Heroku

Heroku used to be the default recommendation for deploying small Python and R web apps, and many older tutorials still mention its free tier. Heroku discontinued its free product tiers in November 2022 and now charges for all dynos. It’s still a perfectly good platform, but for readers looking for a free starting point the recommendations below (Render, Fly.io, Railway) are better aligned with that goal.

7.2.2.1 Render

Render is the most direct successor to old-Heroku for data science workloads, with a generous free tier for experimentation:

Best for: Python and R web applications, Dockerised dashboards Setup complexity: Medium Cost: Free tier for experimentation (services sleep after inactivity); paid plans from $7/month Limitations: Free-tier services sleep when idle and have limited compute hours

Setup for a Python web application:

  1. Create a requirements.txt file:
flask==3.0.3
pandas==2.2.3
matplotlib==3.9.2
gunicorn==23.0.0
  1. Push your project to a GitHub repository
  2. Sign up at render.com
  3. Connect your GitHub repository
  4. Create a new Web Service and configure it:
    • Environment: Python
    • Build Command: pip install -r requirements.txt
    • Start Command: gunicorn app:app
  5. Add any required environment variables (API keys, database URLs) in the Environment tab. Never commit secrets to your repo.

Render builds and deploys your app on every push to the configured branch, so there’s no separate git push render main step.

7.2.2.2 Fly.io and Railway

Two other platforms worth knowing for small to mid-sized Python/R deployments:

  • Fly.io: Deploys Docker containers globally on a pay-as-you-go basis. Good fit once you’re comfortable with Docker (covered in the next chapter); the flyctl command-line tool handles the build, push, and deploy steps with one command.
  • Railway: Similar developer experience to Render, with generous trial credits. Good Postgres and cron support out of the box.

All three platforms use essentially the same mental model: connect a Git repo, describe how to build the app (either requirements.txt + a start command or a Dockerfile), and let the platform handle HTTPS, logging, and redeploys on push.

7.2.2.3 shinyapps.io

For R Shiny applications, shinyapps.io offers the simplest deployment option:

Best for: R Shiny applications Setup complexity: Low Cost: Free tier (5 apps, 25 hours/month), paid plans start at $9/month Limitations: Limited monthly active hours on free tier

Deployment from RStudio:

The easiest path is to use the Publish button in RStudio (the blue arrow next to the Run button in the editor for app.R). RStudio walks you through linking your shinyapps.io account and deploys the app with one click, with no token handling in code.

If you prefer to do it from the console, install rsconnect and keep your credentials out of the script. Read them from environment variables that you set in your OS or in a .Renviron file that’s listed in .gitignore:

install.packages("rsconnect")

# Configure your account (one-time setup)
# Set SHINYAPPS_TOKEN and SHINYAPPS_SECRET in your environment first,
# e.g. in ~/.Renviron (which should NOT be committed to git).
rsconnect::setAccountInfo(
  name   = "your-account-name",
  token  = Sys.getenv("SHINYAPPS_TOKEN"),
  secret = Sys.getenv("SHINYAPPS_SECRET")
)

# Deploy your app
rsconnect::deployApp(
  appDir  = "path/to/your/app",
  appName = "my-shiny-app",
  account = "your-account-name"
)
Noteshinyapps.io alternatives

Since 2024, Posit Connect Cloud offers a modern free tier for deploying Shiny, Quarto, Streamlit, and Dash content directly from a Git repository, and is worth a look if shinyapps.io feels dated. For fully client-side Shiny apps (no server required), Shinylive runs R or Python Shiny apps entirely in the browser via WebAssembly and can be hosted on any static host like GitHub Pages.

7.2.3 Cloud Platform Deployment

For more complex or production-level deployments, cloud platforms offer greater flexibility and scalability:

7.2.3.1 Google Cloud Run

Cloud Run is ideal for containerised applications:

Best for: Containerised applications that need to scale Setup complexity: Medium to high Cost: Pay-per-use with generous free tier Limitations: Requires Docker knowledge

Deployment steps:

# Build your Docker image
docker build -t gcr.io/your-project/app-name .

# Push to Google Container Registry
docker push gcr.io/your-project/app-name

# Deploy to Cloud Run
gcloud run deploy app-name \
  --image gcr.io/your-project/app-name \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

7.2.3.2 AWS Elastic Beanstalk

Elastic Beanstalk handles the infrastructure for your applications:

Best for: Production-level web applications Setup complexity: Medium to high Cost: Pay for underlying resources Limitations: More complex setup

Deployment with the AWS CLI:

# Initialize Elastic Beanstalk in your project
eb init -p python-3.12 my-app --region us-west-2

# Create an environment
eb create my-app-env

# Deploy your application
eb deploy

7.3 Step-by-Step Deployment Guides

Let’s walk through complete deployment workflows for common data science scenarios.

NoteA note on the example data and model files

The examples below reference placeholder files like sales_data.csv, my_data.csv, and model.pkl. These aren’t shipped with the book — they stand in for your own data or trained model. When you follow along, either point the code at one of your own files, substitute a built-in dataset (e.g. ggplot2::diamonds for R, sns.load_dataset("tips") for Python), or pull a stable public CSV from a URL as shown in the Reporting and Visualisation chapters. The deployment mechanics don’t depend on which file you use.

7.3.1 Deploying a Data Science Report to GitHub Pages

This example shows how to publish an analysis report created with Quarto:

  1. Create your Quarto document:
---
title: "Sales Analysis Report"
author: "Your Name"
format: html
---

## Executive Summary

Our analysis shows a 15% increase in Q4 sales compared to the previous year.

```{r}
#| echo: false
#| warning: false
library(ggplot2)
library(dplyr)
library(here)

# Load data
sales <- read.csv(here("data", "my_data.csv"))

# Create visualisation
ggplot(sales, aes(x = Product, y = Sales, fill = Product)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Product Comparison")
```
  1. Set up a GitHub repository for your project

  2. Create a GitHub Actions workflow file at .github/workflows/publish.yml:

name: Publish Quarto Site

on:
  push:
    branches: [main]

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - name: Check out repository
        uses: actions/checkout@v4

      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2

      - name: Install R
        uses: r-lib/actions/setup-r@v2
        with:
          r-version: '4.4.0'

      - name: Install R Dependencies
        uses: r-lib/actions/setup-r-dependencies@v2
        with:
          packages:
            any::knitr
            any::rmarkdown
            any::ggplot2
            any::dplyr

      - name: Render and Publish
        uses: quarto-dev/quarto-actions/publish@v2
        with:
          target: gh-pages
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  1. Push your changes to GitHub:
git add .
git commit -m "Add analysis report and GitHub Actions workflow"
git push origin main
  1. Enable GitHub Pages in your repository settings, selecting the gh-pages branch as the source

Your report will be automatically published each time you push changes to your repository, making it easy to share with stakeholders.

7.3.2 Deploying a Dash Dashboard to Render

This example demonstrates deploying an interactive Python dashboard:

  1. Create your Dash application (app.py):
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import pandas as pd
import plotly.express as px

# Load data
df = pd.read_csv('sales_data.csv')

# Initialize app
app = dash.Dash(__name__, title="Sales Dashboard")
server = app.server  # For Render deployment

# Create layout
app.layout = html.Div([
    html.H1("Sales Performance Dashboard"),
    
    html.Div([
        html.Label("Select Year:"),
        dcc.Dropdown(
            id='year-filter',
            options=[{'label': str(year), 'value': year} 
                     for year in sorted(df['year'].unique())],
            value=df['year'].max(),
            clearable=False
        )
    ], style={'width': '30%', 'margin': '20px'}),
    
    dcc.Graph(id='sales-graph')
])

# Create callback
@app.callback(
    Output('sales-graph', 'figure'),
    Input('year-filter', 'value')
)
def update_graph(selected_year):
    filtered_df = df[df['year'] == selected_year]
    
    fig = px.bar(
        filtered_df, 
        x='quarter', 
        y='sales',
        color='product',
        barmode='group',
        title=f'Quarterly Sales by Product ({selected_year})'
    )
    
    return fig

if __name__ == '__main__':
    # debug=True must NEVER be used in production. It exposes
    # an interactive Python console to anyone who can reach the app.
    # Render runs `gunicorn` directly in production, so this block
    # is only used when you run `python app.py` locally.
    app.run(debug=False)
  1. Create a requirements.txt file:
dash==2.18.2
pandas==2.2.3
plotly==5.24.1
gunicorn==23.0.0
  1. Create a minimal Dockerfile:
FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD gunicorn app:server -b 0.0.0.0:$PORT
  1. Sign up for Render and connect your GitHub repository

  2. Create a new Web Service on Render with these settings:

    • Name: your-dashboard-name
    • Environment: Docker
    • Build Command: (leave empty when using Dockerfile)
    • Start Command: (leave empty when using Dockerfile)
  3. Deploy your application

Your interactive dashboard will be available at the URL provided by Render.

7.3.3 Deploying a Shiny Application to shinyapps.io

This example shows how to deploy an R Shiny dashboard:

  1. Create a Shiny app directory with app.R:
library(shiny)
library(ggplot2)
library(dplyr)
library(here)

# Load data
sales <- read.csv(here("data", "my_data.csv"))

# UI
ui <- fluidPage(
  titlePanel("Sales Analysis Dashboard"),
  
  sidebarLayout(
    sidebarPanel(
      selectInput("Date", "Select Date:",
                  choices = unique(sales$Date),
                  selected = max(sales$Date)),
      
      checkboxGroupInput("Products", "Select Products:",
                         choices = unique(sales$Product),
                         selected = unique(sales$Product)[1])
    ),
    
    mainPanel(
      plotOutput("salesPlot"),
      dataTableOutput("salesTable")
    )
  )
)

# Server
server <- function(input, output) {
  
  filtered_data <- reactive({
    sales %>%
      filter(Date == input$Date,
             Product %in% input$Products)
  })
  
  output$salesPlot <- renderPlot({
    ggplot(filtered_data(), aes(x = Date, y = Sales, fill = Product)) +
      geom_bar(stat = "identity", position = "dodge") +
      theme_minimal() +
      labs(title = paste("Sales for", input$Date))
  })
  
  output$salesTable <- renderDataTable({
    filtered_data() %>%
      group_by(Product) %>%
      summarize(Total = sum(Sales),
                Average = mean(Sales))
  })
}

# Run the application
shinyApp(ui = ui, server = server)
  1. Install and configure the rsconnect package:
install.packages("rsconnect")

# Set up your account (one-time setup)
rsconnect::setAccountInfo(
  name = "your-account-name",  # Your shinyapps.io username
  token = "YOUR_TOKEN",
  secret = "YOUR_SECRET"
)
  1. Deploy your application:
rsconnect::deployApp(
  appDir = "path/to/your/app",  # Directory containing app.R
  appName = "sales-dashboard",  # Name for your deployed app
  account = "your-account-name" # Your shinyapps.io username
)
  1. Share the provided URL with your stakeholders

The deployed Shiny app will be available at https://your-account-name.shinyapps.io/sales-dashboard/.

7.3.4 Deploying a Machine Learning Model API

This example demonstrates deploying a machine learning model as an API:

  1. Create a Flask API for your model (app.py):
import os
import pickle

from flask import Flask, request, jsonify
import pandas as pd

# Initialize Flask app
app = Flask(__name__)

# Load the pre-trained model
with open('model.pkl', 'rb') as file:
    model = pickle.load(file)

@app.route('/predict', methods=['POST'])
def predict():
    try:
        # Get JSON data from request
        data = request.get_json(force=True)
        if not isinstance(data, dict):
            return jsonify({
                'status': 'error',
                'message': 'Request body must be a JSON object of feature: value pairs',
            }), 400

        # Convert to DataFrame
        input_data = pd.DataFrame(data, index=[0])

        # Make prediction
        prediction = model.predict(input_data)[0]

        # Return prediction as JSON
        return jsonify({
            'status': 'success',
            'prediction': float(prediction),
            'input_data': data,
        })

    except Exception as e:
        return jsonify({
            'status': 'error',
            'message': str(e),
        }), 400

@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status': 'healthy'})

if __name__ == '__main__':
    # In production, gunicorn runs the app directly. This block is
    # only used for local development.
    app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

For production use you would want stricter input validation (for example with Pydantic or Marshmallow) and authentication. The try/except above only catches shape errors, not adversarial input.

  1. Create a requirements.txt file:
flask==3.0.3
pandas==2.2.3
scikit-learn==1.5.2
gunicorn==23.0.0
  1. Create a Dockerfile:
FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD gunicorn --bind 0.0.0.0:$PORT app:app
  1. Deploy to Google Cloud Run:
# Build the container
gcloud builds submit --tag gcr.io/your-project/model-api

# Deploy to Cloud Run
gcloud run deploy model-api \
  --image gcr.io/your-project/model-api \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated
Warning--allow-unauthenticated exposes the endpoint to the public internet

For a quick demo this is fine, but don’t leave a model API wide open to the internet without rate limiting and authentication. For production use, drop --allow-unauthenticated and require callers to present an identity token (gcloud auth print-identity-token), or put an API gateway in front of the service. At minimum, monitor the Cloud Run request count and set budget alerts so you notice unexpected traffic early.

  1. Test your API:
curl -X POST \
  https://model-api-xxxx-xx.a.run.app/predict \
  -H "Content-Type: application/json" \
  -d '{"feature1": 0.5, "feature2": 0.8, "feature3": 1.2}'

This API allows other applications to easily access your machine learning model’s predictions.

7.4 Deployment Best Practices

Regardless of the platform you choose, these best practices will help ensure successful deployments:

7.4.1 Environment Management

  1. Use environment files: Include requirements.txt for Python or renv.lock for R
  2. Specify exact versions: Use pandas==1.5.3 rather than pandas>=1.5.0
  3. Minimise dependencies: Include only what you need to reduce deployment size
  4. Test in a clean environment: Verify your environment files are complete

7.4.2 Security Considerations

  1. Never commit secrets: Keep API keys, database passwords, and access tokens in environment variables, platform-managed secret stores (Render Environment, Fly.io secrets, GCP Secret Manager, AWS Secrets Manager), or a local .env file that is listed in .gitignore. If a secret ever reaches a repository, whether public or private, treat it as compromised and rotate it immediately rather than trying to scrub git history.
  2. Set up proper authentication: Restrict access to sensitive applications. Don’t leave --allow-unauthenticated endpoints, default passwords, or debug=True enabled in anything publicly reachable.
  3. Implement input validation: Protect against malicious inputs, especially for any endpoint that touches a database or a machine learning model.
  4. Use HTTPS: Every modern platform-as-a-service (Render, Fly, Railway, Vercel, Cloud Run) issues free TLS certificates automatically; use them. Custom domains should always be served over HTTPS.
  5. Regularly update dependencies: Address security vulnerabilities. Tools like GitHub Dependabot or pip-audit can flag known CVEs in your requirements.txt.

7.4.3 Observability: Logging, Health Checks, and Monitoring

Once your application is deployed, you need to know when it breaks:

  1. Emit structured logs to stdout/stderr: Modern platforms capture anything your app prints and surface it in their dashboards. Use Python’s logging module (or lgr / futile.logger in R) rather than print, and log in JSON where possible so the logs are searchable.

  2. Expose a /health endpoint: A simple route that returns HTTP 200 when the app is alive lets the platform (and uptime monitors like UptimeRobot or BetterStack) restart unhealthy instances automatically. The ML API example in this chapter already does this.

  3. Add a Docker HEALTHCHECK in your Dockerfile so the container itself reports its status:

    HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
      CMD curl -fsS http://localhost:${PORT:-8080}/health || exit 1
  4. Track errors separately: Services like Sentry or Better Stack give you grouped stack traces and notifications when something goes wrong. Both have generous free tiers for small projects.

  5. Set budget alerts: On pay-per-use platforms (Cloud Run, AWS, GCP) configure a monthly budget with email alerts before you deploy anything that autoscales.

7.4.4 Performance Optimisation

  1. Optimise data loading: Load data efficiently or use databases for large datasets
  2. Implement caching: Cache results of expensive computations
  3. Monitor resource usage: Keep track of memory and CPU utilisation
  4. Implement pagination: For large datasets, display data in manageable chunks
  5. Consider asynchronous processing: Use background tasks for long-running computations

7.4.5 Documentation

  1. Create a README: Document deployment steps and dependencies
  2. Add usage examples: Show how to interact with your deployed application
  3. Include contact information: Let users know who to contact for support
  4. Provide version information: Display the current version of your application
  5. Document API endpoints: If applicable, describe available API endpoints

7.5 Troubleshooting Common Deployment Issues

7.5.1 Platform-Specific Issues

7.5.1.1 GitHub Pages

Issue Solution
Changes not showing up Check if you’re pushing to the correct branch
Build failures Review the GitHub Actions logs for errors
Custom domain not working Verify DNS settings and CNAME file

7.5.1.2 Render / Fly.io

Issue Solution
Application crash Check the service’s Logs tab in the dashboard (flyctl logs on Fly.io)
Build failures Ensure dependencies are pinned in requirements.txt and that the build command is correct
Free service sleeping when idle Use periodic health-check pings, or upgrade to a paid tier that stays warm

7.5.1.3 shinyapps.io

Issue Solution
Package installation failures Use packrat or renv to manage dependencies
Application timeout Optimise data loading and computation
Deployment failures Check rsconnect logs in RStudio

7.5.2 General Deployment Issues

  1. Missing dependencies:
    • Review error logs to identify missing packages
    • Ensure all dependencies are listed in your environment files
    • Test your application in a clean environment
  2. Environment variable problems:
    • Verify environment variables are set correctly
    • Check for typos in variable names
    • Use platform-specific ways to set environment variables
  3. File path issues:
    • Use relative paths instead of absolute paths
    • Be mindful of case sensitivity on Linux servers
    • Use appropriate path separators for the deployment platform
  4. Permission problems:
    • Ensure application has necessary permissions to read/write files
    • Check file and directory permissions
    • Use platform-specific storage solutions for persistent data
  5. Memory limitations:
    • Optimise data loading to reduce memory usage
    • Use streaming approaches for large datasets
    • Upgrade to a plan with more resources if necessary

7.6 Conclusion

Effective deployment is crucial for sharing your data science work with stakeholders and making it accessible to users. By understanding the different deployment options and following best practices, you can ensure your projects have the impact they deserve.

Remember that deployment is not a one-time task but an ongoing process. As your projects evolve, you’ll need to update your deployed applications, monitor their performance, and address any issues that arise.

In the next chapter, we’ll explore how to optimise your entire data science workflow, from development to deployment, to maximise your productivity and impact.