Computational Biology

Author

Isabel Duarte

Published

March 1, 2026


Computational Biology course | Intro to R data analysis (Module 1)
Masters in Health Sciences - Disease Mechanisms
Faculdade de Medicina e Ciências Biomédicas - Universidade do Algarve, Faro, Portugal
Isabel Duarte | giduarte at ualg dot pt | Website: http://iduarte.eu/


General Information

Learning outcomes | Knowledge, Skills, and Competences

The students will acquire basic knowledge in the application of computational analysis techniques to biological data using R.

The following topics will be addressed in the ‘Intro to R data analysis’ (Module 1):

  • Introduction to R programming;
  • Descriptive statistics using R;
  • Brief exploratory data analysis of a biological dataset.


Evaluation

The evaluation for Module 1: Intro to R data analysis will be the summation of the two following evaluation criteria:

  • A. Performance in class measured by completing the tutorials and assignments, participation in class, and collaboration with classmates: 5 points.

  • B. One final written exam with one section of multiple choice questions, plus one section with questions to write R code for simple programming tasks: 15 points.

Final grade improvements will be assessed with:

  • One individual assignment, to be completed at home, that must be presented and discussed in a 30 minute individual oral exam: 20 points.


Classes documents

Additional files required for exercises


Prerequisites

To attend these classes, students should be familiar with the following basic statistical concepts:

  • Basic concepts in statistics: Univariate and Bivariate analysis
  • Categorical data (Nominal or Ordinal) vs Numeric data (Discrete or Continuous)
  • Descriptive/Exploratory studies: Mean, Median, Min, Max, Standard deviation, Variance, Mode, Interquartile range
  • Linear regression and Correlation coefficient (Pearson and Spearman)
  • Inferential studies
  • Parametric vs Non-Parametric tests
  • Z-score (Standard score)
  • Hypothesis testing (Null hypothesis and Alternative hypothesis)
  • Unilateral vs Bilateral tests
  • P-value


Syllabus

  • 1. Brief recap of basic statistics concepts
  • 2. Introduction to R
    • Introduction to R programming
    • Descriptive statistics in R
    • Hypothesis testing in R
    • Statistical significance in R
  • 3. Mini-project: Exploratory data analysis using R
    • Tidy data concept: How to organize data into tidy tables
    • Visualization of descriptive statistics, and variable distributions
    • Principal Component Analysis (PCA) for multidimensional reduction
    • Fitting simple linear models
    • Finding and visualizing correlations
    • Strategies to derive knowledge from data
    • Others (according to students requests)


Pedagogical goals

At the end of Module 1, the students will be able to:

  • 1. Biostatistics:
    1. Identify the type of a variable (Numeric - Continuous or Discrete, Categorical - Ordinal or Nominal);
    2. Formulate hypotheses for hypothesis testing (t-test);
    3. Decide between bilateral and unilateral testing;
    4. Calculate and interpret the p-value of a test.
  • 2. Introduction to R:
    1. Create an RStudio project;
    2. Install packages from major repositories, namely CRAN and Bioconductor;
    3. Identify 4 types of data structures available in R: Vectors, Matrices, Data frames, and Lists;
    4. Recognize the 4 main vector data types: Logical (TRUE or FALSE), Numeric (e.g. 1,2,3…), Character (e.g. “Universidade”, “do”, “Algarve”), and Complex (e.g. 3+2i);
    5. Obtain help regarding R functions (using ? or help);
    6. Create vectors;
    7. Assign results to named variables using the assignment operators <- and = ;
    8. Convert between data types;
    9. Understand vectorized arithmetics, i.e. operations between vectors are applied element-wise;
    10. Understand vector recycling, i.e. if an operation is conducted between vectors of different length, the elements from the shorter vector are reused from the beginning;
    11. Construct code iterations using for loops;
    12. Construct conditional statements (if statements);
    13. Load a dataset in R;
    14. Inspect the data loaded;
    15. Obtain information about the dimensions of the dataset, such as the number of rows and number of columns;
    16. Subset a dataset based on row/column number (with []), or based on column name (with $);
    17. Obtain summary statistics on the dataset (mean, maximum, minimum, quartiles, standard deviation, and variance);
    18. Graphically explore the data with boxplots and histograms;
    19. Export results to a file (data analyses and figures);
    20. Save the workspace with all analysis’ results in a .Rdata file;
    21. Understand hypothesis testing: interpreting the p-value.
  • Mini-project: Exploratory data analysis using R
    1. Inspect and assess the structure and quality of a dataset;
    2. Perform structured exploratory data analysis using appropriate visualisations;
    3. Interpret distributions, group differences, and correlations in biological terms;
    4. Recognise multivariate patterns and potential confounding effects;
    5. Translate exploratory findings into hypotheses for downstream modelling.

Bibliography


Back to top