Week 3 | Identify variables & Tidy data
📅 Date: 20 March, 2025
⏰ Time: 09:30h - 11:30h
📖 Synopsis: Extract experimental details from text, format tidy tables, and reverse engineer study design from tidy tables.
Experimental Designs
1. Controlled Experiment
- Description: Researcher manipulates one or more variables while keeping others constant.
- Example: Adjusting pH levels in aquaria and measuring coral growth.
- Strengths: Clear cause-effect relationships, high internal validity.
- Limitations: May lack realism in natural environments.
2. Observational Study
- Description: No manipulation; variables are observed as they naturally occur.
- Example: Comparing fish diversity in protected vs. unprotected areas.
- Types:
- Cross-sectional: Single point in time.
- Longitudinal (Time-series): Data collected over time.
- Strengths: Realistic, practical.
- Limitations: Cannot infer causation definitively.
3. Correlational Study
- Description: Measures the relationship between variables without manipulation.
- Example: Monitoring algal biomass and oxygen levels to detect patterns.
- Strengths: Good for detecting associations.
- Limitations: No causal conclusions.
4. Randomized Controlled Trial (RCT)
- Description: Random assignment of subjects/units to treatments and control.
- Example: Randomly assigning noise levels to dolphin enclosures.
- Strengths: Minimizes bias, strong causal inference.
- Limitations: Logistically complex in the field.
5. Factorial Design
- Description: Tests multiple independent variables simultaneously.
- Example: Testing combined effects of temperature and salinity on zooplankton.
- Strengths: Identifies interaction effects between variables.
- Limitations: Requires more samples.
6. Before-After Control-Impact (BACI) Design
- Description: Compares conditions before and after an impact at both affected and control sites.
- Example: Measuring water quality before and after coastal development.
- Strengths: Controls for natural variability.
- Limitations: Requires baseline data.
7. Natural Experiment
- Description: Utilizes naturally occurring events as treatments.
- Example: Studying fish populations after a hurricane.
- Strengths: High ecological validity.
- Limitations: Limited control over variables.
Goal | This exercise aims at training students to critically extract key experimental details from textual descriptions.
Instructions | Read each narrative description carefully. For each one, identify:
- Experimental Design (check the Experimental Designs tab)
- Independent Variable(s)
- Dependent Variable(s)
- Control Variable(s) (if applicable)
Example 1: Ocean Acidification on Coral Growth
Researchers want to test how increased ocean acidification affects the growth of a common coral species. They set up three aquaria with seawater at different pH levels: 8.1 (current ocean pH), 7.8, and 7.5. Identical coral fragments are placed in each aquarium, and all tanks are kept under the same light and temperature conditions. After six months, they measure the calcification rate of the corals.
- Experimental Design: Controlled laboratory experiment with three treatment groups.
- Independent Variable: pH level of seawater (8.1, 7.8, 7.5)
- Dependent Variable: Coral calcification rate (growth)
- Control Variables: Light intensity, water temperature, coral species and fragment size, nutrient levels.
Example 2: Marine Protected Areas and Fish Biodiversity
A team surveys fish species diversity in 10 marine protected areas (MPAs) and compares it with 10 nearby unprotected coastal sites. Surveys are conducted using the same underwater visual census method at all locations during the same season.
- Experimental Design: Observational comparative study (between protected and unprotected areas)
- Independent Variable: Protection status (MPA vs. unprotected site)
- Dependent Variable: Fish species diversity (number of species observed)
- Control Variables: Survey method, season, location depth, time of day surveyed.
Example 3: Plastic Pollution Impact on Zooplankton
Scientists investigate whether microplastic concentration affects zooplankton reproduction. They prepare four tanks with different microplastic concentrations: 0 particles/mL, 10 particles/mL, 100 particles/mL, and 1,000 particles/mL. Each tank contains the same number of zooplankton individuals and is kept at constant temperature, salinity, and food availability. After two weeks, they count the number of offspring produced.
- Experimental Design: Controlled laboratory experiment with graded treatments.
- Independent Variable: Microplastic concentration (0, 10, 100, 1,000 particles/mL)
- Dependent Variable: Number of zooplankton offspring
- Control Variables: Temperature, salinity, food availability, initial zooplankton number/species.
Example 4: Algal Bloom Effects on Oxygen Levels
To examine the effect of algal blooms on oxygen levels, researchers monitor dissolved oxygen in a coastal bay over six months. They record algal biomass, water temperature, and nutrient concentrations weekly. They look for correlations between algal biomass and oxygen concentration.
- Experimental Design: Observational time-series study
- Independent Variable: Algal biomass (measured continuously)
- Dependent Variable: Dissolved oxygen concentration
- Control Variables: Recording temperature and nutrient levels allows adjustment for confounding variables.
Example 5: Noise Pollution Impact on Dolphin Communication
Marine biologists play varying levels of ship noise recordings in a controlled enclosure housing a pod of dolphins. For each noise level (low, medium, high), they observe and record the frequency and duration of dolphin whistles. Sessions are randomized, and the dolphins have rest periods between exposures.
- Experimental Design: Randomized controlled experiment
- Independent Variable: Ship noise level (low, medium, high)
- Dependent Variables: Frequency and duration of dolphin whistles
- Control Variables: Same dolphin pod, rest periods, enclosure size, time of day.
Goal | To bridge experimental design concepts with practical data skills.
Instructions | After identifying the independent, dependent, and control variables from each “narrative”, describe how the data would be organized in a tidy data table. Specifically:
- Columns: What variables are measured or recorded?
- Rows: What does each row represent (unit of observation)?
- Variable types: Are variables numeric or categorical?
- Control variables: How are they documented?
Example 1: Ocean Acidification on Coral Growth
Tidy Table Example:
tank_id | pH | calcification_rate | light_intensity | temp_C | coral_species | nutrient_conc |
---|---|---|---|---|---|---|
Tank_1 | 8.1 | 2.3 | 200 | 25 | Acropora sp. | 3 |
Tank_2 | 7.8 | 1.9 | 200 | 25 | Acropora sp. | 3 |
Tank_3 | 7.5 | 1.4 | 200 | 25 | Acropora sp. | 3 |
Each row = one coral fragment after six months.
Example 2: Marine Protected Areas and Fish Biodiversity
Tidy Table Example:
site_id | protection_status | fish_species_count | survey_method | season | depth_m | time_of_day |
---|---|---|---|---|---|---|
MPA_1 | MPA | 25 | Visual Census | Spring | 10 | Morning |
MPA_2 | MPA | 30 | Visual Census | Spring | 12 | Morning |
Site_1 | Unprotected | 18 | Visual Census | Spring | 11 | Morning |
Site_2 | Unprotected | 20 | Visual Census | Spring | 10 | Morning |
Each row = one survey at one site.
Example 3: Plastic Pollution Impact on Zooplankton
Tidy Table Example:
tank_id | microplastic_conc | offspring_count | temp_C | salinity | food_available | zooplankton_sp | initial_count |
---|---|---|---|---|---|---|---|
Tank_A | 0 | 45 | 22 | 35 | High | Copepod sp. | 50 |
Tank_B | 10 | 40 | 22 | 35 | High | Copepod sp. | 50 |
Tank_C | 100 | 28 | 22 | 35 | High | Copepod sp. | 50 |
Tank_D | 1000 | 12 | 22 | 35 | High | Copepod sp. | 50 |
Each row = one tank’s measurement.
Example 4: Algal Bloom Effects on Oxygen Levels
Tidy Table Example:
date | algal_biomass | dissolved_oxygen | water_temp_C | nutrient_conc |
---|---|---|---|---|
2024-01-01 | 12.5 | 6.8 | 18.2 | 3.1 |
2024-01-08 | 15.3 | 5.4 | 18.4 | 3.2 |
2024-01-15 | 20.1 | 4.7 | 18.6 | 3.0 |
Control variables (temperature and nutrients) are recorded alongside the key variables to control for confounding effects.
Example 5: Noise Pollution Impact on Dolphin Communication
Tidy Table Example:
session_id | noise_level | whistle_freq | whistle_duration | rest_period_min | pod_id | enclosure_size_m | time_of_day |
---|---|---|---|---|---|---|---|
Sess_1 | Low | 7500 | 2.1 | 30 | Pod_1 | 50 | Morning |
Sess_2 | Medium | 7100 | 1.8 | 30 | Pod_1 | 50 | Morning |
Sess_3 | High | 6800 | 1.4 | 30 | Pod_1 | 50 | Morning |
Each row = one noise exposure session.
Goal | To “reverse engineer” the experimental design and research question from a tidy table to build critical thinking skills.
Instructions | In this exercise, you are given tidy data tables. Your task:
- Interpret: What scientific question might the researchers be asking?
- Identify Variables: Which variables are:
- Independent (manipulated or compared)
- Dependent (measured outcome)
- Control (kept constant or recorded for adjustment)
- Describe Experimental Design: Is it observational, controlled, randomized?
Example 1
tank_id | light_intensity | nutrient_level | temp_C | seaweed_species | biomass_increase |
---|---|---|---|---|---|
Tank_1 | 100 | Low | 18 | Ulva sp. | 4.5 |
Tank_2 | 100 | High | 18 | Ulva sp. | 8.1 |
Tank_3 | 200 | Low | 18 | Ulva sp. | 6.2 |
Tank_4 | 200 | High | 18 | Ulva sp. | 11.4 |
Possible Research Question:
How do light intensity and nutrient levels affect seaweed biomass increase under controlled conditions?
Experimental Design:
Controlled laboratory factorial experiment (2x2 design: light × nutrient).
Variables:
- Independent Variables:
light_intensity
(100, 200),nutrient_level
(Low, High) - Dependent Variable:
biomass_increase
- Control Variables:
temperature
,seaweed_species
(constant across tanks)
Example 2
site_id | fishing_pressure | depth_m | habitat_type | survey_method | shark_count |
---|---|---|---|---|---|
Site_A | High | 25 | Reef | Baited Camera | 3 |
Site_B | Low | 25 | Reef | Baited Camera | 8 |
Site_C | High | 30 | Sandy | Baited Camera | 2 |
Site_D | Low | 30 | Sandy | Baited Camera | 6 |
Possible Research Question:
Is shark abundance lower in areas with high fishing pressure compared to low fishing pressure areas?
Experimental Design:
Observational comparative study (across sites).
Variables:
- Independent Variable:
fishing_pressure
(High, Low) - Dependent Variable:
shark_count
- Control Variables:
depth_m
,habitat_type
,survey_method
(standardized across sites)
Example 3
sample_id | salinity | temp_C | sample_depth_m | chlorophyll_conc | plankton_sp_count |
---|---|---|---|---|---|
Sample_1 | 30 | 22 | 5 | 2.5 | 15 |
Sample_2 | 35 | 22 | 5 | 3.1 | 18 |
Sample_3 | 40 | 22 | 5 | 4.2 | 12 |
Sample_4 | 45 | 22 | 5 | 4.8 | 9 |
Possible Research Question:
How does salinity influence plankton species diversity in coastal waters?
Experimental Design:
Observational gradient study.
Variables:
- Independent Variable:
salinity
(continuous gradient) - Dependent Variable:
plankton_species_count
- Control Variables:
temperature
,sample_depth_m
,chlorophyll_concentration
(recorded)
1. Experimental Design & Variables
- What information helps you distinguish between independent, dependent, and control variables?
- How can you tell whether a study is controlled, randomized, or observational based solely on the dataset?
- Why is it important to clearly document control variables, even if they are kept constant?
- In some examples, more than one independent variable is manipulated. How would you recognize and interpret this factorial design?
- How might confounding variables influence the results if not properly controlled?
2. Tidy Data Structure
- Why is it useful to structure experimental data in a tidy format (one variable per column, one observation per row)?
- What could go wrong in analysis or visualization if data were not kept in tidy format?
- How do identifiers like
tank_id
,site_id
, orsession_id
help maintain data clarity and traceability?
3. Critical Interpretation
- Could there be alternative research questions addressed with the same dataset? If so, how?
- Are there any additional variables you might want to collect to strengthen the experimental design or account for potential confounders?
- If you encountered missing or inconsistent data in these tables, how would it affect your interpretation?
- How do replication and sample size considerations appear (or not appear) in tidy datasets like these?
- How could these tables be prepared for statistical modeling (e.g., regression analysis)?
4. Future Application
- In your own research, how would you apply what you’ve learned about tidy data and variable types to plan data collection?
- What challenges might arise when moving from narrative descriptions of experiments to data analysis-ready datasets in real-world projects?