Introduction

This R Markdown document accompanies the tutorial manuscript Latent Transition Analysis with Auxiliary Variables: A Demonstration of the ML 3-Step and BCH Methods in Mplus using data from the Longitudinal Study of American Youth (LSAY). Its purpose is to provide a fully reproducible, step-by-step implementation of the multi-step LTA workflow described in the paper using Mplus and the MplusAutomation package in R.

The focus of this document is operational rather than conceptual. For each phase of the workflow, we specify:

  1. the goal of the step,

  2. the required Mplus and R inputs (e.g., data objects, model outputs, SVALUES, BCH weights),

  3. the corresponding Mplus or R syntax, and

  4. the diagnostic checks needed to confirm correct estimation.


Workflow Overview

The full workflow is summarized below. Each phase corresponds directly to the steps outlined in the manuscript, but the focus here is on reproducible implementation and inspection of intermediate outputs.

Phase Step(s) from Manuscript What You Do Output / Purpose
0. Data Setup and Preparation Load packages, import LSAY data, inspect indicator coding Data ready for enumeration
1. Enumeration 1.1 Latent class enumeration at each wave (1–6 classes) Determine K for each wave; check plausibility of aligning classes
Estimate chosen K-class LCAs separately at each wave using OPTSEED + SVALUES and plot item response probabilties Establish consistent class ordering and evaluate class similarity across time to verify that classes are conceptually similar.
2. Testing for Measurement Invariance 2.1 Joint configural model (no transitions) Evaluate whether classes can be aligned across waves
2.2 Joint full measurement invariance (no transitions) Test equality of item-class parameters; assess need for partial MI
2.3 Compare model fit to the unconstrained configural model using an LRT test A tool to statistically compare the two models
2.4 Decide Whether to Proceed with an LTA Under Invariance If the invariant (or partially invariant) model is still substantively interpretable and classes retain the same meaning, proceed to Phase 3.
3. Final LTA – ML 3-Step Step 1 Generate time-specific posteriors from invariant measurement model Posterior class probabilities for each wave
Step 2 Merge posteriors; compute classification-error corrections Data ready for unconditional transitions
Step 3a Estimate unconditional LTA transitions Transition model based on invariant class structure
Step 3b Add covariates/distal outcomes with correction Corrected auxiliary-variable effects
3. Final LTA – BCH Step 1 Estimate invariant LTA; save BCH weights Individual-level BCH weights
Step 2 Extract BCH weights Data ready for unconditional transitions
Step 3a Estimate unconditional LTA transitions Transition model
Step 3b Add covariates/distals using BCH Corrected auxiliary-variable effects

Before estimating the latent class structure, we begin by loading the necessary packages and preparing the dataset to ensure the indicators and sample structure are ready for LTA.


Data Setup and Preparation

Purpose

Before beginning, we prepare the LSAY dataset for all subsequent stages of the LTA workflow. This includes loading required packages, defining the project root, importing the dataset, converting special codes to missing values, and verifying that indicators and covariates are correctly formatted. These checks ensure that enumeration, measurement-invariance testing, and multi-step auxiliary-variable models can be estimated without data-related errors.

Load Required Packages

# Installation required when using Mplus version 8.6
# devtools::install_github("michaelhallquist/MplusAutomation")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("rhdf5")

library(MplusAutomation)
library(rhdf5)
library(tidyverse)   
library(haven)
library(here)            
library(glue)            
library(janitor)            
library(gt) 
library(naniar)
library(psych)
library(modelsummary)
library(cowplot)
library(patchwork)

Set Working Directory

here::i_am("lta_mi_workflow.Rmd")

Using here() finds your project’s files, based on the current working directory at the time when the package is loaded.


Import LSAY Data

lsay_data <- read_csv(here("data", "lsay_data.csv"), na = c("9999", "-99", "-98", "-96")) %>%
  mutate(across(everything(), as.numeric))

#summary(lsay_data)
#describe(lsay_data)

The dataset is imported with LSAY missing-value codes converted to NA, and all variables are coerced to numeric. Binary indicators used for LCA/LTA must be coded 0/1, so this step ensures that item formatting is correct before enumeration.


Descriptive Statistics

These summaries confirm (a) valid 0/1 coding for all indicators, (b) non-zero variance at each wave, and (c) acceptable levels of missingness for variables used later in auxiliary-variable models. Only distributional confirmation is needed here; no substantive interpretation is included in the supplement.


Indicators

# Define survey questions
all_questions <- c(
  "AB39A", "AB39H", "AB39I", "AB39K", "AB39L", "AB39M", "AB39T", "AB39U", "AB39W", "AB39X", # 7th grade
  "GA32A", "GA32H", "GA32I", "GA32K", "GA32L", "GA33A", "GA33H", "GA33I", "GA33K", "GA33L", # 10th grade
  "KA46A", "KA46H", "KA46I", "KA46K", "KA46L", "KA47A", "KA47H", "KA47I", "KA47K", "KA47L"  # 12th grade
)

# Function to compute stats (count, mean, SD, range)
compute_stats <- function(data, question, grade, question_name) {
  data %>%
    summarise(
      Grade = grade,
      Count = sum(!is.na(.data[[question]])),
      Mean = mean(.data[[question]], na.rm = TRUE),
      SD = sd(.data[[question]], na.rm = TRUE),
      Min = min(.data[[question]], na.rm = TRUE),
      Max = max(.data[[question]], na.rm = TRUE)
    ) %>%
    mutate(Question = question_name)
}

# Define question names and mappings
table_setup <- tibble(
  question_code = all_questions,
  grade = rep(c(7, 10, 12), each = 10),
  question_name = rep(
    c(
      "I enjoy math",
      "Math is useful in everyday problems",
      "Math helps a person think logically",
      "It is important to know math to get a good job",
      "I will use math in many ways as an adult",
      "I enjoy science",
      "Science is useful in everyday problems",
      "Science helps a person think logically",
      "It is important to know science to get a good job",
      "I will use science in many ways as an adult"
    ),
    times = 3
  )
)

# Compute stats for all questions
table1_data <- pmap_dfr(
  list(table_setup$question_code, table_setup$grade, table_setup$question_name),
  ~compute_stats(lsay_data, ..1, ..2, ..3)
) %>%
  mutate(
    Mean = round(Mean, 2),
    SD = round(SD, 2),
    Min = round(Min, 2),
    Max = round(Max, 2)
  ) %>%
  arrange(match(Question, table_setup$question_name), Grade) %>%
  select(Question, Grade, Count, Mean, SD, Min, Max)

# Build table
table1_gt <- table1_data %>%
  gt(groupname_col = "Question") %>%
  tab_header(
    title = "Table 1. Descriptive Statistics for Mathematics and Science Attitudinal Survey Items Included in Analyses"
  ) %>%
  cols_label(
    Grade = "Grade",
    Count = "N",
    Mean = "Prop.",
    SD = "SD",
    Min = "Min",
    Max = "Max"
  ) %>%
  fmt_number(
    columns = c(Mean, SD),
    decimals = 2
  )

# Show table
table1_gt
Table 1. Descriptive Statistics for Mathematics and Science Attitudinal Survey Items Included in Analyses
Grade N Prop. SD Min Max
I enjoy math
7 1882 0.69 0.46 0 1
10 1532 0.63 0.48 0 1
12 1120 0.57 0.50 0 1
Math is useful in everyday problems
7 1851 0.72 0.45 0 1
10 1520 0.65 0.48 0 1
12 1111 0.67 0.47 0 1
Math helps a person think logically
7 1847 0.65 0.48 0 1
10 1517 0.69 0.46 0 1
12 1108 0.71 0.45 0 1
It is important to know math to get a good job
7 1854 0.77 0.42 0 1
10 1517 0.68 0.47 0 1
12 1105 0.61 0.49 0 1
I will use math in many ways as an adult
7 1857 0.75 0.43 0 1
10 1525 0.65 0.48 0 1
12 1104 0.65 0.48 0 1
I enjoy science
7 1873 0.62 0.48 0 1
10 1526 0.59 0.49 0 1
12 1105 0.55 0.50 0 1
Science is useful in everyday problems
7 1840 0.40 0.49 0 1
10 1516 0.43 0.50 0 1
12 1099 0.48 0.50 0 1
Science helps a person think logically
7 1850 0.49 0.50 0 1
10 1516 0.53 0.50 0 1
12 1100 0.56 0.50 0 1
It is important to know science to get a good job
7 1857 0.40 0.49 0 1
10 1518 0.43 0.50 0 1
12 1099 0.38 0.49 0 1
I will use science in many ways as an adult
7 1873 0.46 0.50 0 1
10 1524 0.42 0.49 0 1
12 1103 0.44 0.50 0 1

Data Verification Summary

All attitudinal indicators fall within the expected 0–1 range and show adequate variability across Grades 7, 10, and 12. No recoding or item removal is required before proceeding to enumeration.


Covariates

# Define covariates
covariates <- c(
  "MINORITY", "FEMALE"
)

# Function to compute stats (count, mean, SD, range, % missing)
compute_stats <- function(data, question, question_name) {
  total_n <- nrow(data)
  missing_n <- sum(is.na(data[[question]]))
  percent_missing <- (missing_n / total_n) * 100
  
  data %>%
    summarise(
      Count = sum(!is.na(.data[[question]])),
      Mean = mean(.data[[question]], na.rm = TRUE),
      SD = sd(.data[[question]], na.rm = TRUE),
      Min = min(.data[[question]], na.rm = TRUE),
      Max = max(.data[[question]], na.rm = TRUE)
    ) %>%
    mutate(
      Question = question_name,
      PercentMissing = round(percent_missing, 2)
    )
}

# Define question names and mappings
table_setup <- tibble(
  question_code = covariates,
  question_name = c(
    "Minority",
    "Gender"
  )
)

# Compute stats for all questions
table1_data <- pmap_dfr(
  list(table_setup$question_code, table_setup$question_name),
  ~compute_stats(lsay_data, ..1, ..2)
) %>%
  mutate(
    Mean = round(Mean, 2),
    SD = round(SD, 2),
    Min = round(Min, 2),
    Max = round(Max, 2)
  ) %>%
  select(Question, Count, PercentMissing, Mean, SD, Min, Max)

# Build table
table1_gt <- table1_data %>%
  gt() %>%
  tab_header(
    title = "Table 1. Descriptive Statistics for Covariates Included in Analyses"
  ) %>%
  cols_label(
    Count = "N",
    PercentMissing = "% Missing",
    Mean = "Mean or Proportion",
    SD = "SD",
    Min = "Min",
    Max = "Max"
  ) %>%
  fmt_number(
    columns = c(PercentMissing, Mean, SD, Min, Max),
    decimals = 2
  )

# Show table
table1_gt
Table 1. Descriptive Statistics for Covariates Included in Analyses
Question N % Missing Mean or Proportion SD Min Max
Minority 1824 4.60 0.17 0.38 0.00 1.00
Gender 1912 0.00 0.51 0.50 0.00 1.00

Data Verification Summary

Covariates show valid ranges, expected missingness patterns, and sufficient variability for use in auxiliary-variable models. All covariates can be included as planned under full-information maximum likelihood.


Phase 1: Latent Class Enumeration at Each Timepoint (Exploratory Stage)

Purpose

Phase 1 evaluates the number of latent classes at each timepoint (Grades 7, 10, and 12) using independent latent class analyses. Enumeration is conducted separately at each wave to determine the number of classes supported by the data without imposing any longitudinal constraints. This step establishes whether a common class structure is empirically plausible across time, which is a prerequisite for evaluating longitudinal measurement invariance in Phase 2.

What This Phase Produces

Phase 1 produces the following diagnostic outputs:

  • 1–6 class model solutions at each wave

  • Model-fit indices (BIC, aBIC, CAIC, AWE)

  • Likelihood-ratio tests (TECH11, TECH14)

  • Convergence diagnostics and replication rates

  • Estimated class proportions

  • Final selected four-class solutions at each wave

  • Item probability plots for all three selected four-class solutions


Workflow Overview

This phase proceeds in three diagnostic stages:

  1. Enumeration of 1–6 class models at each wave

  2. Extraction and comparison of model-fit diagnostics

  3. Independent re-estimation of the selected four-class solution at each wave with fixed class ordering

Each stage serves as a structural validation step prior to measurement-invariance testing.


Time 1 Enumeration (Grade 7)

Syntax Walkthrough – Enumeration at All Three Timepoints

This syntax uses mplusObject() and mplusModeler() from the MplusAutomation package to programmatically generate the full set of 1–6 class LCA models at all three waves (Grade 7, Grade 10, Grade 12). Each wave uses the same enumeration structure, differing only in the indicator set used and the folder where results are saved.

Key elements:

  • lapply(1:6, …) loops over class sizes k = 1…6 so that each wave estimates six models with identical specifications.

  • mplusObject()builds a complete Mplus input file for each iteration, allowing R to write the .inp automatically without manual editing.

  • For each wave, the VARIABLE block assigns that wave’s ten attitudinal indicators to both categorical and usevar, which instructs Mplus to treat them as ordered categorical and restrict estimation to those indicators.

  • The line classes = c({k}); dynamically inserts the current class number so that each iteration creates the proper 1-class, 2-class, … up to 6-class solution.

  • The ANALYSIS block includes:

    • estimator = mlr for robust ML estimation.

    • type = mixture specifying latent class analysis.

    • starts = 500 100, meaning Mplus begins estimation from 500 random starting value sets and refines the best 100. This approach substantially reduces the likelihood of converging to a poor local optimum and helps ensure the best-fitting solution is recovered for each k.

    • processors = 12 enabling parallel computation where available.

  • The OUTPUT block requests enumeration diagnostics, including TECH11 and TECH14 likelihood-ratio tests, sample statistics, and residuals.

  • mplusModeler() writes the .inp, exports the .dat file, calls Mplus, and returns the results to R, allowing automated batch fitting of all enumeration models.

t1_enum <- lapply(1:6, function(k) { 
  enum_t1  <- mplusObject(                 
    
   TITLE = glue("Class {k} Time 1"), 
  
   VARIABLE = glue( 
    "categorical = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
     usevar = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
     
     classes = c({k});"),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 500 100;           
    processors = 12;",
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

enum_t1_fit <- mplusModeler(enum_t1,
                 dataout=here("phase_1", "t1", "t1.dat"), 
                 modelout=glue(here("phase_1", "t1", "c{k}_lca_t1.inp")),
                 check=TRUE, run = TRUE, hashfilename = FALSE)
})

The first enumeration evaluates 1–6 class models for the ten Grade 7 indicators. Diagnostics include IC trends (BIC/aBIC), TECH11/TECH14 tests, model convergence, and estimated class proportions.

Time 2 Enumeration (Grade 10)

t2_enum <- lapply(1:6, function(k) { 
  enum_t2  <- mplusObject(                 
    
   TITLE = glue("Class {k} Time 2"), 
  
   VARIABLE = glue( 
    "categorical = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
     usevar = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
     
     classes = c({k});"),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 500 100;           
    processors = 12;",
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

enum_t2_fit <- mplusModeler(enum_t2,
                 dataout=here("phase_1", "t2", "t2.dat"), 
                 modelout=glue(here("phase_1", "t2", "c{k}_lca_t2.inp")),
                 check=TRUE, run = TRUE, hashfilename = FALSE)
})

The same enumeration procedure is applied to the Grade 10 indicators. Estimating these models independently allows direct comparison of class recovery and structure relative to Time 1.

Time 3 Enumeration (Grade 12)

t3_enum <- lapply(1:6, function(k) { 
  enum_t3  <- mplusObject(                 
    
   TITLE = glue("Class {k} Time 3"), 
  
   VARIABLE = glue( 
    "categorical = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
     usevar = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
     
     classes = c({k});"),
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 500 100;           
    processors = 12;",
  
  OUTPUT = "sampstat residual tech11 tech14;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

enum_t3_fit <- mplusModeler(enum_t3,
                 dataout=here("phase_1", "t3", "t3.dat"), 
                 modelout=glue(here("phase_1", "t3", "c{k}_lca_t3.inp")),
                 check=TRUE, run = TRUE, hashfilename = FALSE)
})

Enumeration is completed for Grade 12. Agreement in class number and stability across all three waves provides initial empirical support for considering a longitudinal model.


Extracting and Summarizing Model Fit

After enumeration, Mplus output files are programmatically parsed to extract fit indices and model diagnostics for all 1–6 class models at each wave. These summaries provide the empirical basis for class-number selection and stability assessment.

Syntax Walkthrough: Model-Fit Extraction (Time 1, Time 2, Time 3)

This syntax automates the extraction of model-fit information from all enumeration models at each wave (Grade 7, Grade 10, Grade 12). The same structure is repeated for each timepoint so the resulting fit-index tables are fully comparable across waves.

Key elements:

  • source(here("functions", "extract_mplus_info.R")) loads the custom function extract_mplus_info_extended(), which:

    • reads an individual Mplus .out file;

    • parses the model summary;

    • extracts key enumeration statistics including log-likelihood, BIC, aBIC, entropy, TECH11/14 p-values (if present), and sample size;

    • returns those values as a tidy one-row dataframe.
      This lets R pull the relevant fit information directly from Mplus output without manual inspection.

  • source(here("functions","enum_table.R")) loads enum_table_full(), which:

    • takes the list of Mplus models loaded via readModels();

    • extracts class-specific and model-level fit information;

    • formats it into a clean, presentation-ready enumeration summary table.
      This provides the standardized fit-index tables shown in the walkthrough.

  • output_dir <- here("phase_1", "t1") specifies which wave’s enumeration folder to process.

  • list.files() identifies all .out files generated during enumeration for that wave. This ensures the script automatically captures the 1–6 class models.

  • map_dfr(output_files, extract_mplus_info_extended) applies the custom extractor function to every model and stacks the results into a single tidy dataframe (final_data). This dataframe contains all fit indices for that wave in one place.

  • unique(final_data$Sample_Size) verifies the sample size used in the enumeration models.

  • readModels() loads all Mplus model objects from the wave directory so the full output—thresholds, summaries, and statistics—is available in R.

  • enum_table_full() formats the full set of models into a publication-quality enumeration table summarizing fit indices across the 1–6 class solutions.

This automated workflow extracts, collates, and displays model-fit information for all enumeration models at each timepoint, removing the need to manually inspect Mplus output and ensuring consistent reporting across waves.


Time 1

source(here("functions", "extract_mplus_info.R"))
source(here("functions","enum_table.R"))

# Define the directory where all of the .out files are located.
output_dir <- here("phase_1","t1")

# Get all .out files
output_files <- list.files(output_dir, pattern = "\\.out$", full.names = TRUE)

# Process all .out files into one dataframe
final_data <- map_dfr(output_files, extract_mplus_info_extended)

# Extract Sample_Size from final_data
sample_size <- unique(final_data$Sample_Size)

output_enum_t1 <- readModels(here("phase_1","t1"), quiet = TRUE)
enum_table_full(output_enum_t1)
Model Fit Summary Table1
Classes Par LL % Converged % Replicated
Model Fit Indices
LRTs
Smallest Class
BIC aBIC CAIC AWE VLMR BLRT n (%)
Class 1 Time 1 10 −11,803.43 100% 100% 23,682.28 23,650.51 23,692.28 23,787.70 1886 (100%)
Class 2 Time 1 21 −10,418.76 100% 100% 20,995.91 20,929.19 21,016.91 21,217.29 <.001 <.001 782 (41.5%)
Class 3 Time 1 32 −10,165.87 36% 100% 20,573.10 20,471.44 20,605.10 20,910.45 0.00 <.001 384 (20.4%)
Class 4 Time 1 43 −10,042.97 64% 98% 20,410.26 20,273.64 20,453.26 20,863.57 0.00 <.001 390 (20.7%)
Class 5 Time 1 54 −9,969.24 66% 47% 20,345.76 20,174.20 20,399.76 20,915.04 0.04 <.001 175 (9.3%)
Class 6 Time 1 65 −9,915.15 42% 67% 20,320.54 20,114.04 20,385.54 21,005.78 0.01 <.001 180 (9.5%)
1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability.

IC Plot

Syntax Walkthrough: Information-Criterion (IC) Plots

These sections generate visualizations of BIC/aBIC trends across the 1–6 class models.

Key elements:

  • ic_plot_lca.R defines a custom helper function that extracts information criteria from the readModels() output and produces a line plot across class sizes.

  • The call ic_plot(output_enum_t1) (and the T2/T3 versions) creates wave-specific plots showing improvements in fit as classes increase.

  • These plots help identify diminishing returns (elbows) that guide class selection.

source(here("functions","ic_plot_lca.R"))
ic_plot(output_enum_t1)


Time 2

source(here("functions", "extract_mplus_info.R"))
source(here("functions","enum_table.R"))

# Define the directory where all of the .out files are located.
output_dir <- here("phase_1","t2")

# Get all .out files
output_files <- list.files(output_dir, pattern = "\\.out$", full.names = TRUE)

# Process all .out files into one dataframe
final_data <- map_dfr(output_files, extract_mplus_info_extended)

# Extract Sample_Size from final_data
sample_size <- unique(final_data$Sample_Size)

output_enum_t2 <- readModels(here("phase_1","t2"), quiet = TRUE)
enum_table_full(output_enum_t2)
Model Fit Summary Table1
Classes Par LL % Converged % Replicated
Model Fit Indices
LRTs
Smallest Class
BIC aBIC CAIC AWE VLMR BLRT n (%)
Class 1 Time 2 10 −10,072.93 100% 100% 20,219.21 20,187.44 20,229.21 20,322.56 1534 (100%)
Class 2 Time 2 21 −8,428.38 100% 100% 17,010.82 16,944.10 17,031.82 17,227.86 <.001 <.001 658 (42.9%)
Class 3 Time 2 32 −8,067.61 53% 100% 16,369.96 16,268.31 16,401.96 16,700.70 <.001 <.001 297 (19.4%)
Class 4 Time 2 43 −7,905.53 39% 100% 16,126.50 15,989.90 16,169.50 16,570.93 <.001 <.001 290 (18.9%)
Class 5 Time 2 54 −7,845.44 82% 100% 16,087.01 15,915.46 16,141.01 16,645.13 0.01 <.001 220 (14.3%)
Class 6 Time 2 65 −7,806.99 44% 36% 16,090.79 15,884.30 16,155.79 16,762.61 0.02 <.001 130 (8.5%)
1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability.

IC Plot

source(here("functions","ic_plot_lca.R"))
ic_plot(output_enum_t2)


Time 3

source(here("functions", "extract_mplus_info.R"))
source(here("functions","enum_table.R"))

# Define the directory where all of the .out files are located.
output_dir <- here("phase_1","t3")

# Get all .out files
output_files <- list.files(output_dir, pattern = "\\.out$", full.names = TRUE)

# Process all .out files into one dataframe
final_data <- map_dfr(output_files, extract_mplus_info_extended)

# Extract Sample_Size from final_data
sample_size <- unique(final_data$Sample_Size)

output_enum_t3 <- readModels(here("phase_1","t3"), quiet = TRUE)
enum_table_full(output_enum_t3)
Model Fit Summary Table1
Classes Par LL % Converged % Replicated
Model Fit Indices
LRTs
Smallest Class
BIC aBIC CAIC AWE VLMR BLRT n (%)
Class 1 Time 3 10 −7,349.13 100% 100% 14,768.49 14,736.72 14,778.49 14,868.72 1122 (100%)
Class 2 Time 3 21 −5,976.60 100% 100% 12,100.68 12,033.98 12,121.68 12,311.16 <.001 <.001 534 (47.5%)
Class 3 Time 3 32 −5,670.60 98% 99% 11,565.92 11,464.28 11,597.92 11,886.65 <.001 <.001 203 (18.1%)
Class 4 Time 3 43 −5,543.62 36% 97% 11,389.22 11,252.64 11,432.22 11,820.20 0.00 <.001 219 (19.5%)
Class 5 Time 3 54 −5,483.67 63% 62% 11,346.57 11,175.06 11,400.57 11,887.81 0.00 <.001 131 (11.7%)
Class 6 Time 3 65 −5,444.06 42% 64% 11,344.60 11,138.14 11,409.60 11,996.08 0.09 <.001 81 (7.2%)
1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability.

IC Plot

source(here("functions","ic_plot_lca.R"))
ic_plot(output_enum_t3)

Diagnostic Summary: Enumeration Fit

  • For Time 1, model-fit indices improved through the four-class solution, with stable convergence and acceptable class proportions.

  • For Time 2 and Time 3, information criteria again declined through the four-class solution and stabilized thereafter.

  • Convergence diagnostics and class-size distributions remained adequate at all three waves for the four-class solution.

Together, these diagnostics support selecting four classes at each timepoint for all subsequent phases.


1.5 Plotting Class Probabilities

Once the four-class solution is selected at each wave, class-specific response probabilities are visualized as an additional diagnostic check on class distinctiveness and shape similarity.

Syntax Walkthrough: Class Probabilitiy Plots

These plots visualize the conditional response patterns for the selected class solution at each wave.

Key elements:

  • plot_lca() is a custom plotting function sourced from your functions/ directory that takes a readModels() output object and generates class-specific probabilities

  • The models c4_lca_t1.out, c4_lca_t2.out, and c4_lca_t3.out are selected manually because enumeration suggested that 4 classes were optimal.

  • (plot_lca(t1) | plot_lca(t2) | plot_lca(t3)) uses the patchwork operator (|) to place the three probability plots side-by-side.

  • These plots allow researchers to visually inspect:

    • class distinctiveness,

    • similarity of class shapes across waves,

    • and the feasibility of linking classes over time.

This visualization is essential before moving to Phase 2 (invariance testing).


Plot LCA:

source(here("functions","plot_lca.R"))

step1_t1 <- readModels(here("phase_1","t1"))
step1_t2 <- readModels(here("phase_1","t2"))
step1_t3 <- readModels(here("phase_1","t3"))

t1 <- step1_t1$c4_lca_t1.out
t2 <- step1_t2$c4_lca_t2.out
t3 <- step1_t3$c4_lca_t3.out


(plot_lca(t1) | plot_lca(t2) | plot_lca(t3))

Class labels used here:
Class 1: Very Positive, Class 2: Qualified Positive, Class 3: Neutral, Class 4: Less Positive

Diagnostic Summary: Class Consistency

The four-class probability plots confirm that:

  • Each wave yields four clearly differentiated classes

  • No class displays extreme degeneration or redundancy

  • Classes are sufficiently stable across waves

  • No structural inconsistencies are observed

These diagnostics indicate that the same four-class structure is empirically recoverable across Grades 7, 10, and 12.


1.6 Estimate LCAs Independently at Each Time Point to Reorder Classes

In this step, the selected four-class solution is re-estimated independently at each wave with:

  • Fixed number of classes (four)

  • Optimized class ordering using optseed

  • Supplied svalues() to enforce consistent class numbering across waves

  • Random starts disabled (starts = 0)

This step does not change the class structure. Its sole purpose is to ensure stable class labeling across waves prior to joint longitudinal modeling.

Syntax Walkthrough: Independent LCAs at Each Time Point (Time 1, Time 2, Time 3)

These syntax blocks estimate the selected four-class model separately at each wave. Because the basic LCA structure is identical to the enumeration syntax used in Phase 1, we rely on the earlier walkthrough and focus here on what is new.

Key elements:

  • The same MplusAutomation workflow (mplusObject(), mplusModeler()) used in Phase 1 is applied here, but each model now fixes the number of classes at four, based on enumeration decisions.

  • Each wave uses its own set of categorical indicators (AB* for Grade 7, GA* for Grade 10, KA*for Grade 12), paralleling the Phase 1 structure.

  • starts = 0 is used because the enumeration step has already established the class structure. This disables random start searches and forces the model to proceed directly from the provided starting values.

  • optseed = ###### fixes the optimization seed so the class ordering remains stable across waves. This is critical for later longitudinal alignment and reduces risk of label switching.

  • The svalues() option supplies ordered starting values to maintain consistent class numbering (e.g., 1 = Very Positive, 4 = Less Positive).
    These svalues differ across waves because the enumeration results differ; each set is wave-specific.

  • All other syntax elements (VARIABLE, ANALYSIS, OUTPUT, data export, Mplus execution) operate exactly as described in the Phase 1 walkthrough.

Together, these steps generate three wave-specific four-class models that serve as the baseline for evaluating measurement invariance in the next phase.

Time 1 (Reorder)

lca_t1  <- mplusObject(                 
    
  TITLE = "Class-3_Time1", 
  
  VARIABLE =
   "categorical = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
         usevar = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
         !useobs = patw2==0; 
    
    classes = c(4);",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;    
    optseed = 937588;",
  
  OUTPUT = "TECH1 TECH8 TECH14 svalues(2 3 4 1);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)


lca_t1_fit <- mplusModeler(lca_t1,
                 dataout=here("phase_1","reordered","t1_lca.dat"), 
                 modelout=here("phase_1","reordered","t1_lca_step1.inp"),
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Time 2 (Reorder)

lca_t2  <- mplusObject(                 
    
  TITLE = "Class-3_Time2", 
  
  VARIABLE =
   "categorical = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
         usevar = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
         !useobs = patw4==0; 
    
    classes = c(4);",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;    
    optseed = 264935;",
  
  OUTPUT = "TECH1 TECH8 TECH14 svalues(3 1 4 2);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)


lca_t2_fit <- mplusModeler(lca_t2,
                 dataout=here("phase_1","reordered","t2_lca.dat"), 
                 modelout=here("phase_1","reordered","t2_lca_step1.inp"),
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Time 3 (Reorder)

lca_t3  <- mplusObject(                 
    
  TITLE = "Class-3_Time3", 
  
  VARIABLE =
   "categorical = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
         usevar = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
         !useobs = patw6==0; 
    
    classes = c(4);",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;    
    optseed = 366706;",
  
  OUTPUT = "TECH1 TECH8 TECH14 svalues(1 4 3 2);",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)


lca_t3_fit <- mplusModeler(lca_t3,
                 dataout=here("phase_1","reordered","t3_lca.dat"), 
                 modelout=here("phase_1","reordered","t3_lca_step1.inp"),
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Diagnostic Class Plots (Post-Reordering)

We then visualize the estimated classes from all three waves in a side-by-side display. Probability plots provide an intuitive check on whether the same substantive patterns emerge across time. Specifically, we look for:

  • similar shapes or relative endorsement levels within classes,

  • consistent ordering of classes across waves

Strong similarity across the three sets of plots suggests that placing equality constraints on item-response probabilities in later phases is plausible.


Syntax Walkthrough: Plotting Independent LCA Probabilities

This plotting code is similar to the enumeration plots in Phase 1, but introduces several elements that require a separate walkthrough.

Key elements:

  • readModels() is used to read the single selected 4-class model for each wave (Time 1, Time 2, Time 3).
    Unlike Phase 1, where readModels() loads an entire folder of c1–c6 models, here we load specific .out files created during the beginning of Phase 1.

  • plot_lca() is a custom plotting function (sourced from functions/plot_lca.R).
    In Phase 1 it was applied once per time point to visualize enumeration candidates; here, it is used to create final probability plots from the selected 4-class models.

  • The syntax extracts each model explicitly:
    t1 <- readModels(“t1_lca_step1.out”)
    t2 <- readModels(“t2_lca_step1.out”)
    t3 <- readModels(“t3_lca_step1.out”)
    This ensures that the plotted probabilities represent the final estimation for each wave—not the enumeration set.

  • The expression (plot_lca(t1) | plot_lca(t2) | plot_lca(t3)) uses the patchwork package to align the three probability plots in a single row.
    This allows direct visual comparison of class shape, ordering, and interpretation across waves.

  • No additional arguments are needed because class labels, indicator names, and formatting are handled internally by plot_lca().

This workflow creates the three-wave class comparison used to assess whether class shapes remain stable enough to justify longitudinal measurement invariance in Phase 2.

source(here("functions","plot_lca.R"))
t1 <- readModels(here("phase_1","reordered","t1_lca_step1.out"))
t2 <- readModels(here("phase_1","reordered","t2_lca_step1.out"))
t3 <- readModels(here("phase_1","reordered","t3_lca_step1.out"))


(plot_lca(t1) | plot_lca(t2) | plot_lca(t3))

Diagnostic Summary: Pre–Measurement-Invariance Validation

  • A four-class solution is consistently recoverable at Grades 7, 10, and 12.

  • Estimated classes remain structurally comparable across waves following independent reordering.

  • No classes exhibits redundancy or instability after enforcing consistent class labels.

  • Class proportions remain substantively viable at all three waves.

These diagnostics confirm that a longitudinal measurement-invariance model is empirically defensible. No model adjustments are required prior to proceeding to Phase 2.

Completion of Phase 1

Phase 1 confirms that a stable four-class measurement structure is recoverable independently at each timepoint and that classes are sufficiently consistent to support formal evaluation of longitudinal measurement invariance. With enumeration and class ordering validated, the workflow proceeds to Phase 2, where joint configural and invariant latent class models are estimated.


Phase 2: Testing for Measurement Invariance

Purpose

Phase 2 evaluates whether the same latent class measurement model can be meaningfully applied across Grades 7, 10, and 12. This is accomplished by comparing:

  • a non-invariant joint model, in which all item–class thresholds are freely estimated at each wave, and

  • a fully invariant joint model, in which all item–class thresholds are constrained to be equal across waves.

This phase determines whether a common measurement structure is empirically defensible across time. The outcome of this phase governs whether full invariance, partial invariance, or non-invariance must be carried forward into auxiliary-variable modeling.

Important Clarification on the Role of Measurement Invariance in LTA

Establishing longitudinal measurement invariance is not a prerequisite for estimating a latent transition model itself. An LTA model can always be estimated under a non-invariant measurement structure, in which item–class thresholds are freely estimated at each wave. In such cases, transitions are interpreted with respect to time-specific latent classes, rather than assuming stable class meaning across time.

Measurement invariance becomes essential only when the analytic goal requires:

  • interpreting transitions as stability and movement among the same latent constructs over time, and

If full invariance is not supported in this phase, researchers may:

  • proceed with a non-invariant LTA using freely estimated thresholds, or

  • pursue a partial invariance solution by selectively freeing non-invariant item parameters.

The workflow presented here focuses on the invariant-measurement pathway because we focus on LTA with invariance. Full invariance is not needed to estimate and LTA model.

What This Phase Produces

Phase 2 produces the following diagnostic outputs:

  • A joint non-invariant LTA measurement model

  • A joint fully invariant LTA measurement model

  • Visual comparison of invariant vs. non-invariant latent class models

  • A Satorra–Bentler scaled χ² nested difference test

  • A formal decision regarding:

    • full invariance,

    • partial invariance, or

    • non-invariance

These outputs establish whether longitudinal transitions can be modeled under a constrained measurement structure.


Workflow Overview

This phase proceeds in three validation steps:

  1. Estimate a non-invariant joint configural model (baseline)

  2. Estimate a fully constrained invariant measurement model

  3. Formally evaluate invariance using nested model testing and diagnostics


Estimate the Non-Invariant Joint Configural Model (No Transitions)

The non-invariant joint model serves as the unconstrained baseline for evaluating measurement invariance. In this model:

  • Three latent class variables (one per wave) are estimated jointly.

  • All item–class thresholds are freely estimated across time.

  • No equality constraints are imposed.

  • No transition regressions are specified.

This model provides the parent likelihood required for nested model comparison with the invariant specification.

Syntax Walkthrough: Non-Invariant LTA Model

This model introduces the first joint longitudinal specification in the tutorial. Because the basic MplusAutomation structure (mplusObject(), mplusModeler()) was already described in Phases 1–2, this walkthrough focuses only on elements that are new or unique to the non-invariant LTA model.

Key elements:
• Three latent class variables are defined simultaneously using classes = c1(4) c2(4) c3(4);, one for each wave.
• The usev and categorical fields now contain all indicators from Grades 7, 10, and 12, reflecting the joint estimation across waves.
• The MODEL blocks use syntax such as [AB39A$1–AB39X$1] to freely estimate thresholds for each indicator across all classes within a wave.
• No equality constraints appear anywhere in the MODEL block, which is what makes this the fully non-invariant baseline model.
• The ANALYSIS block uses starts = 1000 200 to increase the likelihood of reaching the global solution. Joint LTA models are more prone to local maxima than single-wave LCAs, so the high number of random and final starts is necessary.
• All other workflow components (VARIABLE structure, Mplus syntax generation, model export, and execution) follow the same logic already covered in Phase 1.

This model serves as the unconstrained parent model required for Satorra–Bentler nested comparison testing.

lta_non_inv <- mplusObject(
  
  TITLE = 
    "Non-Invariant LTA Model", 
  
  VARIABLE = 
     "idvariable=CASENUM;
     
      usev =        
      AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
      GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
      KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;

      categorical = 
      AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
      GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
      KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;

      classes = c1(4) c2(4) c3(4);",
    
  ANALYSIS = 
     "estimator = mlr;
      type = mixture;
      starts = 500 200;",

  MODEL = 
     "%overall%

  MODEL c1:
     %c1#1%
     [AB39A$1-AB39X$1];
     %c1#2%
     [AB39A$1-AB39X$1];
     %c1#3%
     [AB39A$1-AB39X$1];
     %c1#4%
     [AB39A$1-AB39X$1];
     
  MODEL c2:
     %c2#1%
     [GA32A$1-GA33L$1];
     %c2#2%
     [GA32A$1-GA33L$1];
     %c2#3%
     [GA32A$1-GA33L$1];
     %c2#4%
     [GA32A$1-GA33L$1];
     
  MODEL c3:
     %c3#1%
     [KA46A$1-KA47L$1];
     %c3#2%
     [KA46A$1-KA47L$1];
     %c3#3%
     [KA46A$1-KA47L$1];
     %c3#4%
     [KA46A$1-KA47L$1];",

  OUTPUT = "svalues;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lta_non_inv_fit <- mplusModeler(lta_non_inv,
                     dataout=here("phase_2", "lta.dat"),
                     modelout=here("phase_2", "noninvariant_lta.inp"),
                     check=TRUE, run = TRUE, hashfilename = FALSE)

After estimation, this model is retained strictly as the reference model for nested testing.


Estimate the Full Measurement-Invariance Model

The invariant model imposes full longitudinal measurement invariance by fixing all item–class thresholds to identical values across Grades 7, 10, and 12.

Under this model:

  • All wave-specific measurement parameters are constrained.

  • Only structural parameters (class proportions and transitions, when added later) remain free.

  • Because all thresholds are fixed, random-start searches are disabled (starts = 0).

This model nests within the non-invariant model and provides the constrained target for formal invariance testing.

Syntax Walkthrough: Invariant LTA Model

This model imposes full measurement invariance by fixing all item thresholds to values obtained from the invariant single-wave measurement model.

Key elements:
• Thresholds are fixed using statements such as ab39a$1*-0.88317, one per item and per class, within each wave-specific MODEL block.
• The structure mirrors the non-invariant syntax, but all free thresholds are replaced with fixed values.
• Because measurement parameters are fixed, starts = 0 is used in the ANALYSIS block—there is no need for random start searches.
• The invariant model estimates only structural parameters: class proportions at each wave and transition probabilities (c2 ON c1, c3 ON c2 in later models).
• Because this model nestles inside the non-invariant model (same structure, fewer free parameters), it can be directly compared using the Satorra–Bentler test.

This model provides the required nested structure for evaluating whether full measurement invariance is plausible.

lta_inv <- mplusObject(
  
  TITLE = 
    "Invariant LTA Model", 
  
  VARIABLE = 
     "idvariable=CASENUM;
     
      usev =        
      AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
      GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
      KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;

      categorical = 
      AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
      GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
      KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;

      classes = c1(4) c2(4) c3(4);",
    
  ANALYSIS = 
     "estimator = mlr;
      type = mixture;
      starts =  0;
      processors=10;",

  MODEL = 
     "
     %overall%
     
  MODEL c1:
     %c1#1%
     [AB39A$1-AB39X$1](1-10);
     %c1#2%
     [AB39A$1-AB39X$1](11-20);
     %c1#3%
     [AB39A$1-AB39X$1](21-30);
     %c1#4%
     [AB39A$1-AB39X$1](31-40);
     
  MODEL c2:
     %c2#1%
     [GA32A$1-GA33L$1](1-10);
     %c2#2%
     [GA32A$1-GA33L$1](11-20);
     %c2#3%
     [GA32A$1-GA33L$1](21-30);
     %c2#4%
     [GA32A$1-GA33L$1](31-40);
     
  MODEL c3:
     %c3#1%
     [KA46A$1-KA47L$1](1-10);
     %c3#2%
     [KA46A$1-KA47L$1](11-20);
     %c3#3%
     [KA46A$1-KA47L$1](21-30);
     %c3#4%
     [KA46A$1-KA47L$1](31-40);",

  OUTPUT = "svalues;", 
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

lta_inv_fit <- mplusModeler(lta_inv,
                     dataout=here("phase_2", "lta.dat"),
                     modelout=here("phase_2", "invariant_lta.inp"),
                     check=TRUE, run = TRUE, hashfilename = FALSE)

This model yields the measurement structure used for all downstream auxiliary-variable analyses if invariance is retained.


Plotting the Non-Invariant and Invariant Models

Class probability plots are generated for both the non-invariant and invariant solutions to support visual diagnostic evaluation of the invariance constraints.

The plots are used to verify:

  • preservation of overall class shape,

  • absence of class collapse or reversals,

  • stability of class ordering, and

  • absence of constraint-induced distortion.

Syntax Walkthrough: Plotting Non-Invariant and Invariant Models

Key elements:
readModels() loads both .out files from the Phase 2 directory.
plot_lta() (from your functions/ directory) is then used to generate class probability plots for each model.
• Calling plot_lta(inv_mod$invariant.lta.out) displays invariant classes; plot_lta(inv_mod$non.invariant.lta.out) displays the non-invariant ones.
• This visualization allows quick assessment of whether constraining item thresholds alters class shapes in a meaningful way.

These plots support the conceptual justification for moving forward with invariance—or identifying where partial invariance might be required.

source(here("functions","plot_lta.R"))

inv_mod <- readModels(here("phase_2"), quiet = TRUE)

plot_lta(inv_mod$invariant_lta.out)

plot_lta(inv_mod$noninvariant_lta.out)

Diagnostic Summary: Class Stability Under Invariance Constraints

  • The invariant model reproduces the same four-class structural configuration as the non-invariant model.

  • No meaningful differences under invariance constraints.

  • The constrained classes remain visually interpretable and structurally aligned with the unconstrained solution.

  • Minor item-level deviations observed in the non-invariant model do not change the interpretation of the classes. These diagnostics indicate that imposing full longitudinal measurement invariance does not compromise class recoverability or structural interpretability.


Nested Model Testing: Satorra–Bentler Scaled χ² Difference Test

A Satorra–Bentler scaled chi-square difference test is used as one way to compare nested model. The Satorra-Bentler adjustment is needed because models were estimated using ML with robust standard errors (MLR):

  • the fully invariant model (nested), and

  • the non-invariant model (comparison).

This test evaluates whether the invariance constraints introduce statistically meaningful degradation in model fit.

Syntax Walkthrough: Satorra–Bentler Scaled χ² Difference Test

This section computes the adjusted chi-square difference for nested model comparison between the invariant and non-invariant LTA models.

Key elements:
readModels() extracts log-likelihoods (LL), number of parameters (Parameters), and scaling correction factors (LLCorrectionFactor) for each model.
• The likelihood-ratio term is computed using lr = -2*(L0 - L1).
• The adjusted scaling factor is computed using cd = ((p0*c0) - (p1*c1)) / (p0 - p1).
• The Satorra–Bentler test statistic is then TRd = lr / cd.
• Degrees of freedom are computed as abs(p0 - p1).
• The final p-value is obtained via pchisq(TRd, df, lower.tail = FALSE).

A non-significant p-value indicates that the invariant model does not fit significantly worse than the non-invariant model, supporting the decision to carry invariant measurement parameters forward into auxiliary-variable modeling.

# *0 = null or nested model & *1 = comparison  or parent model
lta_models <- readModels(here("phase_2"), quiet = TRUE)

# Log Likelihood Values
L0 <- lta_models$invariant_lta.out$summaries$LL
L1 <- lta_models$noninvariant_lta.out$summaries$LL

# LRT equation
lr <- -2*(L0-L1) 

# Parameters
p0 <- lta_models$invariant_lta.out$summaries$Parameters
p1 <- lta_models$noninvariant_lta.out$summaries$Parameters

# Scaling Correction Factors
c0 <- lta_models$invariant_lta.out$summaries$LLCorrectionFactor
c1 <- lta_models$noninvariant_lta.out$summaries$LLCorrectionFactor

# Difference Test Scaling correction
cd <- ((p0*c0)-(p1*c1))/(p0-p1)

# Chi-square difference test(TRd)
TRd <- (lr)/(cd)

# Degrees of freedom
df <- abs(p0 - p1)

# Significance test
(p_diff <- pchisq(TRd, df, lower.tail=FALSE))
## [1] 2.294394e-40

Diagnostic Summary: Nested Model Test of Invariance

  • The Satorra–Bentler scaled difference test indicates a statistically significant likelihood difference between the non-invariant and invariant models. This implies that, based on this test, there is not support of the fully invariant model.

  • Given the large sample size, this result is not surprising and reflects the high sensitivity of likelihood-based tests to even trivial deviations from invariance.

  • Visual class diagnostics and structural stability assessments indicate that the invariant model preserves the same four-class configuration without substantive distortion.

Taken together, despite the significant LL tests, the statistical and structural diagnostics support retention of the fully invariant model for longitudinal transition modeling.

Alternatively: Nested Model Testing Using compareModels in MplusAutomation

Nested model comparisons can also be conducted directly through the compareModels() function in MplusAutomation, which automates the extraction of likelihood values, parameters, and scaling correction factors needed for robust chi-square difference testing under MLR estimation.

invisible(capture.output(
  comparison <- compareModels(lta_models$invariant_lta.out, lta_models$noninvariant_lta.out, diffTest=TRUE)
))

# p-value
comparison$diffTest$MLR_LL$p
## [1] 2.294394e-40
#BIC
comparison$summaries
##                     Title Observations Estimator Parameters        LL      AIC
## 1     Invariant LTA Model         1907       MLR         49 -23700.74 47499.48
## 2 Non-Invariant LTA Model         1907       MLR        129 -23492.12 47242.24
##        BIC
## 1 47771.59
## 2 47958.62

Diagnostic Summary: compareModels

  • Results from compareModels() indicate that the scaled chi-square difference between the non-invariant and invariant models is statistically significant (p < .001)

  • However, the BIC supports the fully invariant model.

  • These results are shown in the first set of output.

Completion of Phase 2

Phase 2 evaluates whether a common longitudinal measurement structure can be defensibly imposed across Grades 7, 10, and 12. Visual diagnostics and nested model testing jointly indicate that a fully invariant four-class measurement model is supported and full invariance does not cause class distortions or instability. With longitudinal measurement invariance validated for the purposes of auxiliary-variable modeling, the workflow proceeds to Phase 3, where invariant measurement parameters are carried forward to generate classification-error–corrected posterior probabilities and implement the ML three-step and BCH auxiliary-variable procedures.


Phase 3: Apply Multi-Step Estimation Methods


ML Three-Step Procedure

The ML three-step method is a model-based correction procedure that allows auxiliary variables to be incorporated into latent class and latent transition models while accounting for classification uncertainty. The method proceeds by:

  1. generating posterior class membership under a fixed measurement model,

  2. translating classification uncertainty into fixed logits, and

  3. estimating structural relationships while holding the measurement model constant.

This approach ensures that auxiliary-variable effects are estimated relative to stable, invariant latent classes.


Purpose

Phase 3 implements multi-step estimation methods for incorporating auxiliary variables into the latent transition analysis while preserving the invariant measurement structure established in Phase 2. The central objective of this phase is to estimate:

  • longitudinal transition dynamics across Grades 7, 10, and 12, and

  • covariate-adjusted effects on class membership and transitions,

without allowing auxiliary variables to distort the latent class definitions.

This phase focuses first on the ML three-step procedure, which corrects for classification error by fixing logits derived from an invariant measurement model. This ensures that auxiliary variables influence the structural components of the model (class membership and transitions), but do not redefine the latent classes themselves.

A parallel BCH-based estimation strategy is introduced later as an alternative auxiliary-variable framework under invariant measurement.


What This Phase Produces

Phase 3 produces the following structural outputs:

  • Fixed-threshold LCAs at each timepoint

  • Posterior class probabilities and modal class assignments under invariant measurement

  • Classification-error logits for all three waves

  • A baseline (unconditional) LTA transition model

  • A covariate-adjusted LTA transition model

  • Longitudinal transition probability plots

  • Covariate effect estimates on class membership and transitions

These outputs collectively establish a classification-error–corrected longitudinal transition model under invariant measurement.


Workflow Overview

This phase proceeds in three structural steps:

  1. Re-estimate wave-specific LCAs with invariant thresholds fixed

  2. Extract classification-error logits and merge posterior data

  3. Estimate unconditional and covariate-adjusted LTAs with fixed logits

Each step transfers information from the invariant measurement model into the longitudinal structural model.


ML STEP 1: Re-Estimate Wave-Specific LCAs With Invariant Thresholds Fixed

In Step 1, each single-wave LCA is re-estimated while fixing all item–class thresholds to the invariant values obtained in Phase 2. These fixed thresholds enforce the invariant measurement model at each wave, ensuring that class meaning is identical across time.

With thresholds fixed:

  • only class proportions are freely estimated, and

  • each model saves:

    • posterior class probabilities (CPROB), and

    • most-likely class assignments (MLCC).

These saved quantities form the basis for computing classification-error logits in Step 2.

Conceptual role of Step 1:
To regenerate posterior classification under invariant measurement conditions so that classification uncertainty can be carried forward into transition modeling.

Syntax Walkthrough: Step 1 – Re-Estimate Wave-Specific LCAs With Fixed Thresholds

In Step 1, each wave’s LCA is re-estimated while fixing all threshold parameters to the invariant values obtained in Phase 2. This enforces the invariant measurement model at each wave and ensures that class meaning is identical across time.

Key elements:
• All thresholds are fixed using invariant values (checked earlier in Phase 2).
• With thresholds fixed, only class proportions are estimated.
• Each model uses starts = 0 because no random-search is needed when parameters are fixed.
• Each model saves posterior information using Save = cprob;, producing variables like CPROB1CPROB4 and MLCC.
• These saved data are required for logits extraction in Step 2.
• The fixed-threshold blocks consist exclusively of fixed statements such as [ab39a$1*-1.53361], ensuring measurement invariance.
• The only difference across waves is in the indicator sets (AB*, GA*, KA*). • OPTIONAL: We have rearranged the class labels in Step 1 by renaming the class specific statements (e.g., %C1#2%)


Time 1: Fixed-Threshold LCA With Saved CPROBS

inv_t1  <- mplusObject(                 
    
  TITLE = "Time 1 - Invariant LCA", 
  
  VARIABLE =
   "idvariable=CASENUM;
    categorical = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X; 
         usevar = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X; 
    
    classes = c1(4);
   
    auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;",
  
  MODEL = 
     "%overall%

     %C1#1%

     [ ab39a$1@-1.53313 ] (1);
     [ ab39h$1@-2.90401 ] (2);
     [ ab39i$1@-2.99575 ] (3);
     [ ab39k$1@-2.83359 ] (4);
     [ ab39l$1@-3.85594 ] (5);
     [ ab39m$1@-2.16372 ] (6);
     [ ab39t$1@-2.13876 ] (7);
     [ ab39u$1@-2.48348 ] (8);
     [ ab39w$1@-1.81515 ] (9);
     [ ab39x$1@-2.35163 ] (10);

     %C1#3%

     [ ab39a$1@-0.06349 ] (11);
     [ ab39h$1@-0.07976 ] (12);
     [ ab39i$1@-0.52495 ] (13);
     [ ab39k$1@-0.01216 ] (14);
     [ ab39l$1@0.25484 ] (15);
     [ ab39m$1@-0.83350 ] (16);
     [ ab39t$1@0.07961 ] (17);
     [ ab39u$1@-0.45274 ] (18);
     [ ab39w$1@0.51840 ] (19);
     [ ab39x$1@0.22569 ] (20);

     %C1#2%

     [ ab39a$1@-0.88292 ] (21);
     [ ab39h$1@-1.67761 ] (22);
     [ ab39i$1@-0.87642 ] (23);
     [ ab39k$1@-1.94409 ] (24);
     [ ab39l$1@-2.31259 ] (25);
     [ ab39m$1@0.42887 ] (26);
     [ ab39t$1@2.29606 ] (27);
     [ ab39u$1@1.02908 ] (28);
     [ ab39w$1@1.97179 ] (29);
     [ ab39x$1@1.80988 ] (30);

     %C1#4%

     [ ab39a$1@0.74744 ] (31);
     [ ab39h$1@2.01287 ] (32);
     [ ab39i$1@1.67957 ] (33);
     [ ab39k$1@1.54221 ] (34);
     [ ab39l$1@2.34965 ] (35);
     [ ab39m$1@1.38217 ] (36);
     [ ab39t$1@3.73153 ] (37);
     [ ab39u$1@3.09927 ] (38);
     [ ab39w$1@3.69495 ] (39);
     [ ab39x$1@3.68803 ] (40);",
  
  OUTPUT = "TECH1 TECH8 TECH14;",
  
  SAVEDATA = 
   "File=t1_inv_cprobs.dat;
    Save=cprob;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)


inv_t1_fit <- mplusModeler(inv_t1,
                 dataout=here("phase_3","step_1_ml", "t1_inv.dat"), 
                 modelout=here("phase_3","step_1_ml", "t1_inv_lca.inp"),
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Time 2: Fixed-Threshold LCA With Saved CPROBS

inv_t2  <- mplusObject(                 
    
  TITLE = "Time 2 - Invariant LCA", 
  
  VARIABLE =
   "idvariable=CASENUM;
    categorical = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
         usevar = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
    
    classes = c2(4);
   
    auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;",
  
  MODEL = 
     "%overall%

     %C2#1%

     [ ga32a$1@-1.53313 ] (1);
     [ ga32h$1@-2.90401 ] (2);
     [ ga32i$1@-2.99575 ] (3);
     [ ga32k$1@-2.83359 ] (4);
     [ ga32l$1@-3.85594 ] (5);
     [ ga33a$1@-2.16372 ] (6);
     [ ga33h$1@-2.13876 ] (7);
     [ ga33i$1@-2.48348 ] (8);
     [ ga33k$1@-1.81515 ] (9);
     [ ga33l$1@-2.35163 ] (10);

     %C2#3%

     [ ga32a$1@-0.06349 ] (11);
     [ ga32h$1@-0.07976 ] (12);
     [ ga32i$1@-0.52495 ] (13);
     [ ga32k$1@-0.01216 ] (14);
     [ ga32l$1@0.25484 ] (15);
     [ ga33a$1@-0.83350 ] (16);
     [ ga33h$1@0.07961 ] (17);
     [ ga33i$1@-0.45274 ] (18);
     [ ga33k$1@0.51840 ] (19);
     [ ga33l$1@0.22569 ] (20);

     %C2#2%

     [ ga32a$1@-0.88292 ] (21);
     [ ga32h$1@-1.67761 ] (22);
     [ ga32i$1@-0.87642 ] (23);
     [ ga32k$1@-1.94409 ] (24);
     [ ga32l$1@-2.31259 ] (25);
     [ ga33a$1@0.42887 ] (26);
     [ ga33h$1@2.29606 ] (27);
     [ ga33i$1@1.02908 ] (28);
     [ ga33k$1@1.97179 ] (29);
     [ ga33l$1@1.80988 ] (30);

     %C2#4%

     [ ga32a$1@0.74744 ] (31);
     [ ga32h$1@2.01287 ] (32);
     [ ga32i$1@1.67957 ] (33);
     [ ga32k$1@1.54221 ] (34);
     [ ga32l$1@2.34965 ] (35);
     [ ga33a$1@1.38217 ] (36);
     [ ga33h$1@3.73153 ] (37);
     [ ga33i$1@3.09927 ] (38);
     [ ga33k$1@3.69495 ] (39);
     [ ga33l$1@3.68803 ] (40);
",
  
  OUTPUT = "TECH1 TECH8 TECH14;",
  
  SAVEDATA = 
   "File=t2_inv_cprobs.dat;
    Save=cprob;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)


inv_t2_fit <- mplusModeler(inv_t2,
                 dataout=here("phase_3","step_1_ml", "t2_inv.dat"), 
                 modelout=here("phase_3","step_1_ml", "t2_inv_lca.inp"),
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Time 3: Fixed-Threshold LCA With Saved CPROBS

inv_t3  <- mplusObject(                 
    
  TITLE = "Time 3 - Invariant LCA", 
  
  VARIABLE =
   "idvariable=CASENUM;
    categorical = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
         usevar = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
    
    classes = c3(4);
   
    auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;   ",
  
    MODEL = 
     "%overall%

     %C3#1%

     [ ka46a$1@-1.53313 ] (1);
     [ ka46h$1@-2.90401 ] (2);
     [ ka46i$1@-2.99575 ] (3);
     [ ka46k$1@-2.83359 ] (4);
     [ ka46l$1@-3.85594 ] (5);
     [ ka47a$1@-2.16372 ] (6);
     [ ka47h$1@-2.13876 ] (7);
     [ ka47i$1@-2.48348 ] (8);
     [ ka47k$1@-1.81515 ] (9);
     [ ka47l$1@-2.35163 ] (10);

     %C3#2%

     [ ka46a$1@-0.06349 ] (11);
     [ ka46h$1@-0.07976 ] (12);
     [ ka46i$1@-0.52495 ] (13);
     [ ka46k$1@-0.01216 ] (14);
     [ ka46l$1@0.25484 ] (15);
     [ ka47a$1@-0.83350 ] (16);
     [ ka47h$1@0.07961 ] (17);
     [ ka47i$1@-0.45274 ] (18);
     [ ka47k$1@0.51840 ] (19);
     [ ka47l$1@0.22569 ] (20);

     %C3#3%

     [ ka46a$1@-0.88292 ] (21);
     [ ka46h$1@-1.67761 ] (22);
     [ ka46i$1@-0.87642 ] (23);
     [ ka46k$1@-1.94409 ] (24);
     [ ka46l$1@-2.31259 ] (25);
     [ ka47a$1@0.42887 ] (26);
     [ ka47h$1@2.29606 ] (27);
     [ ka47i$1@1.02908 ] (28);
     [ ka47k$1@1.97179 ] (29);
     [ ka47l$1@1.80988 ] (30);

     %C3#4%

     [ ka46a$1@0.74744 ] (31);
     [ ka46h$1@2.01287 ] (32);
     [ ka46i$1@1.67957 ] (33);
     [ ka46k$1@1.54221 ] (34);
     [ ka46l$1@2.34965 ] (35);
     [ ka47a$1@1.38217 ] (36);
     [ ka47h$1@3.73153 ] (37);
     [ ka47i$1@3.09927 ] (38);
     [ ka47k$1@3.69495 ] (39);
     [ ka47l$1@3.68803 ] (40);",
  
  OUTPUT = "TECH1 TECH8 TECH14;",
  
  SAVEDATA = 
   "File=t3_inv_cprobs.dat;
    Save=cprob;",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)


inv_t3_fit <- mplusModeler(inv_t3,
                 dataout=here("phase_3","step_1_ml", "t3_inv.dat"), 
                 modelout=here("phase_3","step_1_ml", "t3_inv_lca.inp"),
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Plotting the Fixed-Threshold LCAs

After fitting the fixed-threshold LCAs, class probabilities are visualized to confirm that invariant measurement has been correctly transferred into each wave’s model.

Syntax Walkthrough: Plotting the Fixed-Threshold LCAs

After Step 1 models are estimated, the three invariant LCAs are visualized to confirm that fixing thresholds produces class probabilities that remain aligned across all timepoints.

Key elements:
• Uses plot_lca() applied to each wave’s .out file.
• Side-by-side display checks that class shapes remain consistent with the invariant structure.
• This is a diagnostic step before logits extraction.

source(here("functions","plot_lca.R"))

t1 <- readModels(here("phase_3","step_1_ml","t1_inv_lca.out"))
t2 <- readModels(here("phase_3","step_1_ml","t2_inv_lca.out"))
t3 <- readModels(here("phase_3","step_1_ml","t3_inv_lca.out"))

(plot_lca(t1) | plot_lca(t2) | plot_lca(t3))

Diagnostic Summary: Fixed-Threshold Measurement Verification

  • The fixed-threshold LCAs reproduce the same four-class structure at all three timepoints.

  • Relative ordering of class endorsement remains stable across Grades 7, 10, and 12.

  • Class proportions remain substantively stable under invariant specification.

These diagnostics confirm that fixing invariant thresholds does not compromise class recoverability and that the invariant measurement model is correctly transferred into the three-step framework.


ML STEP 2: Extract Logits and Merge Saved Posterior Data

Step 2 reconstructs the classification-error structure implied by the invariant LCAs and prepares the dataset required for three-step LTA estimation.

Two components are extracted:

  1. Classification-error logits, which encode misclassification uncertainty and will be fixed in all subsequent LTA models.

  2. Posterior and modal classification data, where saved CPROB and MLCC variables from each wave are merged into a single wide dataset (savedata_t123) containing:

    • nominal class indicators (n1, n2, n3), and

    • posterior class probabilities at each timepoint.

Conceptual role of Step 2:
To transfer probabilistic classification uncertainty from the measurement model into the structural transition model without allowing auxiliary variables to distort class definitions.


Syntax Walkthrough: Step 2 – Extract Logits and Merge Posterior Data

Step 2 extracts classification-error logits and merges the saved CPROB and MLCC variables across all waves.

Key elements:
• Logits for each timepoint are extracted from class_counts$logitProbs.mostLikely.
• These logits encode misclassification uncertainty and will be fixed in all later LTA models.
• Saved CPROB and MLCC variables are renamed (e.g., cprob11, n1) for consistency across waves.
• Data are merged by CASENUM into a single wide dataset (savedata_t123).
• Missing values coded as 999 are converted to proper NAs.
• This merged dataset provides the nominal class indicators (n1, n2, n3) required for LTA estimation.


Extract Classification-Error Logits

inv_step4 <- readModels(here("phase_3","step_1_ml"), quiet = TRUE)

logits_t1 <- as.data.frame(inv_step4$t1_inv_lca.out$class_counts$logitProbs.mostLikely)
logits_t2 <- as.data.frame(inv_step4$t2_inv_lca.out$class_counts$logitProbs.mostLikely)
logits_t3 <- as.data.frame(inv_step4$t3_inv_lca.out$class_counts$logitProbs.mostLikely)

Extract and Merge Saved Posterior Probabilities

savedata_t1 <- as.data.frame(inv_step4$t1_inv_lca.out$savedata) %>%
  rename(cprob11=CPROB1,cprob12=CPROB2,cprob13=CPROB3,cprob14=CPROB4,n1=MLCC1)

savedata_t2 <- as.data.frame(inv_step4$t2_inv_lca.out$savedata) %>%
  rename(cprob21=CPROB1,cprob22=CPROB2,cprob23=CPROB3,cprob24=CPROB4,n2=MLCC2)

savedata_t3 <- as.data.frame(inv_step4$t3_inv_lca.out$savedata) %>%
  rename(cprob31=CPROB1,cprob32=CPROB2,cprob33=CPROB3,cprob34=CPROB4,n3=MLCC3)

savedata_t123 <- savedata_t1 %>%
  full_join(savedata_t2, by = "CASENUM") %>%
  full_join(savedata_t3, by = "CASENUM") %>%
  clean_names() %>% relocate(casenum)

summary(savedata_t123)
##     casenum         ab39a            ab39h            ab39i      
##  Min.   :1004   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:2500   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :3973   Median :1.0000   Median :1.0000   Median :1.000  
##  Mean   :3991   Mean   :0.6881   Mean   :0.7218   Mean   :0.653  
##  3rd Qu.:5505   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :6937   Max.   :1.0000   Max.   :1.0000   Max.   :1.000  
##                 NA's   :25       NA's   :56       NA's   :60     
##      ab39k            ab39l            ab39m            ab39t       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :1.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.7681   Mean   :0.7512   Mean   :0.6236   Mean   :0.3984  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :53       NA's   :50       NA's   :34       NA's   :67      
##      ab39u            ab39w            ab39x          minority_x    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.4924   Mean   :0.3996   Mean   :0.4645   Mean   :0.1704  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :57       NA's   :50       NA's   :34       NA's   :105     
##     female_x         mathg7_x       mathg10_x       mathg12_x    
##  Min.   :0.0000   Min.   :28.93   Min.   :30.63   Min.   :27.01  
##  1st Qu.:0.0000   1st Qu.:44.84   1st Qu.:58.58   1st Qu.:63.40  
##  Median :1.0000   Median :53.09   Median :67.71   Median :72.19  
##  Mean   :0.5175   Mean   :52.58   Mean   :66.15   Mean   :71.20  
##  3rd Qu.:1.0000   3rd Qu.:59.86   3rd Qu.:75.27   3rd Qu.:81.42  
##  Max.   :1.0000   Max.   :85.07   Max.   :95.26   Max.   :99.30  
##  NA's   :21       NA's   :42      NA's   :532     NA's   :1085   
##     cprob11          cprob12          cprob13          cprob14      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0020   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0030   1st Qu.:0.0090   1st Qu.:0.0000  
##  Median :0.0080   Median :0.1405   Median :0.0640   Median :0.0000  
##  Mean   :0.3169   Mean   :0.3572   Mean   :0.2105   Mean   :0.1155  
##  3rd Qu.:0.8530   3rd Qu.:0.8210   3rd Qu.:0.2752   3rd Qu.:0.0130  
##  Max.   :0.9980   Max.   :0.9950   Max.   :0.9990   Max.   :0.9960  
##  NA's   :21       NA's   :21       NA's   :21       NA's   :21      
##        n1            ga32a            ga32h            ga32i       
##  Min.   :1.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :2.000   Median :1.0000   Median :1.0000   Median :1.0000  
##  Mean   :2.093   Mean   :0.6299   Mean   :0.6493   Mean   :0.6862  
##  3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :4.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :21      NA's   :375      NA's   :387      NA's   :390     
##      ga32k           ga32l            ga33a            ga33h       
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.000   Median :1.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.679   Mean   :0.6466   Mean   :0.5917   Mean   :0.4347  
##  3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :390     NA's   :382      NA's   :381      NA's   :391     
##      ga33i            ga33k            ga33l          minority_y    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.5251   Mean   :0.4289   Mean   :0.4199   Mean   :0.1632  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :391      NA's   :389      NA's   :383      NA's   :430     
##     female_y         mathg7_y       mathg10_y       mathg12_y    
##  Min.   :0.0000   Min.   :28.93   Min.   :30.63   Min.   :27.01  
##  1st Qu.:0.0000   1st Qu.:46.10   1st Qu.:59.09   1st Qu.:64.00  
##  Median :1.0000   Median :53.57   Median :67.86   Median :72.56  
##  Mean   :0.5254   Mean   :53.20   Mean   :66.41   Mean   :71.63  
##  3rd Qu.:1.0000   3rd Qu.:60.09   3rd Qu.:75.30   3rd Qu.:81.73  
##  Max.   :1.0000   Max.   :85.07   Max.   :95.26   Max.   :99.30  
##  NA's   :373      NA's   :390     NA's   :557     NA's   :1128   
##     cprob21          cprob22          cprob23          cprob24      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0020   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0055   1st Qu.:0.0000  
##  Median :0.0035   Median :0.0110   Median :0.0430   Median :0.0000  
##  Mean   :0.3343   Mean   :0.2391   Mean   :0.2287   Mean   :0.1979  
##  3rd Qu.:0.9640   3rd Qu.:0.4000   3rd Qu.:0.3210   3rd Qu.:0.1230  
##  Max.   :0.9980   Max.   :0.9920   Max.   :0.9990   Max.   :0.9980  
##  NA's   :373      NA's   :373      NA's   :373      NA's   :373     
##        n2            ka46a           ka46h            ka46i       
##  Min.   :1.000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :2.000   Median :1.000   Median :1.0000   Median :1.0000  
##  Mean   :2.275   Mean   :0.567   Mean   :0.6733   Mean   :0.7121  
##  3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :4.000   Max.   :1.000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :373     NA's   :787     NA's   :796      NA's   :799     
##      ka46k            ka46l            ka47a           ka47h      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.000  
##  Median :1.0000   Median :1.0000   Median :1.000   Median :0.000  
##  Mean   :0.6109   Mean   :0.6513   Mean   :0.552   Mean   :0.485  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.000   Max.   :1.000  
##  NA's   :802      NA's   :803      NA's   :802     NA's   :808    
##      ka47i            ka47k            ka47l           minority     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.5645   Mean   :0.3831   Mean   :0.4433   Mean   :0.1513  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  NA's   :807      NA's   :808      NA's   :804      NA's   :823     
##      female           mathg7         mathg10         mathg12     
##  Min.   :0.0000   Min.   :28.93   Min.   :30.63   Min.   :27.01  
##  1st Qu.:0.0000   1st Qu.:47.05   1st Qu.:60.66   1st Qu.:63.95  
##  Median :1.0000   Median :54.23   Median :68.83   Median :72.31  
##  Mean   :0.5169   Mean   :54.05   Mean   :67.49   Mean   :71.59  
##  3rd Qu.:1.0000   3rd Qu.:60.80   3rd Qu.:76.10   3rd Qu.:81.66  
##  Max.   :1.0000   Max.   :85.07   Max.   :95.26   Max.   :99.30  
##  NA's   :785      NA's   :793     NA's   :955     NA's   :1137   
##     cprob31          cprob32          cprob33          cprob34      
##  Min.   :0.0000   Min.   :0.0020   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0020   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0060   Median :0.0400   Median :0.0060   Median :0.0000  
##  Mean   :0.3511   Mean   :0.2298   Mean   :0.2066   Mean   :0.2125  
##  3rd Qu.:0.9650   3rd Qu.:0.3095   3rd Qu.:0.2480   3rd Qu.:0.2005  
##  Max.   :0.9980   Max.   :0.9990   Max.   :0.9910   Max.   :0.9980  
##  NA's   :785      NA's   :785      NA's   :785      NA's   :785     
##        n3       
##  Min.   :1.000  
##  1st Qu.:1.000  
##  Median :2.000  
##  Mean   :2.291  
##  3rd Qu.:3.000  
##  Max.   :4.000  
##  NA's   :785
#describe(savedata_t123)

Diagnostic Summary: Classification-Error Transfer

  • Classification-error logits were successfully extracted for all three timepoints.

  • Posterior class probabilities (CPROB) and modal assignments (MLCC) were merged into a unified wide dataset.

  • Nominal class indicators (n1, n2, n3) now encode probabilistic class membership with known misclassification uncertainty.

  • Identifier alignment and missing-value handling were verified.

These diagnostics confirm that classification uncertainty has been properly reconstructed for three-step LTA estimation.


ML STEP 3a: Estimate the Unconditional LTA Model With Fixed Logits

In Step 3a, the baseline longitudinal transition model is estimated under the ML three-step framework.

In this model:

  • the nominal indicators (n1, n2, n3) act as imperfect measurements of the latent class variables (c1, c2, c3),

  • all classification-error logits are fixed to their Step 2 values, and

  • only the structural transition regressions (c2 ON c1, c3 ON c2) are freely estimated.

By fixing the logits, this model produces transition estimates that are explicitly corrected for classification error while preserving invariant measurement.

Conceptual role of Step 3a:
To establish the baseline pattern of latent transitions before introducing auxiliary variables.


The purpose of this step is to confirm that class sizes and transition patterns match expectations from the earlier models before adding auxiliary variables.


Syntax Walkthrough: Step 3a – Unconditional LTA With Fixed Logits

Step 3a estimates the unconditional transition model with logits fixed.

Key elements:
• The nominal indicators are declared using nominal = n1 n2 n3;.
• Logits are fixed using statements such as [n1#1@{logits_t1[1,1]}].
• Only the structural regression paths (c2 ON c1;, c3 ON c2;) are estimated—no measurement parameters vary.
• Fixing logits ensures classification error is properly accounted for, and class definitions remain identical to those in the invariant model.
• This model establishes the baseline transition structure before adding covariates.

unc_LTA  <- mplusObject(
  TITLE = "Step3a of 3-step", 
  
  VARIABLE = 
 "idvariable =casenum;
 
  nominal=n1 n2 n3;
  
  usevar = n1 n2 n3;

  classes = c1(4) c2(4) c3(4);
 
  auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
  
  ANALYSIS = 
 "estimator = mlr; 
  type = mixture; 
  starts = 0;",
  
  MODEL = glue(
  "%overall%
  c2 on c1; 
  c3 on c2;

  Model c1:
    %c1#1%
      [n1#1@{logits_t1[1,1]}];
      [n1#2@{logits_t1[1,2]}];
      [n1#3@{logits_t1[1,3]}];
    %c1#2%
      [n1#1@{logits_t1[2,1]}];
      [n1#2@{logits_t1[2,2]}];
      [n1#3@{logits_t1[2,3]}];
    %c1#3%
      [n1#1@{logits_t1[3,1]}];
      [n1#2@{logits_t1[3,2]}];
      [n1#3@{logits_t1[3,3]}];
    %c1#4%
      [n1#1@{logits_t1[4,1]}];
      [n1#2@{logits_t1[4,2]}];
      [n1#3@{logits_t1[4,3]}];
 
  Model c2:
    %c2#1%
      [n2#1@{logits_t2[1,1]}];
      [n2#2@{logits_t2[1,2]}];
      [n2#3@{logits_t2[1,3]}];
    %c2#2%
      [n2#1@{logits_t2[2,1]}];
      [n2#2@{logits_t2[2,2]}];
      [n2#3@{logits_t2[2,3]}];
    %c2#3%
      [n2#1@{logits_t2[3,1]}];
      [n2#2@{logits_t2[3,2]}];
      [n2#3@{logits_t2[3,3]}];
    %c2#4%
      [n2#1@{logits_t2[4,1]}];
      [n2#2@{logits_t2[4,2]}];
      [n2#3@{logits_t2[4,3]}];
  
  Model c3:
    %c3#1%
      [n3#1@{logits_t3[1,1]}];
      [n3#2@{logits_t3[1,2]}];
      [n3#3@{logits_t3[1,3]}];
    %c3#2%
      [n3#1@{logits_t3[2,1]}];
      [n3#2@{logits_t3[2,2]}];
      [n3#3@{logits_t3[2,3]}];
    %c3#3%
      [n3#1@{logits_t3[3,1]}];
      [n3#2@{logits_t3[3,2]}];
      [n3#3@{logits_t3[3,3]}];
    %c3#4%
      [n3#1@{logits_t3[4,1]}];
      [n3#2@{logits_t3[4,2]}];
      [n3#3@{logits_t3[4,3]}];"),
 
  OUTPUT = "svalues;",
  
  usevariables = colnames(savedata_t123), 
  rdata = savedata_t123)

unc_LTA_fit <- mplusModeler(unc_LTA, 
                 dataout=here("phase_3","step_3a_ml", "t123_LTA.dat"), 
                 modelout=here("phase_3","step_3a_ml", "unconditional_LTA.inp"), 
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Plotting Transition Probabilities (Unconditional Model)

Plotting the unconditional model helps verify that transition patterns are stable, interpretable, and consistent with expectations from earlier phases. This serves as a diagnostic check prior to adding auxiliary variables.


Syntax Walkthrough: Plotting Transition Probabilities (Unconditional Model)

The unconditional LTA transition plot is used to confirm that movement between classes is interpretable and aligns with expectations from previous phases.

Key elements:
• Uses plot_transition() and custom class/timepoint labels.
• Provides a clear depiction of how students move across latent classes from Grade 7 to Grade 12.
• Helps verify that transitions remain coherent before adding auxiliary variables.

source(here("functions","plot_transition.R"))

lta_model <- readModels(here("phase_3","step_3a_ml", "unconditional_LTA.out"))

plot_transition(
  model_name = lta_model,
  facet_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive"),
  timepoint_labels = c(`1` = "Grade 7", `2` = "Grade 10", `3` = "Grade 12"),
  class_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive")
) +
  labs(title = "Unconditional LTA (ML Three-Step Method)")

ggsave(here("figures", "unconditional_lta_ml.jpeg"), dpi="retina", height = 6, width = 10, bg = "white",  units="in")

Diagnostic Summary: Unconditional Transition Structure

  • Transition probabilities exhibit substantial longitudinal stability, particularly within the Very Positive and Qualified Positive classes.

  • The Neutral class shows the greatest fluidity, with transitions distributed across adjacent classes.

  • The Less Positive class demonstrates modest upward mobility across waves.

  • No degenerate or implausible transition patterns are observed.

These results confirm that longitudinal transitions are structurally coherent under the invariant three-step framework and provide a valid baseline prior to introducing covariates.


ML STEP 3b: Estimate the Covariate-Adjusted LTA Model With Fixed Logits

In Step 3b, auxiliary variables are added as predictors of:

  • baseline class membership, and

  • longitudinal transitions.

The classification-error logits remain fixed. As a result:

  • covariates may influence class membership and transitions,

  • but cannot modify the measurement structure or redefine latent classes.

This guarantees that covariate effects are interpreted relative to a stable, invariant latent class framework.

Syntax Walkthrough: Step 3b – Covariate-Adjusted LTA (Fixed Logits)

In Step 3b, covariates are added while logits remain fixed, preserving the measurement model.

Key elements:
• Covariates predict baseline class membership and transitions using statements like c1 ON MINORITY FEMALE.
• Logits remain fixed, so covariates cannot redefine classes or alter thresholds.
• This separation—covariates affect transitions, not measurement—is the core benefit of the ML three-step method.
• The output provides effects of each predictor relative to the reference class (“Less Positive”).

con_LTA  <- mplusObject(
  TITLE = "Step4b of 3-step", 
  
  VARIABLE = 
 "idvariable =casenum;
 
  nominal=n1 n2 n3;
  
  usevar = n1 n2 n3;

  classes = c1(4) c2(4) c3(4);
 
  usevar = MINORITY, FEMALE;",
  
  ANALYSIS = 
 "estimator = mlr; 
  type = mixture; 
  starts = 0;",
  
  MODEL = glue(
  "%overall%
  
  c1 ON MINORITY Female;
  c2 ON c1 MINORITY Female;
  c3 ON c2 MINORITY Female;

  Model c1:
    %c1#1%
      [n1#1@{logits_t1[1,1]}];
      [n1#2@{logits_t1[1,2]}];
      [n1#3@{logits_t1[1,3]}];
    %c1#2%
      [n1#1@{logits_t1[2,1]}];
      [n1#2@{logits_t1[2,2]}];
      [n1#3@{logits_t1[2,3]}];
    %c1#3%
      [n1#1@{logits_t1[3,1]}];
      [n1#2@{logits_t1[3,2]}];
      [n1#3@{logits_t1[3,3]}];
    %c1#4%
      [n1#1@{logits_t1[4,1]}];
      [n1#2@{logits_t1[4,2]}];
      [n1#3@{logits_t1[4,3]}];
 
  Model c2:
    %c2#1%
      [n2#1@{logits_t2[1,1]}];
      [n2#2@{logits_t2[1,2]}];
      [n2#3@{logits_t2[1,3]}];
    %c2#2%
      [n2#1@{logits_t2[2,1]}];
      [n2#2@{logits_t2[2,2]}];
      [n2#3@{logits_t2[2,3]}];
    %c2#3%
      [n2#1@{logits_t2[3,1]}];
      [n2#2@{logits_t2[3,2]}];
      [n2#3@{logits_t2[3,3]}];
    %c2#4%
      [n2#1@{logits_t2[4,1]}];
      [n2#2@{logits_t2[4,2]}];
      [n2#3@{logits_t2[4,3]}];
  
  Model c3:
    %c3#1%
      [n3#1@{logits_t3[1,1]}];
      [n3#2@{logits_t3[1,2]}];
      [n3#3@{logits_t3[1,3]}];
    %c3#2%
      [n3#1@{logits_t3[2,1]}];
      [n3#2@{logits_t3[2,2]}];
      [n3#3@{logits_t3[2,3]}];
    %c3#3%
      [n3#1@{logits_t3[3,1]}];
      [n3#2@{logits_t3[3,2]}];
      [n3#3@{logits_t3[3,3]}];
    %c3#4%
      [n3#1@{logits_t3[4,1]}];
      [n3#2@{logits_t3[4,2]}];
      [n3#3@{logits_t3[4,3]}];"),
 
  OUTPUT = "svalues;",
  
  usevariables = colnames(savedata_t123), 
  rdata = savedata_t123)

con_LTA_fit <- mplusModeler(con_LTA, 
                 dataout=here("phase_3","step_3b_ml", "t123_LTA.dat"), 
                 modelout=here("phase_3","step_3b_ml", "LTA_covariates.inp"), 
                 check=TRUE, run = TRUE, hashfilename = FALSE)

Covariate Effects Table

The covariate table summarizes the effects of each predictor relative to the reference class (“Less Positive”). Effect sizes, standard errors, and significance tests are extracted directly from the Mplus output.


Syntax Walkthrough: Covariate Effects Table

The covariate table summarizes how MINORITY and FEMALE relate to class membership at each timepoint.

Key elements:
• Extracts unstandardized logits from the parameter table.
• Formats estimates as estimate (se) with significance markers.
• Reference class is “Less Positive.”
• Uses gt() for clean presentation in the final tutorial.

lta_cov_model <- readModels(here("phase_3","step_3b_ml", "LTA_covariates.out"))

# REFERENCE CLASS 4
cov <- as.data.frame(lta_cov_model[["parameters"]][["unstandardized"]]) %>%
  filter(param %in% c("MINORITY", "FEMALE")) %>% 
  mutate(param = case_when(
            param == "FEMALE" ~ "Gender",
            param == "MINORITY" ~ "URM"),
    se = paste0("(", format(round(se,2), nsmall =2), ")")) %>% 
  separate(paramHeader, into = c("Time", "Class"), sep = "#") %>% 
  mutate(Class = case_when(
            Class == "1.ON" ~ "Very Positive",
            Class == "2.ON" ~ "Qualified Positive",
            Class == "3.ON" ~ "Neutral"),
         Time = case_when(
            Time == "C1" ~ "7th Grade (T1)",
            Time == "C2" ~ "10th Grade (T2)",
            Time == "C3" ~ "12th Grade (T3)",
         )
         ) %>% 
  unite(estimate, est, se, sep = " ") %>% 
  select(Time:pval, -est_se) %>% 
  mutate(pval = ifelse(pval<0.001, paste0("<.001*"),
                       ifelse(pval<0.05, paste0(scales::number(pval, accuracy = .001), "*"),
                              scales::number(pval, accuracy = .001))))

# Create table

cov_m1 <- cov %>% 
  group_by(param, Class) %>% 
  gt() %>% 
  tab_header(
    title = "Relations Between the Covariates and Latent Class (ML Three-Step Method)") %>%
  tab_footnote(
    footnote = md(
      "Reference Group: Less Positive"
    ),
locations = cells_title()
  ) %>% 
  cols_label(
    param = md("Covariate"),
    estimate = md("Estimate (*se*)"),
    pval = md("*p*-value")) %>% 
  sub_missing(1:3,
              missing_text = "") %>%
  sub_values(values = c(999.000), replacement = "-") %>% 
  cols_align(align = "center") %>% 
  opt_align_table_header(align = "left") %>% 
  gt::tab_options(table.font.names = "serif") 

cov_m1
Relations Between the Covariates and Latent Class (ML Three-Step Method)1
Time Estimate (se) p-value
URM - Very Positive
7th Grade (T1) 0.295 (0.40) 0.459
10th Grade (T2) 0.169 (0.33) 0.611
12th Grade (T3) 0.403 (0.37) 0.273
Gender - Very Positive
7th Grade (T1) -0.398 (0.28) 0.150
10th Grade (T2) -0.124 (0.22) 0.580
12th Grade (T3) 0.155 (0.22) 0.482
URM - Qualified Positive
7th Grade (T1) -0.174 (0.43) 0.683
10th Grade (T2) 0.429 (0.34) 0.208
12th Grade (T3) 0.818 (0.39) 0.034*
Gender - Qualified Positive
7th Grade (T1) 0.252 (0.29) 0.380
10th Grade (T2) 0.22 (0.25) 0.379
12th Grade (T3) 0.414 (0.26) 0.105
URM - Neutral
7th Grade (T1) 0.509 (0.47) 0.279
10th Grade (T2) 0.164 (0.39) 0.673
12th Grade (T3) 1.123 (0.40) 0.005*
Gender - Neutral
7th Grade (T1) -0.228 (0.34) 0.508
10th Grade (T2) -0.195 (0.27) 0.472
12th Grade (T3) 0.641 (0.26) 0.015*
1 Reference Group: Less Positive
gtsave(cov_m1, here("figures", "cov_table_ml.png"))

Diagnostic Summary: Covariate-Adjusted Class Membership

  • Across Grades 7 and 10, covariate effects on class membership are small and largely non-significant.

  • By Grade 12, URM students show higher odds of membership in the Qualified Positive and Neutral classes relative to the Less Positive class.

  • Female students show elevated odds of Qualified Positive membership at Grade 12.

  • No covariate significantly differentiates membership in the Very Positive class.

These results indicate that demographic differentiation in STEM attitudes is minimal in early adolescence but becomes more pronounced by the end of high school.

Completion of Phase 3 ML Three-Step

Phase 3 implements the full ML three-step latent transition framework under a longitudinally invariant measurement structure. Fixed-threshold LCAs confirm correct transfer of invariant measurement, classification-error logits formally encode posterior uncertainty, and unconditional and covariate-adjusted LTAs establish a stable developmental transition structure across Grades 7, 10, and 12. With classification error properly controlled and auxiliary-variable effects estimated without contaminating the measurement model, the workflow is now positioned to proceed to the BCH auxiliary-variable framework as an alternative estimation strategy.


BCH Method

Apply BCH Training Weights for Classification-Error Correction

Purpose

This phase implements the BCH auxiliary-variable framework as an alternative multi-step strategy for estimating latent transition models under longitudinal measurement invariance. Unlike the ML three-step method, which corrects for classification error using fixed logits, the BCH approach uses classification-error–adjusted training weights derived from a single, fully invariant latent transition model.

These BCH training weights carry forward full classification uncertainty while ensuring that:

  • auxiliary variables may influence class membership and transitions,

  • but cannot alter the underlying measurement model or redefine latent classes.

The BCH approach therefore provides an independent verification of auxiliary-variable effects under invariant measurement using a fundamentally different correction mechanism than the ML three-step procedure.


What This Phase Produces

The BCH variant of Phase 3 produces:

  • A single fully invariant LTA model with BCH specification

  • 64 BCH training weights for each individual (4 × 4 × 4 class combinations)

  • A baseline (unconditional) BCH-corrected LTA model

  • A covariate-adjusted BCH-corrected LTA model

  • Longitudinal BCH transition probability plots

  • A BCH-corrected covariate effects table

Together, these outputs establish a classification-error–corrected transition model under BCH weighting, fully parallel to the ML three-step results.


Workflow Overview

This phase proceeds in four structured steps:

  1. Estimate a fully invariant BCH LTA model to generate training weights

  2. Extract BCH weights from the saved data

  3. Estimate the unconditional LTA using BCH weights only

  4. Estimate the covariate-adjusted LTA using BCH weights

Each step transfers classification uncertainty from the invariant measurement model into the BCH-corrected structural model.


BCH Step 1: Estimate the Invariant LTA Model and Compute BCH Training Weights

In Step 1, a single, fully invariant latent transition model is estimated across all three time points. All item–class thresholds are fixed to the invariant values established in Phase 2, enforcing strict longitudinal measurement invariance.

Auxiliary variables are included under the BCH specification, but do not influence class membership or transitions at this stage. Instead, they are used solely to generate BCH training weights.

From this model, Mplus produces:

  • invariant threshold parameters (fixed),

  • unconditional class proportions,

  • BCH-corrected item-response classes, and

  • 64 BCHW* variables representing all possible C1 × C2 × C3 class combinations.

These BCH weights encode classification uncertainty and will serve as the only class indicators for all subsequent BCH transition models.

Conceptual role of BCH Step 1:
To generate a single set of posterior-probability-based training weights that preserve invariant class meaning while fully encoding classification error.


Syntax Walkthrough: BCH Step 1 – Invariant LTA With BCH Weights

This syntax estimates a single fully invariant LTA model and computes BCH training weights.
Key elements:
• Thresholds are fixed to invariant values; no measurement parameters are estimated.
• Auxiliary variables are listed under auxiliary = ... but do not affect the model; they trigger BCH weight generation.
• A single model generates BCHW1–BCHW64, one for each possible c1 × c2 × c3 class combination.
starts = 0 is used because all measurement parameters are fixed.
• Output includes invariant classes and BCH training weights for all individuals. • OPTIONAL: We have rearranged the class labels in Step 1 by renaming the class specific statements (e.g., %C1#2%)

step1_bch  <- mplusObject(
  TITLE = "Step 1 - BCH Method", 
  VARIABLE = 
     "idvariable=CASENUM;
     
      usev =        
      AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
      GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
      KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;

      categorical = 
      AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
      GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
      KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;

      classes = c1(4) c2(4) c3(4);
      
      auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
  
  ANALYSIS = 
   "estimator = mlr; 
    type = mixture;
    starts = 0;",

  MODEL = 
     "%overall%
     
  MODEL C1:

     %C1#1%

     [ ab39a$1@-1.53313 ] (1);
     [ ab39h$1@-2.90401 ] (2);
     [ ab39i$1@-2.99575 ] (3);
     [ ab39k$1@-2.83359 ] (4);
     [ ab39l$1@-3.85594 ] (5);
     [ ab39m$1@-2.16372 ] (6);
     [ ab39t$1@-2.13876 ] (7);
     [ ab39u$1@-2.48348 ] (8);
     [ ab39w$1@-1.81515 ] (9);
     [ ab39x$1@-2.35163 ] (10);

     %C1#3%

     [ ab39a$1@-0.06349 ] (11);
     [ ab39h$1@-0.07976 ] (12);
     [ ab39i$1@-0.52495 ] (13);
     [ ab39k$1@-0.01216 ] (14);
     [ ab39l$1@0.25484 ] (15);
     [ ab39m$1@-0.83350 ] (16);
     [ ab39t$1@0.07961 ] (17);
     [ ab39u$1@-0.45274 ] (18);
     [ ab39w$1@0.51840 ] (19);
     [ ab39x$1@0.22569 ] (20);

     %C1#2%

     [ ab39a$1@-0.88292 ] (21);
     [ ab39h$1@-1.67761 ] (22);
     [ ab39i$1@-0.87642 ] (23);
     [ ab39k$1@-1.94409 ] (24);
     [ ab39l$1@-2.31259 ] (25);
     [ ab39m$1@0.42887 ] (26);
     [ ab39t$1@2.29606 ] (27);
     [ ab39u$1@1.02908 ] (28);
     [ ab39w$1@1.97179 ] (29);
     [ ab39x$1@1.80988 ] (30);

     %C1#4%

     [ ab39a$1@0.74744 ] (31);
     [ ab39h$1@2.01287 ] (32);
     [ ab39i$1@1.67957 ] (33);
     [ ab39k$1@1.54221 ] (34);
     [ ab39l$1@2.34965 ] (35);
     [ ab39m$1@1.38217 ] (36);
     [ ab39t$1@3.73153 ] (37);
     [ ab39u$1@3.09927 ] (38);
     [ ab39w$1@3.69495 ] (39);
     [ ab39x$1@3.68803 ] (40);

  MODEL C2:

     %C2#1%

     [ ga32a$1@-1.53313 ] (1);
     [ ga32h$1@-2.90401 ] (2);
     [ ga32i$1@-2.99575 ] (3);
     [ ga32k$1@-2.83359 ] (4);
     [ ga32l$1@-3.85594 ] (5);
     [ ga33a$1@-2.16372 ] (6);
     [ ga33h$1@-2.13876 ] (7);
     [ ga33i$1@-2.48348 ] (8);
     [ ga33k$1@-1.81515 ] (9);
     [ ga33l$1@-2.35163 ] (10);

     %C2#3%

     [ ga32a$1@-0.06349 ] (11);
     [ ga32h$1@-0.07976 ] (12);
     [ ga32i$1@-0.52495 ] (13);
     [ ga32k$1@-0.01216 ] (14);
     [ ga32l$1@0.25484 ] (15);
     [ ga33a$1@-0.83350 ] (16);
     [ ga33h$1@0.07961 ] (17);
     [ ga33i$1@-0.45274 ] (18);
     [ ga33k$1@0.51840 ] (19);
     [ ga33l$1@0.22569 ] (20);

     %C2#2%

     [ ga32a$1@-0.88292 ] (21);
     [ ga32h$1@-1.67761 ] (22);
     [ ga32i$1@-0.87642 ] (23);
     [ ga32k$1@-1.94409 ] (24);
     [ ga32l$1@-2.31259 ] (25);
     [ ga33a$1@0.42887 ] (26);
     [ ga33h$1@2.29606 ] (27);
     [ ga33i$1@1.02908 ] (28);
     [ ga33k$1@1.97179 ] (29);
     [ ga33l$1@1.80988 ] (30);

     %C2#4%

     [ ga32a$1@0.74744 ] (31);
     [ ga32h$1@2.01287 ] (32);
     [ ga32i$1@1.67957 ] (33);
     [ ga32k$1@1.54221 ] (34);
     [ ga32l$1@2.34965 ] (35);
     [ ga33a$1@1.38217 ] (36);
     [ ga33h$1@3.73153 ] (37);
     [ ga33i$1@3.09927 ] (38);
     [ ga33k$1@3.69495 ] (39);
     [ ga33l$1@3.68803 ] (40);

  MODEL C3:

     %C3#1%

     [ ka46a$1@-1.53313 ] (1);
     [ ka46h$1@-2.90401 ] (2);
     [ ka46i$1@-2.99575 ] (3);
     [ ka46k$1@-2.83359 ] (4);
     [ ka46l$1@-3.85594 ] (5);
     [ ka47a$1@-2.16372 ] (6);
     [ ka47h$1@-2.13876 ] (7);
     [ ka47i$1@-2.48348 ] (8);
     [ ka47k$1@-1.81515 ] (9);
     [ ka47l$1@-2.35163 ] (10);

     %C3#3%

     [ ka46a$1@-0.06349 ] (11);
     [ ka46h$1@-0.07976 ] (12);
     [ ka46i$1@-0.52495 ] (13);
     [ ka46k$1@-0.01216 ] (14);
     [ ka46l$1@0.25484 ] (15);
     [ ka47a$1@-0.83350 ] (16);
     [ ka47h$1@0.07961 ] (17);
     [ ka47i$1@-0.45274 ] (18);
     [ ka47k$1@0.51840 ] (19);
     [ ka47l$1@0.22569 ] (20);

     %C3#2%

     [ ka46a$1@-0.88292 ] (21);
     [ ka46h$1@-1.67761 ] (22);
     [ ka46i$1@-0.87642 ] (23);
     [ ka46k$1@-1.94409 ] (24);
     [ ka46l$1@-2.31259 ] (25);
     [ ka47a$1@0.42887 ] (26);
     [ ka47h$1@2.29606 ] (27);
     [ ka47i$1@1.02908 ] (28);
     [ ka47k$1@1.97179 ] (29);
     [ ka47l$1@1.80988 ] (30);

     %C3#4%

     [ ka46a$1@0.74744 ] (31);
     [ ka46h$1@2.01287 ] (32);
     [ ka46i$1@1.67957 ] (33);
     [ ka46k$1@1.54221 ] (34);
     [ ka46l$1@2.34965 ] (35);
     [ ka47a$1@1.38217 ] (36);
     [ ka47h$1@3.73153 ] (37);
     [ ka47i$1@3.09927 ] (38);
     [ ka47k$1@3.69495 ] (39);
     [ ka47l$1@3.68803 ] (40);",
  
  SAVEDATA = 
   "File=bch_savedata.dat;
    Save=bchweights;",

  OUTPUT = "sampstat svalues",
  
  usevariables = colnames(lsay_data),
  rdata = lsay_data)

step1_fit_bch <- mplusModeler(step1_bch,
                            dataout=here("phase_3", "step_1_bch", "step1.dat"),
                            modelout=here("phase_3", "step_1_bch", "step1_bch.inp") ,
                            check=TRUE, run = TRUE, hashfilename = FALSE)

Diagnostic Plot of the Invariant BCH Model

The invariant BCH class probability plot provides a visual verification that fixing thresholds across all three waves preserves the expected four-class measurement structure.

Syntax Walkthrough: BCH Diagnostic Plot

This plot visualizes the invariant BCH model.
Key elements:
• Uses the same plot_lta() function as Phase 2.
• Confirms that fixed thresholds produce stable class classes.
• Serves as a check that the BCH model encoded the correct measurement structure.

source(here("functions","plot_lta.R"))

inv_bch <- readModels(here("phase_3","step_1_bch"), quiet = TRUE)

plot_lta(inv_bch)

Diagnostic Summary: BCH Invariant Measurement Verification

  • The invariant BCH model reproduces the same four-class configuration observed in Phase 2.

  • No meaningful differences in classes are observed under BCH specification.

  • Class ordering and shape remain fully aligned with the invariant ML solution.

  • All four classes remain substantively interpretable across time.

These diagnostics confirm that BCH training weights are generated under the correct invariant measurement structure.


BCH Step 2: Extract BCH weights

After estimation of the invariant BCH model, the full BCH weight matrix is extracted from the saved data.

Each individual receives 64 weights, corresponding to all possible class combinations across Grades 7, 10, and 12. These weights encode classification uncertainty for every longitudinal class path.

No other class indicators are used in the BCH framework:

  • no logits,

  • no posterior probabilities, and

  • no modal class assignments.

Conceptual role of BCH Step 2:
To construct the complete BCH training-weight representation of probabilistic class membership for downstream structural modeling.


Syntax Walkthrough: BCH Step 2 – Extract BCH Weights

This extracts the 64 BCH weight variables from the saved data.
Key elements:
• Saved data includes variables BCHW1–BCHW64.
• These weights encode classification error for every class path.
• These 64 variables are the only class indicators used in BCH LTA models.
• No logits, CPROBs, or modal assignments are used in BCH.

bch_savedata <- as.data.frame(inv_bch$savedata)
#summary(bch_savedata)

Diagnostic Summary: BCH Weight Extraction

  • All 64 BCH training-weight variables were successfully extracted from the saved data.

  • Each weight corresponds to a unique C1 × C2 × C3 class path.

  • No logits, posterior probabilities, or modal classifications are used in the BCH framework.

  • The BCH dataset is correctly structured for unconditional and covariate-adjusted LTA estimation.

These diagnostics confirm that classification uncertainty has been fully transferred into the BCH training-weight system.


BCH Step 3a: Estimate the Unconditional LTA Model Using BCH Weights

In Step 3a, the unconditional latent transition model is estimated using only BCH training weights as indicators of latent class membership. Because the weights already encode classification error:

  • no logits, thresholds, or CPROB variables are required, and

  • measurement remains implicitly fixed through the training weights.

This model freely estimates:

  • class proportions at each time point, and

  • transitions from C1 → C2 and C2 → C3.

This step is directly parallel to Step 3a of the ML three-step method, replacing logits with BCH weights.

Conceptual role of BCH Step 3a:
To estimate baseline longitudinal transitions under full BCH-corrected classification uncertainty.

Syntax Walkthrough: BCH Step 3a – Unconditional LTA Using BCH Weights

This syntax estimates the unconditional transition model using only BCH weights.
Key elements:
usevar = BCHW1–BCHW64 includes all training-weight indicators.
training = BCHW1–BCHW64(bch) tells Mplus to use BCH corrections.
• No thresholds or logits appear; measurement is fixed implicitly through the weights.
• The structural portion (c2 ON c1, c3 ON c2) is estimated freely.
• Mirrors the ML three-step unconditional model but uses BCH weights instead of logits.

step3_bch  <- mplusObject(
  TITLE = "Step 3 - BCH Method", 
  
  VARIABLE = 
 "usevar = BCHW1-BCHW64;
  
  training = BCHW1-BCHW64(bch);
 
  classes = c1(4) c2(4) c3(4);" ,
  
  ANALYSIS = 
 "type = mixture; 
  starts = 0;",
 
  MODEL = glue(
  "%overall%
  
  c2 ON c1;
  c3 ON c2;"),

  usevariables = colnames(bch_savedata), 
  rdata = bch_savedata)

step3_fit_bch <- mplusModeler(step3_bch,
               dataout=here("phase_3", "step_3a_bch", "step3.dat"), 
               modelout=here("phase_3", "step_3a_bch", "unconditional_bch.inp"), 
               check=TRUE, run = TRUE, hashfilename = FALSE)

Plotting BCH Transition Probabilities

The following visualization helps confirm that the BCH-estimated transitions are stable, interpretable, and consistent with the fixed measurement structure.


Syntax Walkthrough: Plotting BCH Transition Probabilities

This plot displays transition probabilities from the unconditional BCH LTA.
Key elements:
• Uses the same plot_transition() function as earlier phases.
• Checks that transition patterns are coherent under BCH correction.
• Ensures BCH and ML patterns are comparable before adding covariates.

source(here("functions","plot_transition.R"))

lta_model <- readModels(here("phase_3", "step_3a_bch", "unconditional_bch.out"))

plot_transition(
  model_name = lta_model,
  facet_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive"),
  timepoint_labels = c(`1` = "Grade 7", `2` = "Grade 10", `3` = "Grade 12"),
  class_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive")
) +
  labs(title = "Unconditional LTA (BCH Method)")

ggsave(here("figures", "unconditional_lta_bch.jpeg"), dpi="retina", height = 6, width = 10, bg = "white",  units="in")

Diagnostic Summary: Unconditional BCH Transition Structure

  • The Very Positive class shows the highest degree of stability across all transitions.

  • The Qualified Positive class remains moderately stable with limited upward and downward mobility.

  • The Neutral class exhibits the greatest degree of flexibility across waves.

  • The Less Positive class shows both persistence and gradual upward movement.

  • No degenerate or implausible transitions are observed under BCH correction.

These transition patterns mirror the ML three-step results, confirming that BCH-adjusted transitions are structurally coherent.


BCH Step 3b: Estimate the Covariate-Adjusted LTA Model Using BCH Weights

In the final BCH step, auxiliary covariates are added as predictors of:

  • initial class membership, and

  • longitudinal transitions.

Because BCH training weights remain fixed:

  • covariates may influence class membership and transitions,

  • but cannot alter the underlying measurement model or redefine latent classes.

This guarantees that covariate effects are interpreted relative to a stable, invariant latent class framework.

Syntax Walkthrough: BCH Step 3b – Covariate-Adjusted LTA With BCH Weights

This fits the covariate-adjusted LTA model while keeping measurement fixed through the BCH weights.
Key elements:
usevar = BCHW1–BCHW64 MINORITY FEMALE adds covariates.
training = BCHW1–BCHW64(bch) ensures covariates cannot redefine class measurement.
• Covariates appear in structural regressions (e.g., c1 ON MINORITY FEMALE).
• Transition regressions include both covariates and prior class variables (e.g., c2 ON c1 MINORITY FEMALE).
• BCH weights provide all classification-error correction.

step3_bch  <- mplusObject(
  TITLE = "Step 3 - BCH Method", 
  
  VARIABLE = 
 "usevar = BCHW1-BCHW64 MINORITY, FEMALE;
  
  training = BCHW1-BCHW64(bch);
 
  classes = c1(4) c2(4) c3(4);" ,
  
  ANALYSIS = 
 "estimator = mlr; 
  type = mixture; 
  starts = 0;",
  
  MODEL =
  glue(
 " %OVERALL%
 
  c1 ON MINORITY Female;
  c2 ON c1 MINORITY Female;
  c3 ON c2 MINORITY Female;"),
  
  usevariables = colnames(bch_savedata), 
  rdata = bch_savedata)

step3_fit_bch <- mplusModeler(step3_bch,
               dataout=here("phase_3", "step_3b_bch", "step3.dat"), 
               modelout=here("phase_3", "step_3b_bch", "bch_covariates.inp"), 
               check=TRUE, run = TRUE, hashfilename = FALSE)

Covariate Table: BCH Method

The table below summarizes the effects of covariates on class membership in each time point, relative to the reference class (Less Positive). Estimates incorporate BCH-adjusted classification error and are directly comparable to those from the ML method.


Syntax Walkthrough: Covariate Table (BCH)

This table summarizes covariate effects under BCH correction.
Key elements:
• Extracts unstandardized estimates from the Mplus output.
• Relabels classes and timepoints for interpretability.
• Displays estimate (se) and p-values with significance markers.
• Reference class is “Less Positive,” matching earlier phases.
• BCH-corrected effects are directly comparable to ML three-step results.

lta_cov_model <- readModels(here("phase_3", "step_3b_bch", "bch_covariates.out"))

# REFERENCE CLASS 4
cov <- as.data.frame(lta_cov_model[["parameters"]][["unstandardized"]]) %>%
  filter(param %in% c("MINORITY", "FEMALE")) %>% 
  mutate(param = case_when(
            param == "FEMALE" ~ "Gender",
            param == "MINORITY" ~ "URM"),
    se = paste0("(", format(round(se,2), nsmall =2), ")")) %>% 
  separate(paramHeader, into = c("Time", "Class"), sep = "#") %>% 
  mutate(Class = case_when(
            Class == "1.ON" ~ "Very Positive",
            Class == "2.ON" ~ "Qualified Positive",
            Class == "3.ON" ~ "Neutral"),
         Time = case_when(
            Time == "C1" ~ "7th Grade (T1)",
            Time == "C2" ~ "10th Grade (T2)",
            Time == "C3" ~ "12th Grade (T3)",
         )
         ) %>% 
  unite(estimate, est, se, sep = " ") %>% 
  select(Time:pval, -est_se) %>% 
  mutate(pval = ifelse(pval<0.001, paste0("<.001*"),
                       ifelse(pval<0.05, paste0(scales::number(pval, accuracy = .001), "*"),
                              scales::number(pval, accuracy = .001))))


# Create table

cov_m1 <- cov %>% 
  group_by(param, Class) %>% 
  gt() %>% 
  tab_header(
    title = "Relations Between the Covariates and Latent Class (BCH Method)") %>%
  tab_footnote(
    footnote = md(
      "Reference Group: Less Positive"
    ),
locations = cells_title()
  ) %>% 
  cols_label(
    param = md("Covariate"),
    estimate = md("Estimate (*se*)"),
    pval = md("*p*-value")) %>% 
  sub_missing(1:3,
              missing_text = "") %>%
  sub_values(values = c(999.000), replacement = "-") %>% 
  cols_align(align = "center") %>% 
  opt_align_table_header(align = "left") %>% 
  gt::tab_options(table.font.names = "serif") 

cov_m1
Relations Between the Covariates and Latent Class (BCH Method)1
Time Estimate (se) p-value
URM - Very Positive
7th Grade (T1) 0.225 (0.27) 0.406
10th Grade (T2) 0.442 (0.27) 0.100
12th Grade (T3) 0.809 (0.60) 0.175
Gender - Very Positive
7th Grade (T1) -0.655 (0.19) 0.001*
10th Grade (T2) -0.091 (0.20) 0.641
12th Grade (T3) 0.338 (0.45) 0.457
URM - Qualified Positive
7th Grade (T1) 0.096 (0.28) 0.736
10th Grade (T2) 0.444 (0.27) 0.105
12th Grade (T3) 1.099 (0.40) 0.006*
Gender - Qualified Positive
7th Grade (T1) 0.1 (0.20) 0.621
10th Grade (T2) 0.411 (0.20) 0.039*
12th Grade (T3) 0.623 (0.27) 0.022*
URM - Neutral
7th Grade (T1) 0.515 (0.32) 0.103
10th Grade (T2) 0.145 (0.31) 0.640
12th Grade (T3) 0.822 (0.40) 0.039*
Gender - Neutral
7th Grade (T1) -0.431 (0.24) 0.069
10th Grade (T2) 0.062 (0.21) 0.771
12th Grade (T3) 0.372 (0.26) 0.152
1 Reference Group: Less Positive
gtsave(cov_m1, here("figures", "cov_table_bch.png"))

Diagnostic Summary: BCH Covariate-Adjusted Class Membership

  • At Grades 7 and 10, demographic covariate effects remain small and largely non-significant.

  • By Grade 12, URM students show higher odds of membership in the Qualified Positive and Neutral classes relative to the Less Positive class.

  • Female students show elevated odds of Qualified Positive membership at Grades 10 and 12.

  • No consistent covariate effects emerge for the Very Positive class.

These results parallel the ML three-step findings and confirm that demographic differentiation in STEM attitudes becomes more pronounced in late adolescence.