This R Markdown document accompanies the tutorial manuscript Latent Transition Analysis with Auxiliary Variables: A Demonstration of the ML 3-Step and BCH Methods in Mplus using data from the Longitudinal Study of American Youth (LSAY). Its purpose is to provide a fully reproducible, step-by-step implementation of the multi-step LTA workflow described in the paper using Mplus and the MplusAutomation package in R.
The focus of this document is operational rather than conceptual. For each phase of the workflow, we specify:
the goal of the step,
the required Mplus and R inputs (e.g., data objects, model outputs, SVALUES, BCH weights),
the corresponding Mplus or R syntax, and
the diagnostic checks needed to confirm correct estimation.
The full workflow is summarized below. Each phase corresponds directly to the steps outlined in the manuscript, but the focus here is on reproducible implementation and inspection of intermediate outputs.
| Phase | Step(s) from Manuscript | What You Do | Output / Purpose |
|---|---|---|---|
| 0. Data Setup and Preparation | — | Load packages, import LSAY data, inspect indicator coding | Data ready for enumeration |
| 1. Enumeration | 1.1 | Latent class enumeration at each wave (1–6 classes) | Determine K for each wave; check plausibility of aligning classes |
| Estimate chosen K-class LCAs separately at each wave using OPTSEED + SVALUES and plot item response probabilties | Establish consistent class ordering and evaluate class similarity across time to verify that classes are conceptually similar. | ||
| 2. Testing for Measurement Invariance | 2.1 | Joint configural model (no transitions) | Evaluate whether classes can be aligned across waves |
| 2.2 | Joint full measurement invariance (no transitions) | Test equality of item-class parameters; assess need for partial MI | |
| 2.3 | Compare model fit to the unconstrained configural model using an LRT test | A tool to statistically compare the two models | |
| 2.4 | Decide Whether to Proceed with an LTA Under Invariance | If the invariant (or partially invariant) model is still substantively interpretable and classes retain the same meaning, proceed to Phase 3. | |
| 3. Final LTA – ML 3-Step | Step 1 | Generate time-specific posteriors from invariant measurement model | Posterior class probabilities for each wave |
| Step 2 | Merge posteriors; compute classification-error corrections | Data ready for unconditional transitions | |
| Step 3a | Estimate unconditional LTA transitions | Transition model based on invariant class structure | |
| Step 3b | Add covariates/distal outcomes with correction | Corrected auxiliary-variable effects | |
| 3. Final LTA – BCH | Step 1 | Estimate invariant LTA; save BCH weights | Individual-level BCH weights |
| Step 2 | Extract BCH weights | Data ready for unconditional transitions | |
| Step 3a | Estimate unconditional LTA transitions | Transition model | |
| Step 3b | Add covariates/distals using BCH | Corrected auxiliary-variable effects |
Before estimating the latent class structure, we begin by loading the necessary packages and preparing the dataset to ensure the indicators and sample structure are ready for LTA.
Before beginning, we prepare the LSAY dataset for all subsequent stages of the LTA workflow. This includes loading required packages, defining the project root, importing the dataset, converting special codes to missing values, and verifying that indicators and covariates are correctly formatted. These checks ensure that enumeration, measurement-invariance testing, and multi-step auxiliary-variable models can be estimated without data-related errors.
# Installation required when using Mplus version 8.6
# devtools::install_github("michaelhallquist/MplusAutomation")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("rhdf5")
library(MplusAutomation)
library(rhdf5)
library(tidyverse)
library(haven)
library(here)
library(glue)
library(janitor)
library(gt)
library(naniar)
library(psych)
library(modelsummary)
library(cowplot)
library(patchwork)
here::i_am("lta_mi_workflow.Rmd")
Using here() finds your project’s files, based on the
current working directory at the time when the package is loaded.
lsay_data <- read_csv(here("data", "lsay_data.csv"), na = c("9999", "-99", "-98", "-96")) %>%
mutate(across(everything(), as.numeric))
#summary(lsay_data)
#describe(lsay_data)
The dataset is imported with LSAY missing-value codes converted to
NA, and all variables are coerced to numeric. Binary
indicators used for LCA/LTA must be coded 0/1, so this step ensures that
item formatting is correct before enumeration.
These summaries confirm (a) valid 0/1 coding for all indicators, (b) non-zero variance at each wave, and (c) acceptable levels of missingness for variables used later in auxiliary-variable models. Only distributional confirmation is needed here; no substantive interpretation is included in the supplement.
# Define survey questions
all_questions <- c(
"AB39A", "AB39H", "AB39I", "AB39K", "AB39L", "AB39M", "AB39T", "AB39U", "AB39W", "AB39X", # 7th grade
"GA32A", "GA32H", "GA32I", "GA32K", "GA32L", "GA33A", "GA33H", "GA33I", "GA33K", "GA33L", # 10th grade
"KA46A", "KA46H", "KA46I", "KA46K", "KA46L", "KA47A", "KA47H", "KA47I", "KA47K", "KA47L" # 12th grade
)
# Function to compute stats (count, mean, SD, range)
compute_stats <- function(data, question, grade, question_name) {
data %>%
summarise(
Grade = grade,
Count = sum(!is.na(.data[[question]])),
Mean = mean(.data[[question]], na.rm = TRUE),
SD = sd(.data[[question]], na.rm = TRUE),
Min = min(.data[[question]], na.rm = TRUE),
Max = max(.data[[question]], na.rm = TRUE)
) %>%
mutate(Question = question_name)
}
# Define question names and mappings
table_setup <- tibble(
question_code = all_questions,
grade = rep(c(7, 10, 12), each = 10),
question_name = rep(
c(
"I enjoy math",
"Math is useful in everyday problems",
"Math helps a person think logically",
"It is important to know math to get a good job",
"I will use math in many ways as an adult",
"I enjoy science",
"Science is useful in everyday problems",
"Science helps a person think logically",
"It is important to know science to get a good job",
"I will use science in many ways as an adult"
),
times = 3
)
)
# Compute stats for all questions
table1_data <- pmap_dfr(
list(table_setup$question_code, table_setup$grade, table_setup$question_name),
~compute_stats(lsay_data, ..1, ..2, ..3)
) %>%
mutate(
Mean = round(Mean, 2),
SD = round(SD, 2),
Min = round(Min, 2),
Max = round(Max, 2)
) %>%
arrange(match(Question, table_setup$question_name), Grade) %>%
select(Question, Grade, Count, Mean, SD, Min, Max)
# Build table
table1_gt <- table1_data %>%
gt(groupname_col = "Question") %>%
tab_header(
title = "Table 1. Descriptive Statistics for Mathematics and Science Attitudinal Survey Items Included in Analyses"
) %>%
cols_label(
Grade = "Grade",
Count = "N",
Mean = "Prop.",
SD = "SD",
Min = "Min",
Max = "Max"
) %>%
fmt_number(
columns = c(Mean, SD),
decimals = 2
)
# Show table
table1_gt
| Table 1. Descriptive Statistics for Mathematics and Science Attitudinal Survey Items Included in Analyses | |||||
| Grade | N | Prop. | SD | Min | Max |
|---|---|---|---|---|---|
| I enjoy math | |||||
| 7 | 1882 | 0.69 | 0.46 | 0 | 1 |
| 10 | 1532 | 0.63 | 0.48 | 0 | 1 |
| 12 | 1120 | 0.57 | 0.50 | 0 | 1 |
| Math is useful in everyday problems | |||||
| 7 | 1851 | 0.72 | 0.45 | 0 | 1 |
| 10 | 1520 | 0.65 | 0.48 | 0 | 1 |
| 12 | 1111 | 0.67 | 0.47 | 0 | 1 |
| Math helps a person think logically | |||||
| 7 | 1847 | 0.65 | 0.48 | 0 | 1 |
| 10 | 1517 | 0.69 | 0.46 | 0 | 1 |
| 12 | 1108 | 0.71 | 0.45 | 0 | 1 |
| It is important to know math to get a good job | |||||
| 7 | 1854 | 0.77 | 0.42 | 0 | 1 |
| 10 | 1517 | 0.68 | 0.47 | 0 | 1 |
| 12 | 1105 | 0.61 | 0.49 | 0 | 1 |
| I will use math in many ways as an adult | |||||
| 7 | 1857 | 0.75 | 0.43 | 0 | 1 |
| 10 | 1525 | 0.65 | 0.48 | 0 | 1 |
| 12 | 1104 | 0.65 | 0.48 | 0 | 1 |
| I enjoy science | |||||
| 7 | 1873 | 0.62 | 0.48 | 0 | 1 |
| 10 | 1526 | 0.59 | 0.49 | 0 | 1 |
| 12 | 1105 | 0.55 | 0.50 | 0 | 1 |
| Science is useful in everyday problems | |||||
| 7 | 1840 | 0.40 | 0.49 | 0 | 1 |
| 10 | 1516 | 0.43 | 0.50 | 0 | 1 |
| 12 | 1099 | 0.48 | 0.50 | 0 | 1 |
| Science helps a person think logically | |||||
| 7 | 1850 | 0.49 | 0.50 | 0 | 1 |
| 10 | 1516 | 0.53 | 0.50 | 0 | 1 |
| 12 | 1100 | 0.56 | 0.50 | 0 | 1 |
| It is important to know science to get a good job | |||||
| 7 | 1857 | 0.40 | 0.49 | 0 | 1 |
| 10 | 1518 | 0.43 | 0.50 | 0 | 1 |
| 12 | 1099 | 0.38 | 0.49 | 0 | 1 |
| I will use science in many ways as an adult | |||||
| 7 | 1873 | 0.46 | 0.50 | 0 | 1 |
| 10 | 1524 | 0.42 | 0.49 | 0 | 1 |
| 12 | 1103 | 0.44 | 0.50 | 0 | 1 |
Data Verification Summary
All attitudinal indicators fall within the expected 0–1 range and show adequate variability across Grades 7, 10, and 12. No recoding or item removal is required before proceeding to enumeration.
# Define covariates
covariates <- c(
"MINORITY", "FEMALE"
)
# Function to compute stats (count, mean, SD, range, % missing)
compute_stats <- function(data, question, question_name) {
total_n <- nrow(data)
missing_n <- sum(is.na(data[[question]]))
percent_missing <- (missing_n / total_n) * 100
data %>%
summarise(
Count = sum(!is.na(.data[[question]])),
Mean = mean(.data[[question]], na.rm = TRUE),
SD = sd(.data[[question]], na.rm = TRUE),
Min = min(.data[[question]], na.rm = TRUE),
Max = max(.data[[question]], na.rm = TRUE)
) %>%
mutate(
Question = question_name,
PercentMissing = round(percent_missing, 2)
)
}
# Define question names and mappings
table_setup <- tibble(
question_code = covariates,
question_name = c(
"Minority",
"Gender"
)
)
# Compute stats for all questions
table1_data <- pmap_dfr(
list(table_setup$question_code, table_setup$question_name),
~compute_stats(lsay_data, ..1, ..2)
) %>%
mutate(
Mean = round(Mean, 2),
SD = round(SD, 2),
Min = round(Min, 2),
Max = round(Max, 2)
) %>%
select(Question, Count, PercentMissing, Mean, SD, Min, Max)
# Build table
table1_gt <- table1_data %>%
gt() %>%
tab_header(
title = "Table 1. Descriptive Statistics for Covariates Included in Analyses"
) %>%
cols_label(
Count = "N",
PercentMissing = "% Missing",
Mean = "Mean or Proportion",
SD = "SD",
Min = "Min",
Max = "Max"
) %>%
fmt_number(
columns = c(PercentMissing, Mean, SD, Min, Max),
decimals = 2
)
# Show table
table1_gt
| Table 1. Descriptive Statistics for Covariates Included in Analyses | ||||||
| Question | N | % Missing | Mean or Proportion | SD | Min | Max |
|---|---|---|---|---|---|---|
| Minority | 1824 | 4.60 | 0.17 | 0.38 | 0.00 | 1.00 |
| Gender | 1912 | 0.00 | 0.51 | 0.50 | 0.00 | 1.00 |
Data Verification Summary
Covariates show valid ranges, expected missingness patterns, and sufficient variability for use in auxiliary-variable models. All covariates can be included as planned under full-information maximum likelihood.
Phase 1 evaluates the number of latent classes at each timepoint (Grades 7, 10, and 12) using independent latent class analyses. Enumeration is conducted separately at each wave to determine the number of classes supported by the data without imposing any longitudinal constraints. This step establishes whether a common class structure is empirically plausible across time, which is a prerequisite for evaluating longitudinal measurement invariance in Phase 2.
Phase 1 produces the following diagnostic outputs:
1–6 class model solutions at each wave
Model-fit indices (BIC, aBIC, CAIC, AWE)
Likelihood-ratio tests (TECH11, TECH14)
Convergence diagnostics and replication rates
Estimated class proportions
Final selected four-class solutions at each wave
Item probability plots for all three selected four-class solutions
This phase proceeds in three diagnostic stages:
Enumeration of 1–6 class models at each wave
Extraction and comparison of model-fit diagnostics
Independent re-estimation of the selected four-class solution at each wave with fixed class ordering
Each stage serves as a structural validation step prior to measurement-invariance testing.
Syntax Walkthrough – Enumeration at All Three Timepoints
This syntax uses mplusObject() and
mplusModeler() from the MplusAutomation
package to programmatically generate the full set of 1–6 class LCA
models at all three waves (Grade 7, Grade 10, Grade
12). Each wave uses the same enumeration structure, differing only in
the indicator set used and the folder where results are saved.
Key elements:
lapply(1:6, …) loops over class sizes
k = 1…6 so that each wave estimates six models with
identical specifications.
mplusObject()builds a complete Mplus input file for
each iteration, allowing R to write the .inp automatically
without manual editing.
For each wave, the VARIABLE block assigns that
wave’s ten attitudinal indicators to both categorical and
usevar, which instructs Mplus to treat them as ordered
categorical and restrict estimation to those indicators.
The line classes = c({k}); dynamically inserts the
current class number so that each iteration creates the proper 1-class,
2-class, … up to 6-class solution.
The ANALYSIS block includes:
estimator = mlr for robust ML estimation.
type = mixture specifying latent class
analysis.
starts = 500 100, meaning Mplus begins estimation
from 500 random starting value sets and refines the
best 100. This approach substantially reduces the
likelihood of converging to a poor local optimum and helps ensure the
best-fitting solution is recovered for each k.
processors = 12 enabling parallel computation where
available.
The OUTPUT block requests enumeration diagnostics,
including TECH11 and TECH14 likelihood-ratio tests, sample statistics,
and residuals.
mplusModeler() writes the .inp, exports
the .dat file, calls Mplus, and returns the results to R,
allowing automated batch fitting of all enumeration models.
t1_enum <- lapply(1:6, function(k) {
enum_t1 <- mplusObject(
TITLE = glue("Class {k} Time 1"),
VARIABLE = glue(
"categorical = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
usevar = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
classes = c({k});"),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 500 100;
processors = 12;",
OUTPUT = "sampstat residual tech11 tech14;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
enum_t1_fit <- mplusModeler(enum_t1,
dataout=here("phase_1", "t1", "t1.dat"),
modelout=glue(here("phase_1", "t1", "c{k}_lca_t1.inp")),
check=TRUE, run = TRUE, hashfilename = FALSE)
})
The first enumeration evaluates 1–6 class models for the ten Grade 7 indicators. Diagnostics include IC trends (BIC/aBIC), TECH11/TECH14 tests, model convergence, and estimated class proportions.
t2_enum <- lapply(1:6, function(k) {
enum_t2 <- mplusObject(
TITLE = glue("Class {k} Time 2"),
VARIABLE = glue(
"categorical = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
usevar = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
classes = c({k});"),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 500 100;
processors = 12;",
OUTPUT = "sampstat residual tech11 tech14;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
enum_t2_fit <- mplusModeler(enum_t2,
dataout=here("phase_1", "t2", "t2.dat"),
modelout=glue(here("phase_1", "t2", "c{k}_lca_t2.inp")),
check=TRUE, run = TRUE, hashfilename = FALSE)
})
The same enumeration procedure is applied to the Grade 10 indicators. Estimating these models independently allows direct comparison of class recovery and structure relative to Time 1.
t3_enum <- lapply(1:6, function(k) {
enum_t3 <- mplusObject(
TITLE = glue("Class {k} Time 3"),
VARIABLE = glue(
"categorical = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
usevar = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
classes = c({k});"),
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 500 100;
processors = 12;",
OUTPUT = "sampstat residual tech11 tech14;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
enum_t3_fit <- mplusModeler(enum_t3,
dataout=here("phase_1", "t3", "t3.dat"),
modelout=glue(here("phase_1", "t3", "c{k}_lca_t3.inp")),
check=TRUE, run = TRUE, hashfilename = FALSE)
})
Enumeration is completed for Grade 12. Agreement in class number and stability across all three waves provides initial empirical support for considering a longitudinal model.
After enumeration, Mplus output files are programmatically parsed to extract fit indices and model diagnostics for all 1–6 class models at each wave. These summaries provide the empirical basis for class-number selection and stability assessment.
Syntax Walkthrough: Model-Fit Extraction (Time 1, Time 2, Time 3)
This syntax automates the extraction of model-fit information from all enumeration models at each wave (Grade 7, Grade 10, Grade 12). The same structure is repeated for each timepoint so the resulting fit-index tables are fully comparable across waves.
Key elements:
source(here("functions", "extract_mplus_info.R"))
loads the custom function
extract_mplus_info_extended(), which:
reads an individual Mplus .out file;
parses the model summary;
extracts key enumeration statistics including log-likelihood, BIC, aBIC, entropy, TECH11/14 p-values (if present), and sample size;
returns those values as a tidy one-row dataframe.
This lets R pull the relevant fit information directly from Mplus output
without manual inspection.
source(here("functions","enum_table.R")) loads
enum_table_full(), which:
takes the list of Mplus models loaded via
readModels();
extracts class-specific and model-level fit information;
formats it into a clean, presentation-ready enumeration summary
table.
This provides the standardized fit-index tables shown in the
walkthrough.
output_dir <- here("phase_1", "t1") specifies
which wave’s enumeration folder to process.
list.files() identifies all .out files
generated during enumeration for that wave. This ensures the script
automatically captures the 1–6 class models.
map_dfr(output_files, extract_mplus_info_extended)
applies the custom extractor function to every model and stacks the
results into a single tidy dataframe (final_data). This
dataframe contains all fit indices for that wave in one place.
unique(final_data$Sample_Size) verifies the sample
size used in the enumeration models.
readModels() loads all Mplus model objects from the
wave directory so the full output—thresholds, summaries, and
statistics—is available in R.
enum_table_full() formats the full set of models
into a publication-quality enumeration table summarizing fit indices
across the 1–6 class solutions.
This automated workflow extracts, collates, and displays model-fit information for all enumeration models at each timepoint, removing the need to manually inspect Mplus output and ensuring consistent reporting across waves.
source(here("functions", "extract_mplus_info.R"))
source(here("functions","enum_table.R"))
# Define the directory where all of the .out files are located.
output_dir <- here("phase_1","t1")
# Get all .out files
output_files <- list.files(output_dir, pattern = "\\.out$", full.names = TRUE)
# Process all .out files into one dataframe
final_data <- map_dfr(output_files, extract_mplus_info_extended)
# Extract Sample_Size from final_data
sample_size <- unique(final_data$Sample_Size)
output_enum_t1 <- readModels(here("phase_1","t1"), quiet = TRUE)
enum_table_full(output_enum_t1)
| Model Fit Summary Table1 | |||||||||||
| Classes | Par | LL | % Converged | % Replicated |
Model Fit Indices
|
LRTs
|
Smallest Class
|
||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| BIC | aBIC | CAIC | AWE | VLMR | BLRT | n (%) | |||||
| Class 1 Time 1 | 10 | −11,803.43 | 100% | 100% | 23,682.28 | 23,650.51 | 23,692.28 | 23,787.70 | – | – | 1886 (100%) |
| Class 2 Time 1 | 21 | −10,418.76 | 100% | 100% | 20,995.91 | 20,929.19 | 21,016.91 | 21,217.29 | <.001 | <.001 | 782 (41.5%) |
| Class 3 Time 1 | 32 | −10,165.87 | 36% | 100% | 20,573.10 | 20,471.44 | 20,605.10 | 20,910.45 | 0.00 | <.001 | 384 (20.4%) |
| Class 4 Time 1 | 43 | −10,042.97 | 64% | 98% | 20,410.26 | 20,273.64 | 20,453.26 | 20,863.57 | 0.00 | <.001 | 390 (20.7%) |
| Class 5 Time 1 | 54 | −9,969.24 | 66% | 47% | 20,345.76 | 20,174.20 | 20,399.76 | 20,915.04 | 0.04 | <.001 | 175 (9.3%) |
| Class 6 Time 1 | 65 | −9,915.15 | 42% | 67% | 20,320.54 | 20,114.04 | 20,385.54 | 21,005.78 | 0.01 | <.001 | 180 (9.5%) |
| 1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability. | |||||||||||
IC Plot
Syntax Walkthrough: Information-Criterion (IC) Plots
These sections generate visualizations of BIC/aBIC trends across the 1–6 class models.
Key elements:
ic_plot_lca.R defines a custom helper function that
extracts information criteria from the readModels() output
and produces a line plot across class sizes.
The call ic_plot(output_enum_t1) (and the T2/T3
versions) creates wave-specific plots showing improvements in fit as
classes increase.
These plots help identify diminishing returns (elbows) that guide class selection.
source(here("functions","ic_plot_lca.R"))
ic_plot(output_enum_t1)
source(here("functions", "extract_mplus_info.R"))
source(here("functions","enum_table.R"))
# Define the directory where all of the .out files are located.
output_dir <- here("phase_1","t2")
# Get all .out files
output_files <- list.files(output_dir, pattern = "\\.out$", full.names = TRUE)
# Process all .out files into one dataframe
final_data <- map_dfr(output_files, extract_mplus_info_extended)
# Extract Sample_Size from final_data
sample_size <- unique(final_data$Sample_Size)
output_enum_t2 <- readModels(here("phase_1","t2"), quiet = TRUE)
enum_table_full(output_enum_t2)
| Model Fit Summary Table1 | |||||||||||
| Classes | Par | LL | % Converged | % Replicated |
Model Fit Indices
|
LRTs
|
Smallest Class
|
||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| BIC | aBIC | CAIC | AWE | VLMR | BLRT | n (%) | |||||
| Class 1 Time 2 | 10 | −10,072.93 | 100% | 100% | 20,219.21 | 20,187.44 | 20,229.21 | 20,322.56 | – | – | 1534 (100%) |
| Class 2 Time 2 | 21 | −8,428.38 | 100% | 100% | 17,010.82 | 16,944.10 | 17,031.82 | 17,227.86 | <.001 | <.001 | 658 (42.9%) |
| Class 3 Time 2 | 32 | −8,067.61 | 53% | 100% | 16,369.96 | 16,268.31 | 16,401.96 | 16,700.70 | <.001 | <.001 | 297 (19.4%) |
| Class 4 Time 2 | 43 | −7,905.53 | 39% | 100% | 16,126.50 | 15,989.90 | 16,169.50 | 16,570.93 | <.001 | <.001 | 290 (18.9%) |
| Class 5 Time 2 | 54 | −7,845.44 | 82% | 100% | 16,087.01 | 15,915.46 | 16,141.01 | 16,645.13 | 0.01 | <.001 | 220 (14.3%) |
| Class 6 Time 2 | 65 | −7,806.99 | 44% | 36% | 16,090.79 | 15,884.30 | 16,155.79 | 16,762.61 | 0.02 | <.001 | 130 (8.5%) |
| 1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability. | |||||||||||
IC Plot
source(here("functions","ic_plot_lca.R"))
ic_plot(output_enum_t2)
source(here("functions", "extract_mplus_info.R"))
source(here("functions","enum_table.R"))
# Define the directory where all of the .out files are located.
output_dir <- here("phase_1","t3")
# Get all .out files
output_files <- list.files(output_dir, pattern = "\\.out$", full.names = TRUE)
# Process all .out files into one dataframe
final_data <- map_dfr(output_files, extract_mplus_info_extended)
# Extract Sample_Size from final_data
sample_size <- unique(final_data$Sample_Size)
output_enum_t3 <- readModels(here("phase_1","t3"), quiet = TRUE)
enum_table_full(output_enum_t3)
| Model Fit Summary Table1 | |||||||||||
| Classes | Par | LL | % Converged | % Replicated |
Model Fit Indices
|
LRTs
|
Smallest Class
|
||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| BIC | aBIC | CAIC | AWE | VLMR | BLRT | n (%) | |||||
| Class 1 Time 3 | 10 | −7,349.13 | 100% | 100% | 14,768.49 | 14,736.72 | 14,778.49 | 14,868.72 | – | – | 1122 (100%) |
| Class 2 Time 3 | 21 | −5,976.60 | 100% | 100% | 12,100.68 | 12,033.98 | 12,121.68 | 12,311.16 | <.001 | <.001 | 534 (47.5%) |
| Class 3 Time 3 | 32 | −5,670.60 | 98% | 99% | 11,565.92 | 11,464.28 | 11,597.92 | 11,886.65 | <.001 | <.001 | 203 (18.1%) |
| Class 4 Time 3 | 43 | −5,543.62 | 36% | 97% | 11,389.22 | 11,252.64 | 11,432.22 | 11,820.20 | 0.00 | <.001 | 219 (19.5%) |
| Class 5 Time 3 | 54 | −5,483.67 | 63% | 62% | 11,346.57 | 11,175.06 | 11,400.57 | 11,887.81 | 0.00 | <.001 | 131 (11.7%) |
| Class 6 Time 3 | 65 | −5,444.06 | 42% | 64% | 11,344.60 | 11,138.14 | 11,409.60 | 11,996.08 | 0.09 | <.001 | 81 (7.2%) |
| 1 Note. Par = Parameters; LL = model log likelihood; BIC = Bayesian information criterion; aBIC = sample size adjusted BIC; CAIC = consistent Akaike information criterion; AWE = approximate weight of evidence criterion; BLRT = bootstrapped likelihood ratio test p-value; VLMR = Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test p-value; cmPk = approximate correct model probability. | |||||||||||
IC Plot
source(here("functions","ic_plot_lca.R"))
ic_plot(output_enum_t3)
Diagnostic Summary: Enumeration Fit
For Time 1, model-fit indices improved through the four-class solution, with stable convergence and acceptable class proportions.
For Time 2 and Time 3, information criteria again declined through the four-class solution and stabilized thereafter.
Convergence diagnostics and class-size distributions remained adequate at all three waves for the four-class solution.
Together, these diagnostics support selecting four classes at each timepoint for all subsequent phases.
Once the four-class solution is selected at each wave, class-specific response probabilities are visualized as an additional diagnostic check on class distinctiveness and shape similarity.
Syntax Walkthrough: Class Probabilitiy Plots
These plots visualize the conditional response patterns for the selected class solution at each wave.
Key elements:
plot_lca() is a custom plotting function sourced
from your functions/ directory that takes a
readModels() output object and generates class-specific
probabilities
The models c4_lca_t1.out,
c4_lca_t2.out, and c4_lca_t3.out are selected
manually because enumeration suggested that 4 classes were
optimal.
(plot_lca(t1) | plot_lca(t2) | plot_lca(t3)) uses
the patchwork operator (|) to place the
three probability plots side-by-side.
These plots allow researchers to visually inspect:
class distinctiveness,
similarity of class shapes across waves,
and the feasibility of linking classes over time.
This visualization is essential before moving to Phase 2 (invariance testing).
Plot LCA:
source(here("functions","plot_lca.R"))
step1_t1 <- readModels(here("phase_1","t1"))
step1_t2 <- readModels(here("phase_1","t2"))
step1_t3 <- readModels(here("phase_1","t3"))
t1 <- step1_t1$c4_lca_t1.out
t2 <- step1_t2$c4_lca_t2.out
t3 <- step1_t3$c4_lca_t3.out
(plot_lca(t1) | plot_lca(t2) | plot_lca(t3))
Class labels used here:
Class 1: Very Positive, Class 2: Qualified
Positive, Class 3: Neutral, Class 4:
Less Positive
Diagnostic Summary: Class Consistency
The four-class probability plots confirm that:
Each wave yields four clearly differentiated classes
No class displays extreme degeneration or redundancy
Classes are sufficiently stable across waves
No structural inconsistencies are observed
These diagnostics indicate that the same four-class structure is empirically recoverable across Grades 7, 10, and 12.
In this step, the selected four-class solution is re-estimated independently at each wave with:
Fixed number of classes (four)
Optimized class ordering using optseed
Supplied svalues() to enforce consistent class
numbering across waves
Random starts disabled (starts = 0)
This step does not change the class structure. Its sole purpose is to ensure stable class labeling across waves prior to joint longitudinal modeling.
Syntax Walkthrough: Independent LCAs at Each Time Point (Time 1, Time 2, Time 3)
These syntax blocks estimate the selected four-class model separately at each wave. Because the basic LCA structure is identical to the enumeration syntax used in Phase 1, we rely on the earlier walkthrough and focus here on what is new.
Key elements:
The same MplusAutomation workflow (mplusObject(),
mplusModeler()) used in Phase 1 is applied here, but each
model now fixes the number of classes at four, based on enumeration
decisions.
Each wave uses its own set of categorical indicators
(AB* for Grade 7, GA* for Grade 10,
KA*for Grade 12), paralleling the Phase 1
structure.
starts = 0 is used because the enumeration step has
already established the class structure. This disables random start
searches and forces the model to proceed directly from the provided
starting values.
optseed = ###### fixes the optimization seed so the
class ordering remains stable across waves. This is critical for later
longitudinal alignment and reduces risk of label switching.
The svalues() option supplies ordered starting
values to maintain consistent class numbering (e.g., 1 = Very Positive,
4 = Less Positive).
These svalues differ across waves because the enumeration results
differ; each set is wave-specific.
All other syntax elements (VARIABLE, ANALYSIS, OUTPUT, data export, Mplus execution) operate exactly as described in the Phase 1 walkthrough.
Together, these steps generate three wave-specific four-class models that serve as the baseline for evaluating measurement invariance in the next phase.
lca_t1 <- mplusObject(
TITLE = "Class-3_Time1",
VARIABLE =
"categorical = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
usevar = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
!useobs = patw2==0;
classes = c(4);",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;
optseed = 937588;",
OUTPUT = "TECH1 TECH8 TECH14 svalues(2 3 4 1);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lca_t1_fit <- mplusModeler(lca_t1,
dataout=here("phase_1","reordered","t1_lca.dat"),
modelout=here("phase_1","reordered","t1_lca_step1.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
lca_t2 <- mplusObject(
TITLE = "Class-3_Time2",
VARIABLE =
"categorical = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
usevar = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
!useobs = patw4==0;
classes = c(4);",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;
optseed = 264935;",
OUTPUT = "TECH1 TECH8 TECH14 svalues(3 1 4 2);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lca_t2_fit <- mplusModeler(lca_t2,
dataout=here("phase_1","reordered","t2_lca.dat"),
modelout=here("phase_1","reordered","t2_lca_step1.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
lca_t3 <- mplusObject(
TITLE = "Class-3_Time3",
VARIABLE =
"categorical = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
usevar = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
!useobs = patw6==0;
classes = c(4);",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;
optseed = 366706;",
OUTPUT = "TECH1 TECH8 TECH14 svalues(1 4 3 2);",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lca_t3_fit <- mplusModeler(lca_t3,
dataout=here("phase_1","reordered","t3_lca.dat"),
modelout=here("phase_1","reordered","t3_lca_step1.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
We then visualize the estimated classes from all three waves in a side-by-side display. Probability plots provide an intuitive check on whether the same substantive patterns emerge across time. Specifically, we look for:
similar shapes or relative endorsement levels within classes,
consistent ordering of classes across waves
Strong similarity across the three sets of plots suggests that placing equality constraints on item-response probabilities in later phases is plausible.
Syntax Walkthrough: Plotting Independent LCA Probabilities
This plotting code is similar to the enumeration plots in Phase 1, but introduces several elements that require a separate walkthrough.
Key elements:
readModels() is used to read the single selected 4-class
model for each wave (Time 1, Time 2, Time 3).
Unlike Phase 1, where readModels() loads an entire folder of c1–c6
models, here we load specific .out files created during the beginning of
Phase 1.
plot_lca() is a custom plotting function (sourced from
functions/plot_lca.R).
In Phase 1 it was applied once per time point to visualize enumeration
candidates; here, it is used to create final probability plots from the
selected 4-class models.
The syntax extracts each model explicitly:
t1 <- readModels(“t1_lca_step1.out”)
t2 <- readModels(“t2_lca_step1.out”)
t3 <- readModels(“t3_lca_step1.out”)
This ensures that the plotted probabilities represent the final
estimation for each wave—not the enumeration set.
The expression (plot_lca(t1) | plot_lca(t2) | plot_lca(t3)) uses
the patchwork package to align the three probability plots in a single
row.
This allows direct visual comparison of class shape, ordering, and
interpretation across waves.
No additional arguments are needed because class labels, indicator names, and formatting are handled internally by plot_lca().
This workflow creates the three-wave class comparison used to assess whether class shapes remain stable enough to justify longitudinal measurement invariance in Phase 2.
source(here("functions","plot_lca.R"))
t1 <- readModels(here("phase_1","reordered","t1_lca_step1.out"))
t2 <- readModels(here("phase_1","reordered","t2_lca_step1.out"))
t3 <- readModels(here("phase_1","reordered","t3_lca_step1.out"))
(plot_lca(t1) | plot_lca(t2) | plot_lca(t3))
Diagnostic Summary: Pre–Measurement-Invariance Validation
A four-class solution is consistently recoverable at Grades 7, 10, and 12.
Estimated classes remain structurally comparable across waves following independent reordering.
No classes exhibits redundancy or instability after enforcing consistent class labels.
Class proportions remain substantively viable at all three waves.
These diagnostics confirm that a longitudinal measurement-invariance model is empirically defensible. No model adjustments are required prior to proceeding to Phase 2.
Phase 1 confirms that a stable four-class measurement structure is recoverable independently at each timepoint and that classes are sufficiently consistent to support formal evaluation of longitudinal measurement invariance. With enumeration and class ordering validated, the workflow proceeds to Phase 2, where joint configural and invariant latent class models are estimated.
Phase 2 evaluates whether the same latent class measurement model can be meaningfully applied across Grades 7, 10, and 12. This is accomplished by comparing:
a non-invariant joint model, in which all item–class thresholds are freely estimated at each wave, and
a fully invariant joint model, in which all item–class thresholds are constrained to be equal across waves.
This phase determines whether a common measurement structure is empirically defensible across time. The outcome of this phase governs whether full invariance, partial invariance, or non-invariance must be carried forward into auxiliary-variable modeling.
Important Clarification on the Role of Measurement Invariance in LTA
Establishing longitudinal measurement invariance is not a prerequisite for estimating a latent transition model itself. An LTA model can always be estimated under a non-invariant measurement structure, in which item–class thresholds are freely estimated at each wave. In such cases, transitions are interpreted with respect to time-specific latent classes, rather than assuming stable class meaning across time.
Measurement invariance becomes essential only when the analytic goal requires:
If full invariance is not supported in this phase, researchers may:
proceed with a non-invariant LTA using freely estimated thresholds, or
pursue a partial invariance solution by selectively freeing non-invariant item parameters.
The workflow presented here focuses on the invariant-measurement pathway because we focus on LTA with invariance. Full invariance is not needed to estimate and LTA model.
Phase 2 produces the following diagnostic outputs:
A joint non-invariant LTA measurement model
A joint fully invariant LTA measurement model
Visual comparison of invariant vs. non-invariant latent class models
A Satorra–Bentler scaled χ² nested difference test
A formal decision regarding:
full invariance,
partial invariance, or
non-invariance
These outputs establish whether longitudinal transitions can be modeled under a constrained measurement structure.
This phase proceeds in three validation steps:
Estimate a non-invariant joint configural model (baseline)
Estimate a fully constrained invariant measurement model
Formally evaluate invariance using nested model testing and diagnostics
The non-invariant joint model serves as the unconstrained baseline for evaluating measurement invariance. In this model:
Three latent class variables (one per wave) are estimated jointly.
All item–class thresholds are freely estimated across time.
No equality constraints are imposed.
No transition regressions are specified.
This model provides the parent likelihood required for nested model comparison with the invariant specification.
Syntax Walkthrough: Non-Invariant LTA Model
This model introduces the first joint longitudinal specification in
the tutorial. Because the basic MplusAutomation structure
(mplusObject(), mplusModeler()) was already
described in Phases 1–2, this walkthrough focuses only on elements that
are new or unique to the non-invariant LTA model.
Key elements:
• Three latent class variables are defined simultaneously using
classes = c1(4) c2(4) c3(4);, one for each wave.
• The usev and categorical fields now contain
all indicators from Grades 7, 10, and 12, reflecting the joint
estimation across waves.
• The MODEL blocks use syntax such as [AB39A$1–AB39X$1] to
freely estimate thresholds for each indicator across
all classes within a wave.
• No equality constraints appear anywhere in the MODEL block, which is
what makes this the fully non-invariant baseline model.
• The ANALYSIS block uses starts = 1000 200 to increase the
likelihood of reaching the global solution. Joint LTA models are more
prone to local maxima than single-wave LCAs, so the high number of
random and final starts is necessary.
• All other workflow components (VARIABLE structure, Mplus syntax
generation, model export, and execution) follow the same logic already
covered in Phase 1.
This model serves as the unconstrained parent model required for Satorra–Bentler nested comparison testing.
lta_non_inv <- mplusObject(
TITLE =
"Non-Invariant LTA Model",
VARIABLE =
"idvariable=CASENUM;
usev =
AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
categorical =
AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
classes = c1(4) c2(4) c3(4);",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 500 200;",
MODEL =
"%overall%
MODEL c1:
%c1#1%
[AB39A$1-AB39X$1];
%c1#2%
[AB39A$1-AB39X$1];
%c1#3%
[AB39A$1-AB39X$1];
%c1#4%
[AB39A$1-AB39X$1];
MODEL c2:
%c2#1%
[GA32A$1-GA33L$1];
%c2#2%
[GA32A$1-GA33L$1];
%c2#3%
[GA32A$1-GA33L$1];
%c2#4%
[GA32A$1-GA33L$1];
MODEL c3:
%c3#1%
[KA46A$1-KA47L$1];
%c3#2%
[KA46A$1-KA47L$1];
%c3#3%
[KA46A$1-KA47L$1];
%c3#4%
[KA46A$1-KA47L$1];",
OUTPUT = "svalues;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lta_non_inv_fit <- mplusModeler(lta_non_inv,
dataout=here("phase_2", "lta.dat"),
modelout=here("phase_2", "noninvariant_lta.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
After estimation, this model is retained strictly as the reference model for nested testing.
The invariant model imposes full longitudinal measurement invariance by fixing all item–class thresholds to identical values across Grades 7, 10, and 12.
Under this model:
All wave-specific measurement parameters are constrained.
Only structural parameters (class proportions and transitions, when added later) remain free.
Because all thresholds are fixed, random-start searches are
disabled (starts = 0).
This model nests within the non-invariant model and provides the constrained target for formal invariance testing.
Syntax Walkthrough: Invariant LTA Model
This model imposes full measurement invariance by fixing all item thresholds to values obtained from the invariant single-wave measurement model.
Key elements:
• Thresholds are fixed using statements such as
ab39a$1*-0.88317, one per item and per class, within each
wave-specific MODEL block.
• The structure mirrors the non-invariant syntax, but all free
thresholds are replaced with fixed values.
• Because measurement parameters are fixed, starts = 0 is
used in the ANALYSIS block—there is no need for random start
searches.
• The invariant model estimates only structural parameters: class
proportions at each wave and transition probabilities
(c2 ON c1, c3 ON c2 in later models).
• Because this model nestles inside the non-invariant model (same
structure, fewer free parameters), it can be directly compared using the
Satorra–Bentler test.
This model provides the required nested structure for evaluating whether full measurement invariance is plausible.
lta_inv <- mplusObject(
TITLE =
"Invariant LTA Model",
VARIABLE =
"idvariable=CASENUM;
usev =
AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
categorical =
AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
classes = c1(4) c2(4) c3(4);",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;
processors=10;",
MODEL =
"
%overall%
MODEL c1:
%c1#1%
[AB39A$1-AB39X$1](1-10);
%c1#2%
[AB39A$1-AB39X$1](11-20);
%c1#3%
[AB39A$1-AB39X$1](21-30);
%c1#4%
[AB39A$1-AB39X$1](31-40);
MODEL c2:
%c2#1%
[GA32A$1-GA33L$1](1-10);
%c2#2%
[GA32A$1-GA33L$1](11-20);
%c2#3%
[GA32A$1-GA33L$1](21-30);
%c2#4%
[GA32A$1-GA33L$1](31-40);
MODEL c3:
%c3#1%
[KA46A$1-KA47L$1](1-10);
%c3#2%
[KA46A$1-KA47L$1](11-20);
%c3#3%
[KA46A$1-KA47L$1](21-30);
%c3#4%
[KA46A$1-KA47L$1](31-40);",
OUTPUT = "svalues;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
lta_inv_fit <- mplusModeler(lta_inv,
dataout=here("phase_2", "lta.dat"),
modelout=here("phase_2", "invariant_lta.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
This model yields the measurement structure used for all downstream auxiliary-variable analyses if invariance is retained.
Class probability plots are generated for both the non-invariant and invariant solutions to support visual diagnostic evaluation of the invariance constraints.
The plots are used to verify:
preservation of overall class shape,
absence of class collapse or reversals,
stability of class ordering, and
absence of constraint-induced distortion.
Syntax Walkthrough: Plotting Non-Invariant and Invariant Models
Key elements:
• readModels() loads both .out files from the
Phase 2 directory.
• plot_lta() (from your functions/ directory)
is then used to generate class probability plots for each model.
• Calling plot_lta(inv_mod$invariant.lta.out) displays
invariant classes; plot_lta(inv_mod$non.invariant.lta.out)
displays the non-invariant ones.
• This visualization allows quick assessment of whether constraining
item thresholds alters class shapes in a meaningful way.
These plots support the conceptual justification for moving forward with invariance—or identifying where partial invariance might be required.
source(here("functions","plot_lta.R"))
inv_mod <- readModels(here("phase_2"), quiet = TRUE)
plot_lta(inv_mod$invariant_lta.out)
plot_lta(inv_mod$noninvariant_lta.out)
Diagnostic Summary: Class Stability Under Invariance Constraints
The invariant model reproduces the same four-class structural configuration as the non-invariant model.
No meaningful differences under invariance constraints.
The constrained classes remain visually interpretable and structurally aligned with the unconstrained solution.
Minor item-level deviations observed in the non-invariant model do not change the interpretation of the classes. These diagnostics indicate that imposing full longitudinal measurement invariance does not compromise class recoverability or structural interpretability.
A Satorra–Bentler scaled chi-square difference test is used as one way to compare nested model. The Satorra-Bentler adjustment is needed because models were estimated using ML with robust standard errors (MLR):
the fully invariant model (nested), and
the non-invariant model (comparison).
This test evaluates whether the invariance constraints introduce statistically meaningful degradation in model fit.
Syntax Walkthrough: Satorra–Bentler Scaled χ² Difference Test
This section computes the adjusted chi-square difference for nested model comparison between the invariant and non-invariant LTA models.
Key elements:
• readModels() extracts log-likelihoods (LL),
number of parameters (Parameters), and scaling correction
factors (LLCorrectionFactor) for each model.
• The likelihood-ratio term is computed using
lr = -2*(L0 - L1).
• The adjusted scaling factor is computed using
cd = ((p0*c0) - (p1*c1)) / (p0 - p1).
• The Satorra–Bentler test statistic is then
TRd = lr / cd.
• Degrees of freedom are computed as abs(p0 - p1).
• The final p-value is obtained via
pchisq(TRd, df, lower.tail = FALSE).
A non-significant p-value indicates that the invariant model does not fit significantly worse than the non-invariant model, supporting the decision to carry invariant measurement parameters forward into auxiliary-variable modeling.
# *0 = null or nested model & *1 = comparison or parent model
lta_models <- readModels(here("phase_2"), quiet = TRUE)
# Log Likelihood Values
L0 <- lta_models$invariant_lta.out$summaries$LL
L1 <- lta_models$noninvariant_lta.out$summaries$LL
# LRT equation
lr <- -2*(L0-L1)
# Parameters
p0 <- lta_models$invariant_lta.out$summaries$Parameters
p1 <- lta_models$noninvariant_lta.out$summaries$Parameters
# Scaling Correction Factors
c0 <- lta_models$invariant_lta.out$summaries$LLCorrectionFactor
c1 <- lta_models$noninvariant_lta.out$summaries$LLCorrectionFactor
# Difference Test Scaling correction
cd <- ((p0*c0)-(p1*c1))/(p0-p1)
# Chi-square difference test(TRd)
TRd <- (lr)/(cd)
# Degrees of freedom
df <- abs(p0 - p1)
# Significance test
(p_diff <- pchisq(TRd, df, lower.tail=FALSE))
## [1] 2.294394e-40
Diagnostic Summary: Nested Model Test of Invariance
The Satorra–Bentler scaled difference test indicates a statistically significant likelihood difference between the non-invariant and invariant models. This implies that, based on this test, there is not support of the fully invariant model.
Given the large sample size, this result is not surprising and reflects the high sensitivity of likelihood-based tests to even trivial deviations from invariance.
Visual class diagnostics and structural stability assessments indicate that the invariant model preserves the same four-class configuration without substantive distortion.
Taken together, despite the significant LL tests, the statistical and structural diagnostics support retention of the fully invariant model for longitudinal transition modeling.
compareModels
in MplusAutomationNested model comparisons can also be conducted directly through the compareModels() function in MplusAutomation, which automates the extraction of likelihood values, parameters, and scaling correction factors needed for robust chi-square difference testing under MLR estimation.
invisible(capture.output(
comparison <- compareModels(lta_models$invariant_lta.out, lta_models$noninvariant_lta.out, diffTest=TRUE)
))
# p-value
comparison$diffTest$MLR_LL$p
## [1] 2.294394e-40
#BIC
comparison$summaries
## Title Observations Estimator Parameters LL AIC
## 1 Invariant LTA Model 1907 MLR 49 -23700.74 47499.48
## 2 Non-Invariant LTA Model 1907 MLR 129 -23492.12 47242.24
## BIC
## 1 47771.59
## 2 47958.62
Diagnostic Summary: compareModels
Results from compareModels() indicate that the scaled chi-square difference between the non-invariant and invariant models is statistically significant (p < .001)
However, the BIC supports the fully invariant model.
These results are shown in the first set of output.
Phase 2 evaluates whether a common longitudinal measurement structure can be defensibly imposed across Grades 7, 10, and 12. Visual diagnostics and nested model testing jointly indicate that a fully invariant four-class measurement model is supported and full invariance does not cause class distortions or instability. With longitudinal measurement invariance validated for the purposes of auxiliary-variable modeling, the workflow proceeds to Phase 3, where invariant measurement parameters are carried forward to generate classification-error–corrected posterior probabilities and implement the ML three-step and BCH auxiliary-variable procedures.
The ML three-step method is a model-based correction procedure that allows auxiliary variables to be incorporated into latent class and latent transition models while accounting for classification uncertainty. The method proceeds by:
generating posterior class membership under a fixed measurement model,
translating classification uncertainty into fixed logits, and
estimating structural relationships while holding the measurement model constant.
This approach ensures that auxiliary-variable effects are estimated relative to stable, invariant latent classes.
Phase 3 implements multi-step estimation methods for incorporating auxiliary variables into the latent transition analysis while preserving the invariant measurement structure established in Phase 2. The central objective of this phase is to estimate:
longitudinal transition dynamics across Grades 7, 10, and 12, and
covariate-adjusted effects on class membership and transitions,
without allowing auxiliary variables to distort the latent class definitions.
This phase focuses first on the ML three-step procedure, which corrects for classification error by fixing logits derived from an invariant measurement model. This ensures that auxiliary variables influence the structural components of the model (class membership and transitions), but do not redefine the latent classes themselves.
A parallel BCH-based estimation strategy is introduced later as an alternative auxiliary-variable framework under invariant measurement.
Phase 3 produces the following structural outputs:
Fixed-threshold LCAs at each timepoint
Posterior class probabilities and modal class assignments under invariant measurement
Classification-error logits for all three waves
A baseline (unconditional) LTA transition model
A covariate-adjusted LTA transition model
Longitudinal transition probability plots
Covariate effect estimates on class membership and transitions
These outputs collectively establish a classification-error–corrected longitudinal transition model under invariant measurement.
This phase proceeds in three structural steps:
Re-estimate wave-specific LCAs with invariant thresholds fixed
Extract classification-error logits and merge posterior data
Estimate unconditional and covariate-adjusted LTAs with fixed logits
Each step transfers information from the invariant measurement model into the longitudinal structural model.
In Step 1, each single-wave LCA is re-estimated while fixing all item–class thresholds to the invariant values obtained in Phase 2. These fixed thresholds enforce the invariant measurement model at each wave, ensuring that class meaning is identical across time.
With thresholds fixed:
only class proportions are freely estimated, and
each model saves:
posterior class probabilities (CPROB), and
most-likely class assignments (MLCC).
These saved quantities form the basis for computing classification-error logits in Step 2.
Conceptual role of Step 1:
To regenerate posterior classification under invariant measurement conditions so that classification uncertainty can be carried forward into transition modeling.
Syntax Walkthrough: Step 1 – Re-Estimate Wave-Specific LCAs With Fixed Thresholds
In Step 1, each wave’s LCA is re-estimated while fixing all threshold parameters to the invariant values obtained in Phase 2. This enforces the invariant measurement model at each wave and ensures that class meaning is identical across time.
Key elements:
• All thresholds are fixed using invariant values (checked earlier in
Phase 2).
• With thresholds fixed, only class proportions are estimated.
• Each model uses starts = 0 because no random-search is
needed when parameters are fixed.
• Each model saves posterior information using
Save = cprob;, producing variables like
CPROB1–CPROB4 and MLCC.
• These saved data are required for logits extraction in Step 2.
• The fixed-threshold blocks consist exclusively of fixed statements
such as [ab39a$1*-1.53361], ensuring measurement
invariance.
• The only difference across waves is in the indicator sets
(AB*, GA*, KA*). • OPTIONAL: We
have rearranged the class labels in Step 1 by renaming the class
specific statements (e.g., %C1#2%)
Time 1: Fixed-Threshold LCA With Saved
CPROBS
inv_t1 <- mplusObject(
TITLE = "Time 1 - Invariant LCA",
VARIABLE =
"idvariable=CASENUM;
categorical = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
usevar = AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X;
classes = c1(4);
auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;",
MODEL =
"%overall%
%C1#1%
[ ab39a$1@-1.53313 ] (1);
[ ab39h$1@-2.90401 ] (2);
[ ab39i$1@-2.99575 ] (3);
[ ab39k$1@-2.83359 ] (4);
[ ab39l$1@-3.85594 ] (5);
[ ab39m$1@-2.16372 ] (6);
[ ab39t$1@-2.13876 ] (7);
[ ab39u$1@-2.48348 ] (8);
[ ab39w$1@-1.81515 ] (9);
[ ab39x$1@-2.35163 ] (10);
%C1#3%
[ ab39a$1@-0.06349 ] (11);
[ ab39h$1@-0.07976 ] (12);
[ ab39i$1@-0.52495 ] (13);
[ ab39k$1@-0.01216 ] (14);
[ ab39l$1@0.25484 ] (15);
[ ab39m$1@-0.83350 ] (16);
[ ab39t$1@0.07961 ] (17);
[ ab39u$1@-0.45274 ] (18);
[ ab39w$1@0.51840 ] (19);
[ ab39x$1@0.22569 ] (20);
%C1#2%
[ ab39a$1@-0.88292 ] (21);
[ ab39h$1@-1.67761 ] (22);
[ ab39i$1@-0.87642 ] (23);
[ ab39k$1@-1.94409 ] (24);
[ ab39l$1@-2.31259 ] (25);
[ ab39m$1@0.42887 ] (26);
[ ab39t$1@2.29606 ] (27);
[ ab39u$1@1.02908 ] (28);
[ ab39w$1@1.97179 ] (29);
[ ab39x$1@1.80988 ] (30);
%C1#4%
[ ab39a$1@0.74744 ] (31);
[ ab39h$1@2.01287 ] (32);
[ ab39i$1@1.67957 ] (33);
[ ab39k$1@1.54221 ] (34);
[ ab39l$1@2.34965 ] (35);
[ ab39m$1@1.38217 ] (36);
[ ab39t$1@3.73153 ] (37);
[ ab39u$1@3.09927 ] (38);
[ ab39w$1@3.69495 ] (39);
[ ab39x$1@3.68803 ] (40);",
OUTPUT = "TECH1 TECH8 TECH14;",
SAVEDATA =
"File=t1_inv_cprobs.dat;
Save=cprob;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
inv_t1_fit <- mplusModeler(inv_t1,
dataout=here("phase_3","step_1_ml", "t1_inv.dat"),
modelout=here("phase_3","step_1_ml", "t1_inv_lca.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
Time 2: Fixed-Threshold LCA With Saved
CPROBS
inv_t2 <- mplusObject(
TITLE = "Time 2 - Invariant LCA",
VARIABLE =
"idvariable=CASENUM;
categorical = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
usevar = GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L;
classes = c2(4);
auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;",
MODEL =
"%overall%
%C2#1%
[ ga32a$1@-1.53313 ] (1);
[ ga32h$1@-2.90401 ] (2);
[ ga32i$1@-2.99575 ] (3);
[ ga32k$1@-2.83359 ] (4);
[ ga32l$1@-3.85594 ] (5);
[ ga33a$1@-2.16372 ] (6);
[ ga33h$1@-2.13876 ] (7);
[ ga33i$1@-2.48348 ] (8);
[ ga33k$1@-1.81515 ] (9);
[ ga33l$1@-2.35163 ] (10);
%C2#3%
[ ga32a$1@-0.06349 ] (11);
[ ga32h$1@-0.07976 ] (12);
[ ga32i$1@-0.52495 ] (13);
[ ga32k$1@-0.01216 ] (14);
[ ga32l$1@0.25484 ] (15);
[ ga33a$1@-0.83350 ] (16);
[ ga33h$1@0.07961 ] (17);
[ ga33i$1@-0.45274 ] (18);
[ ga33k$1@0.51840 ] (19);
[ ga33l$1@0.22569 ] (20);
%C2#2%
[ ga32a$1@-0.88292 ] (21);
[ ga32h$1@-1.67761 ] (22);
[ ga32i$1@-0.87642 ] (23);
[ ga32k$1@-1.94409 ] (24);
[ ga32l$1@-2.31259 ] (25);
[ ga33a$1@0.42887 ] (26);
[ ga33h$1@2.29606 ] (27);
[ ga33i$1@1.02908 ] (28);
[ ga33k$1@1.97179 ] (29);
[ ga33l$1@1.80988 ] (30);
%C2#4%
[ ga32a$1@0.74744 ] (31);
[ ga32h$1@2.01287 ] (32);
[ ga32i$1@1.67957 ] (33);
[ ga32k$1@1.54221 ] (34);
[ ga32l$1@2.34965 ] (35);
[ ga33a$1@1.38217 ] (36);
[ ga33h$1@3.73153 ] (37);
[ ga33i$1@3.09927 ] (38);
[ ga33k$1@3.69495 ] (39);
[ ga33l$1@3.68803 ] (40);
",
OUTPUT = "TECH1 TECH8 TECH14;",
SAVEDATA =
"File=t2_inv_cprobs.dat;
Save=cprob;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
inv_t2_fit <- mplusModeler(inv_t2,
dataout=here("phase_3","step_1_ml", "t2_inv.dat"),
modelout=here("phase_3","step_1_ml", "t2_inv_lca.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
Time 3: Fixed-Threshold LCA With Saved
CPROBS
inv_t3 <- mplusObject(
TITLE = "Time 3 - Invariant LCA",
VARIABLE =
"idvariable=CASENUM;
categorical = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
usevar = KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
classes = c3(4);
auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0; ",
MODEL =
"%overall%
%C3#1%
[ ka46a$1@-1.53313 ] (1);
[ ka46h$1@-2.90401 ] (2);
[ ka46i$1@-2.99575 ] (3);
[ ka46k$1@-2.83359 ] (4);
[ ka46l$1@-3.85594 ] (5);
[ ka47a$1@-2.16372 ] (6);
[ ka47h$1@-2.13876 ] (7);
[ ka47i$1@-2.48348 ] (8);
[ ka47k$1@-1.81515 ] (9);
[ ka47l$1@-2.35163 ] (10);
%C3#2%
[ ka46a$1@-0.06349 ] (11);
[ ka46h$1@-0.07976 ] (12);
[ ka46i$1@-0.52495 ] (13);
[ ka46k$1@-0.01216 ] (14);
[ ka46l$1@0.25484 ] (15);
[ ka47a$1@-0.83350 ] (16);
[ ka47h$1@0.07961 ] (17);
[ ka47i$1@-0.45274 ] (18);
[ ka47k$1@0.51840 ] (19);
[ ka47l$1@0.22569 ] (20);
%C3#3%
[ ka46a$1@-0.88292 ] (21);
[ ka46h$1@-1.67761 ] (22);
[ ka46i$1@-0.87642 ] (23);
[ ka46k$1@-1.94409 ] (24);
[ ka46l$1@-2.31259 ] (25);
[ ka47a$1@0.42887 ] (26);
[ ka47h$1@2.29606 ] (27);
[ ka47i$1@1.02908 ] (28);
[ ka47k$1@1.97179 ] (29);
[ ka47l$1@1.80988 ] (30);
%C3#4%
[ ka46a$1@0.74744 ] (31);
[ ka46h$1@2.01287 ] (32);
[ ka46i$1@1.67957 ] (33);
[ ka46k$1@1.54221 ] (34);
[ ka46l$1@2.34965 ] (35);
[ ka47a$1@1.38217 ] (36);
[ ka47h$1@3.73153 ] (37);
[ ka47i$1@3.09927 ] (38);
[ ka47k$1@3.69495 ] (39);
[ ka47l$1@3.68803 ] (40);",
OUTPUT = "TECH1 TECH8 TECH14;",
SAVEDATA =
"File=t3_inv_cprobs.dat;
Save=cprob;",
usevariables = colnames(lsay_data),
rdata = lsay_data)
inv_t3_fit <- mplusModeler(inv_t3,
dataout=here("phase_3","step_1_ml", "t3_inv.dat"),
modelout=here("phase_3","step_1_ml", "t3_inv_lca.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
After fitting the fixed-threshold LCAs, class probabilities are visualized to confirm that invariant measurement has been correctly transferred into each wave’s model.
Syntax Walkthrough: Plotting the Fixed-Threshold LCAs
After Step 1 models are estimated, the three invariant LCAs are visualized to confirm that fixing thresholds produces class probabilities that remain aligned across all timepoints.
Key elements:
• Uses plot_lca() applied to each wave’s .out
file.
• Side-by-side display checks that class shapes remain consistent with
the invariant structure.
• This is a diagnostic step before logits extraction.
source(here("functions","plot_lca.R"))
t1 <- readModels(here("phase_3","step_1_ml","t1_inv_lca.out"))
t2 <- readModels(here("phase_3","step_1_ml","t2_inv_lca.out"))
t3 <- readModels(here("phase_3","step_1_ml","t3_inv_lca.out"))
(plot_lca(t1) | plot_lca(t2) | plot_lca(t3))
Diagnostic Summary: Fixed-Threshold Measurement Verification
The fixed-threshold LCAs reproduce the same four-class structure at all three timepoints.
Relative ordering of class endorsement remains stable across Grades 7, 10, and 12.
Class proportions remain substantively stable under invariant specification.
These diagnostics confirm that fixing invariant thresholds does not compromise class recoverability and that the invariant measurement model is correctly transferred into the three-step framework.
Step 2 reconstructs the classification-error structure implied by the invariant LCAs and prepares the dataset required for three-step LTA estimation.
Two components are extracted:
Classification-error logits, which encode misclassification uncertainty and will be fixed in all subsequent LTA models.
Posterior and modal classification data, where
saved CPROB and MLCC variables from each wave
are merged into a single wide dataset (savedata_t123)
containing:
nominal class indicators (n1, n2,
n3), and
posterior class probabilities at each timepoint.
Conceptual role of Step 2:
To transfer probabilistic classification uncertainty from the measurement model into the structural transition model without allowing auxiliary variables to distort class definitions.
Syntax Walkthrough: Step 2 – Extract Logits and Merge Posterior Data
Step 2 extracts classification-error logits and merges the saved
CPROB and MLCC variables across all waves.
Key elements:
• Logits for each timepoint are extracted from
class_counts$logitProbs.mostLikely.
• These logits encode misclassification uncertainty and will be fixed in
all later LTA models.
• Saved CPROB and MLCC variables are renamed
(e.g., cprob11, n1) for consistency across
waves.
• Data are merged by CASENUM into a single wide dataset
(savedata_t123).
• Missing values coded as 999 are converted to proper
NAs.
• This merged dataset provides the nominal class indicators
(n1, n2, n3) required for LTA
estimation.
Extract Classification-Error Logits
inv_step4 <- readModels(here("phase_3","step_1_ml"), quiet = TRUE)
logits_t1 <- as.data.frame(inv_step4$t1_inv_lca.out$class_counts$logitProbs.mostLikely)
logits_t2 <- as.data.frame(inv_step4$t2_inv_lca.out$class_counts$logitProbs.mostLikely)
logits_t3 <- as.data.frame(inv_step4$t3_inv_lca.out$class_counts$logitProbs.mostLikely)
Extract and Merge Saved Posterior Probabilities
savedata_t1 <- as.data.frame(inv_step4$t1_inv_lca.out$savedata) %>%
rename(cprob11=CPROB1,cprob12=CPROB2,cprob13=CPROB3,cprob14=CPROB4,n1=MLCC1)
savedata_t2 <- as.data.frame(inv_step4$t2_inv_lca.out$savedata) %>%
rename(cprob21=CPROB1,cprob22=CPROB2,cprob23=CPROB3,cprob24=CPROB4,n2=MLCC2)
savedata_t3 <- as.data.frame(inv_step4$t3_inv_lca.out$savedata) %>%
rename(cprob31=CPROB1,cprob32=CPROB2,cprob33=CPROB3,cprob34=CPROB4,n3=MLCC3)
savedata_t123 <- savedata_t1 %>%
full_join(savedata_t2, by = "CASENUM") %>%
full_join(savedata_t3, by = "CASENUM") %>%
clean_names() %>% relocate(casenum)
summary(savedata_t123)
## casenum ab39a ab39h ab39i
## Min. :1004 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:2500 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
## Median :3973 Median :1.0000 Median :1.0000 Median :1.000
## Mean :3991 Mean :0.6881 Mean :0.7218 Mean :0.653
## 3rd Qu.:5505 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :6937 Max. :1.0000 Max. :1.0000 Max. :1.000
## NA's :25 NA's :56 NA's :60
## ab39k ab39l ab39m ab39t
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :0.0000
## Mean :0.7681 Mean :0.7512 Mean :0.6236 Mean :0.3984
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :53 NA's :50 NA's :34 NA's :67
## ab39u ab39w ab39x minority_x
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.4924 Mean :0.3996 Mean :0.4645 Mean :0.1704
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :57 NA's :50 NA's :34 NA's :105
## female_x mathg7_x mathg10_x mathg12_x
## Min. :0.0000 Min. :28.93 Min. :30.63 Min. :27.01
## 1st Qu.:0.0000 1st Qu.:44.84 1st Qu.:58.58 1st Qu.:63.40
## Median :1.0000 Median :53.09 Median :67.71 Median :72.19
## Mean :0.5175 Mean :52.58 Mean :66.15 Mean :71.20
## 3rd Qu.:1.0000 3rd Qu.:59.86 3rd Qu.:75.27 3rd Qu.:81.42
## Max. :1.0000 Max. :85.07 Max. :95.26 Max. :99.30
## NA's :21 NA's :42 NA's :532 NA's :1085
## cprob11 cprob12 cprob13 cprob14
## Min. :0.0000 Min. :0.0000 Min. :0.0020 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0030 1st Qu.:0.0090 1st Qu.:0.0000
## Median :0.0080 Median :0.1405 Median :0.0640 Median :0.0000
## Mean :0.3169 Mean :0.3572 Mean :0.2105 Mean :0.1155
## 3rd Qu.:0.8530 3rd Qu.:0.8210 3rd Qu.:0.2752 3rd Qu.:0.0130
## Max. :0.9980 Max. :0.9950 Max. :0.9990 Max. :0.9960
## NA's :21 NA's :21 NA's :21 NA's :21
## n1 ga32a ga32h ga32i
## Min. :1.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :2.000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :2.093 Mean :0.6299 Mean :0.6493 Mean :0.6862
## 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :4.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :21 NA's :375 NA's :387 NA's :390
## ga32k ga32l ga33a ga33h
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.000 Median :1.0000 Median :1.0000 Median :0.0000
## Mean :0.679 Mean :0.6466 Mean :0.5917 Mean :0.4347
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :390 NA's :382 NA's :381 NA's :391
## ga33i ga33k ga33l minority_y
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.5251 Mean :0.4289 Mean :0.4199 Mean :0.1632
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :391 NA's :389 NA's :383 NA's :430
## female_y mathg7_y mathg10_y mathg12_y
## Min. :0.0000 Min. :28.93 Min. :30.63 Min. :27.01
## 1st Qu.:0.0000 1st Qu.:46.10 1st Qu.:59.09 1st Qu.:64.00
## Median :1.0000 Median :53.57 Median :67.86 Median :72.56
## Mean :0.5254 Mean :53.20 Mean :66.41 Mean :71.63
## 3rd Qu.:1.0000 3rd Qu.:60.09 3rd Qu.:75.30 3rd Qu.:81.73
## Max. :1.0000 Max. :85.07 Max. :95.26 Max. :99.30
## NA's :373 NA's :390 NA's :557 NA's :1128
## cprob21 cprob22 cprob23 cprob24
## Min. :0.0000 Min. :0.0000 Min. :0.0020 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0055 1st Qu.:0.0000
## Median :0.0035 Median :0.0110 Median :0.0430 Median :0.0000
## Mean :0.3343 Mean :0.2391 Mean :0.2287 Mean :0.1979
## 3rd Qu.:0.9640 3rd Qu.:0.4000 3rd Qu.:0.3210 3rd Qu.:0.1230
## Max. :0.9980 Max. :0.9920 Max. :0.9990 Max. :0.9980
## NA's :373 NA's :373 NA's :373 NA's :373
## n2 ka46a ka46h ka46i
## Min. :1.000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :2.000 Median :1.000 Median :1.0000 Median :1.0000
## Mean :2.275 Mean :0.567 Mean :0.6733 Mean :0.7121
## 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :4.000 Max. :1.000 Max. :1.0000 Max. :1.0000
## NA's :373 NA's :787 NA's :796 NA's :799
## ka46k ka46l ka47a ka47h
## Min. :0.0000 Min. :0.0000 Min. :0.000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.0000 Median :1.0000 Median :1.000 Median :0.000
## Mean :0.6109 Mean :0.6513 Mean :0.552 Mean :0.485
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.0000 Max. :1.0000 Max. :1.000 Max. :1.000
## NA's :802 NA's :803 NA's :802 NA's :808
## ka47i ka47k ka47l minority
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.5645 Mean :0.3831 Mean :0.4433 Mean :0.1513
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :807 NA's :808 NA's :804 NA's :823
## female mathg7 mathg10 mathg12
## Min. :0.0000 Min. :28.93 Min. :30.63 Min. :27.01
## 1st Qu.:0.0000 1st Qu.:47.05 1st Qu.:60.66 1st Qu.:63.95
## Median :1.0000 Median :54.23 Median :68.83 Median :72.31
## Mean :0.5169 Mean :54.05 Mean :67.49 Mean :71.59
## 3rd Qu.:1.0000 3rd Qu.:60.80 3rd Qu.:76.10 3rd Qu.:81.66
## Max. :1.0000 Max. :85.07 Max. :95.26 Max. :99.30
## NA's :785 NA's :793 NA's :955 NA's :1137
## cprob31 cprob32 cprob33 cprob34
## Min. :0.0000 Min. :0.0020 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0020 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0060 Median :0.0400 Median :0.0060 Median :0.0000
## Mean :0.3511 Mean :0.2298 Mean :0.2066 Mean :0.2125
## 3rd Qu.:0.9650 3rd Qu.:0.3095 3rd Qu.:0.2480 3rd Qu.:0.2005
## Max. :0.9980 Max. :0.9990 Max. :0.9910 Max. :0.9980
## NA's :785 NA's :785 NA's :785 NA's :785
## n3
## Min. :1.000
## 1st Qu.:1.000
## Median :2.000
## Mean :2.291
## 3rd Qu.:3.000
## Max. :4.000
## NA's :785
#describe(savedata_t123)
Diagnostic Summary: Classification-Error Transfer
Classification-error logits were successfully extracted for all three timepoints.
Posterior class probabilities (CPROB) and modal
assignments (MLCC) were merged into a unified wide
dataset.
Nominal class indicators (n1, n2,
n3) now encode probabilistic class membership with known
misclassification uncertainty.
Identifier alignment and missing-value handling were verified.
These diagnostics confirm that classification uncertainty has been properly reconstructed for three-step LTA estimation.
In Step 3a, the baseline longitudinal transition model is estimated under the ML three-step framework.
In this model:
the nominal indicators (n1, n2,
n3) act as imperfect measurements of the latent class
variables (c1, c2, c3),
all classification-error logits are fixed to their Step 2 values, and
only the structural transition regressions
(c2 ON c1, c3 ON c2) are freely
estimated.
By fixing the logits, this model produces transition estimates that are explicitly corrected for classification error while preserving invariant measurement.
Conceptual role of Step 3a:
To establish the baseline pattern of latent transitions before introducing auxiliary variables.
The purpose of this step is to confirm that class sizes and transition patterns match expectations from the earlier models before adding auxiliary variables.
Syntax Walkthrough: Step 3a – Unconditional LTA With Fixed Logits
Step 3a estimates the unconditional transition model with logits fixed.
Key elements:
• The nominal indicators are declared using
nominal = n1 n2 n3;.
• Logits are fixed using statements such as
[n1#1@{logits_t1[1,1]}].
• Only the structural regression paths (c2 ON c1;,
c3 ON c2;) are estimated—no measurement parameters
vary.
• Fixing logits ensures classification error is properly accounted for,
and class definitions remain identical to those in the invariant
model.
• This model establishes the baseline transition structure before adding
covariates.
unc_LTA <- mplusObject(
TITLE = "Step3a of 3-step",
VARIABLE =
"idvariable =casenum;
nominal=n1 n2 n3;
usevar = n1 n2 n3;
classes = c1(4) c2(4) c3(4);
auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;",
MODEL = glue(
"%overall%
c2 on c1;
c3 on c2;
Model c1:
%c1#1%
[n1#1@{logits_t1[1,1]}];
[n1#2@{logits_t1[1,2]}];
[n1#3@{logits_t1[1,3]}];
%c1#2%
[n1#1@{logits_t1[2,1]}];
[n1#2@{logits_t1[2,2]}];
[n1#3@{logits_t1[2,3]}];
%c1#3%
[n1#1@{logits_t1[3,1]}];
[n1#2@{logits_t1[3,2]}];
[n1#3@{logits_t1[3,3]}];
%c1#4%
[n1#1@{logits_t1[4,1]}];
[n1#2@{logits_t1[4,2]}];
[n1#3@{logits_t1[4,3]}];
Model c2:
%c2#1%
[n2#1@{logits_t2[1,1]}];
[n2#2@{logits_t2[1,2]}];
[n2#3@{logits_t2[1,3]}];
%c2#2%
[n2#1@{logits_t2[2,1]}];
[n2#2@{logits_t2[2,2]}];
[n2#3@{logits_t2[2,3]}];
%c2#3%
[n2#1@{logits_t2[3,1]}];
[n2#2@{logits_t2[3,2]}];
[n2#3@{logits_t2[3,3]}];
%c2#4%
[n2#1@{logits_t2[4,1]}];
[n2#2@{logits_t2[4,2]}];
[n2#3@{logits_t2[4,3]}];
Model c3:
%c3#1%
[n3#1@{logits_t3[1,1]}];
[n3#2@{logits_t3[1,2]}];
[n3#3@{logits_t3[1,3]}];
%c3#2%
[n3#1@{logits_t3[2,1]}];
[n3#2@{logits_t3[2,2]}];
[n3#3@{logits_t3[2,3]}];
%c3#3%
[n3#1@{logits_t3[3,1]}];
[n3#2@{logits_t3[3,2]}];
[n3#3@{logits_t3[3,3]}];
%c3#4%
[n3#1@{logits_t3[4,1]}];
[n3#2@{logits_t3[4,2]}];
[n3#3@{logits_t3[4,3]}];"),
OUTPUT = "svalues;",
usevariables = colnames(savedata_t123),
rdata = savedata_t123)
unc_LTA_fit <- mplusModeler(unc_LTA,
dataout=here("phase_3","step_3a_ml", "t123_LTA.dat"),
modelout=here("phase_3","step_3a_ml", "unconditional_LTA.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
Plotting the unconditional model helps verify that transition patterns are stable, interpretable, and consistent with expectations from earlier phases. This serves as a diagnostic check prior to adding auxiliary variables.
Syntax Walkthrough: Plotting Transition Probabilities (Unconditional Model)
The unconditional LTA transition plot is used to confirm that movement between classes is interpretable and aligns with expectations from previous phases.
Key elements:
• Uses plot_transition() and custom class/timepoint
labels.
• Provides a clear depiction of how students move across latent classes
from Grade 7 to Grade 12.
• Helps verify that transitions remain coherent before adding auxiliary
variables.
source(here("functions","plot_transition.R"))
lta_model <- readModels(here("phase_3","step_3a_ml", "unconditional_LTA.out"))
plot_transition(
model_name = lta_model,
facet_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive"),
timepoint_labels = c(`1` = "Grade 7", `2` = "Grade 10", `3` = "Grade 12"),
class_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive")
) +
labs(title = "Unconditional LTA (ML Three-Step Method)")
ggsave(here("figures", "unconditional_lta_ml.jpeg"), dpi="retina", height = 6, width = 10, bg = "white", units="in")
Diagnostic Summary: Unconditional Transition Structure
Transition probabilities exhibit substantial longitudinal stability, particularly within the Very Positive and Qualified Positive classes.
The Neutral class shows the greatest fluidity, with transitions distributed across adjacent classes.
The Less Positive class demonstrates modest upward mobility across waves.
No degenerate or implausible transition patterns are observed.
These results confirm that longitudinal transitions are structurally coherent under the invariant three-step framework and provide a valid baseline prior to introducing covariates.
In Step 3b, auxiliary variables are added as predictors of:
baseline class membership, and
longitudinal transitions.
The classification-error logits remain fixed. As a result:
covariates may influence class membership and transitions,
but cannot modify the measurement structure or redefine latent classes.
This guarantees that covariate effects are interpreted relative to a stable, invariant latent class framework.
Syntax Walkthrough: Step 3b – Covariate-Adjusted LTA (Fixed Logits)
In Step 3b, covariates are added while logits remain fixed, preserving the measurement model.
Key elements:
• Covariates predict baseline class membership and transitions using
statements like c1 ON MINORITY FEMALE.
• Logits remain fixed, so covariates cannot redefine classes or alter
thresholds.
• This separation—covariates affect transitions, not measurement—is the
core benefit of the ML three-step method.
• The output provides effects of each predictor relative to the
reference class (“Less Positive”).
con_LTA <- mplusObject(
TITLE = "Step4b of 3-step",
VARIABLE =
"idvariable =casenum;
nominal=n1 n2 n3;
usevar = n1 n2 n3;
classes = c1(4) c2(4) c3(4);
usevar = MINORITY, FEMALE;",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;",
MODEL = glue(
"%overall%
c1 ON MINORITY Female;
c2 ON c1 MINORITY Female;
c3 ON c2 MINORITY Female;
Model c1:
%c1#1%
[n1#1@{logits_t1[1,1]}];
[n1#2@{logits_t1[1,2]}];
[n1#3@{logits_t1[1,3]}];
%c1#2%
[n1#1@{logits_t1[2,1]}];
[n1#2@{logits_t1[2,2]}];
[n1#3@{logits_t1[2,3]}];
%c1#3%
[n1#1@{logits_t1[3,1]}];
[n1#2@{logits_t1[3,2]}];
[n1#3@{logits_t1[3,3]}];
%c1#4%
[n1#1@{logits_t1[4,1]}];
[n1#2@{logits_t1[4,2]}];
[n1#3@{logits_t1[4,3]}];
Model c2:
%c2#1%
[n2#1@{logits_t2[1,1]}];
[n2#2@{logits_t2[1,2]}];
[n2#3@{logits_t2[1,3]}];
%c2#2%
[n2#1@{logits_t2[2,1]}];
[n2#2@{logits_t2[2,2]}];
[n2#3@{logits_t2[2,3]}];
%c2#3%
[n2#1@{logits_t2[3,1]}];
[n2#2@{logits_t2[3,2]}];
[n2#3@{logits_t2[3,3]}];
%c2#4%
[n2#1@{logits_t2[4,1]}];
[n2#2@{logits_t2[4,2]}];
[n2#3@{logits_t2[4,3]}];
Model c3:
%c3#1%
[n3#1@{logits_t3[1,1]}];
[n3#2@{logits_t3[1,2]}];
[n3#3@{logits_t3[1,3]}];
%c3#2%
[n3#1@{logits_t3[2,1]}];
[n3#2@{logits_t3[2,2]}];
[n3#3@{logits_t3[2,3]}];
%c3#3%
[n3#1@{logits_t3[3,1]}];
[n3#2@{logits_t3[3,2]}];
[n3#3@{logits_t3[3,3]}];
%c3#4%
[n3#1@{logits_t3[4,1]}];
[n3#2@{logits_t3[4,2]}];
[n3#3@{logits_t3[4,3]}];"),
OUTPUT = "svalues;",
usevariables = colnames(savedata_t123),
rdata = savedata_t123)
con_LTA_fit <- mplusModeler(con_LTA,
dataout=here("phase_3","step_3b_ml", "t123_LTA.dat"),
modelout=here("phase_3","step_3b_ml", "LTA_covariates.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
The covariate table summarizes the effects of each predictor relative to the reference class (“Less Positive”). Effect sizes, standard errors, and significance tests are extracted directly from the Mplus output.
Syntax Walkthrough: Covariate Effects Table
The covariate table summarizes how MINORITY and FEMALE relate to class membership at each timepoint.
Key elements:
• Extracts unstandardized logits from the parameter table.
• Formats estimates as estimate (se) with significance
markers.
• Reference class is “Less Positive.”
• Uses gt() for clean presentation in the final
tutorial.
lta_cov_model <- readModels(here("phase_3","step_3b_ml", "LTA_covariates.out"))
# REFERENCE CLASS 4
cov <- as.data.frame(lta_cov_model[["parameters"]][["unstandardized"]]) %>%
filter(param %in% c("MINORITY", "FEMALE")) %>%
mutate(param = case_when(
param == "FEMALE" ~ "Gender",
param == "MINORITY" ~ "URM"),
se = paste0("(", format(round(se,2), nsmall =2), ")")) %>%
separate(paramHeader, into = c("Time", "Class"), sep = "#") %>%
mutate(Class = case_when(
Class == "1.ON" ~ "Very Positive",
Class == "2.ON" ~ "Qualified Positive",
Class == "3.ON" ~ "Neutral"),
Time = case_when(
Time == "C1" ~ "7th Grade (T1)",
Time == "C2" ~ "10th Grade (T2)",
Time == "C3" ~ "12th Grade (T3)",
)
) %>%
unite(estimate, est, se, sep = " ") %>%
select(Time:pval, -est_se) %>%
mutate(pval = ifelse(pval<0.001, paste0("<.001*"),
ifelse(pval<0.05, paste0(scales::number(pval, accuracy = .001), "*"),
scales::number(pval, accuracy = .001))))
# Create table
cov_m1 <- cov %>%
group_by(param, Class) %>%
gt() %>%
tab_header(
title = "Relations Between the Covariates and Latent Class (ML Three-Step Method)") %>%
tab_footnote(
footnote = md(
"Reference Group: Less Positive"
),
locations = cells_title()
) %>%
cols_label(
param = md("Covariate"),
estimate = md("Estimate (*se*)"),
pval = md("*p*-value")) %>%
sub_missing(1:3,
missing_text = "") %>%
sub_values(values = c(999.000), replacement = "-") %>%
cols_align(align = "center") %>%
opt_align_table_header(align = "left") %>%
gt::tab_options(table.font.names = "serif")
cov_m1
| Relations Between the Covariates and Latent Class (ML Three-Step Method)1 | ||
| Time | Estimate (se) | p-value |
|---|---|---|
| URM - Very Positive | ||
| 7th Grade (T1) | 0.295 (0.40) | 0.459 |
| 10th Grade (T2) | 0.169 (0.33) | 0.611 |
| 12th Grade (T3) | 0.403 (0.37) | 0.273 |
| Gender - Very Positive | ||
| 7th Grade (T1) | -0.398 (0.28) | 0.150 |
| 10th Grade (T2) | -0.124 (0.22) | 0.580 |
| 12th Grade (T3) | 0.155 (0.22) | 0.482 |
| URM - Qualified Positive | ||
| 7th Grade (T1) | -0.174 (0.43) | 0.683 |
| 10th Grade (T2) | 0.429 (0.34) | 0.208 |
| 12th Grade (T3) | 0.818 (0.39) | 0.034* |
| Gender - Qualified Positive | ||
| 7th Grade (T1) | 0.252 (0.29) | 0.380 |
| 10th Grade (T2) | 0.22 (0.25) | 0.379 |
| 12th Grade (T3) | 0.414 (0.26) | 0.105 |
| URM - Neutral | ||
| 7th Grade (T1) | 0.509 (0.47) | 0.279 |
| 10th Grade (T2) | 0.164 (0.39) | 0.673 |
| 12th Grade (T3) | 1.123 (0.40) | 0.005* |
| Gender - Neutral | ||
| 7th Grade (T1) | -0.228 (0.34) | 0.508 |
| 10th Grade (T2) | -0.195 (0.27) | 0.472 |
| 12th Grade (T3) | 0.641 (0.26) | 0.015* |
| 1 Reference Group: Less Positive | ||
gtsave(cov_m1, here("figures", "cov_table_ml.png"))
Diagnostic Summary: Covariate-Adjusted Class Membership
Across Grades 7 and 10, covariate effects on class membership are small and largely non-significant.
By Grade 12, URM students show higher odds of membership in the Qualified Positive and Neutral classes relative to the Less Positive class.
Female students show elevated odds of Qualified Positive membership at Grade 12.
No covariate significantly differentiates membership in the Very Positive class.
These results indicate that demographic differentiation in STEM attitudes is minimal in early adolescence but becomes more pronounced by the end of high school.
Phase 3 implements the full ML three-step latent transition framework under a longitudinally invariant measurement structure. Fixed-threshold LCAs confirm correct transfer of invariant measurement, classification-error logits formally encode posterior uncertainty, and unconditional and covariate-adjusted LTAs establish a stable developmental transition structure across Grades 7, 10, and 12. With classification error properly controlled and auxiliary-variable effects estimated without contaminating the measurement model, the workflow is now positioned to proceed to the BCH auxiliary-variable framework as an alternative estimation strategy.
Apply BCH Training Weights for Classification-Error Correction
This phase implements the BCH auxiliary-variable framework as an alternative multi-step strategy for estimating latent transition models under longitudinal measurement invariance. Unlike the ML three-step method, which corrects for classification error using fixed logits, the BCH approach uses classification-error–adjusted training weights derived from a single, fully invariant latent transition model.
These BCH training weights carry forward full classification uncertainty while ensuring that:
auxiliary variables may influence class membership and transitions,
but cannot alter the underlying measurement model or redefine latent classes.
The BCH approach therefore provides an independent verification of auxiliary-variable effects under invariant measurement using a fundamentally different correction mechanism than the ML three-step procedure.
The BCH variant of Phase 3 produces:
A single fully invariant LTA model with BCH specification
64 BCH training weights for each individual (4 × 4 × 4 class combinations)
A baseline (unconditional) BCH-corrected LTA model
A covariate-adjusted BCH-corrected LTA model
Longitudinal BCH transition probability plots
A BCH-corrected covariate effects table
Together, these outputs establish a classification-error–corrected transition model under BCH weighting, fully parallel to the ML three-step results.
This phase proceeds in four structured steps:
Estimate a fully invariant BCH LTA model to generate training weights
Extract BCH weights from the saved data
Estimate the unconditional LTA using BCH weights only
Estimate the covariate-adjusted LTA using BCH weights
Each step transfers classification uncertainty from the invariant measurement model into the BCH-corrected structural model.
In Step 1, a single, fully invariant latent transition model is estimated across all three time points. All item–class thresholds are fixed to the invariant values established in Phase 2, enforcing strict longitudinal measurement invariance.
Auxiliary variables are included under the BCH specification, but do not influence class membership or transitions at this stage. Instead, they are used solely to generate BCH training weights.
From this model, Mplus produces:
invariant threshold parameters (fixed),
unconditional class proportions,
BCH-corrected item-response classes, and
64 BCHW* variables representing all possible
C1 × C2 × C3 class combinations.
These BCH weights encode classification uncertainty and will serve as the only class indicators for all subsequent BCH transition models.
Conceptual role of BCH Step 1:
To generate a single set of posterior-probability-based training weights that preserve invariant class meaning while fully encoding classification error.
Syntax Walkthrough: BCH Step 1 – Invariant LTA With BCH Weights
This syntax estimates a single fully invariant LTA model and computes
BCH training weights.
Key elements:
• Thresholds are fixed to invariant values; no measurement parameters
are estimated.
• Auxiliary variables are listed under auxiliary = ... but
do not affect the model; they trigger BCH weight generation.
• A single model generates BCHW1–BCHW64, one for each
possible c1 × c2 × c3 class combination.
• starts = 0 is used because all measurement parameters are
fixed.
• Output includes invariant classes and BCH training weights for all
individuals. • OPTIONAL: We have rearranged the class labels in Step 1
by renaming the class specific statements (e.g.,
%C1#2%)
step1_bch <- mplusObject(
TITLE = "Step 1 - BCH Method",
VARIABLE =
"idvariable=CASENUM;
usev =
AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
categorical =
AB39A AB39H AB39I AB39K AB39L AB39M AB39T AB39U AB39W AB39X
GA32A GA32H GA32I GA32K GA32L GA33A GA33H GA33I GA33K GA33L
KA46A KA46H KA46I KA46K KA46L KA47A KA47H KA47I KA47K KA47L;
classes = c1(4) c2(4) c3(4);
auxiliary = MINORITY, FEMALE, MATHG7, MATHG10, MATHG12;",
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;",
MODEL =
"%overall%
MODEL C1:
%C1#1%
[ ab39a$1@-1.53313 ] (1);
[ ab39h$1@-2.90401 ] (2);
[ ab39i$1@-2.99575 ] (3);
[ ab39k$1@-2.83359 ] (4);
[ ab39l$1@-3.85594 ] (5);
[ ab39m$1@-2.16372 ] (6);
[ ab39t$1@-2.13876 ] (7);
[ ab39u$1@-2.48348 ] (8);
[ ab39w$1@-1.81515 ] (9);
[ ab39x$1@-2.35163 ] (10);
%C1#3%
[ ab39a$1@-0.06349 ] (11);
[ ab39h$1@-0.07976 ] (12);
[ ab39i$1@-0.52495 ] (13);
[ ab39k$1@-0.01216 ] (14);
[ ab39l$1@0.25484 ] (15);
[ ab39m$1@-0.83350 ] (16);
[ ab39t$1@0.07961 ] (17);
[ ab39u$1@-0.45274 ] (18);
[ ab39w$1@0.51840 ] (19);
[ ab39x$1@0.22569 ] (20);
%C1#2%
[ ab39a$1@-0.88292 ] (21);
[ ab39h$1@-1.67761 ] (22);
[ ab39i$1@-0.87642 ] (23);
[ ab39k$1@-1.94409 ] (24);
[ ab39l$1@-2.31259 ] (25);
[ ab39m$1@0.42887 ] (26);
[ ab39t$1@2.29606 ] (27);
[ ab39u$1@1.02908 ] (28);
[ ab39w$1@1.97179 ] (29);
[ ab39x$1@1.80988 ] (30);
%C1#4%
[ ab39a$1@0.74744 ] (31);
[ ab39h$1@2.01287 ] (32);
[ ab39i$1@1.67957 ] (33);
[ ab39k$1@1.54221 ] (34);
[ ab39l$1@2.34965 ] (35);
[ ab39m$1@1.38217 ] (36);
[ ab39t$1@3.73153 ] (37);
[ ab39u$1@3.09927 ] (38);
[ ab39w$1@3.69495 ] (39);
[ ab39x$1@3.68803 ] (40);
MODEL C2:
%C2#1%
[ ga32a$1@-1.53313 ] (1);
[ ga32h$1@-2.90401 ] (2);
[ ga32i$1@-2.99575 ] (3);
[ ga32k$1@-2.83359 ] (4);
[ ga32l$1@-3.85594 ] (5);
[ ga33a$1@-2.16372 ] (6);
[ ga33h$1@-2.13876 ] (7);
[ ga33i$1@-2.48348 ] (8);
[ ga33k$1@-1.81515 ] (9);
[ ga33l$1@-2.35163 ] (10);
%C2#3%
[ ga32a$1@-0.06349 ] (11);
[ ga32h$1@-0.07976 ] (12);
[ ga32i$1@-0.52495 ] (13);
[ ga32k$1@-0.01216 ] (14);
[ ga32l$1@0.25484 ] (15);
[ ga33a$1@-0.83350 ] (16);
[ ga33h$1@0.07961 ] (17);
[ ga33i$1@-0.45274 ] (18);
[ ga33k$1@0.51840 ] (19);
[ ga33l$1@0.22569 ] (20);
%C2#2%
[ ga32a$1@-0.88292 ] (21);
[ ga32h$1@-1.67761 ] (22);
[ ga32i$1@-0.87642 ] (23);
[ ga32k$1@-1.94409 ] (24);
[ ga32l$1@-2.31259 ] (25);
[ ga33a$1@0.42887 ] (26);
[ ga33h$1@2.29606 ] (27);
[ ga33i$1@1.02908 ] (28);
[ ga33k$1@1.97179 ] (29);
[ ga33l$1@1.80988 ] (30);
%C2#4%
[ ga32a$1@0.74744 ] (31);
[ ga32h$1@2.01287 ] (32);
[ ga32i$1@1.67957 ] (33);
[ ga32k$1@1.54221 ] (34);
[ ga32l$1@2.34965 ] (35);
[ ga33a$1@1.38217 ] (36);
[ ga33h$1@3.73153 ] (37);
[ ga33i$1@3.09927 ] (38);
[ ga33k$1@3.69495 ] (39);
[ ga33l$1@3.68803 ] (40);
MODEL C3:
%C3#1%
[ ka46a$1@-1.53313 ] (1);
[ ka46h$1@-2.90401 ] (2);
[ ka46i$1@-2.99575 ] (3);
[ ka46k$1@-2.83359 ] (4);
[ ka46l$1@-3.85594 ] (5);
[ ka47a$1@-2.16372 ] (6);
[ ka47h$1@-2.13876 ] (7);
[ ka47i$1@-2.48348 ] (8);
[ ka47k$1@-1.81515 ] (9);
[ ka47l$1@-2.35163 ] (10);
%C3#3%
[ ka46a$1@-0.06349 ] (11);
[ ka46h$1@-0.07976 ] (12);
[ ka46i$1@-0.52495 ] (13);
[ ka46k$1@-0.01216 ] (14);
[ ka46l$1@0.25484 ] (15);
[ ka47a$1@-0.83350 ] (16);
[ ka47h$1@0.07961 ] (17);
[ ka47i$1@-0.45274 ] (18);
[ ka47k$1@0.51840 ] (19);
[ ka47l$1@0.22569 ] (20);
%C3#2%
[ ka46a$1@-0.88292 ] (21);
[ ka46h$1@-1.67761 ] (22);
[ ka46i$1@-0.87642 ] (23);
[ ka46k$1@-1.94409 ] (24);
[ ka46l$1@-2.31259 ] (25);
[ ka47a$1@0.42887 ] (26);
[ ka47h$1@2.29606 ] (27);
[ ka47i$1@1.02908 ] (28);
[ ka47k$1@1.97179 ] (29);
[ ka47l$1@1.80988 ] (30);
%C3#4%
[ ka46a$1@0.74744 ] (31);
[ ka46h$1@2.01287 ] (32);
[ ka46i$1@1.67957 ] (33);
[ ka46k$1@1.54221 ] (34);
[ ka46l$1@2.34965 ] (35);
[ ka47a$1@1.38217 ] (36);
[ ka47h$1@3.73153 ] (37);
[ ka47i$1@3.09927 ] (38);
[ ka47k$1@3.69495 ] (39);
[ ka47l$1@3.68803 ] (40);",
SAVEDATA =
"File=bch_savedata.dat;
Save=bchweights;",
OUTPUT = "sampstat svalues",
usevariables = colnames(lsay_data),
rdata = lsay_data)
step1_fit_bch <- mplusModeler(step1_bch,
dataout=here("phase_3", "step_1_bch", "step1.dat"),
modelout=here("phase_3", "step_1_bch", "step1_bch.inp") ,
check=TRUE, run = TRUE, hashfilename = FALSE)
The invariant BCH class probability plot provides a visual verification that fixing thresholds across all three waves preserves the expected four-class measurement structure.
Syntax Walkthrough: BCH Diagnostic Plot
This plot visualizes the invariant BCH model.
Key elements:
• Uses the same plot_lta() function as Phase 2.
• Confirms that fixed thresholds produce stable class classes.
• Serves as a check that the BCH model encoded the correct measurement
structure.
source(here("functions","plot_lta.R"))
inv_bch <- readModels(here("phase_3","step_1_bch"), quiet = TRUE)
plot_lta(inv_bch)
Diagnostic Summary: BCH Invariant Measurement Verification
The invariant BCH model reproduces the same four-class configuration observed in Phase 2.
No meaningful differences in classes are observed under BCH specification.
Class ordering and shape remain fully aligned with the invariant ML solution.
All four classes remain substantively interpretable across time.
These diagnostics confirm that BCH training weights are generated under the correct invariant measurement structure.
After estimation of the invariant BCH model, the full BCH weight matrix is extracted from the saved data.
Each individual receives 64 weights, corresponding to all possible class combinations across Grades 7, 10, and 12. These weights encode classification uncertainty for every longitudinal class path.
No other class indicators are used in the BCH framework:
no logits,
no posterior probabilities, and
no modal class assignments.
Conceptual role of BCH Step 2:
To construct the complete BCH training-weight representation of probabilistic class membership for downstream structural modeling.
Syntax Walkthrough: BCH Step 2 – Extract BCH Weights
This extracts the 64 BCH weight variables from the saved data.
Key elements:
• Saved data includes variables BCHW1–BCHW64.
• These weights encode classification error for every class path.
• These 64 variables are the only class indicators used in BCH LTA
models.
• No logits, CPROBs, or modal assignments are used in
BCH.
bch_savedata <- as.data.frame(inv_bch$savedata)
#summary(bch_savedata)
Diagnostic Summary: BCH Weight Extraction
All 64 BCH training-weight variables were successfully extracted from the saved data.
Each weight corresponds to a unique C1 × C2 × C3
class path.
No logits, posterior probabilities, or modal classifications are used in the BCH framework.
The BCH dataset is correctly structured for unconditional and covariate-adjusted LTA estimation.
These diagnostics confirm that classification uncertainty has been fully transferred into the BCH training-weight system.
In Step 3a, the unconditional latent transition model is estimated using only BCH training weights as indicators of latent class membership. Because the weights already encode classification error:
no logits, thresholds, or CPROB variables are required, and
measurement remains implicitly fixed through the training weights.
This model freely estimates:
class proportions at each time point, and
transitions from C1 → C2 and
C2 → C3.
This step is directly parallel to Step 3a of the ML three-step method, replacing logits with BCH weights.
Conceptual role of BCH Step 3a:
To estimate baseline longitudinal transitions under full BCH-corrected classification uncertainty.
Syntax Walkthrough: BCH Step 3a – Unconditional LTA Using BCH Weights
This syntax estimates the unconditional transition model using only
BCH weights.
Key elements:
• usevar = BCHW1–BCHW64 includes all training-weight
indicators.
• training = BCHW1–BCHW64(bch) tells Mplus to use BCH
corrections.
• No thresholds or logits appear; measurement is fixed implicitly
through the weights.
• The structural portion (c2 ON c1, c3 ON c2)
is estimated freely.
• Mirrors the ML three-step unconditional model but uses BCH weights
instead of logits.
step3_bch <- mplusObject(
TITLE = "Step 3 - BCH Method",
VARIABLE =
"usevar = BCHW1-BCHW64;
training = BCHW1-BCHW64(bch);
classes = c1(4) c2(4) c3(4);" ,
ANALYSIS =
"type = mixture;
starts = 0;",
MODEL = glue(
"%overall%
c2 ON c1;
c3 ON c2;"),
usevariables = colnames(bch_savedata),
rdata = bch_savedata)
step3_fit_bch <- mplusModeler(step3_bch,
dataout=here("phase_3", "step_3a_bch", "step3.dat"),
modelout=here("phase_3", "step_3a_bch", "unconditional_bch.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
The following visualization helps confirm that the BCH-estimated transitions are stable, interpretable, and consistent with the fixed measurement structure.
Syntax Walkthrough: Plotting BCH Transition Probabilities
This plot displays transition probabilities from the unconditional
BCH LTA.
Key elements:
• Uses the same plot_transition() function as earlier
phases.
• Checks that transition patterns are coherent under BCH
correction.
• Ensures BCH and ML patterns are comparable before adding
covariates.
source(here("functions","plot_transition.R"))
lta_model <- readModels(here("phase_3", "step_3a_bch", "unconditional_bch.out"))
plot_transition(
model_name = lta_model,
facet_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive"),
timepoint_labels = c(`1` = "Grade 7", `2` = "Grade 10", `3` = "Grade 12"),
class_labels = c(`1` = "Very Positive", `2` = "Qualified Positive", `3` = "Neutral", `4` = "Less Positive")
) +
labs(title = "Unconditional LTA (BCH Method)")
ggsave(here("figures", "unconditional_lta_bch.jpeg"), dpi="retina", height = 6, width = 10, bg = "white", units="in")
Diagnostic Summary: Unconditional BCH Transition Structure
The Very Positive class shows the highest degree of stability across all transitions.
The Qualified Positive class remains moderately stable with limited upward and downward mobility.
The Neutral class exhibits the greatest degree of flexibility across waves.
The Less Positive class shows both persistence and gradual upward movement.
No degenerate or implausible transitions are observed under BCH correction.
These transition patterns mirror the ML three-step results, confirming that BCH-adjusted transitions are structurally coherent.
In the final BCH step, auxiliary covariates are added as predictors of:
initial class membership, and
longitudinal transitions.
Because BCH training weights remain fixed:
covariates may influence class membership and transitions,
but cannot alter the underlying measurement model or redefine latent classes.
This guarantees that covariate effects are interpreted relative to a stable, invariant latent class framework.
Syntax Walkthrough: BCH Step 3b – Covariate-Adjusted LTA With BCH Weights
This fits the covariate-adjusted LTA model while keeping measurement
fixed through the BCH weights.
Key elements:
• usevar = BCHW1–BCHW64 MINORITY FEMALE adds
covariates.
• training = BCHW1–BCHW64(bch) ensures covariates cannot
redefine class measurement.
• Covariates appear in structural regressions (e.g.,
c1 ON MINORITY FEMALE).
• Transition regressions include both covariates and prior class
variables (e.g., c2 ON c1 MINORITY FEMALE).
• BCH weights provide all classification-error correction.
step3_bch <- mplusObject(
TITLE = "Step 3 - BCH Method",
VARIABLE =
"usevar = BCHW1-BCHW64 MINORITY, FEMALE;
training = BCHW1-BCHW64(bch);
classes = c1(4) c2(4) c3(4);" ,
ANALYSIS =
"estimator = mlr;
type = mixture;
starts = 0;",
MODEL =
glue(
" %OVERALL%
c1 ON MINORITY Female;
c2 ON c1 MINORITY Female;
c3 ON c2 MINORITY Female;"),
usevariables = colnames(bch_savedata),
rdata = bch_savedata)
step3_fit_bch <- mplusModeler(step3_bch,
dataout=here("phase_3", "step_3b_bch", "step3.dat"),
modelout=here("phase_3", "step_3b_bch", "bch_covariates.inp"),
check=TRUE, run = TRUE, hashfilename = FALSE)
The table below summarizes the effects of covariates on class membership in each time point, relative to the reference class (Less Positive). Estimates incorporate BCH-adjusted classification error and are directly comparable to those from the ML method.
Syntax Walkthrough: Covariate Table (BCH)
This table summarizes covariate effects under BCH correction.
Key elements:
• Extracts unstandardized estimates from the Mplus output.
• Relabels classes and timepoints for interpretability.
• Displays estimate (se) and p-values with significance
markers.
• Reference class is “Less Positive,” matching earlier phases.
• BCH-corrected effects are directly comparable to ML three-step
results.
lta_cov_model <- readModels(here("phase_3", "step_3b_bch", "bch_covariates.out"))
# REFERENCE CLASS 4
cov <- as.data.frame(lta_cov_model[["parameters"]][["unstandardized"]]) %>%
filter(param %in% c("MINORITY", "FEMALE")) %>%
mutate(param = case_when(
param == "FEMALE" ~ "Gender",
param == "MINORITY" ~ "URM"),
se = paste0("(", format(round(se,2), nsmall =2), ")")) %>%
separate(paramHeader, into = c("Time", "Class"), sep = "#") %>%
mutate(Class = case_when(
Class == "1.ON" ~ "Very Positive",
Class == "2.ON" ~ "Qualified Positive",
Class == "3.ON" ~ "Neutral"),
Time = case_when(
Time == "C1" ~ "7th Grade (T1)",
Time == "C2" ~ "10th Grade (T2)",
Time == "C3" ~ "12th Grade (T3)",
)
) %>%
unite(estimate, est, se, sep = " ") %>%
select(Time:pval, -est_se) %>%
mutate(pval = ifelse(pval<0.001, paste0("<.001*"),
ifelse(pval<0.05, paste0(scales::number(pval, accuracy = .001), "*"),
scales::number(pval, accuracy = .001))))
# Create table
cov_m1 <- cov %>%
group_by(param, Class) %>%
gt() %>%
tab_header(
title = "Relations Between the Covariates and Latent Class (BCH Method)") %>%
tab_footnote(
footnote = md(
"Reference Group: Less Positive"
),
locations = cells_title()
) %>%
cols_label(
param = md("Covariate"),
estimate = md("Estimate (*se*)"),
pval = md("*p*-value")) %>%
sub_missing(1:3,
missing_text = "") %>%
sub_values(values = c(999.000), replacement = "-") %>%
cols_align(align = "center") %>%
opt_align_table_header(align = "left") %>%
gt::tab_options(table.font.names = "serif")
cov_m1
| Relations Between the Covariates and Latent Class (BCH Method)1 | ||
| Time | Estimate (se) | p-value |
|---|---|---|
| URM - Very Positive | ||
| 7th Grade (T1) | 0.225 (0.27) | 0.406 |
| 10th Grade (T2) | 0.442 (0.27) | 0.100 |
| 12th Grade (T3) | 0.809 (0.60) | 0.175 |
| Gender - Very Positive | ||
| 7th Grade (T1) | -0.655 (0.19) | 0.001* |
| 10th Grade (T2) | -0.091 (0.20) | 0.641 |
| 12th Grade (T3) | 0.338 (0.45) | 0.457 |
| URM - Qualified Positive | ||
| 7th Grade (T1) | 0.096 (0.28) | 0.736 |
| 10th Grade (T2) | 0.444 (0.27) | 0.105 |
| 12th Grade (T3) | 1.099 (0.40) | 0.006* |
| Gender - Qualified Positive | ||
| 7th Grade (T1) | 0.1 (0.20) | 0.621 |
| 10th Grade (T2) | 0.411 (0.20) | 0.039* |
| 12th Grade (T3) | 0.623 (0.27) | 0.022* |
| URM - Neutral | ||
| 7th Grade (T1) | 0.515 (0.32) | 0.103 |
| 10th Grade (T2) | 0.145 (0.31) | 0.640 |
| 12th Grade (T3) | 0.822 (0.40) | 0.039* |
| Gender - Neutral | ||
| 7th Grade (T1) | -0.431 (0.24) | 0.069 |
| 10th Grade (T2) | 0.062 (0.21) | 0.771 |
| 12th Grade (T3) | 0.372 (0.26) | 0.152 |
| 1 Reference Group: Less Positive | ||
gtsave(cov_m1, here("figures", "cov_table_bch.png"))
Diagnostic Summary: BCH Covariate-Adjusted Class Membership
At Grades 7 and 10, demographic covariate effects remain small and largely non-significant.
By Grade 12, URM students show higher odds of membership in the Qualified Positive and Neutral classes relative to the Less Positive class.
Female students show elevated odds of Qualified Positive membership at Grades 10 and 12.
No consistent covariate effects emerge for the Very Positive class.
These results parallel the ML three-step findings and confirm that demographic differentiation in STEM attitudes becomes more pronounced in late adolescence.