| Title: | Simplified Statistical Analysis Tools for Social Science |
|---|---|
| Description: | Provides simplified tools for common statistical analyses used in social science research and teaching, with output formatted in the style of SPSS and Stata. Analysis functions include descriptive statistics, frequency tables, t-tests, ANOVA, correlations, chi-square tests, cross-tabulations, linear regression, logistic regression, and Cronbach's alpha, with built-in model diagnostics and publication-oriented plotting. Supporting tools handle data management (recoding, labelling, filtering, dummy coding, scale construction), automatic detection and handling of coded missing values, and unified data import and export for SPSS, Stata, SAS, Excel, CSV, and R formats. Functions accept unquoted variable names and formula syntax for ease of use, handle haven-labelled variables automatically, and return results invisibly while printing user-friendly tables. The package stays close enough to base R conventions that users learn transferable skills rather than a private dialect, supporting a smooth transition from SPSS, Stata, or SAS into the broader R ecosystem. |
| Authors: | Jeff Ackerman [aut, cre] |
| Maintainer: | Jeff Ackerman <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.9.96 |
| Built: | 2026-07-01 09:39:53 UTC |
| Source: | https://github.com/JMA61/jstats |
A small synthetic mental-health and wellbeing intervention sample, used as
the package's messy-data companion to community. Where
community is the clean default, clinic deliberately carries
undeclared missing-value codes, a column whose value labels were stripped on
import, and an imperfect scale item, so the declare-and-clean workflow has
realistic material to work on. The 70 clients and 16 variables echo the
teaching structure of community – an interaction, a null variable,
non-overlapping missingness, a recode dichotomy, a clean logistic outcome, a
multi-category variable, and a Likert battery – in a psychology setting. The
data are synthetic, but the relationships among the variables are realistic.
clinicclinic
A data frame with 70 rows and 16 variables:
Client ID, character ("C001", "C002", ...).
Perceived stress (integer, 0-40). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
Perceived social support (integer, 0-24); the buffering partner in the Stress-by-SocialSupport interaction on Flourishing.
Average nightly sleep in hours. Carries SPSS-style missing values (-99 Refused, -98 Don't know), placed on cases that do not overlap the Stress codes, so a model using both predictors drops more cases than either alone.
Flourishing score (integer, 0-100); built with a Stress-by-SocialSupport interaction (the buffering hypothesis).
Daily screen time in hours; deliberately near-independent of the other variables.
Received therapy before the study, dichotomy coded 1/2 (1 Yes, 2 No); recode to 0/1 before use as a logistic-regression outcome.
Sought professional help during the study, dichotomy coded 0/1 (0 No, 1 Yes). A clean logistic-regression outcome.
Currently taking medication, dichotomy coded 0/1 (0 No, 1 Yes). Carries an SPSS-style missing value (-99 Refused).
Treatment condition, 4 categories (1 Control, 2 CBT, 3 Mindfulness, 4 Support group); has a modest effect on Flourishing.
Mood rating (integer, 1-10). Arrives "dirty": literal -99 (Refused) and -98 (Don't know) codes are present in the data with NO missing-value declaration, the state of play after a CSV or Excel import. The package's jdeclare_udm() demonstration variable: summary statistics are poisoned until the codes are declared.
"I felt calm and relaxed." 5-point Likert (1 Not at all to 5 Extremely); reverse-keyed (the variable label ends in " R"). Reverse-code before scale scoring.
"I worried about many different things." 5-point Likert. Arrives with literal -99/-98 codes present in the data and NO missing-value declaration – the undeclared contrast to Anxiety4.
"I felt afraid for no clear reason." 5-point Likert; arrives with its value labels stripped (a plain numeric column, as after a CSV import that dropped the labels).
"I had trouble controlling my worry." 5-point Likert. Carries properly declared SPSS-style missing values (-99 Refused, -98 Don't know) – the declared contrast to Anxiety2.
"I felt restless or on edge." 5-point Likert; weakly loaded (a Cronbach's-alpha drop candidate when scoring the scale).
The five Anxiety items form a single scale, with one deliberate problem per
item: Anxiety1 is reverse-keyed; Anxiety2 carries literal -99/-98 codes that
are not declared as missing; Anxiety3 arrives with its value labels stripped;
Anxiety4 carries the same -99/-98 codes but properly declared (the clean
contrast to Anxiety2); and Anxiety5 is the weak item that scale-reliability
output flags for dropping. Stress and SleepHours carry SPSS-style missing
values on non-overlapping cases, so listwise deletion across both reduces the
analysis sample below the per-variable counts. MoodRating and Anxiety2 are
the two columns whose -99/-98 codes arrive undeclared, awaiting
jdeclare_udm(). The Stress-by-SocialSupport interaction on Flourishing
is the buffering hypothesis (higher social support weakens the negative
association between stress and flourishing), and treatment Condition has a
modest effect on Flourishing.
Synthetic data generated by
data-raw/clinic_data_generator.R (random seed 20260614).
community, the clean default example dataset.
A small synthetic survey dataset used throughout the package as a runnable example. It backs the function examples, serves as a teaching dataset for new users, and demonstrates cross-platform save and load behavior. The 100 respondents and 15 variables are chosen to exercise the kinds of data social-science users actually have: Likert scales, dichotomies, a multi-category variable, continuous measures, and SPSS-style user-defined missing values. The data are synthetic, but the relationships among the variables are realistic.
communitycommunity
A data frame with 100 rows and 15 variables:
Respondent ID, character ("R001", "R002", ...).
Annual income (USD). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
Highest education level, 5 categories (1 Some high school, 2 High school graduate, 3 Some college, 4 Bachelor's degree, 5 Graduate degree). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
Age in years (integer, 18-80).
Flourishing score (integer, 0-100); built with an Income-by-Age interaction.
Volunteered in past year, dichotomy coded 0/1 (0 No, 1 Yes).
Owns home, dichotomy coded 1/2 (1 Yes, 2 No); recode to 0/1 before use as a logistic-regression outcome.
Current smoker, dichotomy coded 0/1 (0 No, 1 Yes). Carries an SPSS-style missing value (-99 Refused).
Daily commute time in minutes (integer); deliberately near-independent of the other variables.
Region of residence, 4 categories (1 North, 2 South, 3 East, 4 West).
"Climate change is a serious threat." 5-point Likert (1 Strongly Disagree to 5 Strongly Agree). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
"Concern about the environment is exaggerated." 5-point Likert; reverse-keyed (the variable label ends in " R"). Reverse-code before scale scoring.
"Government should do more for the environment." 5-point Likert. Carries SPSS-style missing values (-99 Refused, -98 Don't know).
"I would pay more for environmentally friendly products." 5-point Likert.
"Pollution is a major cause of public health problems." 5-point Likert; weakly loaded (a Cronbach's-alpha drop candidate when scoring the scale).
community is the clean default example dataset. For a companion
dataset that deliberately carries undeclared missing-value codes, stripped
value labels, and an imperfect scale – the material the data-cleaning
workflow operates on – see clinic.
The five Environment items form a single attitude scale: item 2 is
reverse-keyed, and item 5 is the weak item that scale-reliability output
flags for dropping. Income, Education, Smoker, Environment1, and
Environment3 carry SPSS-style missing values, with the codes placed on
partly non-overlapping cases so that listwise deletion visibly reduces the
analysis sample below the per-variable counts. All of community's
missing-value codes are properly declared; for a dataset with undeclared
codes and other deliberate data-cleaning problems, see clinic.
Synthetic data generated by
data-raw/community_data_generator.R (random seed 20260605).
clinic, the messy-data companion dataset.
Computes Cronbach's alpha and prints SPSS-style reliability output including a case processing summary, overall alpha, item statistics, and item-total statistics with alpha-if-item-deleted. Built from scratch with no external package dependencies beyond base R. Handles haven-labelled variables automatically. Detects potentially reverse-coded or misfit items.
jalpha( data, ..., subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, digits = NULL )jalpha( data, ..., subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, digits = NULL )
data |
A data frame. |
... |
Unquoted variable names (scale items) within |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
value.id |
Not supported by |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
A red "Reliability Analysis" title is printed first, followed by the case processing summary, overall alpha, item statistics, and item-total statistics.
Invisibly returns a list of class jst_alpha containing:
alpha (Cronbach's alpha), n_items, n_used,
n_excluded, item_statistics, item_total_statistics,
and sample_info (pipeline and missing data counts).
item statistics data frame, and item-total statistics data frame.
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jalpha(community, Environment1, Environment2, Environment3, Environment4, Environment5) # Using juse() default juse(community) jalpha(Environment1, Environment2, Environment3, Environment4, Environment5)# With explicit data frame jalpha(community, Environment1, Environment2, Environment3, Environment4, Environment5) # Using juse() default juse(community) jalpha(Environment1, Environment2, Environment3, Environment4, Environment5)
Runs a one-way ANOVA and prints a formatted group descriptives table followed by an ANOVA table. By default, runs the traditional ANOVA assuming equal variances. Optional parameters provide post-hoc tests, effect size, Levene's test, and confidence intervals. Set welch = TRUE for the Welch correction when equal variances cannot be assumed. Handles haven-labelled, numeric, and factor grouping variables. For haven-labelled variables, numeric codes are displayed alongside labels in the group descriptives table.
jaov( formula, data, welch = FALSE, posthoc = NULL, effect.size = NULL, levene = NULL, ci = NULL, subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, full = FALSE, digits = NULL )jaov( formula, data, welch = FALSE, posthoc = NULL, effect.size = NULL, levene = NULL, ci = NULL, subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, full = FALSE, digits = NULL )
formula |
A formula of the form |
data |
A data frame containing variables referenced in |
welch |
Logical. If FALSE (default), runs traditional ANOVA. If TRUE, runs Welch's ANOVA (does not assume equal variances). |
posthoc |
Logical or NULL. If TRUE, prints Tukey HSD pairwise comparisons.
Not available when welch = TRUE. If NULL (default), defers to
|
effect.size |
Logical or NULL. If TRUE, prints eta-squared. If NULL
(default), defers to |
levene |
Logical or NULL. If TRUE, prints Levene's test for homogeneity
of variance. If NULL (default), defers to |
ci |
Logical or NULL. If TRUE, adds 95% confidence intervals to the
group descriptives table. If NULL (default), defers to |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
value.id |
Character or NULL. Value-label display mode for the
group descriptives rows: |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
full |
Logical. If TRUE, turns on posthoc, effect.size, levene, and ci all at once. Does not override explicit FALSE values. |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
A red title identifying the test type is printed first, followed by variable labels (if present), then the results tables.
Invisibly returns a list of class jst_anova containing:
model (the aov or oneway.test object),
model_frame (the analysis data frame used for plotting),
test_type, formula, descriptives, f,
df1, df2, p, eta_squared, n, and
sample_info (pipeline and missing data counts).
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jaov(WellbeingScore ~ Region, data = community) jaov(WellbeingScore ~ Region, data = community, welch = TRUE) jaov(WellbeingScore ~ Region, data = community, full = TRUE) # Using juse() default juse(community) jaov(WellbeingScore ~ Region) jaov(WellbeingScore ~ Region, full = TRUE)# With explicit data frame jaov(WellbeingScore ~ Region, data = community) jaov(WellbeingScore ~ Region, data = community, welch = TRUE) jaov(WellbeingScore ~ Region, data = community, full = TRUE) # Using juse() default juse(community) jaov(WellbeingScore ~ Region) jaov(WellbeingScore ~ Region, full = TRUE)
javg() computes the mean of values across multiple variables for each
case (row) in the data frame. This is typically used to create scale means
from a set of related items.
By default, cases with any missing values receive NA. Use the
min.valid argument to allow partial means — for example,
min.valid = 1 computes the mean of available values as long as
at least one item is non-missing.
By default, the denominator is the number of non-missing values for each
case. Use fixed = TRUE to always divide by the total number of
variables regardless of missing values.
Variables can be listed individually or using colon notation to select a
range of consecutive columns (e.g. Attitude1:Attitude6).
javg(data, ..., min.valid = NULL, fixed = FALSE, var.label = NULL)javg(data, ..., min.valid = NULL, fixed = FALSE, var.label = NULL)
data |
A data frame, or omit to use the |
... |
Unquoted variable names. Use colon notation (e.g.
|
min.valid |
Integer (optional). The minimum number of non-missing
values required to compute a mean. If a case has fewer non-missing
values, the result is |
fixed |
Logical. If |
var.label |
Character string (optional). A variable label to attach to the result. If omitted, an auto-generated label is used. |
A numeric vector the same length as nrow(data), suitable for
assigning to a new column:
MyData$ScaleMean <- javg(Var1, Var2, Var3).
jsum for computing row-wise sums.
jstats for the package overview,
workflow conventions, and complete function listing.
# Set the default data frame (so you can omit it in function calls) juse(community) # Mean of three variables (all must be non-missing) community$EnvAvg <- javg(Environment1, Environment3, Environment4) # Mean with partial data allowed (at least 2 non-missing) community$EnvAvg <- javg(Environment1, Environment3, Environment4, min.valid = 2) # Mean using colon range for consecutive columns community$ScaleMean <- javg(Environment1:Environment5) # Mix colon ranges and explicit names (e.g. after reverse-coding an item) community$Environment2R <- jrecode(community, Environment2, map = "1=5; 2=4; 3=3; 4=2; 5=1") community$ScaleMean <- javg(Environment1, Environment2R, Environment3:Environment5) # Fixed denominator (always divide by total number of variables) community$EnvAvg <- javg(Environment1, Environment3, Environment4, min.valid = 2, fixed = TRUE) # With a custom variable label community$ScaleMean <- javg(Environment1:Environment5, var.label = "Environment Scale Mean") # With an explicit data frame (instead of using juse default) community$EnvAvg <- javg(community, Environment1, Environment3, Environment4) # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)# Set the default data frame (so you can omit it in function calls) juse(community) # Mean of three variables (all must be non-missing) community$EnvAvg <- javg(Environment1, Environment3, Environment4) # Mean with partial data allowed (at least 2 non-missing) community$EnvAvg <- javg(Environment1, Environment3, Environment4, min.valid = 2) # Mean using colon range for consecutive columns community$ScaleMean <- javg(Environment1:Environment5) # Mix colon ranges and explicit names (e.g. after reverse-coding an item) community$Environment2R <- jrecode(community, Environment2, map = "1=5; 2=4; 3=3; 4=2; 5=1") community$ScaleMean <- javg(Environment1, Environment2R, Environment3:Environment5) # Fixed denominator (always divide by total number of variables) community$EnvAvg <- javg(Environment1, Environment3, Environment4, min.valid = 2, fixed = TRUE) # With a custom variable label community$ScaleMean <- javg(Environment1:Environment5, var.label = "Environment Scale Mean") # With an explicit data frame (instead of using juse default) community$EnvAvg <- javg(community, Environment1, Environment3, Environment4) # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)
jcomplete() registers a set of variables and activates a listwise
deletion filter that excludes any case with a missing value on any of
the registered variables. This ensures that all subsequent analyses use
the same set of complete cases, which is essential when preliminary
analyses need to match the N of a final regression model.
The setting is stored per dataset, so switching juse() between
datasets preserves each dataset's setting independently.
The jcomplete filter applies whenever the matching dataset is used,
regardless of whether it was supplied via juse() or specified
explicitly in a function call. To bypass temporarily without losing
the setting, use jcomplete(off) before the analysis and
jcomplete(on) afterward. This matches the SPSS USE ALL /
FILTER convention.
jcomplete(data, ..., preview = FALSE, console = FALSE, non.deletes = FALSE)jcomplete(data, ..., preview = FALSE, console = FALSE, non.deletes = FALSE)
data |
A data frame. If omitted, uses the default set by
|
... |
Unquoted variable names to include in the listwise check. |
preview |
Logical. If |
console |
Logical or numeric. Print the dropped rows to the console.
|
non.deletes |
Logical. If |
Invisibly returns NULL. When a preview is requested,
invisibly returns the previewed data frame instead, so it can be
captured (e.g. jcomplete_rows <- jcomplete(preview = TRUE)).
jstats for the package overview,
workflow conventions, and complete function listing.
juse(community) jcomplete(Income, Education, Age) jdesc(Age) # Uses only complete cases on those 3 vars jcomplete(Income, Education, Age, preview = TRUE) # Set and preview together jcomplete(preview = TRUE) # Preview the already-set filter (viewer) jcomplete(preview = TRUE, non.deletes = TRUE) # Viewer shows all cases jcomplete(console = 10) # Console only -- first 10 dropped rows jcomplete(preview = TRUE, console = 25) # Viewer and console jcomplete(off) # Deactivate jcomplete(on) # Reactivate jcomplete() # Check status jcomplete(NULL) # Clear entirely # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)juse(community) jcomplete(Income, Education, Age) jdesc(Age) # Uses only complete cases on those 3 vars jcomplete(Income, Education, Age, preview = TRUE) # Set and preview together jcomplete(preview = TRUE) # Preview the already-set filter (viewer) jcomplete(preview = TRUE, non.deletes = TRUE) # Viewer shows all cases jcomplete(console = 10) # Console only -- first 10 dropped rows jcomplete(preview = TRUE, console = 25) # Viewer and console jcomplete(off) # Deactivate jcomplete(on) # Reactivate jcomplete() # Check status jcomplete(NULL) # Clear entirely # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)
jconvert() provides a single entry point for changing how user-
defined missing values (UDMs) are represented on the columns of a data
frame already in memory. Three target formats are supported: SPSS-style
(na_values on haven_labelled_spss), Stata-style
(tagged_na on haven_labelled), and base R (declarations
stripped, declared cells converted to plain NA). Replaces
jstrip_udm() (retired in v0.9.5); the base R target is the strip
behavior.
jconvert(data, to = NULL, ..., vars = NULL, udm.notice = TRUE)jconvert(data, to = NULL, ..., vars = NULL, udm.notice = TRUE)
data |
A data frame, or omitted to use the |
to |
One of |
... |
Optional unquoted variable names. When supplied, only the
listed variables are scanned. Mutually exclusive with |
vars |
Alternative scope-by-vector path: a character vector of
variable names. Mutually exclusive with |
udm.notice |
Logical; |
The three target formats:
to = "baseR"Strip all UDM declarations and convert
declared cells to plain NA. For SPSS-form columns
(na_values / na_range on
haven_labelled_spss), masks declared codes to NA and
removes the attributes; value labels are preserved so the column
can still round-trip through jsave() with original
labeling. For columns carrying Stata-style missing values
(tagged_na markers), uses haven::zap_missing() to
convert them to plain NAs.
to = "spss"Convert Stata-style or SAS-style missing
values to SPSS-form numeric codes. Letter tags map to numeric
codes via joptions("udm.convention.codes") (default
-99, -98, -97):
.a -> codes[1], .b -> codes[2], and so on. SAS-style
(uppercase) tags are case-corrected to Stata-style (lowercase)
before the numeric mapping – for round-trip purposes the package
treats .A and .a as the same conceptual marker, and
mixed-case columns collapse to a single lowercase marker (SPSS has
no parallel uppercase convention). The notification's per-column
display shows the original (pre-correction) tag for SAS-corrected
columns – e.g. .A "Refused" -> -99 – so the user-visible
mapping reflects what was actually in the data on input. Letter
tags beyond .d (after case correction) are refused with
guidance to use jrecode() for manual mapping.
to = "stata"Convert SPSS-form numeric codes to
Stata-style missing values. Letter tags are assigned by ordering
rather than by convention: each column's own declared
na_values codes are sorted by absolute value descending
(ties broken with more-negative-first), then mapped
.a, .b, .c in that order. Convention codes are NOT
consulted for this direction;
they only govern the reverse (Stata to SPSS) mapping. Round-trip
conversions are not guaranteed to preserve the original numeric
codes (e.g. SPSS c(-1, 9) -> Stata .a, .b -> SPSS
c(-99, -98) loses the original numbers), but the value
labels survive intact and the missingness semantics are preserved.
Range-based SPSS missings (na_range) are out of cross-format
scope; columns with na_range are refused with guidance to
enumerate the range in SPSS first. Columns with more than 4
distinct na_values codes are also refused (matches the
4-code cap on Stata letter-tag mapping).
Pre-flight checks for to = "spss" include a collision check:
if a column's target numeric code (e.g. -99 for .a) is
present as genuine data in the column, the call errors before any
data is touched. The error message lists every colliding column and
presents three resolution paths: change the convention codes via
joptions(udm.convention.codes = ...), scope the call via
vars = c(...) to exclude affected columns, or recode the real-
data values via jrecode() first. Atomicity applies to every
error mode – the entire jconvert() call either succeeds or
errors before mutating the data frame.
Pattern A – value labels suggest missingness but no formal
declaration. When a column has no formal UDM declaration but carries
value labels matching the package's missing-label wordlist (e.g.
"Refused", "Don't know", "Not applicable"),
jconvert() skips the column and surfaces it in the
notification with the affected value/label pairs. To formalise these
as UDMs use jdeclare_udm(); to leave them as ordinary data, no
action is needed.
The data frame with the requested conversions applied, returned
invisibly. As with jrelabel() and jrecode(), the user
must assign the return value back to retain the changes.
jload for the load-time strip alternative
(preserve.udm = FALSE); joptions for setting
the default convention and convention codes session-wide.
# community ships with SPSS-form UDMs (Income, Education, Smoker, # Environment1, Environment3), so the conversions run on it directly. # Strip UDMs from every applicable variable: df <- jconvert(community, to = "baseR") # Convert SPSS-form UDMs to Stata-style missing values: df <- jconvert(community, to = "stata") # Scope by unquoted names: df <- jconvert(community, to = "baseR", Income, Education) # Scope by character vector (alternative form): df <- jconvert(community, to = "baseR", vars = c("Income", "Education")) # Suppress the notification (e.g. inside a script): df <- jconvert(community, to = "baseR", udm.notice = FALSE) ## Not run: # Convert with target inferred from joptions: joptions(missing.convention = "spss") df <- jconvert(df) # converts any Stata-form columns to SPSS ## End(Not run)# community ships with SPSS-form UDMs (Income, Education, Smoker, # Environment1, Environment3), so the conversions run on it directly. # Strip UDMs from every applicable variable: df <- jconvert(community, to = "baseR") # Convert SPSS-form UDMs to Stata-style missing values: df <- jconvert(community, to = "stata") # Scope by unquoted names: df <- jconvert(community, to = "baseR", Income, Education) # Scope by character vector (alternative form): df <- jconvert(community, to = "baseR", vars = c("Income", "Education")) # Suppress the notification (e.g. inside a script): df <- jconvert(community, to = "baseR", udm.notice = FALSE) ## Not run: # Convert with target inferred from joptions: joptions(missing.convention = "spss") df <- jconvert(df) # converts any Stata-form columns to SPSS ## End(Not run)
Copies a data frame to a new name AND clones any classification registrations (jnumeric / jcount / jdummy) attached to it, so the copy behaves the same as the original under later analysis calls. A plain assignment (newdata <- mydata) copies the data but not the registrations, because registrations live in a name-keyed session notebook rather than on the data object; jcopy() is the verb that keeps the two together across a rename or copy.
jcopy(data, name, overwrite = FALSE, quiet = FALSE)jcopy(data, name, overwrite = FALSE, quiet = FALSE)
data |
The source data frame (unquoted). May be omitted when a juse() default is set, in which case the default frame is the source. |
name |
The destination name (unquoted) the copy is assigned to. When a single name is given it is read as the destination, not the source. |
overwrite |
Logical; if FALSE (the default) and the destination name already exists in your environment, an interactive session asks before overwriting. |
quiet |
Logical; if TRUE, suppress the confirmation message. |
Like jload(), jcopy() cannot see the name on the left of an assignment, so the new name is supplied as an argument. The destination name is unquoted, and a single name is always taken as the destination, with the source coming from the juse() default:
jcopy(mydata, newdata) – copy mydata to
newdata.
jcopy(newdata) – copy the juse() default frame to
newdata.
Registrations travel only when the source frame carries them; copying an unregistered frame just copies the data. The copy is independent of the original.
Invisibly NULL. Called for its side effect: the copy is assigned into
the calling environment under name, and its registrations are cloned
onto that name.
## Not run: jdummy(community, Region) # register a classification on community jcopy(community, survey) # survey carries Region's registration juse(community) jcopy(survey2) # copy the default (community) to survey2 ## End(Not run)## Not run: jdummy(community, Region) # register a classification on community jcopy(community, survey) # survey carries Region's registration juse(community) jcopy(survey2) # copy the default (community) to survey2 ## End(Not run)
Computes pairwise correlations and prints a formatted lower-triangle correlation matrix showing r, p values, and pairwise N for each pair. Supports Pearson (default), Spearman, and Kendall methods. Handles haven-labelled and factor variables with numeric levels. Warns when variables may be categorical rather than continuous.
jcorr( data, ..., method = "pearson", subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, value.id = NULL, layout = NULL, case.processing.detail = NULL, digits = NULL )jcorr( data, ..., method = "pearson", subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, value.id = NULL, layout = NULL, case.processing.detail = NULL, digits = NULL )
data |
A data frame. |
... |
Unquoted variable names within |
method |
Character. Correlation method: "pearson" (default), "spearman", or "kendall". |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
numeric |
Optional character vector of variable names to treat as
continuous for this call (the per-call counterpart of |
categorical |
Not supported by |
count |
Optional character vector of variable names to treat as counts
for this call (the per-call counterpart of |
value.id |
Not supported by |
layout |
Character or NULL. How each correlation cell is laid out
when three or more variables are given: |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
A red title identifying the correlation method is printed first, followed by variable labels (if present), then the matrix.
Invisibly returns a list of class jst_corr containing:
r (correlation matrix), p (p-value matrix),
n (pairwise N matrix), method, model_frame (the
analysis data frame used for plotting), and sample_info
(pipeline and missing data counts).
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jcorr(community, Income, Age, WellbeingScore) jcorr(community, Income, Age, WellbeingScore, method = "spearman") # Using juse() default juse(community) jcorr(Income, Age, WellbeingScore)# With explicit data frame jcorr(community, Income, Age, WellbeingScore) jcorr(community, Income, Age, WellbeingScore, method = "spearman") # Using juse() default juse(community) jcorr(Income, Age, WellbeingScore)
jcount() tells jstats to treat one or more variables as count
variables (non-negative whole-number tallies). A count is numeric-like – it
passes wherever a numeric variable does and shows mean/median in
jscreen – and additionally carries count semantics: it is the
asserted signal behind the count-regression caveat in jlm and
the routing target for future count-model functions. Unlike the structural
guess, jcount accepts counts of any range, including those outside the
automatic small-range detection (e.g. a 0-30 victimization count).
jcount(data, ..., remove = FALSE, clear.all = FALSE)jcount(data, ..., remove = FALSE, clear.all = FALSE)
data |
A data frame, or omitted to use the |
... |
One or more unquoted variable names to register. |
remove |
Logical; if |
clear.all |
Logical; if |
A variable carries exactly one registered intent at a time, so registering it as a count clears any prior dummy or numeric registration. Registration changes no data and assigns nothing. It is stored for the session, keyed by the data frame's name; save the data frame in R format (.rds) to keep it across sessions.
invisible(NULL). Called for its side effect on the session
registration notebook.
df <- data.frame(arrests = c(0, 1, 2, 0, 3, 1, 0, 12), age = c(21, 34, 45, 29, 51, 38, 26, 60)) jcount(df, arrests) # treat as a count (here 0-12) jcount(df, arrests, remove = TRUE) jcount() # list all registrations jcount(df, NULL) # clear df's count registrations jcount(clear.all = TRUE) # clear every frame's count registrationsdf <- data.frame(arrests = c(0, 1, 2, 0, 3, 1, 0, 12), age = c(21, 34, 45, 29, 51, 38, 26, 60)) jcount(df, arrests) # treat as a count (here 0-12) jcount(df, arrests, remove = TRUE) jcount() # list all registrations jcount(df, NULL) # clear df's count registrations jcount(clear.all = TRUE) # clear every frame's count registrations
Produces a cross-tabulation of two categorical variables, showing observed frequencies and row percentages by default. Column percentages, expected frequencies, adjusted standardized residuals, and a chi-square test of independence are available via arguments. Handles haven-labelled, numeric, factor, and character variables. For haven-labelled variables, numeric codes are displayed alongside labels.
jcrosstab( formula, data, chisq = FALSE, expected = FALSE, row.pct = TRUE, col.pct = FALSE, residuals = "none", subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, digits = NULL )jcrosstab( formula, data, chisq = FALSE, expected = FALSE, row.pct = TRUE, col.pct = FALSE, residuals = "none", subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, digits = NULL )
formula |
A formula of the form |
data |
A data frame containing variables referenced in |
chisq |
Logical. If TRUE, prints the chi-square test of independence below the cross-tabulation. Default is FALSE. |
expected |
Logical. If TRUE, prints expected frequencies alongside observed. Default is FALSE. |
row.pct |
Logical. If TRUE (default), shows row percentages. |
col.pct |
Logical. If TRUE, shows column percentages. Default is FALSE. |
residuals |
Character. Cell residuals to display: |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
value.id |
Character or NULL. Value-label display mode for both
table axes: |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
A red "Cross-Tabulation" title is printed first, followed by variable labels (if present), then the table and optional test results.
Invisibly returns a list of class jst_crosstab containing:
observed (observed frequency table), expected (expected
frequency table), adjusted_residuals (matrix of adjusted
standardized residuals), n (total N), model_frame (the
analysis data frame used for plotting), sample_info (pipeline and
missing data counts), and if chisq = TRUE: chi_square,
df, and p.
jstats for the package overview,
workflow conventions, and complete function listing.
# Cross-tabulation only jcrosstab(Education ~ Volunteer, data = community) # With chi-square test jcrosstab(Education ~ Volunteer, data = community, chisq = TRUE) # With expected frequencies and column percentages jcrosstab(Education ~ Volunteer, data = community, expected = TRUE, col.pct = TRUE) # With adjusted standardized residuals (interpretation note at full output) jcrosstab(Education ~ Volunteer, data = community, residuals = "adjusted") # Using juse() default juse(community) jcrosstab(Education ~ Volunteer) jcrosstab(Education ~ Volunteer, chisq = TRUE)# Cross-tabulation only jcrosstab(Education ~ Volunteer, data = community) # With chi-square test jcrosstab(Education ~ Volunteer, data = community, chisq = TRUE) # With expected frequencies and column percentages jcrosstab(Education ~ Volunteer, data = community, expected = TRUE, col.pct = TRUE) # With adjusted standardized residuals (interpretation note at full output) jcrosstab(Education ~ Volunteer, data = community, residuals = "adjusted") # Using juse() default juse(community) jcrosstab(Education ~ Volunteer) jcrosstab(Education ~ Volunteer, chisq = TRUE)
Read-side companion to joptions(data.dir = ...): returns the
currently configured data folder as a string, for use in scripts that need
the path itself (building a file path, checking existence, cleaning up test
files) without reaching into package-internal option names.
jdata_dir(default = ".")jdata_dir(default = ".")
default |
Value returned when no data folder is configured. Defaults
to |
joptions() prints the folder but returns invisible(NULL);
jdata_dir() returns it as a value. When no folder is configured, the
default is returned (".", the working directory, by default),
so the result drops straight into file.path. Pass
default = NULL to detect the unconfigured state explicitly.
A length-one character string (the configured folder, or
default); or default unchanged when it is NULL.
joptions to set the folder; jload and
jsave, which resolve files against it.
## Not run: joptions(data.dir = "Data") jdata_dir() # "Data" f <- file.path(jdata_dir(), "community.rds") # build a path in that folder if (file.exists(f)) file.remove(f) jdata_dir(default = NULL) # NULL if nothing configured ## End(Not run)## Not run: joptions(data.dir = "Data") jdata_dir() # "Data" f <- file.path(jdata_dir(), "community.rds") # build a path in that folder if (file.exists(f)) file.remove(f) jdata_dir(default = NULL) # NULL if nothing configured ## End(Not run)
jdeclare_udm() declares one or more user-defined missing
values (UDMs) on a variable. UDMs are specific data values –
typically negative codes such as -99 or Stata-style tagged
markers such as .a – that indicate why a value is
missing (refused, don't know, not applicable, etc.) rather than
simply that it is missing. Once declared, UDM cells are
automatically excluded from analyses but remain visible in the data
for diagnostic purposes (see jfreq()).
The function operates in declarative mode: each call states the
column's complete UDM set. A second call to jdeclare_udm() on
the same column replaces, not augments, the prior declaration. This
matches SPSS's MISSING VALUES and Stata's mvdecode
semantics. When prior UDMs are dropped, a note lists them so the
destructive aspect of the replacement is not silent.
jdeclare_udm( data, var, codes = NULL, labels = NULL, convention = NULL, udm.notice = TRUE )jdeclare_udm( data, var, codes = NULL, labels = NULL, convention = NULL, udm.notice = TRUE )
data |
A data frame containing the variable. |
var |
The variable to declare UDMs on (unquoted, e.g.
|
codes |
Numeric vector of code values to declare as UDMs. Accepts two forms:
Under Stata convention, code values may be Stata-style missing-value markers
created with |
labels |
Optional. A quoted string in the form
|
convention |
Optional. One of |
udm.notice |
Logical. When |
The data frame, with the specified variable updated to carry the declared UDMs.
Under SPSS convention, codes are declared as numeric values via the
column's na_values attribute (haven's representation of
SPSS-form UDMs). The data cells themselves are unchanged; only the
metadata that flags certain values as missing is added.
Under Stata convention with Stata-style missing-value input, the function attaches value labels to existing Stata-style missing-value cells on the column.
Under Stata convention with numeric input, the function converts
matching cells to Stata-style missing-value markers (Session 30 design lock). The
mapping is ordering-based: codes sorted by absolute value
descending, more-negative-first as tie-breaker, then assigned
.a, .b, .c, .d in that order. The
assignment proceeds independently of joptions("udm.convention.codes")
(which only governs the reverse Stata-to-SPSS direction). A
conversion note in the standard/full joutput tier shows the
Stata-style equivalent for future calls.
A single data frame may carry both SPSS-form and Stata-form UDM
columns. In-memory analysis and display tolerate the mix without
issue (each column renders in its native form). The constraint
shows up at file-export time: .sav cannot
represent Stata-style missing values; .dta cannot represent SPSS-form
na_values declarations; .xpt can represent neither
form. jsave() pre-flights the DF
against the destination format and errors with a pointer to
jconvert() when the mix is incompatible. The
post-declaration mismatch notice emitted at the bottom of this
function's output exists to alert you early if a single-column
declaration ends up out of step with the rest of its DF.
jrecode, jconvert,
joptions, jstats
# clinic$MoodRating arrives "dirty": -99/-98 sit in the data as # ordinary numbers (the state after a CSV or Excel import), so summary # statistics are poisoned until the codes are declared missing. df <- clinic jdesc(df, MoodRating) # mean dragged far down by -99/-98 # SPSS form: declare -99 and -98 as UDMs with labels df <- jdeclare_udm(df, MoodRating, codes = c(-99, -98), labels = "-99=Refused; -98=Don't know") jdesc(df, MoodRating) # codes now excluded as missing # Equivalent using named codes (one step instead of codes + labels) df2 <- jdeclare_udm(clinic, MoodRating, codes = c("Refused" = -99, "Don't know" = -98)) # Stata-style: label Stata-style missing-value cells. The jrecode() call # turns the literal codes into tagged cells; jdeclare_udm() labels them. df3 <- clinic df3$Mood2 <- jrecode(df3, MoodRating, map = "-99=.a; -98=.b; else=copy", convention = "stata") df3 <- jdeclare_udm(df3, Mood2, codes = c("Refused" = haven::tagged_na("a"), "Don't know" = haven::tagged_na("b")))# clinic$MoodRating arrives "dirty": -99/-98 sit in the data as # ordinary numbers (the state after a CSV or Excel import), so summary # statistics are poisoned until the codes are declared missing. df <- clinic jdesc(df, MoodRating) # mean dragged far down by -99/-98 # SPSS form: declare -99 and -98 as UDMs with labels df <- jdeclare_udm(df, MoodRating, codes = c(-99, -98), labels = "-99=Refused; -98=Don't know") jdesc(df, MoodRating) # codes now excluded as missing # Equivalent using named codes (one step instead of codes + labels) df2 <- jdeclare_udm(clinic, MoodRating, codes = c("Refused" = -99, "Don't know" = -98)) # Stata-style: label Stata-style missing-value cells. The jrecode() call # turns the literal codes into tagged cells; jdeclare_udm() labels them. df3 <- clinic df3$Mood2 <- jrecode(df3, MoodRating, map = "-99=.a; -98=.b; else=copy", convention = "stata") df3 <- jdeclare_udm(df3, Mood2, codes = c("Refused" = haven::tagged_na("a"), "Don't know" = haven::tagged_na("b")))
Computes basic descriptive statistics (N, non-missing, min, max, mean, SD) for one or more variables in a data frame. Prints a formatted table and invisibly returns the underlying results as a data frame.
jdesc( data, ..., by = NULL, subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, value.id = NULL, case.processing.detail = NULL, digits = NULL )jdesc( data, ..., by = NULL, subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, value.id = NULL, case.processing.detail = NULL, digits = NULL )
data |
A data frame, or a numeric vector. |
... |
Unquoted variable names within |
by |
An optional unquoted grouping variable name. When provided, descriptives are computed separately for each group, with a separate titled table per dependent variable. |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
numeric |
Optional character vector of variable names to treat as
continuous for this call (the per-call counterpart of |
categorical |
Not supported by |
count |
Optional character vector of variable names to treat as counts
for this call (the per-call counterpart of |
value.id |
Character or NULL. Value-label display mode for the
grouped descriptive headers (the |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
Output is structured consistently with jfreq(): a red title is
printed first, followed by a block showing the type and variable label
(or "None" if no label is present) for each variable, then a single blank
line before the table. For multiple variables, one type/label entry is
printed per variable before the shared table.
Summarizes numeric, haven-labelled, logical, numeric-coded factor, and
numeric-looking character variables. Variables that cannot be summarized
— text factors, text character variables, and date/time variables —
are skipped with a warning directing the user to jfreq() (date/time
variables are not supported here). When every requested variable is
unsummarizable, jdesc() stops with an error. Also accepts a simple numeric
vector. Supports grouped descriptives via the by parameter.
Haven-labelled variables are reported as haven_labelled (Categorical)
in the type line; the uninformative vctrs_vctr class is suppressed.
Invisibly returns a list of class jst_desc containing:
descriptives (data frame of statistics, or NULL for grouped output),
and sample_info (pipeline and missing data counts). Also
prints a formatted table to the console.
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jdesc(community, Age) jdesc(community, Income, Age, WellbeingScore) jdesc(community, WellbeingScore, by = Volunteer) # Using juse() default juse(community) jdesc(Age) jdesc(Income, Age, WellbeingScore) jdesc(WellbeingScore, by = Volunteer) # With a vector directly jdesc(community$Age)# With explicit data frame jdesc(community, Age) jdesc(community, Income, Age, WellbeingScore) jdesc(community, WellbeingScore, by = Volunteer) # Using juse() default juse(community) jdesc(Age) jdesc(Income, Age, WellbeingScore) jdesc(WellbeingScore, by = Volunteer) # With a vector directly jdesc(community$Age)
jdummy() registers a categorical variable so that jlm()
automatically expands it into dummy (indicator) variables when it appears
in a regression formula. The original data frame is never modified. Several
variables can be registered in one call; the ref setting then applies
to each of them.
Registrations are stored per dataset, so switching juse() between
datasets preserves each dataset's registrations independently.
jdummy( data, ..., ref = "first", show = FALSE, remove = FALSE, clear.all = FALSE, max.categories = 20L )jdummy( data, ..., ref = "first", show = FALSE, remove = FALSE, clear.all = FALSE, max.categories = 20L )
data |
A data frame, or omit to use the |
... |
One or more unquoted variable names to register. Omit (along
with data) to display all current registrations. A lone |
ref |
The reference category (excluded from the regression model).
Can be a numeric code, a quoted label name, or |
show |
Logical. If |
remove |
Logical. If |
clear.all |
Logical. If |
max.categories |
Integer. Maximum number of categories a variable may
have to be dummy-coded; a variable with more raises an error. Raise it to
dummy-code a higher-cardinality variable. Default |
Invisibly returns NULL. Called for its side effect.
jstats for the package overview,
workflow conventions, and complete function listing.
juse(community) jdummy(Region) # Register, first category as reference jdummy(Region, Education) # Register several at once jdummy(Region, ref = "last") # Last category as reference jdummy(Region, ref = 4) # Reference by numeric code jdummy(Region, ref = "East") # Reference by value label jdummy(Region, show = TRUE) # Show coding scheme jdummy(Region, show = "all") # Full scheme (for many categories) jdummy() # Show all registrations jdummy(Region, remove = TRUE) # Remove one registration jdummy(community, NULL) # Clear community's dummy registrations jdummy(NULL) # Clear the default frame's (or ask) jdummy(clear.all = TRUE) # Clear every frame's dummy registrations # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)juse(community) jdummy(Region) # Register, first category as reference jdummy(Region, Education) # Register several at once jdummy(Region, ref = "last") # Last category as reference jdummy(Region, ref = 4) # Reference by numeric code jdummy(Region, ref = "East") # Reference by value label jdummy(Region, show = TRUE) # Show coding scheme jdummy(Region, show = "all") # Full scheme (for many categories) jdummy() # Show all registrations jdummy(Region, remove = TRUE) # Remove one registration jdummy(community, NULL) # Clear community's dummy registrations jdummy(NULL) # Clear the default frame's (or ask) jdummy(clear.all = TRUE) # Clear every frame's dummy registrations # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)
Prints an SPSS-style frequency table (Freq, Total %, Valid %, Cum. %) for each variable supplied. Designed for use with unquoted variable names, and also accepts a plain vector.
jfreq( data, ..., subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL )jfreq( data, ..., subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL )
data |
A data frame, or a vector. |
... |
Unquoted variable name(s) within |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
value.id |
Character or NULL. Value-label display mode for the
frequency-table valid rows: |
case.processing.detail |
Accepted for API symmetry. jfreq's Case Processing Summary is top-table only (no missing-data breakdown), so this argument has no effect; per-variable code detail already appears in each variable's frequency table. |
Output is structured consistently with jdesc(): a single red
"Frequencies" title is printed first, followed by the default-data note
(if a juse() default was used), any pipeline messages, and the Case
Processing Summary (when at least one pipeline stage was active for
this call). Each variable then gets its own block consisting of the
variable name on its own line, indented Type and Variable label lines
(suppressed when joutput()'s variable.id toggle is off),
a blank line, and the frequency table. The frequency table ends with
a Total row showing the post-pipeline N.
For haven-labelled variables, value labels and numeric codes are combined
in the frequency table rows (e.g. 1: Strongly Oppose). The type
line reports haven_labelled (Categorical) and suppresses the
uninformative vctrs_vctr class. Variable labels are shown for all
variable types, not only haven-labelled ones.
Invisibly returns a list of class jst_freq containing:
frequencies (named list of data frames, one per variable) and
sample_info (pipeline and missing data counts).
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jfreq(community, Region) jfreq(community, Region, Education) # Using juse() default juse(community) jfreq(Region) jfreq(Region, Education) # With a vector directly jfreq(community$Region)# With explicit data frame jfreq(community, Region) jfreq(community, Region, Education) # Using juse() default juse(community) jfreq(Region) jfreq(Region, Education) # With a vector directly jfreq(community$Region)
jlikert() declares one or more value-labelled variables as Likert
items – ordered response scales (for example 1 = Strongly disagree through
5 = Strongly agree). It is the ordered-scale counterpart to
jdummy (categorical), jnumeric (continuous), and
jcount (count): a variable carries exactly one registered
intent at a time, so registering it as Likert clears any prior numeric,
count, or dummy registration on it.
jlikert(data, ..., remove = FALSE, clear.all = FALSE)jlikert(data, ..., remove = FALSE, clear.all = FALSE)
data |
A data frame, or omitted to use the |
... |
One or more unquoted variable names to register, or a single
|
remove |
Logical; if TRUE, remove the named variables' Likert registrations instead of adding them. |
clear.all |
Logical; if TRUE, clear Likert registrations on every data frame. |
Scope – display only. The Likert intent refines reporting, not
analysis. It sets the variable's sub-class to "Likert" in
jscreen's Variable Types table, marking it as an ordered scale
rather than a generic N-category variable. It does NOT change how any
analysis treats the variable (there is no order-aware modelling), and it does
not by itself change jplot output – a value-labelled
small-range variable already plots as an ordered, labelled bar regardless of
this registration.
Like the other registration verbs, registrations are session-scoped and keyed
by data-frame name; save the frame in R format (.rds) with
jsave to keep them across sessions.
Clearing mirrors the other registration verbs:
jlikert(data, NULL) – clear this frame's Likert
registrations.
jlikert(NULL) – clear the juse() default frame (or the
sole frame carrying Likert registrations; if several do, it asks rather
than clearing them all).
jlikert(clear.all = TRUE) – clear every frame.
jlikert() with no arguments prints the current registration status.
Invisibly NULL. Called for its side effect on the session registry.
jnumeric, jcount, jdummy,
jscreen
jlikert(community, Environment1, Environment2) # declare two Likert items jscreen(community) # Sub-class shows "Likert" jlikert(community, Environment1, remove = TRUE) # undo one jlikert(community, NULL) # clear the registrationsjlikert(community, Environment1, Environment2) # declare two Likert items jscreen(community) # Sub-class shows "Likert" jlikert(community, Environment1, remove = TRUE) # undo one jlikert(community, NULL) # clear the registrations
Fits a linear model using stats::lm() and prints SPSS-style output,
including unstandardized coefficients, standard errors, t values, p values,
and standardized coefficients (beta). Standardized coefficients are left
blank for the intercept and for dummy-coded categorical terms.
jlm( formula, data, subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, ci = NULL, std = "regular", diagnostics = NULL, ref.categories = NULL, full = FALSE, case.processing.detail = NULL, digits = NULL, ..., value.id = NULL )jlm( formula, data, subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, ci = NULL, std = "regular", diagnostics = NULL, ref.categories = NULL, full = FALSE, case.processing.detail = NULL, digits = NULL, ..., value.id = NULL )
formula |
A model formula, e.g. |
data |
A data frame containing variables referenced in |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
numeric |
Optional character vector of variable names that should be
treated as continuous (numeric) even if they have value labels. For
example, |
categorical |
Optional character vector of variable names that should
be treated as categorical even if they lack value labels. For example,
|
count |
Optional character vector of variable names to treat as counts
for this call (the per-call counterpart of |
ci |
Logical or NULL. If TRUE, appends a 95% confidence interval for
each unstandardized coefficient (b) at the right of the coefficient table.
If NULL (default), defers to |
std |
Character. Controls the standardized-coefficient column. One of
|
diagnostics |
Logical, character vector, or NULL. If TRUE, prints VIF
table and diagnostic plots. If a character vector, specifies which
diagnostics to show: |
ref.categories |
Logical or NULL. Per-call override for showing the
reference-categories block (the baseline level dropped from each set of
dummy variables). |
full |
Logical. If TRUE, turns on the coefficient confidence interval and diagnostics. Does not override explicit FALSE values. |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
... |
Reserved for argument-name checking. Passing |
value.id |
Character or NULL. Value-label display mode for the dummy
category rows in the Coefficients table: one of |
Also prints key model summary information (R-squared, adjusted R-squared, residual standard error, F-test, sums of squares, and N). If any coefficients are dropped due to perfect collinearity, a warning message is printed.
A red "Linear Regression" title is printed first, followed by variable labels (if present), then the coefficient table and model fit statistics.
Handling of variables:
Variables registered with jdummy() are expanded into dummy
variables using the registered reference category.
Unregistered haven-labelled variables with value labels are
automatically treated as categorical (converted to factors). The
first category is used as the reference, and an informational
message suggests using jdummy() for control over the
reference category.
Haven-labelled variables without value labels are treated as continuous (converted to numeric).
The numeric argument overrides auto-detection for variables
that have value labels but should be treated as continuous (e.g. Age
with labels like "18 years", "19 years").
The categorical argument forces variables without value
labels (or plain numeric variables) to be treated as categorical
(e.g. a numeric Program variable coded 1–4 from a CSV file).
The dependent variable is always modelled as numeric. Naming it in
numeric or count does not change that; it only asserts the
DV's role so the count / categorical-like note is silenced
(numeric) or stated definitively (count).
Invisibly returns a list of class jst_lm containing:
The fitted lm object.
Character string linear.
The model frame used to fit the model.
The formula after dummy expansion.
Formatted coefficient table (data frame); includes
95% CI Lower / Upper columns when ci is on.
Flat data frame of raw, full-precision
coefficient statistics (one row per coefficient): term (machine
key), b, SE, t, df, p, beta,
and ci_lower / ci_upper bounds (present regardless of the
ci display toggle). Carries beta_standardization and
outcome attributes.
List of raw, full-precision fit statistics (R-squared, adjusted R-squared, residual SE, F with its dfs and p-value, residual df, and N).
R-squared value.
Adjusted R-squared value.
Residual standard error.
Named numeric vector with F value, df1, df2, and p.
Named numeric vector (regression, residual, total).
Number of observations used in the model.
Names of dummy variable columns created by
jdummy() registrations.
Reference category descriptions for all categorical variables in the model.
Named numeric vector of VIF values, or NULL for bivariate.
Pipeline and missing data counts.
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame (named argument) jlm(WellbeingScore ~ Income + Age, data = community) # With explicit data frame (positional argument) jlm(WellbeingScore ~ Income + Age, community) # Using juse() default juse(community) jlm(WellbeingScore ~ Income + Age) # CATEGORICAL PREDICTORS # # Per-call: categorical = ... applies for one call only and does not # persist. Useful for a quick one-off analysis. jlm(WellbeingScore ~ Region + Age, categorical = "Region") # The recommended approach for repeated analyses: register the variable # with jdummy() before running jlm(). This sets the categorical # treatment persistently, so subsequent jlm() calls (and other # analyses) use the same coding without re-specifying. jdummy(community, Region) jlm(WellbeingScore ~ Region + Age) # To choose a non-default reference category: jdummy(community, Region, ref = "West") jlm(WellbeingScore ~ Region + Age) # FORCING NUMERIC TREATMENT # # Use numeric = ... when a variable has value labels (haven_labelled) # but you want it treated as a continuous score (e.g., a Likert # scale you want the slope-per-unit interpretation for). jlm(WellbeingScore ~ Age + Education, numeric = "Education") # Multiple overrides at once jlm(WellbeingScore ~ Education + Environment4 + Smoker, numeric = c("Education", "Environment4"), categorical = "Smoker") # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. jdummy(community, NULL) juse(NULL)# With explicit data frame (named argument) jlm(WellbeingScore ~ Income + Age, data = community) # With explicit data frame (positional argument) jlm(WellbeingScore ~ Income + Age, community) # Using juse() default juse(community) jlm(WellbeingScore ~ Income + Age) # CATEGORICAL PREDICTORS # # Per-call: categorical = ... applies for one call only and does not # persist. Useful for a quick one-off analysis. jlm(WellbeingScore ~ Region + Age, categorical = "Region") # The recommended approach for repeated analyses: register the variable # with jdummy() before running jlm(). This sets the categorical # treatment persistently, so subsequent jlm() calls (and other # analyses) use the same coding without re-specifying. jdummy(community, Region) jlm(WellbeingScore ~ Region + Age) # To choose a non-default reference category: jdummy(community, Region, ref = "West") jlm(WellbeingScore ~ Region + Age) # FORCING NUMERIC TREATMENT # # Use numeric = ... when a variable has value labels (haven_labelled) # but you want it treated as a continuous score (e.g., a Likert # scale you want the slope-per-unit interpretation for). jlm(WellbeingScore ~ Age + Education, numeric = "Education") # Multiple overrides at once jlm(WellbeingScore ~ Education + Environment4 + Smoker, numeric = c("Education", "Environment4"), categorical = "Smoker") # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. jdummy(community, NULL) juse(NULL)
jload() reads a data file and assigns it as a data frame in your
environment. Supports SPSS (.sav), Stata (.dta), SAS
(.sas7bdat, .xpt), Excel (.xlsx, .xls),
CSV (.csv), and R's native .rds format.
The file format is determined entirely by the file extension —
jload() reads the extension (e.g. .sav, .dta,
.xlsx) and uses the appropriate reader automatically.
By default, jload() looks for the file in the working
directory. If a data folder is configured with
joptions(data.dir = ...), that folder is searched first. If a
full file path is provided, it is used directly.
The data frame is automatically named after the file (without the
extension). Use the name argument to specify a different name.
jload( file, name = NULL, use = FALSE, overwrite = FALSE, package = FALSE, check.missing = TRUE, sheet = NULL, preserve.udm = TRUE, udm.notice = NULL, quiet = FALSE )jload( file, name = NULL, use = FALSE, overwrite = FALSE, package = FALSE, check.missing = TRUE, sheet = NULL, preserve.udm = TRUE, udm.notice = NULL, quiet = FALSE )
file |
Character string. The filename (e.g. |
name |
Character string (optional). The name to assign the data frame in your environment. If omitted, the name is derived from the filename. |
use |
Logical. If |
overwrite |
Logical. If |
package |
Logical. If |
check.missing |
Logical. If |
sheet |
For Excel files only. The sheet to read — either a sheet
name (character) or sheet number (integer). Defaults to the first sheet.
If the file has multiple sheets and |
preserve.udm |
Logical. If |
udm.notice |
Per-call override for the user-defined missing value (UDM) notification frequency.
|
quiet |
Logical; default FALSE. When TRUE, suppresses jload()'s informational messages (the directory-resolution note, file found, load summary, default-data note, and the UDM narrative, overriding udm.notice). Errors, warnings, the multi-sheet advisory, and the overwrite prompt are still shown. |
File paths:
Use forward slashes (/) in file paths. If you copy a path from
Windows File Explorer, replace the backslashes with forward slashes.
R does not accept single backslashes in file paths.
File search order:
If the path contains a directory separator (/), the path
is used directly.
If the path is a bare filename, jload() checks:
(a) the folder named by joptions("data.dir") if it is set
and exists; (b) the working directory.
Auto-naming:
The data frame name is derived from the filename by stripping the
extension. If the resulting name starts with a digit (which R does not
allow as a variable name), you must supply the name argument.
Excel files:
Excel files (.xlsx, .xls) do not contain variable or
value labels. The data will be loaded as plain numeric, character, or
logical columns. Use jrelabel() to add labels after loading
if needed.
Coded missing values:
When check.missing = TRUE, the function scans numeric variables
for values that appear to be coded missing values. Only whole-number
values are considered (coded missing values are always integers like
-99, 999, etc.). Two detection methods are used:
For SPSS files, user-defined missing values stored in the file metadata are reported with high confidence.
A heuristic scan detects negative values among otherwise positive data and extreme outlier values (5x the range of other values).
Detected values are reported but not changed. Use jrecode
to convert them to NA if needed.
Invisibly returns the loaded data frame. The primary effect is assigning the data frame in the calling environment.
jstats for the package overview,
workflow conventions, and complete function listing.
## Not run: # SPSS jload("community.sav") jload("community.sav", use = TRUE) jload("community.sav", name = "MySurvey") # Stata jload("community.dta") # SAS jload("community.sas7bdat") jload("community.xpt") # Excel jload("community.xlsx") jload("community.xlsx", sheet = "Wave2") jload("community.xlsx", sheet = 2) # CSV and R native jload("community.csv") jload("community.rds") # Extension omitted -- jload searches for a matching file automatically jload("community") # Full file path jload("C:/Projects/Data/community.dta") # Quiet load (e.g. in a .Rprofile or startup script): suppresses the # informational messages while still loading. Errors and warnings still show. jload("community.rds", name = "MyData", quiet = TRUE) ## End(Not run)## Not run: # SPSS jload("community.sav") jload("community.sav", use = TRUE) jload("community.sav", name = "MySurvey") # Stata jload("community.dta") # SAS jload("community.sas7bdat") jload("community.xpt") # Excel jload("community.xlsx") jload("community.xlsx", sheet = "Wave2") jload("community.xlsx", sheet = 2) # CSV and R native jload("community.csv") jload("community.rds") # Extension omitted -- jload searches for a matching file automatically jload("community") # Full file path jload("C:/Projects/Data/community.dta") # Quiet load (e.g. in a .Rprofile or startup script): suppresses the # informational messages while still loading. Errors and warnings still show. jload("community.rds", name = "MyData", quiet = TRUE) ## End(Not run)
Fits a binary logistic regression using stats::glm() with
family = binomial and prints formatted output including an omnibus
model test, model summary statistics, and a coefficients table with
odds ratios (Exp(B)).
jlogistic( formula, data, subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, ci = NULL, classification = FALSE, diagnostics = NULL, ref.categories = NULL, full = FALSE, case.processing.detail = NULL, digits = NULL, ..., value.id = NULL )jlogistic( formula, data, subset = NULL, variable.id = NULL, numeric = NULL, categorical = NULL, count = NULL, ci = NULL, classification = FALSE, diagnostics = NULL, ref.categories = NULL, full = FALSE, case.processing.detail = NULL, digits = NULL, ..., value.id = NULL )
formula |
A model formula, e.g. |
data |
A data frame containing variables referenced in |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
numeric |
Optional character vector of variable names to treat as continuous even if they have value labels. |
categorical |
Optional character vector of variable names to treat as categorical even if they lack value labels. |
count |
Optional character vector of independent-variable names to
treat as counts for this call (the per-call counterpart of
|
ci |
Logical or NULL. If TRUE, adds 95% confidence intervals for
Exp(B). If NULL (default), defers to |
classification |
Logical. If TRUE, prints a classification table showing predicted vs observed outcomes. Default is FALSE. |
diagnostics |
Logical, character vector, or NULL. If TRUE, prints
VIF table. If a character vector, |
ref.categories |
Logical or NULL. Per-call override for showing the
reference-categories block (the baseline level dropped from each set of
dummy variables). |
full |
Logical. If TRUE, turns on ci, classification, and diagnostics. Does not override explicit FALSE values. |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
... |
Reserved for argument-name checking. Passing |
value.id |
Character or NULL. Value-label display mode for the dummy
category rows in the Coefficients table: one of |
The dependent variable must be coded 0/1. If it is not, the function
stops with a clear error message and suggests the appropriate
jrecode() command.
Handles haven-labelled variables, registered dummy variables via
jdummy(), and the numeric/categorical overrides
in the same way as jlm().
Invisibly returns a list of class jst_logistic containing:
The fitted glm object.
Character string logistic.
The model frame used to fit the model.
The formula after dummy expansion.
Formatted coefficient table (data frame).
Flat data frame of raw, full-precision
coefficient statistics (one row per coefficient): term (machine
key, shared with jlm), b, SE, Wald, df,
p, exp_b, and exp_ci_lower / exp_ci_upper
odds-ratio CI bounds (present regardless of the ci display
toggle). Carries an outcome attribute.
List of raw, full-precision model-level fit statistics:
ll_model, ll_null, deviance, null_deviance,
the omnibus likelihood-ratio test (chi_sq, omnibus_df,
omnibus_p), Cox & Snell and Nagelkerke pseudo R-squared
(cox_snell_r2, nagelkerke_r2), aic, and n.
Nagelkerke pseudo R-squared.
Cox & Snell pseudo R-squared.
-2 Log Likelihood.
Akaike Information Criterion.
Named vector: chi_square, df, p.
Number of observations.
Character string describing what the model predicts.
Names of dummy variable columns.
Reference category descriptions.
Named numeric vector of VIF values, or NULL.
Pipeline and missing data counts.
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame -- Volunteer is already coded 0/1 jlogistic(Volunteer ~ Income + Age, data = community) # A 1/2-coded dichotomy (Yes = 1, No = 2) must be recoded to 0/1 first df <- community df$OwnsHome01 <- jrecode(df, OwnsHome, map = "1=1; 2=0", labels = "0=No; 1=Yes") jlogistic(OwnsHome01 ~ Income + Age, data = df) # Using juse() default juse(community) jlogistic(Volunteer ~ Income + Age) # CATEGORICAL PREDICTORS # # Per-call: categorical = ... applies for one call only and does not # persist. jlogistic(Volunteer ~ Region + Age, categorical = "Region") # The recommended approach for repeated analyses: register the variable # with jdummy() before running jlogistic(). This sets categorical # treatment persistently across subsequent analyses. jdummy(community, Region) jlogistic(Volunteer ~ Region + Age) # To choose a non-default reference category: jdummy(community, Region, ref = "West") jlogistic(Volunteer ~ Region + Age) # FORCING NUMERIC TREATMENT # # Use numeric = ... when a labelled variable should enter as a score. jlogistic(Volunteer ~ Age + Education, numeric = "Education") # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. jdummy(community, NULL) juse(NULL)# With explicit data frame -- Volunteer is already coded 0/1 jlogistic(Volunteer ~ Income + Age, data = community) # A 1/2-coded dichotomy (Yes = 1, No = 2) must be recoded to 0/1 first df <- community df$OwnsHome01 <- jrecode(df, OwnsHome, map = "1=1; 2=0", labels = "0=No; 1=Yes") jlogistic(OwnsHome01 ~ Income + Age, data = df) # Using juse() default juse(community) jlogistic(Volunteer ~ Income + Age) # CATEGORICAL PREDICTORS # # Per-call: categorical = ... applies for one call only and does not # persist. jlogistic(Volunteer ~ Region + Age, categorical = "Region") # The recommended approach for repeated analyses: register the variable # with jdummy() before running jlogistic(). This sets categorical # treatment persistently across subsequent analyses. jdummy(community, Region) jlogistic(Volunteer ~ Region + Age) # To choose a non-default reference category: jdummy(community, Region, ref = "West") jlogistic(Volunteer ~ Region + Age) # FORCING NUMERIC TREATMENT # # Use numeric = ... when a labelled variable should enter as a score. jlogistic(Volunteer ~ Age + Education, numeric = "Education") # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. jdummy(community, NULL) juse(NULL)
jnumeric() tells jstats to treat one or more variables as numeric
(continuous) wherever their analysis class matters, overriding the package's
automatic structural guess. It is the counterpart to jdummy
(categorical) and jcount (count): a variable carries exactly
one registered intent at a time, so registering it as numeric clears any
prior dummy or count registration. Registration changes no data and assigns
nothing – you do not write df <- jnumeric(...). It is stored for the
session, keyed by the data frame's name; save the data frame in R format
(.rds) to keep it across sessions.
jnumeric(data, ..., remove = FALSE, clear.all = FALSE)jnumeric(data, ..., remove = FALSE, clear.all = FALSE)
data |
A data frame, or omitted to use the |
... |
One or more unquoted variable names to register. |
remove |
Logical; if |
clear.all |
Logical; if |
The typical use is a small-range whole number that the structural classifier would treat as categorical (e.g. a 0-6 attitude item) but that you want analyzed as a continuous score.
invisible(NULL). Called for its side effect on the session
registration notebook.
# Treat a labelled Likert item as a continuous score (slope-per-unit) jnumeric(community, Environment2) # one labelled 1-5 item jnumeric(community, Environment2, Environment4) # several at once jnumeric(community, Environment2, remove = TRUE) # unregister one jnumeric() # list all registrations jnumeric(community, NULL) # clear community's numeric registrations jnumeric(clear.all = TRUE) # clear every frame's numeric registrations# Treat a labelled Likert item as a continuous score (slope-per-unit) jnumeric(community, Environment2) # one labelled 1-5 item jnumeric(community, Environment2, Environment4) # several at once jnumeric(community, Environment2, remove = TRUE) # unregister one jnumeric() # list all registrations jnumeric(community, NULL) # clear community's numeric registrations jnumeric(clear.all = TRUE) # clear every frame's numeric registrations
Controls session-wide settings that affect how the package handles
missing-value information and related conventions. joptions
complements joutput: joutput governs output verbosity and
tiering, while joptions holds session-wide conventions plus a small number
of per-function display defaults (currently the jcorr() cell
layout). Settings are read fresh on each function call:
changing a setting after data has been loaded does not retroactively
transform data already in memory. jconvert is the
explicit transform path for data already in the workspace.
joptions( missing.convention = NULL, udm.convention.codes = NULL, data.dir = NULL, corr.layout = NULL, quiet = FALSE )joptions( missing.convention = NULL, udm.convention.codes = NULL, data.dir = NULL, corr.layout = NULL, quiet = FALSE )
missing.convention |
One of |
udm.convention.codes |
Numeric vector, length 1 to 3. See Slots. |
data.dir |
Character string (length 1), or |
corr.layout |
One of |
quiet |
Logical; default FALSE. When TRUE, joptions() applies the change silently, suppressing both the status panel and the convention nudge. A bare joptions() status query always prints regardless of quiet. |
Invisibly returns NULL. Called for the side effect of
updating session options and printing the status panel.
Character, length 1. One of "none",
"spss", or "stata". Default: "none".
"none" preserves loaded data as-is (no automatic conversion
between user-defined missing value (UDM) representations at load time). "spss" or
"stata" opts into load-time auto-conversion via
jload, and also supplies the target convention for
fresh UDM declarations on columns with no existing convention.
Numeric vector, length 1 to 3, whole
numbers, no duplicates. Sign unconstrained. Default:
c(-99, -98, -97). The recommended UDM code set used
by jconvert when translating Stata-style missing values
(.a, .b, .c, .d) into SPSS-form
numeric codes, and by the load-time diagnostic for
convention-matched detection.
Character string (length 1), or NULL. Default:
NULL. When NULL, jsave writes
bare-filename saves to the working directory and jload
searches the working directory. When set, names a folder (relative
to the working directory) used as both the save target for
bare-filename saves and as the first directory searched on
bare-filename loads. The folder is auto-created on first save if
it doesn't already exist (nested paths are created in full).
To clear a previously-set folder back to this default, pass
data.dir = "" (an empty string); passing
data.dir = NULL leaves the current setting unchanged
(see Call patterns). Filenames containing a directory
separator (/) bypass this setting and are taken literally.
Character, length 1. One of "wide" or
"stacked". Default: "wide". The default cell layout for
jcorr when three or more variables are correlated:
"wide" puts r and p on one line with N beneath; "stacked"
stacks r, p, and N on three lines for a narrower table that fits more
variables. A per-call layout argument to jcorr()
overrides this. It lives here rather than in joutput
because it is specific to one function's output, not a tiered
analysis-content toggle.
joptions()Print the current settings panel.
joptions(NULL)Reset all slots to defaults, then print the panel.
joptions(slot = value, ...)Set one or more slots,
then print the panel. Passing slot = NULL as a named
argument leaves that slot at its current value – useful for
setting one slot without touching another. To reset a single
slot to its default, pass the default value explicitly (e.g.
joptions(missing.convention = "none")). Because
data.dir's default is NULL – which already means
"leave alone" – it is cleared instead with data.dir = "".
Setting missing.convention to "spss" or "stata"
triggers a one-time scan of globalenv() for data frames whose
predominant UDM convention differs from the newly-set value. When
mismatches exist, a one-line notice lists the affected data frames
and suggests jconvert for alignment. The notice is
informational; nothing is changed. Plain data frames with no
UDM-bearing columns – including the course datasets in their
standard form – do not trigger the notice.
joutput for output-verbosity settings;
jstats for the package overview.
joptions() # show current settings joptions(missing.convention = "spss") # set, panel, nudge joptions(udm.convention.codes = c(-99, -98)) # set, panel, no nudge joptions(data.dir = "Data") # set save/load folder joptions(missing.convention = "stata", udm.convention.codes = c(-99, -98, -97)) # set both joptions(missing.convention = "spss", udm.convention.codes = NULL) # set mc, leave codes joptions(NULL) # reset all to defaultsjoptions() # show current settings joptions(missing.convention = "spss") # set, panel, nudge joptions(udm.convention.codes = c(-99, -98)) # set, panel, no nudge joptions(data.dir = "Data") # set save/load folder joptions(missing.convention = "stata", udm.convention.codes = c(-99, -98, -97)) # set both joptions(missing.convention = "spss", udm.convention.codes = NULL) # set mc, leave codes joptions(NULL) # reset all to defaults
Controls what analysis functions display by default. Three preset levels are available, and individual toggles can override specific settings within any level. Per-call arguments on analysis functions always take precedence over joutput() settings.
joutput( level, effect.size = NULL, regression.ci = NULL, means.ci = NULL, levene = NULL, posthoc = NULL, diagnostics = NULL, case.processing = NULL, case.processing.detail = NULL, variable.id = NULL, value.id = NULL, ref.categories = NULL, udm.notice = NULL, digits = NULL, quiet = FALSE )joutput( level, effect.size = NULL, regression.ci = NULL, means.ci = NULL, levene = NULL, posthoc = NULL, diagnostics = NULL, case.processing = NULL, case.processing.detail = NULL, variable.id = NULL, value.id = NULL, ref.categories = NULL, udm.notice = NULL, digits = NULL, quiet = FALSE )
level |
Character. One of
|
effect.size |
Logical or NULL. Override the level's default for effect size display. |
regression.ci |
Logical or NULL. Override the level's default for
confidence intervals on regression coefficients ( |
means.ci |
Logical or NULL. Override the level's default for
confidence intervals on means and mean differences ( |
levene |
Logical or NULL. Override the level's default for Levene's test display. |
posthoc |
Logical or NULL. Override the level's default for post-hoc test display (jaov only). |
diagnostics |
Logical or NULL. Override the level's default for regression diagnostic output (jlm only). |
case.processing |
Three-state toggle. |
case.processing.detail |
Detail tier for the Case Processing
Summary's missing-data breakdown: |
variable.id |
Character or NULL. Variable label display mode, one
of |
value.id |
Character or NULL. Value-label display mode for the
categorical levels that appear in |
ref.categories |
Logical or NULL. Override the level's default for the reference categories block (registered dummies). |
udm.notice |
Three-state toggle controlling the user-defined
missing-value (UDM) notification emitted by |
digits |
Integer or NULL. Number of decimal places shown for
continuous statistics in the analysis-function output tables
(range 0-7; |
quiet |
Logical; default FALSE. When TRUE, joutput() applies the level/toggle change silently (the status panel is not printed). A bare joutput() status query always prints regardless of quiet. |
Invisibly returns NULL. Called for its side effect of setting session options.
jstats for the package overview,
workflow conventions, and complete function listing.
joutput("standard") # effect sizes + means/diff CIs (jt, jaov) joutput("standard", regression.ci = TRUE) # also show jlm/jlogistic coefficient CIs joutput("full") # everything joutput() # show current settings joutput(NULL) # reset to defaultsjoutput("standard") # effect sizes + means/diff CIs (jt, jaov) joutput("standard", regression.ci = TRUE) # also show jlm/jlogistic coefficient CIs joutput("full") # everything joutput() # show current settings joutput(NULL) # reset to defaults
Unified plotting function. Can be called in three ways:
jplot(x, which = "core", ...) ## Default S3 method: jplot( x, ..., by = NULL, type = NULL, line = FALSE, equation = TRUE, r2 = TRUE, band = "ci", subset = NULL, labels = NULL, numeric = NULL, categorical = NULL, count = NULL ) ## S3 method for class 'jst_lm' jplot( x, which = "core", focal = NULL, at = "zero", equation = TRUE, r2 = TRUE, ... ) ## S3 method for class 'jst_logistic' jplot(x, which = "core", focal = NULL, at = "zero", ...) ## S3 method for class 'jst_ttest' jplot(x, which = "core", ...) ## S3 method for class 'jst_anova' jplot(x, which = "core", ...) ## S3 method for class 'jst_corr' jplot(x, which = "core", ...) ## S3 method for class 'jst_crosstab' jplot(x, which = "core", ...) ## S3 method for class 'jst_desc' jplot(x, which = "core", ...) ## S3 method for class 'jst_freq' jplot(x, which = "core", ...)jplot(x, which = "core", ...) ## Default S3 method: jplot( x, ..., by = NULL, type = NULL, line = FALSE, equation = TRUE, r2 = TRUE, band = "ci", subset = NULL, labels = NULL, numeric = NULL, categorical = NULL, count = NULL ) ## S3 method for class 'jst_lm' jplot( x, which = "core", focal = NULL, at = "zero", equation = TRUE, r2 = TRUE, ... ) ## S3 method for class 'jst_logistic' jplot(x, which = "core", focal = NULL, at = "zero", ...) ## S3 method for class 'jst_ttest' jplot(x, which = "core", ...) ## S3 method for class 'jst_anova' jplot(x, which = "core", ...) ## S3 method for class 'jst_corr' jplot(x, which = "core", ...) ## S3 method for class 'jst_crosstab' jplot(x, which = "core", ...) ## S3 method for class 'jst_desc' jplot(x, which = "core", ...) ## S3 method for class 'jst_freq' jplot(x, which = "core", ...)
x |
A result object from one of the package's analysis functions (result-object form), or a data frame (data-first form). |
which |
Character vector. |
... |
Additional arguments: for the result-object form these are passed to class-specific methods; for the data-first form these are unquoted variable names (1 or 2). |
by |
Unquoted variable name for group-coloring (data-first form). |
type |
Character. Plot type override for the data-first form. One
of |
line |
Controls a line overlay on data-first scatter plots. One of
|
equation |
Logical. If TRUE (default), displays the equation in the
subtitle for |
r2 |
Logical. If TRUE (default), displays R-squared in the subtitle alongside the equation. |
band |
Character. Uncertainty band type for |
subset |
Optional unquoted logical expression to filter cases for this call only (data-first form). |
labels |
Character or NULL. Variable label display mode (data-first
and formula forms): one of |
numeric |
Optional character vector of plotted-variable names to treat
as continuous for this call (the per-call counterpart of
|
categorical |
Optional character vector of plotted-variable names to
treat as categorical for this call (the per-call counterpart of
|
count |
Optional character vector of plotted-variable names to treat
as counts for this call (the per-call counterpart of |
focal |
Unquoted name of the independent variable to place on the
x-axis for |
at |
Character string or named list specifying where non-focal
independent variables are held when drawing the fitted line in
|
Result-object form: Pass a result object returned by one of the package's analysis functions. Produces appropriate plots for each class of result (see valid plot names below).
Formula form (for plots that distinguish DV from IV): Pass a
formula as the first argument, followed optionally by a data frame. Used
for scatterplots and boxplots, consistent with the formula syntax of
jlm(), jaov(), and jt(). The DV on the left of
~ goes on the y-axis; the IV on the right goes on the x-axis. Only
single-IV formulas are supported here; for multi-IV models, fit with
jlm() and pass the result to jplot().
Variable-list form (for distributions and counts): Pass a data frame followed by one or two unquoted variable names. Used for histograms (1 numeric), bar charts (1 categorical), and grouped bar charts (2 categorical). Calls that would otherwise auto-detect to a scatter or boxplot produce a helpful error directing you to the formula form.
Supports pipeline integration (jsubset, jcomplete, per-call
subset), grouping via by = , and regression lines with
equation/R-squared/band annotations.
Valid plot names by class (for the result-object form):
jst_lm: fit, predicted, effects,
coef, vif, residuals, qq,
scale, cooks, leverage
jst_logistic: probability, roc,
calibration, binned, cooks, leverage,
coef, vif
jst_ttest, jst_anova: box
jst_corr: heatmap, scatter (scatter requires
exactly 2 variables in the correlation)
jst_crosstab: bar
The shortcut keyword core (default) produces a curated default
set for the class; all produces every plot the class supports.
Valid plot types for the data-first form: histogram, bar,
scatter, box, grouped_bar.
Valid line values: FALSE (default), TRUE (alias for
lm), lm, loess, connect.
Valid band values: ci (default confidence band around the
regression line, flares at the ends), pi (prediction interval for
individual observations, wider), see (constant-width +/- t*SEE
band illustrating the homoskedasticity assumption), none (no band).
Invisibly, a single ggplot object if one plot is produced,
or a named list of ggplot objects if multiple are produced
(result-object form). Invisibly returns the ggplot object for
the data-first form.
jplot(default): the default method: a scatter or box plot from a formula (DV ~ IV), or a histogram or bar chart from a data frame and one or more variables.
jplot(jst_lm): diagnostic, coefficient (forest), and fitted-effect plots for a jlm() linear-regression result.
jplot(jst_logistic): predicted-probability (S-curve) and coefficient plots for a jlogistic() result.
jplot(jst_ttest): a group-comparison box plot for a jt() result, with the group means marked.
jplot(jst_anova): a group-comparison box plot for a jaov() result, with the group means marked.
jplot(jst_corr): a heat-map of the correlation matrix for a jcorr() result, or a scatter plot for a single pair.
jplot(jst_crosstab): a grouped bar chart of cell counts for a jcrosstab() result.
jplot(jst_desc): (planned) direct plotting of a jdesc() result is not yet available; this method points you to the data-first form, for example jplot(data, Variable).
jplot(jst_freq): (planned) direct plotting of a jfreq() result is not yet available; this method points you to the data-first form, for example jplot(data, Variable).
jstats for the package overview,
workflow conventions, and complete function listing.
# Result-object form m <- jlm(WellbeingScore ~ Income + Age, community) jplot(m) # core diagnostics + fit plot jplot(m, which = "coef") # coefficient forest plot jplot(m, which = "fit", focal = Age, at = "mean") # Formula form (scatter and box) jplot(WellbeingScore ~ Income, community) # scatter jplot(WellbeingScore ~ Income, community, line = "lm") # + regression line jplot(WellbeingScore ~ Income, community, line = "lm", band = "see") jplot(WellbeingScore ~ Income, community, by = Volunteer, line = "lm") # Boxplot: assert the grouping variable as categorical (labelled # variables otherwise enter numerically; jdummy() registration also works) jplot(WellbeingScore ~ Region, community, categorical = "Region") # Variable-list form (distributions and counts) jplot(community, Age) # histogram jplot(community, Region) # bar chart jplot(community, Region, Volunteer, # grouped bar chart categorical = c("Region", "Volunteer")) # Using juse() default (formula form; omit the data frame) juse(community) jplot(WellbeingScore ~ Income) # scatter jplot(WellbeingScore ~ Income, line = "lm") # + regression line# Result-object form m <- jlm(WellbeingScore ~ Income + Age, community) jplot(m) # core diagnostics + fit plot jplot(m, which = "coef") # coefficient forest plot jplot(m, which = "fit", focal = Age, at = "mean") # Formula form (scatter and box) jplot(WellbeingScore ~ Income, community) # scatter jplot(WellbeingScore ~ Income, community, line = "lm") # + regression line jplot(WellbeingScore ~ Income, community, line = "lm", band = "see") jplot(WellbeingScore ~ Income, community, by = Volunteer, line = "lm") # Boxplot: assert the grouping variable as categorical (labelled # variables otherwise enter numerically; jdummy() registration also works) jplot(WellbeingScore ~ Region, community, categorical = "Region") # Variable-list form (distributions and counts) jplot(community, Age) # histogram jplot(community, Region) # bar chart jplot(community, Region, Volunteer, # grouped bar chart categorical = c("Region", "Volunteer")) # Using juse() default (formula form; omit the data frame) juse(community) jplot(WellbeingScore ~ Income) # scatter jplot(WellbeingScore ~ Income, line = "lm") # + regression line
jrecode() recodes a variable using a simple map string that specifies
how old values should be converted to new values. It is designed for
situations where you need to collapse categories, change numeric codes,
or recode dichotomies. Variable and value labels are handled automatically.
Map and labels rules can also produce missing values: plain system NA
via the NA / System / SYSMIS aliases, or
Stata-style tagged missing values (.a through .z) when
the active convention is Stata. See Missing values in the map
below for the canonical patterns under each convention.
jrecode(data, orig.var, map, labels = NULL, convention = NULL)jrecode(data, orig.var, map, labels = NULL, convention = NULL)
data |
A data frame containing the original variable. |
orig.var |
The variable to recode (unquoted, e.g. |
map |
A quoted string specifying the recode rules, using the
format An optional
Individual values can also be mapped to system NA using the same
aliases: Under Stata convention, values can be mapped to Stata-style missing-value tokens:
Examples:
|
labels |
Optional. A quoted string specifying value labels for the
new variable, using the format The left side of each rule may be a numeric code or, under Stata
convention, a Stata-style missing-value token ( If omitted, the function attempts to transfer value labels automatically from the original variable. This works when the original variable has value labels and the mapping is one-to-one (no categories are collapsed). When categories are collapsed, labels cannot be transferred automatically and a note is printed. Example: |
convention |
Optional. One of When |
The function accepts haven-labelled, plain numeric, and factor variables.
The variable label from the original variable is carried across automatically with "(recoded)" appended. If the original variable has no variable label, the variable name is used instead.
Value labels are handled in three ways, in order of priority:
If labels is supplied, those labels are used as-is.
If labels is omitted and the original variable has value
labels, they are automatically transferred to the new codes — provided
the mapping is one-to-one (no collapsing). For example, recoding 1/2 to
1/0 will carry "Yes" and "No" across to the new codes automatically.
If categories are collapsed (multiple old values map to one new value), automatic transfer is not possible and a note is printed directing you to supply labels manually.
NA values in the original variable are always set to NA in the new variable,
regardless of the else setting.
Values that appear to be coded missing values (e.g. -99, -9, 999) from SPSS
or another package are automatically detected and set to NA, even when
else=copy is used. A note is printed when this occurs.
If the map does not include an else clause and there are unmapped
values in the variable, the function stops with a message listing the
unmapped values so you can fix the map before proceeding.
If the map specifies values that do not exist in the original variable, a warning is issued (but the function continues). This helps catch typos in the map string.
Missing values in the map. The package supports two conventions
for representing user-defined missing values (UDMs), and the syntax for
producing UDMs from jrecode() depends on which one is active:
Under SPSS convention (the default), UDMs are real numeric codes carrying metadata that flags them as missing. The two-step canonical pattern is:
df$EducR <- jrecode(df, Education,
map = "1,2=1; 3=2; 4,5=3; -99,-98=-99",
labels = "1=High school or less; 2=Some college; 3=Degree")
df <- jdeclare_udm(df, EducR, codes = c(Refused = -99))
The jrecode() call assigns the numeric sentinel -99; the
subsequent jdeclare_udm() call attaches the label and flags
-99 as missing. Labeling -99 inside the labels
argument is unnecessary — jdeclare_udm() owns that label.
Under Stata convention, UDMs are typed missing cells marked
with Stata-style tags (.a through .z). The single-call
canonical pattern is:
df$EducR <- jrecode(df, Education,
map = "1,2=1; 3=2; 4,5=3; else=.a",
labels = "1=High school or less; 2=Some college; 3=Degree; .a=Refused")
Under Stata convention, jdeclare_udm() is not needed for this
pattern — jrecode() handles both the value recoding and the
Stata-style missing-value labeling in one call.
Writing Stata-style missing-value tokens while the active convention is SPSS raises an
informative error that echoes the user's call rewritten in SPSS-style
syntax. Switching the convention session-wide is one line:
joptions(missing.convention = "stata").
A haven_labelled vector with the recoded values, variable
label, and (if supplied or auto-transferred) value labels applied. Assign
this to a new column in your data frame:
MyData$AgeGroupR <- jrecode(MyData, AgeGroup, map = "...")
jdeclare_udm for declaring user-defined missing
values on a column after a recode (the SPSS-style canonical pattern).
jrelabel for applying labels to an existing variable
after a recode.
joptions for the session-level
missing.convention setting.
jstats for the package overview,
workflow conventions, and complete function listing.
# Recode with explicit labels (a 1/2 dichotomy to 0/1) df <- community df$OwnsHome01 <- jrecode(df, OwnsHome, map = "1=1; 2=0", labels = "0=No; 1=Yes") # Collapse categories (must supply labels) df$RegionR <- jrecode(df, Region, map = "1,2=1; 3,4=2", labels = "1=North or South; 2=East or West") # Use else=copy to carry unspecified values across unchanged df$EducR <- jrecode(df, Education, map = "5=4; else=copy", labels = "4=Bachelor's degree or higher") # Use else=NA to deliberately drop unspecified values to system NA df$EducR2 <- jrecode(df, Education, map = "4=1; 5=1; else=NA", labels = "1=College degree") # Convert a specific coded missing value to system NA df$EducR3 <- jrecode(df, Education, map = "-99=System; else=copy") # Stata convention: Stata-style missing-value tokens in map and labels # (single call; convention = "stata" scopes the choice to this call only) df$EducR4 <- jrecode(df, Education, map = "1,2=1; 3,4,5=2; else=.a", labels = "1=No college; 2=College; .a=Refused", convention = "stata") # Using juse() default juse(df) df$RegionR2 <- jrecode(Region, map = "1,2=1; 3,4=2", labels = "1=North or South; 2=East or West")# Recode with explicit labels (a 1/2 dichotomy to 0/1) df <- community df$OwnsHome01 <- jrecode(df, OwnsHome, map = "1=1; 2=0", labels = "0=No; 1=Yes") # Collapse categories (must supply labels) df$RegionR <- jrecode(df, Region, map = "1,2=1; 3,4=2", labels = "1=North or South; 2=East or West") # Use else=copy to carry unspecified values across unchanged df$EducR <- jrecode(df, Education, map = "5=4; else=copy", labels = "4=Bachelor's degree or higher") # Use else=NA to deliberately drop unspecified values to system NA df$EducR2 <- jrecode(df, Education, map = "4=1; 5=1; else=NA", labels = "1=College degree") # Convert a specific coded missing value to system NA df$EducR3 <- jrecode(df, Education, map = "-99=System; else=copy") # Stata convention: Stata-style missing-value tokens in map and labels # (single call; convention = "stata" scopes the choice to this call only) df$EducR4 <- jrecode(df, Education, map = "1,2=1; 3,4,5=2; else=.a", labels = "1=No college; 2=College; .a=Refused", convention = "stata") # Using juse() default juse(df) df$RegionR2 <- jrecode(Region, map = "1,2=1; 3,4=2", labels = "1=North or South; 2=East or West")
jrelabel() attaches a variable label and/or value labels to any
variable in a data frame. It is designed as a simple label applicator —
it does not recode values or compare variables. Use it to add labels after
a recode, to fix missing labels, or to label any variable that needs them.
The function accepts haven-labelled, plain numeric, factor, and character
variables. The output is always a haven_labelled vector, which is
compatible with all jstats functions.
Both the labels and var.label arguments are optional. If
neither is supplied, the function returns the variable unchanged as a
haven_labelled vector.
If the variable already has labels, they are silently overwritten when new labels are provided.
jrelabel(data, var, labels = NULL, var.label = NULL)jrelabel(data, var, labels = NULL, var.label = NULL)
data |
A data frame containing the variable. |
var |
The variable to label (unquoted, e.g. |
labels |
Optional. A quoted string specifying value labels using the
format Examples:
|
var.label |
Optional. A quoted string to use as the variable label
(the description shown by |
A haven_labelled vector with the requested labels applied.
Assign this back to a column in your data frame:
MyData$VarName <- jrelabel(MyData, VarName, ...)
jrecode for recoding values with optional labels
in a single step.
jstats for the package overview,
workflow conventions, and complete function listing.
# Add value labels after a recode df <- data.frame(Status = c(1, 2, 1, 2, 1, 2)) df$StatusR <- ifelse(df$Status == 1, 1, 0) df$StatusR <- jrelabel(df, StatusR, labels = "1=Yes; 0=No", var.label = "Status (recoded)") # Add just a variable label df$StatusR <- jrelabel(df, StatusR, var.label = "Employment Status") # Add just value labels df$StatusR <- jrelabel(df, StatusR, labels = "1=Yes; 0=No") # Using juse() default juse(df) df$StatusR <- jrelabel(StatusR, labels = "1=Active; 0=Inactive")# Add value labels after a recode df <- data.frame(Status = c(1, 2, 1, 2, 1, 2)) df$StatusR <- ifelse(df$Status == 1, 1, 0) df$StatusR <- jrelabel(df, StatusR, labels = "1=Yes; 0=No", var.label = "Status (recoded)") # Add just a variable label df$StatusR <- jrelabel(df, StatusR, var.label = "Employment Status") # Add just value labels df$StatusR <- jrelabel(df, StatusR, labels = "1=Yes; 0=No") # Using juse() default juse(df) df$StatusR <- jrelabel(StatusR, labels = "1=Active; 0=Inactive")
jsave() writes a data frame to a file. Supports SPSS (.sav),
Stata (.dta), SAS interchange (.xpt), Excel (.xlsx),
CSV (.csv), and R's native .rds format.
The file format is determined entirely by the file extension you
provide — for example, "mydata.sav" saves as SPSS,
"mydata.dta" saves as Stata, and "mydata.xlsx" saves
as Excel. Changing the extension changes the format.
By default, jsave() writes bare-filename saves to the working
directory, matching base R's saveRDS() and write.csv().
To save into a subfolder, set joptions(data.dir = "...")
once per session (or in .Rprofile). Filenames containing a
directory separator (/) bypass this setting and are taken
literally.
If the data argument is omitted, the default data frame set by
juse() is used.
jsave(data, file, overwrite = FALSE, preserve.udm = TRUE)jsave(data, file, overwrite = FALSE, preserve.udm = TRUE)
data |
A data frame (unquoted). If omitted, uses the default set by
|
file |
Character string. The filename with extension (e.g.
|
overwrite |
Logical. If |
preserve.udm |
Logical. If |
File paths:
Use forward slashes (/) in file paths. If you copy a path from
Windows File Explorer, replace the backslashes with forward slashes.
R does not accept single backslashes in file paths.
File location:
If the path contains a directory separator, the file is saved to that exact location.
If the path is a bare filename and joptions("data.dir")
is set, the file is saved to that folder (auto-created if it
doesn't yet exist).
If the path is a bare filename and joptions("data.dir")
is unset (the default), the file is saved to the working
directory.
Format notes:
SPSS (.sav) and Stata (.dta) preserve variable
labels and value labels.
Excel (.xlsx) and CSV (.csv) do not preserve
variable or value labels.
R native (.rds) preserves the data frame exactly as it
exists in R, including all attributes.
Stata files are written as version 14 format.
Legacy Excel format (.xls) is not supported for saving.
Use .xlsx instead.
Invisibly returns NULL. Called for its side effect of
writing a file to disk.
jstats for the package overview,
workflow conventions, and complete function listing.
# A runnable save into R's session temporary folder jsave(community, file.path(tempdir(), "community.sav"), overwrite = TRUE) ## Not run: # The file extension determines the format --- # the same data frame can be saved in any supported format jsave(community, "community.sav") # SPSS jsave(community, "community.xlsx") # Excel jsave(community, "community.csv") # CSV jsave(community, "community.rds") # R native # Stata and SAS formats cannot carry community's SPSS-form missing-value # declarations -- convert first (jsave() pre-flights this and says so) jsave(jconvert(community, to = "stata"), "community.dta") # Stata jsave(jconvert(community, to = "baseR"), "community.xpt") # SAS interchange # Using juse() default jsave(, "community.sav") # Full file path jsave(community, "C:/Output/community.sav") ## End(Not run)# A runnable save into R's session temporary folder jsave(community, file.path(tempdir(), "community.sav"), overwrite = TRUE) ## Not run: # The file extension determines the format --- # the same data frame can be saved in any supported format jsave(community, "community.sav") # SPSS jsave(community, "community.xlsx") # Excel jsave(community, "community.csv") # CSV jsave(community, "community.rds") # R native # Stata and SAS formats cannot carry community's SPSS-form missing-value # declarations -- convert first (jsave() pre-flights this and says so) jsave(jconvert(community, to = "stata"), "community.dta") # Stata jsave(jconvert(community, to = "baseR"), "community.xpt") # SAS interchange # Using juse() default jsave(, "community.sav") # Full file path jsave(community, "C:/Output/community.sav") ## End(Not run)
Provides a quick overview of a data frame for screening. A red "Data Screening" title is printed first, then a short header block (case and variable counts, cases with missing data, variables with outliers), followed by up to three tables: a Variable Types table (Base R storage type, the jstats analysis-role class, an optional sub-class, an optional classification source, distinct-value counts, and optional central- tendency columns), a Missing Data & Outliers table, and – when variable labels are shown – a Variable Labels table last. Handles haven-labelled and date/time variables gracefully.
jscreen( data, ..., outlier.sd = 3, subset = NULL, variable.id = NULL, value.id = NULL, types = TRUE, issues = TRUE, r.type = FALSE, stats = FALSE, digits = NULL )jscreen( data, ..., outlier.sd = 3, subset = NULL, variable.id = NULL, value.id = NULL, types = TRUE, issues = TRUE, r.type = FALSE, stats = FALSE, digits = NULL )
data |
A data frame. |
... |
Optional unquoted variable names to screen. If omitted, all variables in the data frame are screened. |
outlier.sd |
Numeric. Number of standard deviations from the mean to flag as potential outliers (Numeric-class variables only). Default is 3. |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
value.id |
Not supported by |
types |
Logical. If TRUE (default), prints the Variable Types table. Set FALSE to suppress it. |
issues |
Logical. If TRUE (default), prints the Missing Data &
Outliers table, which lists only the variables that actually have
missing data or flagged outliers (clean variables are omitted). Set
FALSE to suppress the table entirely. Suppressing |
r.type |
Logical. If TRUE, adds a "Base R Type" column (numeric / haven_labelled / factor / character / date-time) to the Variable Types table, showing each variable's storage type alongside its jstats class. Default is FALSE: the storage type is expert detail (its main signal is "this variable carries value labels / came from SPSS or Stata"), so it is opt-in rather than shown by default. The returned data frame always includes it regardless of this setting. |
stats |
Logical or character. Adds central-tendency columns to the
Variable Types table for numeric-like variables. FALSE (default) shows
none; TRUE shows both Mean and Median; |
digits |
Integer or NULL. Number of decimal places for the Mean and
Median columns. NULL (default) defers to |
The jstats Class column reports how the package treats each variable in analyses (Numeric, Categorical, Numbers-as-text, Date-time, Unsupported), in contrast to the Base R Type column's storage view; the same classification gates outlier screening, so only Numeric-class variables are SD-screened and the Outliers cell is left blank for the rest. Zero counts are shown blank so only affected variables carry numbers; a column (or the whole Missing/Outliers table) is omitted entirely when nothing is flagged, and the header count lines explain the omission.
When at least one variable's class comes from a registration (jnumeric,
jcount, or jdummy) rather than the structural guess, a Source column
appears. It reads as an exception-marker: "registered" is shown against
the registered variables and the structurally classified rows are left
blank, so the registrations stand out at a glance. (The returned data
frame still records the literal tier for every row.) Set stats = TRUE (or "mean" / "median") to add
central-tendency columns for the numeric-like variables: Numeric and Count
variables show Mean and Median, while a numeric dichotomy shows the raw
mean of its stored codes and a blank median. A numeric dichotomy coded
other than 0/1 (e.g. the 1/2 Group-4 coding) is flagged with a "*" on its
sub-class cell, since its raw mean is not a proportion; the marker shows
even when stats is off, surfacing the recode need.
When variable names are supplied, only those variables are screened. When
omitted, all variables in the data frame are screened. If a subset
expression references variables not already in the screening list, they
are included automatically.
Invisibly returns a data frame of the screening results, with one
row per variable and columns including the Base R type, the jstats
Class and SubClass, the classification Source
("registered" or "structural"), distinct-value count, missing count
and percentage, the outlier count (NA for non-Numeric variables), and
the Mean and Median (NA where not meaningful: Median is NA
for dichotomies, and both are NA for non-numeric-like variables). The
returned values are the raw counts; only the printed tables blank zeros
and omit clean rows.
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jscreen(community) jscreen(community, outlier.sd = 2.5) # Show the Base R storage type column jscreen(community, r.type = TRUE) # Add Mean and Median columns for numeric-like variables jscreen(community, stats = TRUE) # Suppress tables (header block only) jscreen(community, types = FALSE, issues = FALSE) # Using juse() default juse(community) jscreen() jscreen(Income, Age, WellbeingScore) jscreen(Income, Age, WellbeingScore, subset = Volunteer == 1)# With explicit data frame jscreen(community) jscreen(community, outlier.sd = 2.5) # Show the Base R storage type column jscreen(community, r.type = TRUE) # Add Mean and Median columns for numeric-like variables jscreen(community, stats = TRUE) # Suppress tables (header block only) jscreen(community, types = FALSE, issues = FALSE) # Using juse() default juse(community) jscreen() jscreen(Income, Age, WellbeingScore) jscreen(Income, Age, WellbeingScore, subset = Volunteer == 1)
jsubset() sets a persistent case-selection expression that is
applied automatically by jstats analysis functions when the
default data frame (set by juse()) is in use. This is analogous
to the SPSS FILTER command.
The expression is stored per dataset, so switching juse() between
datasets preserves each dataset's setting independently.
The expression applies whenever the matching dataset is used, regardless
of whether it was supplied via juse() or specified explicitly in
a function call. To bypass it temporarily without losing it, use
jsubset(off) before the analysis and jsubset(on) afterward.
This matches the SPSS FILTER / USE ALL convention.
Expressions use standard R logical operators: ==, !=,
<, <=, >, >=, & (AND), | (OR),
! (NOT), xor() (XOR), and %in%. Using = for
equality or the SPSS-style keywords AND/OR/NOT will
produce a helpful error suggesting the correct R syntax.
jsubset(data, expr)jsubset(data, expr)
data |
Optional data frame. If supplied, the expression is stored
on that dataset specifically. If omitted, the dataset set by
|
expr |
A logical expression (e.g.
If |
Invisibly returns NULL. Called for its side effect.
jstats for the package overview,
workflow conventions, and complete function listing.
juse(community) jsubset(Age < 40) # Set using juse default jsubset(community, Age < 40) # Explicit dataset jsubset(Age < 40 & WellbeingScore > 50) # Compound condition jsubset(off) # Deactivate jsubset(on) # Reactivate jsubset() # Check status jsubset(NULL) # Clear entirely # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)juse(community) jsubset(Age < 40) # Set using juse default jsubset(community, Age < 40) # Explicit dataset jsubset(Age < 40 & WellbeingScore > 50) # Compound condition jsubset(off) # Deactivate jsubset(on) # Reactivate jsubset() # Check status jsubset(NULL) # Clear entirely # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)
jsum() computes the sum of values across multiple variables for each
case (row) in the data frame. This is typically used to create composite
scores from a set of related items (e.g. summing 6 survey items into a
total scale score).
By default, cases with any missing values receive NA. Use the
min.valid argument to allow partial sums — for example,
min.valid = 1 returns the sum of available values as long as at
least one item is non-missing.
Variables can be listed individually or using colon notation to select a
range of consecutive columns (e.g. Attitude1:Attitude6).
jsum(data, ..., min.valid = NULL, var.label = NULL)jsum(data, ..., min.valid = NULL, var.label = NULL)
data |
A data frame, or omit to use the |
... |
Unquoted variable names. Use colon notation (e.g.
|
min.valid |
Integer (optional). The minimum number of non-missing
values required to compute a sum. If a case has fewer non-missing
values, the result is |
var.label |
Character string (optional). A variable label to attach to the result. If omitted, an auto-generated label is used. |
A numeric vector the same length as nrow(data), suitable for
assigning to a new column:
MyData$Total <- jsum(Var1, Var2, Var3).
javg for computing row-wise means.
jstats for the package overview,
workflow conventions, and complete function listing.
# Set the default data frame (so you can omit it in function calls) juse(community) # Sum three variables (all must be non-missing) community$EnvTotal <- jsum(Environment1, Environment3, Environment4) # Sum with partial data allowed (at least 2 non-missing) community$EnvTotal <- jsum(Environment1, Environment3, Environment4, min.valid = 2) # Sum using colon range for consecutive columns community$EnvTotal <- jsum(Environment1:Environment5) # Mix colon ranges and explicit names (e.g. after reverse-coding an item) community$Environment2R <- jrecode(community, Environment2, map = "1=5; 2=4; 3=3; 4=2; 5=1") community$ScaleTotal <- jsum(Environment1, Environment2R, Environment3:Environment5) # With a custom variable label community$ScaleTotal <- jsum(Environment1:Environment5, var.label = "Environment Scale Total") # With an explicit data frame (instead of using juse default) community$EnvTotal <- jsum(community, Environment1, Environment3, Environment4) # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)# Set the default data frame (so you can omit it in function calls) juse(community) # Sum three variables (all must be non-missing) community$EnvTotal <- jsum(Environment1, Environment3, Environment4) # Sum with partial data allowed (at least 2 non-missing) community$EnvTotal <- jsum(Environment1, Environment3, Environment4, min.valid = 2) # Sum using colon range for consecutive columns community$EnvTotal <- jsum(Environment1:Environment5) # Mix colon ranges and explicit names (e.g. after reverse-coding an item) community$Environment2R <- jrecode(community, Environment2, map = "1=5; 2=4; 3=3; 4=2; 5=1") community$ScaleTotal <- jsum(Environment1, Environment2R, Environment3:Environment5) # With a custom variable label community$ScaleTotal <- jsum(Environment1:Environment5, var.label = "Environment Scale Total") # With an explicit data frame (instead of using juse default) community$EnvTotal <- jsum(community, Environment1, Environment3, Environment4) # Not normally needed. You'd clear a default or registration only to # undo a mistake, or -- as in this example -- to reset state for testing. juse(NULL)
Runs a t-test and prints formatted group descriptives and test results. By default, runs the traditional Student's independent samples t-test assuming equal variances. Optional parameters provide Welch's correction, paired samples, effect size (Cohen's d), Levene's test, and confidence interval for the mean difference. Handles haven-labelled, numeric, and factor grouping variables. For haven-labelled variables, numeric codes are displayed alongside labels in the group descriptives table.
jt( formula, data, paired = FALSE, welch = FALSE, effect.size = NULL, levene = NULL, ci = NULL, subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, full = FALSE, digits = NULL )jt( formula, data, paired = FALSE, welch = FALSE, effect.size = NULL, levene = NULL, ci = NULL, subset = NULL, variable.id = NULL, value.id = NULL, case.processing.detail = NULL, full = FALSE, digits = NULL )
formula |
A formula of the form |
data |
A data frame containing variables referenced in |
paired |
Logical. If TRUE, runs a paired samples t-test. The two groups must have equal sample sizes. Default is FALSE. |
welch |
Logical. If FALSE (default), runs Student's t-test (equal variances assumed). If TRUE, runs Welch's t-test. Ignored when paired = TRUE. |
effect.size |
Logical or NULL. If TRUE, prints Cohen's d. If NULL
(default), defers to |
levene |
Logical or NULL. If TRUE, prints Levene's test for homogeneity
of variance. Ignored when paired = TRUE. If NULL (default), defers to
|
ci |
Logical or NULL. If TRUE, adds 95% confidence interval for the
mean difference. If NULL (default), defers to |
subset |
An optional unquoted logical expression (e.g.
|
variable.id |
Character or NULL. Variable label display mode: one of
|
value.id |
Character or NULL. Value-label display mode for the
group descriptives rows: |
case.processing.detail |
Per-call override of the Case
Processing Summary detail tier: one of |
full |
Logical. If TRUE, turns on effect.size, levene, and ci all at once. Does not override explicit FALSE values. |
digits |
Integer or NULL. Number of decimal places for continuous
statistics in the output tables (range 0-7; |
A red title identifying the test type is printed first, followed by variable labels (if present), then the results tables.
Invisibly returns a list of class jst_ttest containing:
model (the t.test result), model_frame (the analysis
data frame used for plotting), test_type, formula,
descriptives, t, df, p, mean_difference,
ci (95% CI), cohens_d, d_label, n, and
sample_info (pipeline and missing data counts).
jstats for the package overview,
workflow conventions, and complete function listing.
# With explicit data frame jt(WellbeingScore ~ Volunteer, data = community) jt(WellbeingScore ~ Volunteer, data = community, welch = TRUE) jt(WellbeingScore ~ Volunteer, data = community, full = TRUE) # Using juse() default juse(community) jt(WellbeingScore ~ Volunteer) jt(WellbeingScore ~ Volunteer, full = TRUE)# With explicit data frame jt(WellbeingScore ~ Volunteer, data = community) jt(WellbeingScore ~ Volunteer, data = community, welch = TRUE) jt(WellbeingScore ~ Volunteer, data = community, full = TRUE) # Using juse() default juse(community) jt(WellbeingScore ~ Volunteer) jt(WellbeingScore ~ Volunteer, full = TRUE)
jupdate() installs the most recent version of jstats. While jstats is
in its pre-release phase this downloads and installs the latest pre-built
version; once jstats reaches CRAN, the same command will update it the
ordinary way. Either way, you run one command instead of having to remember
an install line. It is safe to call from the console, a script, or a Quarto
document.
jupdate(ask = FALSE)jupdate(ask = FALSE)
ask |
Logical. When |
The function checks for an internet connection first; if jstats is already up to date it says so and stops. The install runs in a separate R process so the copy of jstats loaded in your session does not lock its own files during the install (the usual cause of a failed update on Windows). After a successful update you restart R once to load the new version.
Invisibly NULL. Called for its side effect of installing the
update, and for the messages it prints.
## Not run: jupdate() # update without prompting jupdate(ask = TRUE) # confirm before updating ## End(Not run)## Not run: jupdate() # update without prompting jupdate(ask = TRUE) # confirm before updating ## End(Not run)
juse() sets a default data frame that will be used automatically
by all jstats functions when the data argument is omitted.
This reduces typing and makes interactive use more convenient.
The function stores the name of the data frame, not a copy of the data. This means any changes you make to the data frame (adding columns, recoding variables, etc.) are automatically reflected in subsequent function calls.
juse(data)juse(data)
data |
A data frame (unquoted). If omitted, prints the current
default. Use |
Invisibly returns NULL. Called for its side effect of
setting, displaying, or clearing the default data frame.
juse() stores the name of the data frame, not a copy
of the data. This means any changes you make to the data frame (adding
columns, recoding variables, etc.) are automatically reflected in
subsequent function calls. This differs from base R's attach(),
which creates a snapshot that can become stale after modifications.
juse() is the recommended approach for this package.
jstats for the package overview,
workflow conventions, and complete function listing.
juse(community) # Set community as the default juse() # Display current default jdesc(Age, WellbeingScore) # Uses community automatically juse(NULL) # Clear the defaultjuse(community) # Set community as the default juse() # Display current default jdesc(Age, WellbeingScore) # Uses community automatically juse(NULL) # Clear the default