Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder with substantial clinical heterogeneity, especially in language and communication ability. There is a need for validated language outcome measures that show sensitivity to true change for this population. We used Natural Language Processing to analyze expressive language transcripts of 64 highly-verbal children and young adults (age: 6–23 years, mean 12.8 years; 78.1% male) with ASD to examine the validity across language sampling context and test-retest reliability of six previously validated Automated Language Measures (ALMs), including Mean Length of Utterance in Morphemes, Number of Distinct Word Roots, C-units per minute, unintelligible proportion, um rate, and repetition proportion. Three expressive language samples were collected at baseline and again 4 weeks later. These samples comprised interview tasks from the Autism Diagnostic Observation Schedule (ADOS-2) Modules 3 and 4, a conversation task, and a narration task. The influence of language sampling context on each ALM was estimated using either generalized linear mixed-effects models or generalized linear models, adjusted for age, sex, and IQ. The 4 weeks test-retest reliability was evaluated using Lin’s Concordance Correlation Coefficient (CCC). The three different sampling contexts were associated with significantly (P < 0.001) different distributions for each ALM. With one exception (repetition proportion), ALMs also showed good test-retest reliability (median CCC: 0.73–0.88) when measured within the same context. Taken in conjunction with our previous work establishing their construct validity, this study demonstrates further critical psychometric properties of ALMs and their promising potential as language outcome measures for ASD research.