- Research
- Open access
- Published:
Attenuated processing of vowels in the left temporal cortex predicts speech-in-noise perception deficit in children with autism
Journal of Neurodevelopmental Disorders volume 16, Article number: 67 (2024)
Abstract
Background
Difficulties with speech-in-noise perception in autism spectrum disorders (ASD) may be associated with impaired analysis of speech sounds, such as vowels, which represent the fundamental phoneme constituents of human speech. Vowels elicit early (< 100 ms) sustained processing negativity (SPN) in the auditory cortex that reflects the detection of an acoustic pattern based on the presence of formant structure and/or periodic envelope information (f0) and its transformation into an auditory “object”.
Methods
We used magnetoencephalography (MEG) and individual brain models to investigate whether SPN is altered in children with ASD and whether this deficit is associated with impairment in their ability to perceive speech in the background of noise. MEG was recorded while boys with ASD and typically developing boys passively listened to sounds that differed in the presence/absence of f0 periodicity and formant structure. Word-in-noise perception was assessed in the separate psychoacoustic experiment using stationary and amplitude modulated noise with varying signal-to-noise ratio.
Results
SPN was present in both groups with similarly early onset. In children with ASD, SPN associated with processing formant structure was reduced predominantly in the cortical areas lateral to and medial to the primary auditory cortex, starting at ~ 150—200 ms after the stimulus onset. In the left hemisphere, this deficit correlated with impaired ability of children with ASD to recognize words in amplitude-modulated noise, but not in stationary noise.
Conclusions
These results suggest that perceptual grouping of vowel formants into phonemes is impaired in children with ASD and that, in the left hemisphere, this deficit contributes to their difficulties with speech perception in fluctuating background noise.
Background
Autism spectrum disorder (ASD) is a group of neurodevelopmental conditions characterized by social and communication impairments and repetitive and restricted behaviors and interests [1]. Up to 70% of children with ASD have language delay, although the exact figures vary depending on the age group investigated and diagnostic criteria used [2,3,4,5,6]. Deficits in receptive and expressive language are associated with early ASD diagnosis [7] and worse outcome [4] and frequently observed even in verbal children with ASD [2, 4]. While in many cases language deficits in ASD can be attributed to general level of cognitive development [8] and/or social motivation [4, 9], there is a strong reason to believe that atypical processing of auditory information may contribute to the observed deficits [10,11,12].
A common difficulty faced by people with ASD, even those with normal or above normal IQ, is a poor listening ability under suboptimal acoustic conditions, such as background noise, both in experimental settings (for a review see [13]) and in real-life [14,15,16,17]. Several studies have linked speech perception in noise to fidelity of temporal processing estimated with frequency following response (FFR) at or above ~ 100 Hz [18,19,20,21]. Atypical FFR has also been found in ASD [22,23,24,25]. However, since FFR reflects both cortical and subcortical activity [26, 27], it remains unclear whether the impaired ability to perceive speech in noise in individuals with ASD is due to deficits at subcortical level, at the level of early auditory cortex, or is related to the processing of higher-order features of the speech signal in non-primary auditory cortices.
In this study, we investigated how processing of basic phonetic properties of speech sounds in auditory cortical areas contributes to deficits in speech-in-noise perception in autism. To do so, we investigated in children with autism and their typically developing peers the relationship between speech perception in noise and vowel processing using MEG, a technique that localizes the sources of electromagnetic responses in the cortex.
Focusing on vowels may be interesting in two respects. First, vowels represent the simplest and ontogenetically and phylogenetically oldest phoneme constituents of human speech. They are the first speech sounds to be produced by human infants [28]. Vowel-like sounds are present even in the vocal repertoire of non-human primates [29]. Thus, atypical vowel processing may have serious effects on speech perception in noise and on language skills in general.
The second reason relates to the acoustic properties of vowel sounds. Vowels are acoustic patterns characterized by formant structure and common periodicity. Detection of acoustic patterns is a rapid and automatic process subserved by the auditory cortex [30,31,32,33]. The combination of formants, i.e., peaks in the frequency spectrum, determines the identity of the vowel, and the periodicity of the amplitude envelope defines its pitch (i.e., fundamental frequency, f0). Extraction of these complex features is followed by processing of the vowel as an auditory “object” [34,35,36]. In the absence of linguistic context (i.e., when the vowel is not represented as part of a word), the spectral-temporal structure of the vowel remains the only auditory cue for the bottom-up grouping that governs its neural representation as a perceptually meaningful auditory object. There is some evidence that the ability to automatically group sound features is reduced in people with ASD [37,38,39], and it has been suggested that this deficit may contribute to their impaired speech perception in noisy environments, as the auditory system must rely on automatic grouping to effectively process speech [38]. However, whether automatic vowel processing in the auditory cortex in ASD is altered in a way that affects speech perception remains an open question.
The processing of auditory patterns and formation of auditory objects (figures) is associated with a sustained negative shift in neural current recorded with MEG/EEG [32, 33, 40,41,42,43]. The characteristics and neural basis of this sustained negative shift have been discussed in more detail in Stroganova et al. [44]. In particular, the sustained response has been shown to be significantly enhanced for vowels and periodicity [41, 45,46,47]. Here, to address this enhancement, measured as the difference between responses to the test and control conditions, and to emphasize its direction, which coincides with the direction of negative transient auditory responses, we will refer to it as the sustained processing negativity (SPN). It has been suggested that SPN reflects the persistent activity of “non-synchronized” neural populations [43, 48]. These neurons are most abundant in non-primary auditory areas, are highly sensitive to complex sound features such as combinations of certain spectral and temporal parameters, and are thought to support representation of meaningful auditory patterns, including species-specific vocalizations or other ecologically relevant sounds [30, 31, 36, 49,50,51]. The term “non-synchronized” refers to the property of these neurons to sustained firing for hundreds of milliseconds, especially when driven by their preferred continuous stimuli. These neurons are thought to encode temporally integrated spectral information [30, 31, 52] and transform it from an acoustic to a perceptual dimension [31]. In the case of vowels, this transformation seems to be necessary to form an integrated phonetic representation.
The presence of bottom-up grouping cues characterizing vowel sounds, such as periodicity of the amplitude envelope, formant structure, and especially the combination of these features, leads to a rapid (within the first 80 ms) and long-lasting (~ 400 ms or longer) increase of the SPN in the auditory cortex [41, 43, 53]. The sensitivity of the SPN to bottom-up grouping signals and its ubiquitous presence in the neural response of children and adults suggests that it may serve as a candidate for detecting putative vowel processing dysfunction at the level of the auditory cortex in children with ASD.
Although previous EEG/MEG studies in ASD individuals have not examined vowel-induced SPN, many studies have investigated mismatch negativity/field (MMN/MMF) response to changes in vowels or syllables in the oddball paradigm. Despite considerable variation in results, their meta-analysis showed that individuals with ASD had increased MMN/MMF latencies and decreased MMN/MMF amplitudes in response to “different phoneme” deviations, but not to phoneme-duration or phoneme-pitch deviations [54], suggesting specific disturbance of phonetic processing. The MMN/MMF studies, however, had certain limitations. First, they have not identified auditory cortical areas involved in abnormal vowel processing in ASD, either due to limitations of sensor-level analysis or coarse localization of brain activity based on a template brain. Second, MMN/MMF studies typically focused on the peaks of event-related responses, perhaps missing differences in activation beyond these peaks. Third, for reasons still under debate, transient ERP/ERF peaks in children and adults differ in timing, polarity, and underlying neural processes [55,56,57], making it difficult to compare data across age groups. Fourth, since the auditory cortex is sensitive to any regular acoustic patterns [32, 33, 37], atypical MMN to vowels may reflect a deficit that is not specific to the processing of speech sounds.
In the present study, we aimed to clarify some of the previously unresolved questions regarding putative vowel processing deficit in ASD. First, we applied MEG imaging combined with individual brain models to localize cortical sources of atypical activity associated with analysis of vowels in children with ASD. Second, we tracked the entire timecourse of the vowel-related response in the auditory cortex rather than focusing on response maxima. Third, we focused on the SPN, which is present in both children and adults, overlapping with age-specific transient components of the auditory response, and may prove informative for testing putative vowel processing deficits in the developing brain. Fourth, to disentangle putative deficits related to the processing of formant structure (“speechness”) and periodicity of the vowel sound, we used synthetic vowels in which the formant structure and f0 were either preserved or modified so that they could be analyzed separately.
We next asked whether atypical vowel processing is associated with impairment in the ability of children with ASD to distinguish speech from background noise. Such a possibility is worth investigating given the psychoacoustic evidence for the importance of vowels for speech intelligibility [58] and the evidence for impaired speech-in-noise recognition in ASD individuals, including those with normal-range IQ and audiometrically normal hearing (for a review see [13]).
Although both words/syllables and sentences have previously been used to test speech-in-noise perception deficit in children with ASD, in the present study we decided to apply the “words-in-noise” (WiN) paradigm. Words are less taxing on working memory than sentences, and their repetition is less dependent on cognitive ability, allowing this test to be used for verbal children with below-average intelligence. Our recent study showed that the ability to recognise two-syllable high-frequency words in background noise in verbal children with ASD does not correlate with their IQ [59]. Unlike the recognition of sentences, which carry higher-order linguistic information (overall tonal contour, stress pattern, syntactic structure, etc.), the recognition of isolated words relies more on bottom-up grouping cues. Thus, the words-in-noise paradigm is better suited than the sentences-in-noise paradigm to investigate the perceptual effects of a possible dysfunction in the relatively low-level cortical grouping processes underlying vowel identification.
To compare WiN perception in children with ASD and TD children, we used two types of background noises: stationary (ST) and amplitude-modulated (AM). Dips (i.e., short periods with more favorable SNR) in the AM noise allow listeners to improve speech recognition. This improvement, referred to as masking release, is attributed to the listener’s ability to detect the target speech signal during the dips [60]. It has been suggested that atypically low masking release in individuals with ASD is due to a reduced ability to integrate fragments of auditory information into meaningful words or sentences [61, 62]. However, deficit in phonetic processing, especially of vowels, may also play a role, as the preserved spectral-temporal structure of vowels is particularly important for speech perception when speech is immersed in AM noise [58, 60].
To summarize, we hypothesized that processing of natural auditory objects—vowels—is atypical in children with ASD and that this deficit may contribute to their degraded perception of words in noise. To test this hypothesis, we recruited verbal children with ASD and TD children and examined group differences in SPN using magnetoencephalography (MEG). To find out which vowel features processing is affected in ASD, we used synthetic periodic and aperiodic vowels, as well as complex periodic sounds that lacked vowel formant structure. We aimed to investigate when (in terms of processing timing) and where (in terms of auditory cortical regions) the timecourse of SPN in children with ASD diverges from the typical activation pattern. We then tested the relationship between atypical SPN activation patterns in children with ASD and their word recognition abilities in the ST and AM noise. Given the importance of vowels for “dip listening”, we hypothesized that vowel processing deficit in children with ASD, if present, should predominantly affect their performance in AM noise.
Materials and methods
Participants
The study included 35 boys with ASD aged 6.9 – 13.0 years and 39 typically developing (TD) boys of the same age range. The TD children from the control group were recruited through advertisements in the media. None of the TD participants had known neurological or psychiatric disorders. Twenty two of 39 TD participants were the same as in our recent developmental study devoted to comparison of sustained negative responses to periodicity and formant composition of vowels in children and adults [43]. The participants with ASD were recruited through several sources (advertisements in the media, consulting centers, an educational center affiliated with the Moscow State University of Psychology and Education).
The ASD diagnosis was confirmed by an experienced psychiatrist who was the author of the current study (N.A.Y.) and was based on the Diagnostic and Statistical Manual of Mental Disorder (5th ed.) criteria as well as an interview with the child and parents/caregivers. In addition, parents of all children were asked to complete the Russian translation of parental questionnaires: Social Responsiveness Scale for children [63] and Social Communication Questionnaire (SCQ-Lifetime) [64] and most of them completed these questionnaires (Table 1). Intellectual abilities were assessed using the KABC-II test, and the Mental Processing Index (MPI) was used as an IQ equivalent [65]. This index excludes tasks that require verbal concepts, verbal reasoning, and cultural knowledge and is recommended for evaluation of non-verbal abilities in both TD children and those with ASD [66].
Hearing status of all participants was determined using pure tone air conduction audiometry with a AA-02 clinical audiometer (“Biomedilen”). Auditory sensitivity was tested at 500, 1000, 2000, 4000 Hz, and the threshold averaged over the four frequencies was calculated separately for each ear. All participants had normal hearing (threshold < 20 dB HL) in either ear [67].
The Ethical Committee of the Moscow State University of Psychology and Education approved this investigation. All children gave verbal consent to participate in the study and their caregivers gave written consent to participate.
Assessment of the general level of language development
The language test incorporated 12 subtests from the Russian Child Language Assessment Battery, RuCLAB [68], which evaluates expressive and receptive skills in vocabulary (production and comprehension of word), morphosyntax (sentence production and comprehension), and discourse (text production and comprehension). The audio material was provided by a professional female native Russian speaker and recorded in studio conditions. All test stimuli were delivered through the AutoRAT application [69]. The tests took place in a quiet and child-friendly environment. Before each subtest, participants received instructions and performed 2–3 practice trials, which were not included in the final analysis. The sequence of the test items was the same for all participants. During testing, the examiner completed a paper protocol. Additionally, the sessions were videotaped, adhering to ethical standards. The description of the test components and details on the scoring are provided in the Supplementary Methods and Table S1. To characterize the overall level of language proficiency, the arithmetic mean of scores on all subtests was calculated. This mean score is hereinafter referred to as the “total language score”.
Words-in-Noise (WiN) test
Stimuli
The Words-in-Noise (WiN) test includes 160 two-syllable lemmatized Russian nouns with high imagery (ability to evoke mental images). These words were frequently occurring words according to the List of Frequent Nouns [70] and/or the list of 300 words most frequently used in daily life [71], and corresponded to the age norm for children 6 years and older. The words were spoken by a 35-year-old woman with neutral, unemotional intonation and were recorded with studio recording equipment. The mean duration of the words was 694 ms (SD = 77 ms).
Mean loudness of each word was adjusted to correspond to that of 45 dB SPL pink noise using the “spl” function of a third-party software package for MATLAB R2020a [72]. The masking noise was of two types: stationary pink noise (ST) and amplitude modulated pink noise (AM) and was synthesized using “pinknoise” and “ammod” functions of MATLAB R2020a (MathWorks, Inc.). AM noise was obtained by modulating ST noise with a 10 Hz sinusoidal function (i.e., pink noise was interrupted 10 times per second); modulation started at the 0° phase. Power spectral density of the masking noise decreases proportional to its frequency (1/f), 3 dB per octave. Noise lasted for 1 s and started 75 ms before word onset. The rise/fall of each 1-s noise signal was smoothed at 1 ms intervals. Words were presented at a fixed sound pressure level. The sound pressure level of the masking noise was varied at four signal-to-noise ratios (SNR): -0, -3, -6, and -9 dB.
Testing procedure
The stimuli were presented through Sony WH-XB900N headphones with the noise-canceling function off. The headphones were calibrated using a CEM DT-815 sound level meter. A stationary pink noise of 45 dB was taken as the reference. The experiment began with a training session during which 10 words were presented randomly against a background of ST or AM noise (3 to -3 SNR). The participant was asked to repeat the word after each presentation. Only the exact repetition was considered a correct answer. The training lasted until the child internalized the instruction, but no more than 10 min. If the instruction was successfully learned, the experimenter proceeded to the main part of the test, which included four blocks in the sequence of 0, -3, -6, and -9 dB SNR, where the 0 SNR condition was the easiest and the -9 SNR condition was the most difficult. Each block contained 40 words—20 words for each noise type (ST and AM). The type of the masking noise varied within a block in a pseudorandomized order (no more than three consecutive presentations of the same noise type). Words were selected randomly from the list and presented in one of the conditions. Each word was presented only once. For each condition (4 SNR levels, 2 types of masking noise), the number of correctly recognized words was counted. At the end of the main part of the experiment, words that the participant did not repeat correctly were presented without noise. All of the children in this study were able to repeat the words presented to them without noise. The WiN test was presented to 30 (of 39) TD children and to 29 (of 35) children with ASD. All TD and 27 of 29 ASD participants were able to complete the test. The full data (MEG, results of WiN test, MPI IQ scores) were available for 26 participants with ASD. Therefore, partial correlation analyses were conducted on this smaller sample of participants with ASD.
Scoring
Although the words corresponded to the developmental level of the participants, some of them were particularly easy or difficult to recognise. We excluded from the analysis 18 words that were “easiest” or “most difficult” according to the accepted criteria (see Supplementary Methods). For each child, we then calculated the average percent of correct responses in each of the SNRs (0 dB, -3 dB, -6 dB, -9 dB) in the ST and AM noise conditions. Many participants of both groups (14 of 30 TD and 14 of 27 ASD) gave 0% correct responses in the most difficult -9 dB ST noise condition. Therefore, we excluded the -9 dB SNR condition from further analysis. For each child, we then calculated the average percent of correct responses across SNR levels (0 dB, -3 dB, -6 dB) in the ST and AM noise conditions (WiNst and WiNam scores, respectively).
MEG experiment
Stimuli
The experimental paradigm used in the present study is identical to that described in Orekhova et al. [43]. We used four types of synthetic vowel-like stimuli used by Uppenkamp et al. [73] and downloaded from ‘http://medi.uni-oldenburg.de/members/stefan/phonology_1/’. The downloaded stimuli had durations of ~ 400 ms and were combined to create stimuli of 812 ms duration. Five strong vowels were used: /a/ (caw, dawn), /e/ (ate, bait), /i/ (beat, peel), /o/ (coat, wrote) and /u/ (boot, pool).
The synthetic periodic vowels consisted of damped sinusoids, which were repeated with a period of 12 ms, so that the fundamental frequency of each vowel was 83.3 Hz. The carrier frequencies of each vowel were kept fixed at the four lower formant frequencies, which were chosen in a typical range of an adult male speaker. These are further referred to as periodic vowels. These regular vowel stimuli have been modified, as described below, to generate three other classes of stimuli: non-periodic vowels, periodic non-vowels, and non-periodic non-vowels. To violate periodicity, the start time of each damped sinusoid was jittered within ± 6 ms relative to its start time in the original vowel, separately for each formant. Despite the degraded voice quality (hoarse voice), these non-periodic sounds were perceived as vowels. To violate formant constancy, the carrier frequency of each subsequent damped sinusoid was randomly chosen from a set of eight different formant frequencies used to produce regular vowels and randomized separately for each formant (Frequency range: formant 1 = 270—1300 Hz; formant 2 = 850—2260 Hz; formant 3 = 1750—3000 Hz; formant 4 = 3300—5500 Hz). Both periodic and non-periodic sounds with disrupted formant structure were perceived as noises rather than vowels (i.e. non-vowels). The following four stimulus types were presented during the experiment: (1) periodic vowels (/a/, /i/, /o/); (2) nonperiodic vowels (/a/, /u/, /e); (3) three variants of periodic non-vowels and (4) three variants of non-periodic non-vowels. The spectral composition of these stimuli is given in Supplementary Figure S1 (see also [43]).
Two hundred seventy stimuli of each of the four classes were presented, with three stimulus variants equally represented within each class (N = 90). All stimuli were presented in random order. Each stimulus lasted 812 ms, including rise/fall times of 10 ms each. The interstimulus intervals (ISIs) were randomly chosen from a range of 500 to 800 ms.
The non-periodic non-vowels were used as control stimuli. The contrasts of interest were (1) “non-periodic vowels versus non-periodic non-vowels”, (2) “periodic non-vowels versus non-periodic non-vowels” and (3) “periodic vowels versus non-periodic non-vowels”. By comparing these contrasts in the ASD and TD groups we investigated group differences in the processing of formant structure, periodicity/pitch, or a combination of these features in a synthetic vowel. The use of periodic vowels allowed us to test whether, in children with ASD, the combination of these features in the ‘normal’ vowel is disrupted even when processing of formant structure and periodicity is preserved.
Procedure
Participants were instructed to watch a silent video (movie/cartoon) of their choice and ignore the auditory stimuli. Stimuli were delivered binaurally via plastic ear tubes inserted into the ear channels. The tubes were attached to the MEG helmet to prevent possible noise from their contact with the subject’s clothing. The intensity was set at 90 dB SPL. The experiment included three blocks of 360 trials, each block lasting around 9 min with short breaks between blocks. If necessary, the parent/caregiver remained with the child in the MEG shielded room during the recording session.
MRI data acquisition and processing
In all participants with ASD and in 28 TD participants T1-weighted 3D-MPRAGE structural image was acquired on a Siemens Magnetom Verio 3 T scanner (Siemens Medical Systems, Erlangen, Germany) using the following parameters: [TR 1780 ms, TE 2.78 ms, TI 900 ms, FA 9°, FOV 256 × 256 mm, matrix 320 × 320, 0.8 mm isotropic voxels, 224 sagittal slices]. In 9 TD subjects MRIs were acquired at a 1.5 T Philips Intera. In 2 TD subjects 1.5 T GE Brivo MR355/MR360 was used. Cortical reconstructions and parcellations were generated using FreeSurfer v.7.4.1 [74, 75].
MEG data acquisition, preprocessing and source localization
MEG data were recorded at the Moscow Center for Neuro-cognitive Research (MEG-Center) using Elekta VectorView Neuromag 306-channel MEG detector array (Helsinki, Finland) with 0.1—330 Hz filters and 1000 Hz sampling frequency. Bad channels were visually detected and labeled, after which the signal was preprocessed with MaxFilter software (v.2.2) in order to reduce external noise using the temporal signal-space separation method (tSSS) and to compensate for head movements by repositioning the head in each time point to an “optimal” common position (head origin). This position was chosen individually for each participant as the one that yielded the smallest average shift across all data epochs after motion correction.
Further preprocessing steps were performed using MNE-Python software (v.1.4.1) [76]. The data were filtered using notch-filter at 50 and 100 Hz and a 110 Hz low-pass filter. Periods in which peak-to-peak signal amplitude exceeded the thresholds of 7e-12 T for magnetometers or 7e-10 T/m for gradiometers, and “flat” segments in which signal amplitude was below 1e-15 T for magnetometers or 1e-13 T/m for gradiometers were automatically excluded from further processing. To correct cardiac and eye movement artifacts we recorded ECG, vEOG, and hEOG and used a signal-space projection (SSP) method. Next, we excluded from analysis data segments in which head rotation exceeded a threshold of 10 degrees/s along one of the three space axes, head velocity exceeded a threshold value of 4 cm/s in 3D space, or head position deviated from the origin position by more than 10 mm in 3D space. The data were then epoched from -0.2 s to 1 s relative to stimulus onset. The mean number of artifact-free data epochs was initially higher in the TD participants (TD: 997 vs ASD: 927, p < 0.05). To equalize this number, we randomly removed 70 epochs for each TD participant. The resulting mean number of clean epochs for each subject and stimulus type was 231 (range 141—340) and 231 (range 131—347) for TD and ASD children, respectively. The epoched data were averaged separately for the four experimental conditions and then baseline corrected by subtracting the mean amplitude in -200—0 ms prestimulus interval.
To obtain the source model, the cortical surfaces reconstructed with the Freesurfer were triangulated using dense meshes with about 130,000 vertices in each hemisphere. The cortical mesh was then resampled to a grid of 4098 vertices per hemisphere, corresponding to a distance of about 4.9 mm between adjacent source points on the cortical surface.
To compute the forward solution, we used a single layer boundary element model (inner skull). Source reconstruction of the event-related fields was performed using the standardized low-resolution brain electromagnetic tomography (sLORETA) [77]. Noise covariance was estimated in the time interval from -200 to 0 ms relative to stimulus onset. To facilitate comparison between subjects, the individual sLORETA results were morphed to the fsaverage template brain provided by FreeSurfer.
MEG data analysis
Data analytic plan
First, to find out whether SPN associated with processing vowel periodicity and formant structure presents in both TD children and children with ASD, we compared responses to test stimuli (periodic vowels, non-periodic vowels, periodic non-vowels) and control stimuli (non-periodic non-vowels) separately in TD and ASD groups. This was done in the sensor space using a global root mean square RMS signal and then in the source space using the sLORETA values in combination with the nonparametric permutation test with spatiotemporal threshold-free cluster enhancement (TFCE) [78]. The TFCE analysis was performed in the left and right regions of interest (ROIs) broadly overlapping the auditory cortex. To analyze temporal characteristics of the responses associated with processing of periodicity or formant structure, we compared timecourses of neural currents evoked by test and control stimuli in those sources where the effects identified by TFCE analysis were most significant.
Second, we used TFCE cluster analysis in the ROIs to test for the group differences in differential responses (e.g. response to periodic vowel minus response to control stimulus). Then, we analyzed the timecourses of the group differences in the “most significant” sources identified by TFCE analysis.
Third, when group differences in differential responses were found, we tested for their correlation with WiN scores in children with ASD using partial correlation analysis to account for the effects of age and IQ.
Root mean square (RMS) analysis
For the sensor level analysis, all subjects’ data were transformed to a common standard head position (to [0, 0, 45] mm, default MaxFilter parameter). RMS metric was computed based on the signal from all gradiometer sensors and compared between test (periodic vowels, periodic non-vowels, non-periodic vowels) and control (non-periodic non-vowel) conditions point-by-point in the 0—800 ms stimulation interval using paired t-tests.
Spatiotemporal cluster analysis
The spatiotemporal clustering analysis was performed in the ROIs that were selected in the fsaverage template brain so as to broadly overlap the left and right auditory cortex and nearby areas (Fig. 3). These ROIs were identical to those used in our previous study [43]. The direction of the source current in the ROIs was aligned using the MNE-python function “label_sign_flip”. We then verified that in each participant and hemisphere, the timecourse averaged over all the point sources and across conditions had a negative sign between 300–800 ms, consistent with the sustained negativity observed in response to these stimuli in the auditory cortex [41, 43] and a positive sign of the P100m component between 50–150 ms. The source current data were cropped to 0–800 ms timeregion and downsampled to 500 Hz.
TFCE spatiotemporal cluster analysis was used to compare evoked source currents between groups. The TFCE procedure derives the spatiotemporal cluster-level statistics for each data point \(p\) by using a weighted average between the cluster extent \(e\) (i.e., number of connected above-threshold data points) and the cluster height \(h\) (i.e., the statistical value of the data point), calculated according to the formula:
where default values of \(E=0.5\) and \(H=2\) were applied [78], and starting threshold \({h}_{0}=0\), step size \(dh =0.4\) were set up for the better approximation. Permutation tests with TFCE effectively address the issue of multiple comparisons, and in this study, we computed 5000 TFCE permutations. For each data point, a corrected p-value was calculated by determining the proportion of permutations in which the TFCE output was greater than or equal to the original TFCE output. The null hypothesis of no cluster of the difference between the groups or conditions (within each group) was rejected at p < 0.05. As no interhemispheric spatial-adjacency-based clusters were expected, the “check_disjoint = True” option was applied, which is equivalent to running the test separately for each hemisphere.
To identify clusters showing significant differences between test and control stimuli we used one-sample permutation test with spatiotemporal TFCE (“stats.permutation_cluster_1samp_test” MNE Python function) separately in each group. In this test, the inputs were the differences between responses to test and control stimuli that were permuted within a single subject.
To identify clusters of significant Group × Condition interactions, we employed a two-sample permutation test with spatiotemporal TFCE (“stats.spatio_temporal_cluster_test” MNE Python function), where the inputs were the differences between responses to the test and control stimuli for each subject from the ASD and TD groups.
When clusters of significant differences were found, we examined timecourses of activity in the “most significant” sources within these clusters. In this case, we did not consider sources beyond the “STG + ” region (areas A1, 52, LBelt, PBelt, MBelt, RI, A4, TA2, PI and PoI1, according to the HCPMMP1 atlas [79]). All the STG + areas are structurally connected [80, 81] and participate in processing of auditory information [82,83,84]. On the other hand, activity observed in response to auditory stimuli in superior insula and superior temporal sulcus may reflect point spread from auditory cortical regions [85, 86] and inspection of the polarity of the current induced by auditory stimuli in our study confirms this (Figure S3).
To select the “most significant” sources for timecourse analysis, we applied an approach similar to that used in [87]. Specifically, we selected sources within the cluster with the average p-values that were below the global average within this cluster. The sources were defined separately for the left and right hemispheres and for each comparison performed (i.e., test-vs-control and ASD-vs-TD contrasts).
To analyze correlations of the SPN with WiN scores in children with ASD, the activity of the “most significant” sources was averaged in the time interval that displayed strongest differences between TD and ASD groups. The temporal boundaries of this interval were restricted to the mean onset and end times of significant group differences in the “most significant” sources (see Supplementary Fig. S2). Using linear regression analysis, we then calculated the adjusted SPN (SPNadj) by regressing from the response to the test stimulus the magnitude of the response to the control (non-periodic non-vowels) stimulus and the square root of the mean number of averaged epochs per condition. Thus, by computing SPNadj, we accounted for large interindividual variability in the magnitude of auditory responses and for differences in the number of averaged epochs.
Statistical analysis
Nonparametric tests (Mann–Whitney U Test, Spearman correlations, or partial Spearman correlations) were used when the distribution differed significantly from normal according to Shapiro–Wilk’s W test (p < 0.05). Otherwise parametric tests (T-test, Pearson correlations, Pearson partial correlations) were used. To calculate partial correlations we used the “pcor” function in R. Group or condition-related differences in neural responses were assessed using TFCE cluster analysis (see above). In case of multiple comparisons, the false discovery rate (FDR) method of Benjamini and Hochberg [88] for correcting p-values was applied. The accepted significance level was p < 0.05. To analyze the effect of SNR (0 dB, -3 dB and -6 dB) and type of masking noise (ST, AM) on the difference in WiN scores between the TD and ASD participants, we employed Mann–Whitney U Test and/or linear mixed model analysis using the “lmer” function in R.
Results
Characteristics of the samples
The characteristics of the samples are summarized in Table 1. Participants with ASD compared to TD participants had significantly higher SRS and SCQ scores. In the case of the SRS questionnaire, all but 3 of 34 tested subjects with ASD diagnosis had total scores above 60 points cut-off for ASD. For the SCQ questionnaire, all but 3 of 32 tested ASD subjects scored above the 15 points cut-off. The three ASD subjects who scored below the cut-off on the SRS questionnaire scored above the cut-off on the SCQ questionnaire, and vice versa. In the TD sample all but 5 children scored below ASD threshold on the SRS questionnaire, and all but one scored below ASD cut-off on the SCQ questionnaire. Given the sensitivity and specificity of the SRS and SCQ questionnaires [89, 90], these results suggest that our samples of ASD and TD children are very well separated on autistic traits.
The participants with ASD as compared with TD participants had significantly lower MPI IQ scores. 18 ASD subjects (53%) had the MPI IQ scores below 85, which corresponds to “below-average” intelligence [65]. This percentage of ASD children with below-average intelligence roughly corresponds to that reported in 8-years-olds with ASD in USA in year 2020 (61% [91]) and is slightly lower than that reported in chinese children born 2002–2008 and aged 6–12 years (~ 64% [92]).
All, but two children with ASD displayed a history of language delay or impairment, defined by parents’ reports of lack of two-word combinations at age three or the presence of language problems at the time of the diagnostic assessment. Six children with ASD experienced language regression at some point in their development.
Words in noise (WiN) test results in TD and ASD groups
Children with ASD recognised significantly fewer words than TD children in AM noise at all SNRs and in ST noise at -3 and -6 dB SNR (Fig. 1A). Both TD and ASD children performed better in case of the AM than ST noise, with the exception of the 0 dB SNR condition in the ASD group, when no “masking release” (i.e., improved performance in AM compared to ST-noise) was observed (Fig. 1B).
Mean percentage of correctly repeated words presented against a background of masking noise. A Comparison of performance in TD and ASD groups, separately for the stationary and amplitude-modulated noise conditions. B Comparison of performance in stationary and amplitude-modulated noise, separately for TD children and children with ASD. # p < 0.1, ** p < 0.01, *** p < 0.001 (Mann–Whitney U Test)
Linear mixed model with main effects of Group, noise Type (ST, AM) and noise Level (0, -3 and -6 dB SNR) and random intercept for subject have found significant interaction between Group and Type (estimate = -0.10, SE = 0.046, t(270) = -2.21, p = 0.028). Presence of gaps in noise resulted in lower masking release in the ASD group than in TD group. Given this effect, as well as the previous literature that show different masking properties of the ST and AM noise [61, 62], for each child we calculated WiNst and WiNam scores for ST and AM noise separately, as the percentage of correct responses averaged over SNR levels.
In both groups, WiNst and WiNam scores improved with age (Spearman correlations; WiNst: TD, R(N = 30) = 0.43, p = 0.02; ASD R(N = 26) = 0.50, p = 0.009; WiNam: TD R(N = 30) = 0.56, p = 0.001, ASD R(N = 26) = 0.51, p = 0.008). WiN scores did not correlate with IQ in either group (Spearman correlations; WiNst: TD R(N = 30) = 0.08, p = 0.65, ASD R(N = 26) = -0.27, p = 0.18; WiNam: TD R(N = 30) = 0.10, p = 0.58, ASD R(N = 26) = 0.02, p = 0.93).
To sum up, the ability to recognize words in noise was impaired in children with ASD compared to their TD peers, was independent of IQ, and improved with age.
WiN performance and general language abilities in TD and ASD groups
As expected, children with ASD had significantly lower total language scores than TD participants (Mann–Whitney U Test: Ntd = 29, Nasd = 32, Z = 6.2, p < 0.0001, \(\eta 2\) = 0.64). The scores improved with age in control participants (Spearman R (N = 29) = 0.74, p < 0.0001), but not in children with ASD (Spearman R (N = 32) = 0.21, p = 0.23). Unlike WiN scores, language total scores correlated with IQ (Spearman correlations, TD (N = 29): R = 0.35, p = 0.06; ASD (N = 31): R = 0.46, p = 0.008).
MEG results: sensor-level analysis
Figure 2 shows the grand average auditory evoked field waveforms expressed as root mean square (RMS) signals calculated over all 204 gradiometer channels. Compared to control stimuli, stimuli characterized by temporal regularity (periodic non-vowels), formant structure (non-periodic vowels), or a combination of these features (periodic vowels) caused a transient decrease in RMS around 100 ms relative to post stimulus onset (time range of the child P100m component), followed by its prolonged increase after ~ 150 ms that lasted up to 400 ms or longer. The RMS amplitude provides a measure of the magnetic field strength across the MEG sensors regardless of polarity, and is blind to the direction of the condition-related difference in the cortical currents [93]. In our previous study we have shown that both the decreases and increases in RMS in response to vowel-like versus control stimuli were explained by sustained negative shift of current in neural activity of the superior temporal cortex [43], i.e., SPN. To account for current polarity, we further analyzed the data at the cortical source level.
Grand average RMS response waveforms and the RMS difference waves in ASD and TD groups. Zero point at the horizontal axis corresponds to the onset of 800-ms stimulus. RMS was calculated over all gradiometer channels. Black line denotes control condition (non-periodic non-vowels) and colored lines denote test conditions (periodic vowels, non-periodic vowels, periodic non-vowels). Dashed lines indicate differential responses. The asterisks under the dashed lines correspond to significant point-by-point differences between the test and control conditions: red color—control < test; blue color—control > test (paired t-test, p < 0.05, FDR corrected). The shaded areas mark 95% confidence intervals
MEG results: analysis of the source current
Effects of periodicity and formant structure in TD and ASD participants
To test whether children with ASD, similarly to TD children [43], respond to patterned acoustic input with a SPN in the auditory cortical areas, we applied spatiotemporal cluster analysis separately in TD and ASD groups. SPN was defined as an increase in negativity in the evoked source current for the Test versus Control contrasts, i.e., a negative sign of the difference between vowel or complex periodic sounds (periodic non-vowels) and a control non-periodic non-vowel sound.
Cluster analysis was performed in the ROIs broadly overlapping the auditory cortex and nearby regions where auditory evoked activity was observed (Fig. 3). Before cluster analysis, the direction of dipole sources within the ROIs was adjusted to correspond to the dominant direction of source current in the auditory cortex (see Methods for details). In both groups, periodicity, formant structure, and their combination were associated with bilateral clusters of negative differential responses, i.e., SPN that lasted several hundred milliseconds (Fig. 3). SPN clusters encompassed primary and secondary auditory cortex and adjacent areas. In case of non-periodic vowels, the cluster of negative differences (test > control) was followed by (or co-existed with) a cluster of positive differences (test < control). In our previous study a similar pattern of differences between non-periodic vowel and control stimuli was observed in neurotypical individuals (see [43] for discussion). The temporal evolutions of the clusters of differences between test and control stimuli in the TD and ASD groups are shown in Supplementary videos S1-S6.
Significant clusters of the differences in the evoked source current between test (periodic vowel, non-periodic vowel, periodic non-vowel) and control (nonperiodic non-vowel) conditions in TD and ASD groups. Blue colors correspond to SPN, i.e. a more negative source current to the test compared to the control condition, red colors—to the opposite direction of the difference. Color intensity indicates the duration of the cluster. Black line indicates the border of the region used for cluster analysis. Note that the 798 ms point is the last time point analyzed
Figure 4 shows the sources in which the differences between periodic vowels and control stimuli identified by cluster analysis were most significant (Fig. 4A, B), and the mean timecourses in these sources (Fig. 4C, D). The respective differential responses (periodic vowels minus control stimuli) and their significance (i.e. difference from zero) are shown in Fig. 4E, F, G and H. Similar illustrations for non-periodic vowels and periodic non-vowels are given in the Supplementary materials (Figures S4 and S5).
Comparison of evoked responses to the periodic vowels and control stimuli (non-periodic non-vowels) in the “most significant” dipole sources identified by cluster analysis. A, B The “most significant” dipole sources within STG + region (outlined with a red contour) are marked by blue dots. Color shade (light blue to dark blue) indicates significance of the differences between test and control conditions in the respective point sources. The primary auditory cortex (A1) is outlined with a white contour. C, D Averaged neural current timecourses in the “most significant” sources. E, F The difference between timecourses of current evoked by the test and control stimuli. G, H T-statistics reflecting a pointwise comparison of the response timecourses to test and control stimuli. Significant differences (p < 0.05, FDR corrected) are marked in gray. Supplementary figures S4 and S5 show the similar results for non-periodic vowels and periodic non-vowels
In both groups and in response to all test stimuli, significant SPN (i.e. negative sign of the difference in the source current between responses to test and control stimuli) began earlier than 100 ms after stimulus onset and then persisted for approximately 500 ms. This timing is consistent with previous results indicating early neural discrimination of auditory patterns characterized by grouping cues [32, 45], which in our study were represented by frequency composition and/or periodicity. The group differences in the SPN are described in the next section.
Group differences in SPN related to the processing of periodicity and formant structure
TFCE cluster analysis yielded bilateral clusters of the group differences for differential responses (test – control condition) for non-periodic and periodic vowels (i.e., sounds characterized by the presence of formant structure), but not for periodic non-vowels. The spatial location of the clusters on the surface of the “inflated” brain is shown in Fig. 5. Supplementary Fig. S6 shows the same clusters projected onto the surface of white matter for a three-dimensional representation.
Differences between TD and ASD groups in sustained processing negativity (SPN) associated with periodic and non-periodic vowels. Central panels show the cortical localization of the TFCE clusters of significant group differences in differential responses (vowels minus control non-periodic non-vowel stimulus). The Coloured bar below the inflated surfaces indicates temporal extent for sources belonging to the TFCE cluster. The left and right panels show the “most significant” dipole sources selected based on the probability of the SPN group differences within the STG + region delineated by the red dashed line. Color bars below the images indicate vertices’ p-values averaged over the temporal extent of the cluster
In all cases, group differences were driven by greater negativity in TD compared to ASD group. As shown in Fig. 5, the cortical localization of clusters of significant group differences was remarkably similar for periodic and non-periodic vowels. Within the STG + region, vertex sources with the highest significance and greatest temporal extent of the differences were located anteriorly and laterally or medially relative to the primary auditory cortex (area A1). In the left hemisphere, the most significant and long-lasting differences were observed in the anterior part of the parabelt auditory area A4 [79]. In the right hemisphere, the differences were most prominent in the posterior segment of the circular insula sulcus (pINS). Additionally, the group differences in SPN were localized to the sources in the superior segment of the circular insula sulcus (sINS) and STS. However, the direction of the current in these sources (Figure S3) as well as their position directly above and below the STG + sources (see Figure S6) suggest that the group differences in sINS and STS are likely the result of point spread from the auditory cortical areas [85, 86].
To sum up, neural processing of sounds with vowel frequency composition differed between children with ASD and TD children in the non-primary auditory cortex of both hemispheres. These group differences in the SPN had a more restricted localization than the SPN itself (see Fig. 4). SPN associated with periodicity of non-vocal sound did not differ between the groups.
Current timecourses associated with processing of vowel formant structure: SPN and P3a-like response
The analysis of timecourses in the “most significant” sources identified by cluster analysis has shown that group differences emerged starting from ~ 150—200 ms after stimulus onset, and persisted for approximately 200 ms (Fig. 6E, F).
Timecourses of the group differences in differential responses to vowels in the “most significant” dipole sources identified by cluster analysis. A, B Group average timecourses of responses to vowels and control stimuli in TD and ASD groups. C, D Average timecourses of the differential responses (test stimulus—control stimulus) in TD and ASD groups. E, F Point-by-point comparison of timecourses of differential responses between the groups. Significant differences (p < 0.05, FDR corrected) are marked in gray
To ensure that the differences in the SPN between the groups were due to test rather than control stimuli, we averaged these responses across the “most significant” sources in the interval bounded by the mean onset and end times of significant group differences (see Methods and Supplementary Figure S2). No group differences were found in responses to control stimuli (all p’s > 0.3). The responses to periodic and non-periodic vowels were reduced in children with ASD compared to TD children (periodic vowel, left hemisphere: t(72) = 3.02 p = 0.003, Cohen’s d = 0.67, right hemisphere: t(72) = 4.44, p = 0.00003, Cohen’s d = 0.92; non-periodic, left hemisphere: t(72) = 1.92, p = 0.06, Cohen’s d = 0.44, right hemisphere: t(72) = 3.59, p = 0.0006, Cohen’s d = 0.78).
Inspection of Fig. 6 revealed a positive transient peak with a latency of ~ 300 ms that was most prominent in response to periodic vowels (and less so to non-periodic vowels) in the left-hemisphere. Considering recent intracranial evidence for left-hemispheric predominance of the P3a response associated with processing of novel and potentially salient speech sounds [94], this deflection might reflect an involuntary capture of attention to perceptually salient vowels presented in a sequence of less salient stimuli.
Group differences in this P3a-like response might obscure (or produce) group differences in SPN. To test this possibility, we performed an additional analysis. For each subject, we low-passed the timecourse signals at 10 Hz and estimated P3a-like amplitude as the amplitude of the largest positive peak in the 200–400 ms range relative to the mean of the two nearest negative peaks. If a positive peak between 200 and 400 ms was absent or indistinct (i.e., its amplitude relative to the preceding or following negative peaks, was below the RMS of the signal amplitude in the baseline period), the P3a-like peak was considered absent and its amplitude was set to zero.
In the pooled sample of participants, the P3a-like response to non-periodic vowels was more frequently detected in the left than in the right hemisphere (69% and 50% respectively; Chi square = 4.9, p = 0.02). In the left hemisphere, it was more frequently detected in response to nonperiodic vowels than control stimuli (69% and 45% respectively; Chi square = 8.64, p = 0.003), while no such difference was found in the right hemisphere (50% vs 42%; Chi square = 0.95, p = 0.33). There were no group differences in the occurrence of P3a-like response to non-periodic vowels either in the left (TD: 64%, ASD: 74%; Chi square = 0.85, p = 0.36) or right (TD: 44%, ASD: 57%; Chi square = 1.2, p = 0.27) hemispheres, and its amplitude did not differ between the groups (Mann–Whitney U-test, left hemisphere: Z = 1.53, p = 0.12, \(\eta 2\) = 0.03; right hemisphere: Z = 1.29, p = 0.17, \(\eta 2\) = 0.02).
In response to periodic vowels, the P3a-like peak also occurred more frequently in the left than in the right hemisphere in the combined sample of participants (82% and 66% respectively; Chi square = 4.89, p = 0.03) and was more often present in response to periodic vowels than control stimuli (left: 82% and 46% respectively; Chi square = 0.85, p < 0.0001; right: 66% and 43%; Chi square = 7.8, p = 0.005). In response to periodic vowels, P3a-like response in the left hemisphere was more frequently detected in the ASD than in TD group (TD: 72%, ASD: 94%; Chi square = 6.07, p = 0.014), whereas no group differences were found in the right hemisphere (TD: 67%, ASD: 66%; Chi square = 0.01, p = 0.93). Amplitude of the P3a-like response to periodic vowels was higher in children with ASD than in TD children in the left hemisphere (Mann–Whitney U-test: Z = 2.93, p = 0.003, \(\eta 2\) = 0.12), but not in the right one (Mann–Whitney U-test: Z = 0.67, p = 0.50, \(\eta 2\) = 0.01).
The amplitude of the left-hemispheric P3a-like response to periodic vowels decreased with age in participants with ASD (Spearman R = -0.45, p = 0.007), but not in TD participants (Spearman R = -0.11, p = 0.52), although group differences in correlation coefficients did not reach the level of significance (Fisher’s Z = 1.54, p = 0.13). No correlations with age were found for the left-hemispheric P3a-like responses to non-periodic vowels (TD: R = -0.12, ASD: R = 0.03, n.s.).
To sum up, the SPN to vowel formant structure was reduced in children with ASD in the STG areas anterior and lateral to the A1 and in pINS. This reduction was observed in both hemispheres from ~ 150—200 ms to ~ 350—450 ms after stimulus onset. In the left hemisphere, periodic and non-periodic vowels evoked P3a-like deflection that was superimposed on the sustained negativity. In response to periodic vowels, the left-hemispheric P3a-like response was present more frequently and with larger amplitude in the ASD group than in the TD group, whereas no group difference was found for the non-periodic vowels. These findings indicated that the P3a-like response might contribute to diminished SPN to periodic vowels in children with ASD in the left hemisphere, but was unlikely to account for the significant reduction of the SPN associated with processing of non-periodic vowels. In the right-hemisphere, P3a-like response can hardly explain the group differences in SPN to either periodic or non-periodic vowels.
SPN to vowels predicts WiN scores in children with ASD
To test whether atypical SPN to vowels in children with ASD predicts their WiN scores, we applied partial correlation analysis, controlling for age and IQ. SPN was preliminary adjusted (SPNadj) for the magnitude of response to control stimulus and for the number of averaged trials in the following two steps. First, for each subject, the timecourses of responses to vowel and control stimuli were averaged over the “most significant” sources in the time interval bounded by the mean start and end times of significant group differences (see Methods and Supplementary Figure S2). Second, we computed SPNadj as regression residuals after partialling out the magnitude of response to control stimulus and the square root of the number of averaged epochs from the magnitude of response to the test stimulus.
The partial correlations of SPNadj with WiN scores are presented in Table 2, separately for periodic and non-periodic vowels. For both types of vowels, greater SPNadj in the left hemisphere was associated with better WiN scores in the AM noise (WiNam). Figure 7 illustrates the relationships between adjusted SPN to periodic and non-periodic vowels in the left hemisphere and WiNam performance in the ASD sample. No significant correlations were found for the WiNam in the right hemisphere or for the WiNst in either hemisphere. We also checked for the presence of the partial correlations between SPNadj and WiN scores in TD children. None of the correlations were significant (all p > 0.7, see Supplementary Table S3).
The relationship between WiNam scores and adjusted SPN to vowel stimuli in children with ASD. WiNam scores—percent of correctly repeated words in the amplitude modulated noise. Adjusted SPN— sustained processing negativity adjusted for the response to control stimuli and for the number of averaged epochs. WiNam scores were corrected for age
To assess the specificity of the correlations for WiNam, we compared coefficients of correlations of the left-hemispheric SPNadj with the WiNam and WiNst scores using Fisher’s Z statistics for dependent samples [95]. In the case of non-periodic vowels, the correlation was significantly higher for WiNam than WiNst scores (N = 26, Z = 2.35; p = 0.018). No significant difference was found for the periodic vowels (N = 26, Z = 1.60; p = 0.11).
We also assessed group differences in partial correlations between WiN and SPNadj scores using Fisher’s Z-statistics for independent samples. In the left hemisphere, correlations between WiNam and SPNadj scores differed significantly between the autism and TD groups (periodic vowels: Z = 2.19, p = 0.029; non-periodic vowels: Z = 2.23, p = 0.026).
To test whether P3a-like response contributed to the correlation with the WiNam scores we repeated the partial correlation analysis including P3a-like response amplitude as an independent variable (Supplementary material, Table S2). The results suggest that the link between SPNadj and WiNam scores cannot be explained by the differences in the amplitude of the P3a-like component.
Children in the ASD sample had variable levels of IQ, which ranged from moderate intellectual disability to age-appropriate intellectual capacities (Table 1). Therefore, to evaluate the nonspecific effect of disorder severity, we tested whether the reduced SPNadj to vowels in children with ASD was related to their lower IQ scores. No significant Spearman correlations were found (periodic vowel, left hemisphere: R =-0.03, n.s.; periodic vowel, right hemisphere: R =-0.21, n.s.; non-periodic vowel, left hemisphere: R = 0.01, n.s.; non-periodic vowel, right hemisphere: R =-0.2, n.s.).
Discussion
We investigated whether cortical processing of isolated sounds characterized by vowel formant structure and/or periodicity (pitch) differs in the auditory cortex of children with ASD and TD children, and whether differences related to the processing of these key vowel features contribute to poor perception of words in noise in children with ASD. In both groups of children presence of periodicity and formant structure was associated with sustained processing negativity (SPN) – an early (starting before 100 ms) negative shift of current in the primary and non-primary auditory cortical areas that persisted for several hundred milliseconds. We found no evidence for atypical processing of periodicity (f0) of non-vocal spectrally complex sounds lacking formant composition in children with ASD. In contrast, the SPN evoked by vowel-like sounds characterized by formant structure, was significantly reduced in ASD as compared with TD children, regardless of the periodicity of the sound. This SPN reduction emerged relatively late (around 150—200 ms after a vowel onset) and was localized bilaterally to the auditory areas anterior to the primary auditory cortex (parabelt area A4 and/or the pINS cortex). In the left, but not in the right hemisphere, reduced SPN in response to vowels predicted poor recognition of words presented against AM noise. Overall, our results suggest that impaired processing of vowel formant composition in children with ASD contributes to their impaired ability to recover words from glimpses of speech interrupted by masking noise.
Attenuated processing of vowel formant structure in children with ASD
The presence of SPN in response to periodicity or formant structure is consistent with the previous findings in neurotypical children and adults [41, 43, 45] and extends these findings to children with ASD, at least those with phrasal speech. Similar sustained negative shifts of current were observed in several recent MEG and EEG studies in response to sounds that can be characterized as acoustic patterns distinguished on the basis of their temporal properties such as periodicity [46] or frequency composition, either static or coherently changing in time [32, 40, 42, 96,97,98]. It has been suggested that this negative shift reflects the fundamental cortical mechanism of automatic grouping in the auditory modality [98]. The early (< 100 ms) latency of the SPN in our study (Fig. 4) is consistent with evidence on the remarkably early sensitivity of human auditory cortex to acoustic patterns [32] and, in particular, to vowels [45].
Being well-recognized vocal sounds deeply shaped by the experience of verbal communication, vowels are unique auditory objects. Studies have repeatedly shown that certain areas of the secondary auditory cortex and adjacent regions show a preference for conspecific vocalizations in humans [99,100,101] and non-human primates [83]. Although still debated [100], it has been suggested that voices are similar to faces in many ways, as both are “special”, carry information about the personality and emotional state of the subject, and are processed in specialized cortical areas [99].
In this respect, the decrease of SPN evoked by periodic and non-periodic vowel-like sounds in children with ASD is a remarkable finding. This decrease can be attributed specifically to an attenuated response to formant composition rather than to the periodicity of the vowel amplitude envelope (fundamental frequency / pitch), as the latter auditory cue was absent in non-periodic vowels. Since children with ASD had normal SPNs to nonvocal sounds characterized by f0 periodicity, as well as normal responses to control nonperiodic nonvocal stimuli (see Fig. 6A and B), the reduced SPNs to vowels cannot be explained by a general decrease in response amplitude or non-specific deficit in auditory pattern processing.
Despite the early start of the SPN (< 100 ms post-stimulus onset), its group differences emerged relatively late (> 150 ms post-stimulus onset) and were located predominantly in the non-primary auditory areas (Fig. 5). Spared functional activity of the primary auditory cortex in children with ASD in our study is in line with fMRI findings in ASD individuals [102, 103]. This result is also consistent with the results of Engineer et al. [104] who found in a mice model of autism that non-primary auditory cortical areas are more vulnerable to prenatal factors leading to autism than the primary auditory cortex.
Presence of typical SPN up to at least 150 ms after vowel onset (Fig. 6) suggests that the reduced activity in response to vowels in children with ASD is not inherited from the earlier stages of analysis, such as tonotopic processing of formant frequencies [105], detection of harmonicity in complex sounds [106] or detection of an acoustic pattern [32]. On the other hand, timing of the vowel-related SPN reduction generally agrees with results of the meta-analysis of MMN/MMF studies which concluded that responses to phoneme changes (either vowels or syllables) are reduced in individuals with ASD [54]. These considerations suggest that the processing deficit in children with ASD arises at the stage of phonetic analysis.
Notably, the time at which we observed a decrease in SPN in children with ASD coincides with the time at which categorization of isolated vowel sounds into distinct phoneme categories (e.g., /u/ vs. /a/) occurs (~ 175 ms post-stimulus onset) [107]. This stage is referred to as acoustic–phonetic mapping and distinguishes brain responses reflecting the true internalized percept of a vowel category from those that index acoustic properties of the vowel [107]. Deficits in phoneme category perception (e.g., relating vowels in the /y/—/i/ continuum to the category /y/ or /i/) have previously been reported in children with ASD, despite preserved or even superior phoneme discrimination abilities (e.g., judging two vowels in the /y/—/i/ continuum as the same or different) [108]. Therefore, it is likely that the neurofunctional abnormalities leading to decreased SPN in response to vowels in children with ASD reflect impaired acoustic–phonetic mapping necessary for phoneme categorization. This hypothesis is consistent with the observation that, in the left hemisphere, SPN reduction was most reliable and persistent in the mid-STG region located lateral to the primary auditory cortex in Heschl’s gyrus (parabelt area A4 according to HCPMMP1 [79]) (Fig. 5). This area was suggested to be an initial STG waypoint of the ventral auditory stream – the auditory pathway optimized for recognition of acoustic pattern [36, 109, 110], especially those representing conspecific communication calls [111, 112]. In humans, this parabelt auditory region plays a crucial role in phoneme encoding [36, 113, 114]. The meta-analysis of neuroimaging studies of speech processing [36] concluded that phoneme recognition is associated with activation in the left mid-STG region, while the integration of phonemes into more complex patterns (i.e., words) is localized to the left anterior STG. This conclusion received strong support in a recent study of a patient with extensive lesions of the bilateral STS and left anterior STG, which showed that the intact region of the mid-STG alone can effectively subserve explicit vowel categorization despite the presence of “pure word deafness” [115].
The SPN in children with ASD was also decreased in the temporo-insular regions medially adjacent to the Hershel gyrus and extending in the anterior direction (Fig. 5). The pINS has strong structural and functional connections with the auditory cortex [81, 116,117,118,119] and is responsive to a wide variety of acoustic stimuli [82]. Yet, registration of its neuronal responses in humans [120] and non-human primates [83] has shown that this auditory region of the pINS is sensitive to conspecific vocalizations and can transmit respective auditory information further down to the anterior insula, which is involved in the evaluation of affective signals conveyed by vocal sounds. In the future, it would be interesting to test whether reduced SPN in pINS regions is associated with impaired human voice emotion recognition in ASD [121].
Apart from signals of auditory modality, the pINS comprises neuronal representations of somatosensory, motor, visual, vestibular, limbic signals and is thought to be involved in multisensory integration [122]. The right insula seems to be particularly important for the audiovisual integration [123]. In this regard, it is interesting that the reduction in the SPN induced by vowels in our participants with ASD was strongest in the right pINS (Fig. 5). In the future, it is interesting to investigate whether atypical activity or connectivity of the right pINS contribute to severe deficit in audiovisual integration during phoneme recognition in children with ASD [124].
However, it should be noted that the effects found in the insula in our study have to be interpreted with caution because the MEG localization error is increased in deep structures such as the insula [85].
Suppressed processing of vowel formant structure is associated with words in noise perception difficulties in children with ASD
The reduced negativity underlying processing of formant structure in children with ASD predicted the severity of their word recognition problems in the AM noise: the diminished SPN responses to vowels in the left hemisphere were associated with lower WiNam scores (Table 2, Fig. 7). This finding has several important implications for interpreting vowel processing deficit and its impact on auditory speech recognition in ASD.
First, while WiN performance in children with ASD showed some developmental improvement throughout childhood, neither child’s age nor IQ could explain correlations between the reduced SPN and lowered WiNam scores (Table 2). The lack of correlation between WiN scores and IQ agrees well with the previous findings on the presence of speech-in-noise recognition difficulties even in high-functioning individuals with ASD [13]. On the other hand, our results suggest that these problems may be caused, at least in part, by a deficient vowel processing at the level of the non-primary auditory cortex.
The passive presentation of auditory stimuli and the presence of the SPN deficit at already ~ 150—200 ms after sound onset—i.e., at the preattentive stage of processing—make the potential contribution of higher-order factors such as voluntary attention or motivation unlikely. Yet, involuntary orienting of attention to auditory stimuli may still influence differences between ASD and TD groups. Indeed, the P3a-like responses to periodic and nonperiodic vowels were observed in both TD and ASD children in our study, likely reflecting involuntary shift of attention to perceptually salient speech stimuli [125]. These responses were left-lateralized, consistent with the left-hemispheric bias of the P3a novelty response to speech revealed in the auditory cortex during intracranial recordings in patients [94]. The presence of elevated P3a-like response to periodic vowels in the left hemisphere in our participants with ASD suggests that their reduced negative responses to vowels is unlikely to be due to inattention to the auditory stream containing speech sounds as it was previously suggested [126]. On the contrary, their involuntary attention seems to be captured by perceptually salient periodic vowel stimuli to a greater degree than in TD children.
The excessive P3a-like response could contribute to the decrease in the left-hemispheric SPN to periodic vowels and its correlation with WiNam scores in children with ASD, but it can hardly explain the general trend toward SPN reduction or a common correlation pattern for both periodic and non-periodic vowels. There are several arguments in support of this assumption. (1) No group differences in the P3a-like responses or distinct P3a-like peaks were observed in the right hemisphere, despite the prominent right-hemispheric SPN attenuation in ASD vs TD group (Figs. 4 and 6). (2) In case of non-periodic vowels, the group differences in vowel-related negativity started already around 150 ms (Fig. 6D), i.e. in the time interval when P3a-like is not yet evident. (3) The group differences in P3a-like amplitude and in frequency-of-occurrence of the P3a-like peak were found for periodic vowels only, while in ASD group, the SPN was reduced for both periodic and nonperiodic vowels. (4) In the left hemisphere, SPN was a better predictor of WiNam scores in children with ASD than the amplitude of the P3a-like component (Supplementary Table S2).
Although beyond the scope of this paper, the possible role of an enhanced left-hemispheric P3a-like response to speech sounds in autism deserves mention. The previous studies have shown that the P3a can be relatively independent of antecedent negativity. For example, Torppa et al. [127] observed in children with cochlear implants and normal hearing smaller MMN but larger P3a in response to speech sounds. Vlaskamp et al. [128] found that tone duration deviants induced smaller MMN, but larger P3a in children with ASD. It has been hypothesized that the larger P3a reflects increased recruitment of neural resources to compensate for less efficient automatic processing of salient sounds that lie outside the current attentional focus [128].
Second, despite the presence of altered vowel-evoked SPN in the auditory cortex of both hemispheres (Fig. 5), correlations with WiNam scores were only found in the left hemisphere (Table 2), indicating a specific relationship between the functional integrity of the left secondary auditory cortex and the ability to recognize words in fluctuating noise in children with ASD. Our previous study, which used the same stimuli to compare the SPN responses in neurotypical children and adults [43], showed that the left hemispheric asymmetry in vowel-evoked SPN was present in adults but not in children, in whom SPN responses had the equal amplitude in both hemispheres. The correlation between left-hemispheric but not right-hemispheric SPN to vowels and WiNam scores in children with ASD suggests that some degree of left-hemispheric specialization for vowel processing is already present in childhood and possibly increases with age, driven by the need to integrate the encoding of vowel spectral composition with a predominantly left-lateralized language system (see [129] for the concept of graded hemispheric specialization).
The left STG region that most reliably separated between ASD and TD participants in the present study is remarkably similar to the region that appeared to be sensitive to intelligibility of the sentences that, in turn, depends on the slow temporal modulation of the speech signals at the level of syllables (3–4 Hz) [109]. Our results do not exclude a role of the left A4 region in sentence intelligibility, perhaps in the context of the top-down interactions between phonetic and higher-level (lexical, syntactic, working memory, etc.) processes, but suggest that this region is tuned to spectro-temporal composition of vowels and that weakening of this tuning hinders WiNam task performance.
Third, while the atypical left-hemispheric processing of vowels in children with ASD correlated with WiNam scores, it did not correlate with WiNst scores (Table 2). This pattern of correlations suggests that the impaired vowel processing interferes with the ability of listeners with ASD to use dips in noise to capture acoustic cues. Psychoacoustic studies have shown that in subjects with normal hearing, information important for word recognition in fluctuating noise is conveyed through both the temporal fine structure (TFS) of vowels, i.e., carrier frequencies of formants, and their common amplitude envelope (f0 / pitch) [58]. In our study, WiNam scores in children with ASD correlated with SPN evoked by periodic and non-periodic vowels (Table 2). Since both have a formant structure, but nonperiodic vowels lack f0, atypical processing of formant frequencies seems to be a crucial factor contributing to the difficulties in perceiving words in AM noise in children with ASD. However, late (> 150 ms post-stimulus onset) occurrence of SPN reduction and its location in non-primary auditory cortex (left mid-STG region) suggests that the poor “dip listening” is due to insufficient grouping of formants into a “vowel object” rather than decoding of TFS separately for each formant frequency. Consistent with this hypothesis, a recent study of older adults found that impaired central grouping of acoustic patterns is a major contributor to their deficits in processing speech in noise [130].
Our study has several limitations. First, we restricted the analysis to temporal cortical regions, where the amplitude of response to sound is maximal, whereas important differences in the processing of linguistic stimuli in autism can also be observed outside the auditory cortex, such as in inferior frontal regions [131]. Second, we presented vowel stimuli which are very special overlearned conspecific auditory objects. It would be important to clarify whether the ASD-related deficit in SPN is specific to vowels or whether it is also observed for other auditory objects that have constant or coherently changing frequency composition. Third, since we used simple words to test tor speech-in-noise processing difficulties in ASD, one should be cautious about generalizing the findings to more complex linguistic constructions such as sentences. In the case of sentences, speech recognition in noise may be supported by prosody, the slow (syllable-rate) envelope of the speech signal [58], and higher-order semantic cues [132] that are absent or less important in the case of isolated words. Fourth, we did not control subjects’ attention to the auditory stimuli, so the possibility remains that differences in attention allocation could affect the results. Comparing the responses in passive and active listening paradigms may help to clarify the role of attention in the observed differences in SPN. Fifth, only boys participated in this study. There are multiple sex differences in individuals with ASD (time of diagnosis, genetic burden, neurological and cognitive abnormalities) that may influence the variables investigated in this study [133]. Our sample size did not allow us to analyze the effect of gender. Therefore, we decided to limit our sample to males, who constitute the majority of individuals diagnosed with ASD. More research is needed to see if the findings can be extended to girls with ASD.
Direction for future research
Word recognition in amplitude-modulated noise depends on multiple integrative processes occurring at different levels of the brain hierarchy and involving numerous feedforward, recurrent, and top-down interactions [134, 135]. In a highly heterogeneous ASD population, difficulties with speech perception in noise may arise for a variety of reasons that are attributable to impairments at different stages of the auditory pathway or at higher hierarchical levels. Thus, our results indicating a role for impaired processing of vowel formant structure in WiN perception deficits in children with ASD do not exclude the contribution of other factors. In some children with autism, poor WiN recognition may be due to deficits occurring already at the subcortical level [23,24,25, 136, 137], as indexed by the frequency following response to speech sounds [138]. In the future, it would be important to investigate whether impairments in the analysis of the temporal fine structure of sound (TFS) [139] in the brainstem, and the deficit in cortical processes leading to the formation of auditory object contribute independently to poor WiN performance in children with ASD. Our findings do not rule out the “cognitive” hypothesis, which, based on behavioral results, attributes poor masking release in individuals with ASD to a weakness of the domain-unspecific mechanisms that integrate glimpsed fragments into meaningful speech [61, 62]. However, since our study was not designed to test this hypothesis, additional neuroimaging research is warranted to address this issue.
Impaired speech-in-noise hearing is one of the central symptoms of auditory processing disorder (APD)—difficulties in recognizing and interpreting sounds that result from central auditory nervous system dysfunction [140] and are often seen in children with ASD and other neurodevelopmental disorders [141]. Detecting at what level of speech signal analysis this dysfunction takes place is important for development of effective and personalized intervention for auditory processing abnormalities not only in ASD, but also in other neurodevelopmental disorders. In this respect, our findings contribute to an emerging profile of children with developmental listening difficulties that may be caused by abnormal processing of speech at different levels of the central nervous system.
Conclusion
Our results suggest that a substantial proportion of children with ASD have altered functional integrity of non-primary auditory cortical areas involved in processing of the vowel formant composition. The localization and relatively late occurrence of this deficit suggest that it arises at the stage of integration of individual formants into phonetic objects – vowels – whereas no deficit was found in children with ASD at earlier stages of processing associated with the encoding of individual formants and/or the detection of a frequency pattern. In the left hemisphere, neural deficit in vowel processing was associated with difficulty recovering words from glimpses of speech in fluctuating noise, a problem characteristic of children with autism. Thus, the impaired grouping of acoustic features into phonetic objects may have an adverse effect on the ability to recognise speech in fluctuating noise in children with autism.
Data availability
The de-identified individual-level raw data that supports this research, study materials, and analysis code used to generate the results are publicly available at https://openneuro.org/datasets/ds005234.
Abbreviations
- ASD:
-
Autism spectrum disorder
- AM:
-
Amplitude-modulated [noise]
- EEG:
-
Electroencephalography
- ERP/ERF:
-
Event-related potential/field
- INS:
-
Insula
- MEG:
-
Magnetoencephalography
- MMN/MMF:
-
Mismatch negativity/field
- MPI:
-
Mental Processing Index
- ROI:
-
Region of interest
- pINS:
-
Posterior insula
- SCQ:
-
Social Communication Questionnaire
- sLORETA:
-
Standardized low-resolution brain electromagnetic tomography
- SNR:
-
Signal-to-noise ratio
- SPN:
-
Sustained processing negativity
- SPNadj:
-
SPN adjusted for the response to the test stimulus and for the number of averaged epochs
- SRS:
-
Social Responsiveness Scale
- ST:
-
Stationary [noise]
- STG:
-
Superior temporal gyrus
- STS:
-
Superior temporal sulcus
- TD:
-
Typically developing
- TFCE:
-
Threshold-Free Cluster Enhancement
- WiN:
-
Words in noise
- WiNam:
-
Words in amplitude-modulated noise
- WiNst:
-
Words in stationary noise
References
American Psychiatric Association. Diagnostic and statistical manual of mental disorders, 5th ed. 2013.
Eigsti I-M, de Marchena AB, Schuh JM, Kelley E. Language acquisition in autism spectrum disorders: a developmental review. Res Autism Spectr Disord. 2011;5:681–91.
Wodka EL, Mathy P, Kalb L. Predictors of phrase and fluent speech in children with autism and severe language delay. Pediatrics. 2013;131:e1128–34.
Mody M, Belliveau JW. Speech and language impairments in autism: insights from behavior and neuroimaging. N Am J Med Sci. 2013;5:157–61.
Pickles A, Anderson DK, Lord C. Heterogeneity and plasticity in the development of language: a 17-year follow-up of children referred early for possible autism. J Child Psychol Psychiatry. 2014;55:1354–62.
Tager-Flusberg H, Kasari C. Minimally verbal school-aged children with autism spectrum disorder: the neglected end of the spectrum. Autism Res. 2013;6:468–78.
Nitzan T, Koller J, Ilan M, Faroy M, Michaelovski A, Menashe I, et al. The importance of language delays as an early indicator of subsequent ASD diagnosis in public healthcare settings. J Autism Dev Disord. 2023;53:4535–44.
Mouga S, Correia BR, Café C, Duque F, Oliveira G. Language predictors in autism spectrum disorder: insights from neurodevelopmental profile in a longitudinal perspective. J Abnorm Child Psychol. 2020;48:149–61.
Chevallier C, Kohls G, Troiani V, Brodkin ES, Schultz RT. The social motivation theory of autism. Trends Cogn Sci. 2012;16:231–9.
O’Connor K. Auditory processing in autism spectrum disorder: a review. Neurosci Biobehav Rev. 2012;36:836–54.
DePape A-MR, Hall GBC, Tillmann B, Trainor LJ. Auditory processing in high-functioning adolescents with Autism Spectrum Disorder. PLoS One. 2012;7:e44084.
Gonçalves AM, Monteiro P. Autism Spectrum Disorder and auditory sensory alterations: a systematic review on the integrity of cognitive and neuronal functions related to auditory processing. J Neural Transm. 2023;130:325–408.
Ruiz Callejo D, Boets B. A systematic review on speech-in-noise perception in autism. Neurosci Biobehav Rev. 2023;154:105406.
Schafer EC, Mathews L, Mehta S, Hill M, Munoz A, Bishop R, et al. Personal FM systems for children with autism spectrum disorders (ASD) and/or attention-deficit hyperactivity disorder (ADHD): an initial investigation. J Commun Disord. 2013;46:30–52.
Schafer EC, Gopal KV, Mathews L, Thompson S, Kaiser K, McCullough S, et al. Effects of auditory training and remote microphone technology on the behavioral performance of children and young adults who have autism spectrum disorder. J Am Acad Audiol. 2019;30:431–43.
Xu S, Fan J, Zhang H, Zhang M, Zhao H, Jiang X, et al. Hearing assistive technology facilitates sentence-in-noise recognition in Chinese children with autism spectrum disorder. J Speech Lang Hear Res. 2023;66:2967–87.
Sturrock A, Guest H, Hanks G, Bendo G, Plack CJ, Gowen E. Chasing the conversation: autistic experiences of speech perception. Autism Dev Lang Impair. 2022;7:23969415221077532.
Anderson S, Skoe E, Chandrasekaran B, Zecker S, Kraus N. Brainstem correlates of speech-in-noise perception in children. Hear Res. 2010;270:151–7.
Hornickel J, Skoe E, Nicol T, Zecker S, Kraus N. Subcortical differentiation of stop consonants relates to reading and speech-in-noise perception. Proc Natl Acad Sci. 2009;106:13022–7.
Hornickel J, Zecker SG, Bradlow AR, Kraus N. Assistive listening devices drive neuroplasticity in children with dyslexia. Proc Natl Acad Sci U S A. 2012;109:16731–6.
Bidelman GM, Momtaz S. Subcortical rather than cortical sources of the frequency-following response (FFR) relate to speech-in-noise perception in normal-hearing listeners. Neurosci Lett. 2021;746:135664.
Russo NM, Skoe E, Trommer B, Nicol T, Zecker S, Bradlow A, et al. Deficient brainstem encoding of pitch in children with Autism Spectrum Disorders. Clin Neurophysiol. 2008;119:1720–31.
Otto-Meyer S, Krizman J, White-Schwoch T, Kraus N. Children with autism spectrum disorder have unstable neural responses to sound. Exp Brain Res. 2018;236:733–43.
Chen J, Liang C, Wei Z, Cui Z, Kong X, Dong C-J, et al. Atypical longitudinal development of speech-evoked auditory brainstem response in preschool children with autism spectrum disorders. Autism Res. 2019;12:1022–31.
Lau JCY, To CKS, Kwan JSK, Kang X, Losh M, Wong PCM. Lifelong tone language experience does not eliminate deficits in neural encoding of pitch in autism spectrum disorder. J Autism Dev Disord. 2021;51:3291–310.
Gorina-Careta N, Kurkela JLO, Hämäläinen J, Astikainen P, Escera C. Neural generators of the frequency-following response elicited to stimuli of low and high frequency: a magnetoencephalographic (MEG) study. Neuroimage. 2021;231:117866.
Coffey EBJ, Herholz SC, Chepesiuk AMP, Baillet S, Zatorre RJ. Cortical contributions to the auditory frequency-following response revealed by MEG. Nat Commun. 2016;7:11070.
Kent RD, Rountrey C. What acoustic studies tell us about vowels in developing and disordered speech. Am J Speech Lang Pathol. 2020;29:1749–78.
Boë LJ, Berthommier F, Legou T, Captier G, Kemp C, Sawallis TR, et al. Evidence of a vocalic proto-system in the baboon (Papio papio) suggests pre-hominin speech precursors. PLoS One. 2017;12:e0169321.
Wang X, Lu T, Snider RK, Liang L. Sustained firing in auditory cortex evoked by preferred stimuli. Nature. 2005;435:341–6.
Wang X. Cortical coding of auditory features. Annu Rev Neurosci. 2018;41:527–52.
Molloy K, Lavie N, Chait M. Auditory figure-ground segregation is impaired by high visual load. J Neurosci. 2019;39:1699–708.
Chait M. How the brain discovers structure in sound sequences. Acoust Sci Technol. 2020;41:48–53.
Bizley JK, Cohen YE. The what, where and how of auditory-object perception. Nat Rev Neurosci. 2013;14:693–707.
Brefczynski-Lewis JA, Lewis JW. Auditory object perception: a neurobiological model and prospective review. Neuropsychologia. 2017;105:223–42.
DeWitt I, Rauschecker JP. Phoneme and word recognition in the auditory ventral stream. Proc Natl Acad Sci U S A. 2012;109:E505–14.
Lin IF, Yamada T, Komine Y, Kato N, Kashino M. Enhanced segregation of concurrent sounds with similar spectral uncertainties in individuals with autism spectrum disorder. Sci Rep. 2015;5:10524.
Lin IF, Shirama A, Kato N, Kashino M. The singular nature of auditory and visual scene analysis in autism. Philos Trans R Soc Lond B Biol Sci. 2017;372:20160115.
Bharadwaj H, Mamashli F, Khan S, Singh R, Joseph RM, Losh A, et al. Cortical signatures of auditory object binding in children with autism spectrum disorder are anomalous in concordance with behavior and diagnosis. PLoS Biol. 2022;20:e3001541.
Barascud N, Pearce MT, Griffiths TD, Friston KJ, Chait M. Brain responses in humans reveal ideal observer-like sensitivity to complex acoustic patterns. Proc Natl Acad Sci U S A. 2016;113:E616–25.
Gutschalk A, Uppenkamp S. Sustained responses for pitch and vowels map to similar sites in human auditory cortex. Neuroimage. 2011;56:1578–87.
Teki S, Barascud N, Picard S, Payne C, Griffiths TD, Chait M. Neural correlates of auditory figure-ground segregation based on temporal coherence. Cereb Cortex. 2016;26:3669–80.
Orekhova EV, Fadeev KA, Goiaeva DE, Obukhova TS, Ovsiannikova TM, Prokofyev AO, et al. Different hemispheric lateralization for periodicity and formant structure of vowels in the auditory cortex and its changes between childhood and adulthood. Cortex. 2024;171:287–307.
Stroganova TA, Komarov KS, Sysoeva OV, Goiaeva DE, Obukhova TS, Ovsiannikova TM, et al. Left hemispheric deficit in the sustained neuromagnetic response to periodic click trains in children with ASD. Mol Autism. 2020;11:100.
Hewson-Stoate N, Schönwiesner M, Krumbholz K. Vowel processing evokes a large sustained response anterior to primary auditory cortex. Eur J Neurosci. 2006;24:2661–71.
Keceli S, Okamoto H, Kakigi R. Hierarchical neural encoding of temporal regularity in the human auditory cortex. Brain Topogr. 2015;28:459–70.
Keceli S, Inui K, Okamoto H, Otsuru N, Kakigi R. Auditory sustained field responses to periodic noise. BMC Neurosci. 2012;13:7.
Steinmann I, Gutschalk A. Sustained BOLD and theta activity in auditory cortex are related to slow stimulus fluctuations rather than to pitch. J Neurophysiol. 2012;107:3458–67.
Tian B, Rauschecker JP. Processing of frequency-modulated sounds in the lateral auditory belt cortex of the rhesus monkey. J Neurophysiol. 2004;92:2993–3013.
Walker KM, Schnupp JW, Hart-Schnupp SM, King AJ, Bizley JK. Pitch discrimination by ferrets for simple and complex sounds. J Acoust Soc Am. 2009;126:1321–35.
Walker KM, Bizley JK, King AJ, Schnupp JW. Cortical encoding of pitch: recent results and open questions. Hear Res. 2011;271:74–87.
Sadagopan S, Wang X. Level invariant representation of sounds by populations of neurons in primary auditory cortex. J Neurosci. 2008;28:3415–26.
Stroganova TA, Komarov KS, Goyaeva DE, Obukhova TS, Ovsyannikova TM, Prokofiev AO, et al. The effect of periodicity and “vowelness” of a sound on cortical auditory responses in children. IP Pavlov J High Nerv Act. 2021;71:563–77.
Chen TC, Hsieh MH, Lin YT, Chan PS, Cheng CH. Mismatch negativity to different deviant changes in autism spectrum disorders: a meta-analysis. Clin Neurophysiol. 2020;131:766–77.
Ponton C, Eggermont JJ, Khosla D, Kwong B, Don M. Maturation of human central auditory system activity: separating auditory evoked potentials by dipole source modeling. Clin Neurophysiol. 2002;113:407–20.
Parviainen T, Helenius P, Salmelin R. Children show hemispheric differences in the basic auditory response properties. Hum Brain Mapp. 2019;40:2699–710.
Orekhova EV, Butorina AV, Tsetlin MM, Novikova SI, Sokolov PA, Elam M, et al. Auditory magnetic response to clicks in children and adults: its components, hemispheric lateralization and repetition suppression effect. Brain Topogr. 2013;26:410–27.
Fogerty D, Humes LE. The role of vowel and consonant fundamental frequency, envelope, and temporal fine structure cues to the intelligibility of words and sentences. J Acoust Soc Am. 2012;131:1490–501.
Fadeev KA, Shvedovsky EF, Nikolayeva AY, et al. Difficulty with speech perception in the background of noise in children with autism spectrum disorders is not related to their level of intelligence. Clin Psychol Spec Educ. 2023;12:180–212.
Hopkins K, Moore BC. The contribution of temporal fine structure to the intelligibility of speech in steady and modulated noise. J Acoust Soc Am. 2009;125:442–6.
Groen WB, van Orsouw L, Huurne N, Swinkels S, van der Gaag RJ, Buitelaar JK, et al. Intact spectral but abnormal temporal processing of auditory stimuli in autism. J Autism Dev Disord. 2009;39:742–50.
Alcántara JI, Weisblatt EJ, Moore BC, Bolton PF. Speech-in-noise perception in high-functioning individuals with autism or Asperger’s syndrome. J Child Psychol Psychiatry. 2004;45:1107–14.
Constantino JN. Social responsiveness scale. In: Volkmar FR, editor. Encyclopedia of autism spectrum disorders. New York: Springer New York; 2013. p. 2919–29.
Berument SK, Rutter M, Lord C, Pickles A, Bailey A. Autism screening questionnaire: diagnostic validity. Br J Psychiatry. 1999;175:444–51.
Kaufman AS, Kaufman NL. KABC-II: Kaufman assessment battery for children. 2004.
Drozdick LW, Singer JK, Lichtenberger EO, Kaufman JC, Kaufman AS, Kaufman NL. The Kaufman Assessment Battery for Children—Second Edition and KABC‑II Normative Update. In: D. P. F, McDonough EM, editors. Contemporary intellectual assessment: Theories, tests, and issues. New York, NY: The Guilford Press; 2018. p. 333–59.
British Society of Audiology. Recommended procedure: pure-tone air-conduction and boneconduction threshold audiometry with and without masking. 2018.
Lopukhina A, Chrabaszcz A, Khudyakova M, Korkina I, Yurchenko A, Dragoy O. Test for assessment of language development in Russian «KORABLIK». 2019.
Ivanova M, Dragoy O, Akinina J, et al. AutoRAT as your fingertips: introducing the new Russian aphasia test on tablet. Front Psychol. 2016;2016:1.
Lyashevskaya ON, Sharov SA. Chastotnyi slovar’ sovremennogo russkogo yazyka (na materialakh Natsional’nogo korpusa russkogo yazyka) [Frequency Dictionary of the Modern Russian Language (based on the materials of the National Corpus of the Russian Language)]. Moscow: Azbukovnik; 2009.
Partington JW, Sundberg ML. The Assessment of basic language and learning skills. Pleasant Hill: Behavior Analysts, Inc.; 1998.
Greene C. Sound Pressure Level Calculator [Internet]. MATLAB Central File Exchange; 2016 [cited 2024 May 24]. Available from: https://www.mathworks.com/matlabcentral/fileexchange/35876-sound-pressurelevel-calculator
Uppenkamp S, Johnsrude IS, Norris D, Marslen-Wilson W, Patterson RD. Locating the initial stages of speech-sound processing in human temporal cortex. Neuroimage. 2006;31:1284–96.
Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis: I. Segmentation and surface reconstruction. Neuroimage. 1999;9:179–94.
Fischl B, Sereno MI, Dale AM. Cortical surface-based analysis: II: Inflation, flattening, and a surface-based coordinate system. Neuroimage. 1999;9:195–207.
Gramfort A, Luessi M, Larson E, Engemann DA, Strohmeier D, Brodbeck C, et al. MEG and EEG data analysis with MNE-Python. Front Neurosci. 2013;7:267.
Pascual-Marqui RD. Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find Exp Clin Pharmacol. 2002;24 Suppl D:5–12.
Smith SM, Nichols TE. Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference. Neuroimage. 2009;44:83–98.
Glasser MF, Coalson TS, Robinson EC, Hacker CD, Harwell J, Yacoub E, et al. A multi-modal parcellation of human cerebral cortex. Nature. 2016;536:171–8.
Baker CM, Burks JD, Briggs RG, Conner AK, Glenn CA, Robbins JM, et al. A connectomic Atlas of the human cerebrum-chapter 5: the insula and opercular cortex. Operative Neurosurg (Hagerstown). 2018;15:S175–s244.
Rolls ET, Rauschecker JP, Deco G, Huang C-C, Feng J. Auditory cortical connectivity in humans. Cereb Cortex. 2023;33:6207–27.
Blenkmann AO, Collavini S, Lubell J, Llorens A, Funderud I, Ivanovic J, et al. Auditory deviance detection in the human insula: An intracranial EEG study. Cortex. 2019;121:189–200.
Remedios R, Logothetis NK, Kayser C. An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. J Neurosci. 2009;29:1034–45.
Tamura Y, Kuriki S, Nakano T. Involvement of the left insula in the ecological validity of the human voice. Sci Rep. 2015;5:8799.
Hauk O, Stenroos M, Treder MS. Towards an objective evaluation of EEG/MEG source estimation methods - the linear approach. Neuroimage. 2022;255:119177.
Das D, Shaw ME, Hämäläinen MS, Dykstra AR, Doll L, Gutschalk A. A role for retro-splenial cortex in the task-related P3 network. Clin Neurophysiol. 2024;157:96–109.
Alho J, Bharadwaj H, Khan S, Mamashli F, Perrachione TK, Losh A, et al. Altered maturation and atypical cortical processing of spoken sentences in autism spectrum disorder. Prog Neurobiol. 2021;203:102077.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological). 1995;57(1):289–300.
Aldridge FJ, Gibbs VM, Schmidhofer K, Williams M. Investigating the clinical usefulness of the Social Responsiveness Scale (SRS) in a tertiary level, autism spectrum disorder specific assessment clinic. J Autism Dev Disord. 2012;42:294–300.
Corsello C, Hus V, Pickles A, Risi S, Cook EH Jr, Leventhal BL, et al. Between a ROC and a hard place: decision making and making decisions about using the SCQ. J Child Psychol Psychiatry. 2007;48:932–40.
Maenner MJ, Warren Z, Williams AR, Amoakohene E, Bakian AV, Bilder DA, et al. Prevalence and characteristics of autism spectrum disorder among children aged 8 years - autism and developmental disabilities monitoring network, 11 sites, United States, 2020. MMWR Surveill Summ. 2023;72:1–14.
Zhou H, Xu X, Yan W, Zou X, Wu L, Luo X, et al. Prevalence of autism spectrum disorder in China: a nationwide multi-center population-based study among children aged 6 to 12 years. Neurosci Bull. 2020;36:961–71.
Hämäläinen M, Hari R. 10 - Magnetoencephalographic characterization of dynamic brain activation: basic principles and methods of data collection and source analysis. In: Toga AW, Mazziotta JC, editors. Brain mapping: the methods. 2nd ed. San Diego: Academic Press; 2002. p. 227–53.
Nourski KV, Steinschneider M, Rhone AE, Dappen ER, Kawasaki H, Howard MA 3rd. Processing of auditory novelty in human cortex during a semantic categorization task. Hear Res. 2024;444:108972.
Lenhard W, Lenhard A. Hypothesis tests for comparing correlations. 2014. http://www.psychometrica.de/correlation.html. Accessed 25 May 2024.
Southwell R, Chait M. Enhanced deviant responses in patterned relative to random sound sequences. Cortex. 2018;109:92–103.
Hu M, Bianco R, Hidalgo AR, Chait M. Concurrent encoding of sequence predictability and event-evoked prediction error in unfolding auditory patterns. J Neurosci. 2024;44:e1894232024.
Tóth B, Kocsis Z, Háden GP, Szerafin Á, Shinn-Cunningham BG, Winkler I. EEG signatures accompanying auditory figure-ground segregation. Neuroimage. 2016;141:108–19.
Belin P, Fecteau S, Bédard C. Thinking the voice: neural correlates of voice perception. Trends Cogn Sci. 2004;8:129–35.
Bethmann A, Brechmann A. On the definition and interpretation of voice selective activation in the temporal cortex. Front Hum Neurosci. 2014;8:499.
Pernet CR, McAleer P, Latinus M, Gorgolewski KJ, Charest I, Bestelmeyer PE, et al. The human voice areas: spatial organization and inter-individual variability in temporal and extra-temporal cortices. Neuroimage. 2015;119:164–74.
Samson F, Hyde KL, Bertone A, Soulières I, Mendrek A, Ahad P, et al. Atypical processing of auditory temporal complexity in autistics. Neuropsychologia. 2011;49:546–55.
Lai G, Schneider HD, Schwarzenberger JC, Hirsch J. Speech stimulation during functional MR imaging as a potential indicator of autism. Radiology. 2011;260:521–30.
Engineer CT, Centanni TM, Im KW, Borland MS, Moreno NA, Carraway RS, et al. Degraded auditory processing in a rat model of autism limits the speech representation in non-primary auditory cortex. Dev Neurobiol. 2014;74:972–86.
Fisher JM, Dick FK, Levy DF, Wilson SM. Neural representation of vowel formants in tonotopic auditory cortex. Neuroimage. 2018;178:574–82.
Feng L, Wang X. Harmonic template neurons in primate auditory cortex underlying complex sound processing. Proc Natl Acad Sci U S A. 2017;114:E840–e848.
Bidelman GM, Moreno S, Alain C. Tracing the emergence of categorical speech perception in the human auditory system. Neuroimage. 2013;79:201–12.
You RS, Serniclaes W, Rider D, Chabane N. On the nature of the speech perception deficits in children with autism spectrum disorders. Res Dev Disabil. 2017;61:158–71.
Albouy P, Benjamin L, Morillon B, Zatorre RJ. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science. 2020;367:1043–7.
Romanski LM, Averbeck BB. The primate cortical auditory system and neural representation of conspecific vocalizations. Annu Rev Neurosci. 2009;32:315–46.
Rauschecker JP, Tian B, Hauser M. Processing of complex sounds in the macaque nonprimary auditory cortex. Science. 1995;268:111–4.
Rauschecker JP. Ventral and dorsal streams in the evolution of speech and language. Front Evol Neurosci. 2012;4:7.
Norman-Haignere S, Kanwisher NG, McDermott JH. Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron. 2015;88:1281–96.
Khalighinejad B, Patel P, Herrero JL, Bickel S, Mehta AD, Mesgarani N. Functional characterization of human Heschl’s gyrus in response to natural speech. Neuroimage. 2021;235:118003.
Gutschalk A, Uppenkamp S, Riedel B, Bartsch A, Brandt T, Vogt-Schaden M. Pure word deafness with auditory object agnosia after bilateral lesion of the superior temporal sulcus. Cortex. 2015;73:24–35.
Dionisio S, Mayoglou L, Cho SM, Prime D, Flanigan PM, Lega B, et al. Connectivity of the human insula: a cortico-cortical evoked potential (CCEP) study. Cortex. 2019;120:419–42.
Zachlod D, Rüttgers B, Bludau S, Mohlberg H, Langner R, Zilles K, et al. Four new cytoarchitectonic areas surrounding the primary and early auditory cortex in human brains. Cortex. 2020;128:1–21.
Ghaziri J, Tucholka A, Girard G, Houde JC, Boucher O, Gilbert G, et al. The corticocortical structural connectivity of the human insula. Cereb Cortex. 2017;27:1216–28.
Uddin LQ, Nomi JS, Hébert-Seropian B, Ghaziri J, Boucher O. Structure and function of the human insula. J Clin Neurophysiol. 2017;34:300–6.
Zhang Y, Zhou W, Huang J, Hong B, Wang X. Neural correlates of perceived emotions in human insula and amygdala for auditory emotion recognition. Neuroimage. 2022;260:119502.
Schelinski S, von Kriegstein K. The relation between vocal pitch and vocal emotion recognition abilities in people with autism spectrum disorder and typical development. J Autism Dev Disord. 2019;49:68–82.
Bamiou DE, Musiek FE, Luxon LM. The insula (Island of Reil) and its role in auditory processing. Literature review. Brain Res Brain Res Rev. 2003;42:143–54.
Bushara KO, Grafman J, Hallett M. Neural correlates of auditory-visual stimulus onset asynchrony detection. J Neurosci. 2001;21:300–4.
Foxe JJ, Molholm S, Del Bene VA, Frey H-P, Russo NN, Blanco D, et al. Severe multisensory speech integration deficits in high-functioning school-aged children with Autism Spectrum Disorder (ASD) and their resolution during early adolescence. Cereb Cortex. 2015;25:298–312.
Escera C, Corral MJ. Role of mismatch negativity and novelty-P3 in involuntary auditory attention. J Psychophysiol. 2007;21:251–64.
Ceponiene R, Lepistö T, Shestakova A, Vanhala R, Alku P, Näätänen R, et al. Speech-sound-selective auditory impairment in children with autism: they can perceive but do not attend. Proc Natl Acad Sci U S A. 2003;100:5567–72.
Torppa R, Kuuluvainen S, Lipsanen J. The development of cortical processing of speech differs between children with cochlear implants and normal hearing and changes with parental singing. Front Neurosci. 2022;16:976767.
Vlaskamp C, Oranje B, Madsen GF, Møllegaard Jepsen JR, Durston S, Cantio C, et al. Auditory processing in autism spectrum disorder: mismatch negativity deficits. Autism Res. 2017;10:1857–65.
Behrmann M, Plaut DC. A vision of graded hemispheric specialization. Ann N Y Acad Sci. 2015;1359:30–46.
Holmes E, Griffiths TD. “Normal” hearing thresholds and fundamental auditory grouping processes predict difficulties with speech-in-noise perception. Sci Rep. 2019;9:16771.
Yoshimura Y, Kikuchi M, Hayashi N, Hiraishi H, Hasegawa C, Takahashi T, et al. Altered human voice processing in the frontal cortex and a developmental language delay in 3- to 5-year-old children with autism spectrum disorder. Sci Rep. 2017;7:17116.
Shi LF. Measuring effectiveness of semantic cues in degraded English sentences in non-native listeners. Int J Audiol. 2014;53:30–9.
Lai M-C, Lombardo MV, Auyeung B, Chakrabarti B, Baron-Cohen S. Sex/gender differences and autism: setting the scene for future research. J Am Acad Child Adolesc Psychiatry. 2015;54:11–24.
Felix RA 2nd, Gourévitch B, Portfors CV. Subcortical pathways: Towards a better understanding of auditory disorders. Hear Res. 2018;362:48–60.
Rauschecker JP. Dual stream models of auditory vocal communication. In: The Oxford handbook of voice perception. Oxford: Oxford University Press; 2018. p. 413.
Ramezani M, Lotfi Y, Moossavi A, Bakhshi E. Auditory brainstem response to speech in children with high functional autism spectrum disorder. Neurol Sci. 2019;40:121–5.
Tecoulesco L, Skoe E, Naigles LR. Phonetic discrimination mediates the relationship between auditory brainstem response stability and syntactic performance. Brain Lang. 2020;208:104810.
Krizman J, Kraus N. Analyzing the FFR: a tutorial for decoding the richness of auditory function. Hear Res. 2019;382:107779.
Moon IJ, Hong SH. What is temporal fine structure and why is it important? Korean J Audiol. 2014;18:1–7.
Aristidou IL, Hohman MH. Central auditory processing disorder. In: StatPearls. Treasure Island: StatPearls Publishing; 2023.
de Wit E, van Dijk P, Hanekamp S, Visser-Bochane MI, Steenbergen B, van der Schans CP, et al. Same or different: the overlap between children with auditory processing disorders and children with other developmental disorders: a systematic review. Ear Hear. 2018;39:1–19.
Acknowledgements
We sincerely thank children and their families for participating in this study. The study was conducted at the unique research facility “Center for Neurocognitive Research (MEG-Center)” of MSUPE.
Funding
The study was funded within the framework of the state assignment of the Ministry of Education of the Russian Federation (№ 073–00037-24–01).
Author information
Authors and Affiliations
Contributions
Fadeev K.A.: Formal analysis, Data recording, Data curation, Investigation, Visualization, Software, Writing - original draft; Writing - review & editing; Romero Reyes, I.V.: Formal analysis, Data curation, Investigation, Visualization, Software, Writing - original draft; Writing - review & editing; Goiaeva D.E.: Data recording, Investigation, Project administration, Writing - review & editing; Obukhova T.S.: Data recording, Investigation, Project administration, Writing - review & editing; Ovsiannikova T.M.: Data recording, Investigation, Project administration, Writing - review & editing; Prokofyev A.O.: Data recording, Investigation, Writing - review & editing; Rytikova A.M.: Investigation, Writing - review & editing; Novikov A.Y.: Investigation, Writing - review & editing; Kozunov V.V.: Investigation, Writing - review & editing; Stroganova T.A.: Conceptualization, Methodology, Funding acquisition, Writing - original draft; Writing - review & editing. Orekhova E.V.: Conceptualization, Methodology, Formal analysis, Visualization, Funding acquisition, Writing - original draft, Writing - review & editing.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The Ethical Committee of the Moscow State University of Psychology and Education approved this investigation. All children gave verbal consent to participate in the study and their caregivers gave written consent to participate.
Consent for publication
All children gave verbal consent to participate in the study and their caregivers gave written consent for publication of anonymized data.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary Material 2.
Supplementary Material 3.
Supplementary Material 4.
Supplementary Material 5.
Supplementary Material 6.
Supplementary Material 7.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Fadeev, K.A., Romero Reyes, I.V., Goiaeva, D.E. et al. Attenuated processing of vowels in the left temporal cortex predicts speech-in-noise perception deficit in children with autism. J Neurodevelop Disord 16, 67 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s11689-024-09585-2
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s11689-024-09585-2