A systematic review on diagnostic procedures for specific language impairment: The sensitivity and specificity issues
Toktam Maleki Shahmahmood1, Shohreh Jalaie2, Zahra Soleymani3, Fatemeh Haresabadi4, Parvin Nemati5
1 Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran; Department of Speech Therapy, Faculty of Paramedical Sciences, Mashhad University of Medical Sciences, Mashhad, Iran
2 Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
3 Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
4 Department of Speech Therapy, Faculty of Paramedical Sciences, Mashhad University of Medical Sciences, Mashhad, Iran
5 Department of Psychology, Tuebingen University, Tuebingen, Germany
|Date of Submission||10-Jan-2016|
|Date of Decision||04-Mar-2016|
|Date of Acceptance||25-May-2016|
|Date of Web Publication||01-Sep-2016|
Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Pich e Shemiran, Enghelab Ave., Tehran, 1148965141
Source of Support: None, Conflict of Interest: None
Background: Identification of children with specific language impairment (SLI) has been viewed as both necessity and challenge. Investigators and clinicians use different tests and measures for this purpose. Some of these tests/measures have good psychometric properties, but it is not sufficient for diagnostic purposes. A diagnostic procedure can be used for identification a specific population with confidence only when its sensitivity and specificity are acceptable. In this study, we searched for tests/measures with predefined sensitivity and specificity for identification of preschool children with SLI from their typically developing peers. Materials and Methods: A computerized search in bibliographic databases from 2000 to August 2015 was performed with the following keywords: "specific language impairment" or SLI" and "primary language impairment" or 'PLI' with at least one of the followings: "diagnosis," "identification," "accuracy," "sensitivity," and "specificity." In addition, the related citations and reference lists of the selected articles were considered. Results: The results of reviewing 23 included studies show that the index measures used in studies vary in accuracy with the sensitivity ranging from 16% to 100% and the specificity ranging from 14% to 100%. Conclusion: These varieties in sensitivity and specificity of different tests/measures confirm the necessity of attention to the diagnostic power of tests/measures before their use as diagnostic tool. Further, the results indicate there are some promising tests/measures that the available evidence supports their performances in the diagnosis of SLI in preschool-aged children, yet the place of a reference standard for the diagnosis of SLI is vacant among investigations.
Keywords: Accuracy, diagnosis, preschool age, sensitivity, specific language impairment, specificity
|How to cite this article:|
Shahmahmood TM, Jalaie S, Soleymani Z, Haresabadi F, Nemati P. A systematic review on diagnostic procedures for specific language impairment: The sensitivity and specificity issues. J Res Med Sci 2016;21:67
|How to cite this URL:|
Shahmahmood TM, Jalaie S, Soleymani Z, Haresabadi F, Nemati P. A systematic review on diagnostic procedures for specific language impairment: The sensitivity and specificity issues. J Res Med Sci [serial online] 2016 [cited 2019 Jan 22];21:67. Available from: http://www.jmsjournal.net/text.asp?2016/21/1/67/189648
| Introduction|| |
Specific language impairment (SLI) is a developmental language disorder in the absence of obvious accompanying conditions such as mental retardation, neurological damage, and hearing or emotional impairment.  Epidemiological evidence suggests that SLI represents the largest segment of language impairments, estimated at roughly 7% of the general population. , For most children with SLI, the central section of impairment is grammar; , nevertheless, the symptom of this condition is so heterogeneous among affected children.  Moreover, individuals with SLI often show similar and overlapping sets of symptom with other disorders such as dyslexia or autism.  Because of these heterogeneous and overlapping symptoms, the differential diagnosis of young children with SLI from normal developing children and children having other language disorders is a challenge for both clinicians and researchers, but it is either a necessity. , Applying accurate diagnostic tests/measures is the first step in treatment planning and carrying out epidemiologic research. 
Diagnosis of SLI depends on both exclusionary and inclusionary criteria. Exclusionary criteria help in ruling out other meddlesome conditions, and inclusionary criteria confirm the presence of language disorder in children. In spite of general agreement about exclusionary criteria among clinicians and researchers, there is no consensus about inclusionary criteria.  A brief review on literature indicates that the criteria for selecting SLI subjects vary among studies from scores of standardized tests to assessing child's language in naturalistic contexts. The diagnostic performance of these tests/measures is a critical issue. Although some of these test/measures have good psychometric properties (including validity and reliability) and even are capable in showing group differences between language impaired and typically developing (TD) children, these properties of a test/measure are not enough to conclude that it can be introduced as a diagnostic tool. Diagnostic measures of tests must be further explored at individual level rather than group level, which include finding sensitivity and specificity in predefined cutoff point/s. ,
Sensitivity of a test means the degree to which children who previously are classified as SLI (using a reference test), will be identified truly as affected by the test and specificity, means the degree to which children who are independently classified as having normal development will be identified as unaffected by the test. According to Plante and Vance, sensitivity and specificity values of ≥ 90% are considered good, 80-89% considered adequate, and below 80% considered unacceptable. 
Sensitivity and specificity of a test are completely dependent on the cutoff point score which is used to determine a line between normal and impaired individuals. To confirm the existence of language disorder in client, many clinicians use from arbitrary cutoff score (e.g., -1.5 or 2 SD under the mean) for any language test. However, now, there are substantial data demonstrating that this practice could not lead to accurate diagnoses because children with impaired language frequently do not obtain scores that fall below these commonly applied cutoff scores.  The cutoff score derived for one test can differ significantly from that of another test even when these tests were validated on the same sample of children. 
The purposes of this study are reviewing published accuracy studies in the last 15 years (until August 2015) which have focused on determining the sensitivity and specificity of specific language tests/measures as inclusionary criteria for the diagnosis of preschool monolingual children with SLI from TD children. Since preschool period is the most important period in the diagnostic process of SLI and regarding the long-standing nature and variable clinical manifestations of this disorder during the development, only the accuracy studies on preschool period have been selected for reviewing in this study. In this study, research on the sensitivity and specificity of the language measures in languages other than English also included examining whether there are shared language behaviors with good diagnostic accuracy for the identification of children with SLI in various languages and whether they can be introduced as universal clinical markers for this disorder. The intention of this review mainly is to specify the linguistic test/measure with acceptable sensitivity and specificity, without necessarily making clinical recommendations for the use of a particular test. Moreover, no attempt has been made to summarize data from test manuals or to evaluate the validity or reliability of the diagnostic procedures.
| Materials and methods|| |
Literature search strategies
A systematic computerized search was conducted in electronic databases including MEDLINE via PubMed, Google Scholar, and Web of Science and publisher databases (Springer, Oxford, Thieme, ProQuest, and ScienceDirect) from 2000 through July 30, 2015. For the electronic search, we used the following keywords or MeSH subject headings: "Specific language impairment" or "SLI" and "primary language impairment" or "PLI" with at least one of the followings: "diagnosis," "identification," "accuracy," "sensitivity," and "specificity;" we used identical search items in all resources. Our search strategies moreover included the tracking of references lists of all searched article and searches by hand in books. E-mail for more information was made to professionals and authors. Studies identified in this ways also incorporated into the decision-making process.
Inclusion and exclusion criteria
Based on structured guidelines of systematic reviews, we reviewed the last 15 years (until August 2015) for accuracy studies on the diagnosis of preschool children with SLI from their TD peers, which published in English-language journals. Accuracy studies on the diagnosis of SLI from other language impairments and studies on adults, toddlers (under 3 years of age), and school-aged children or bilingual subjects with SLI were excluded from the study.
Study selection and eligibility criteria
The literature retrieval processes and screenings of articles are illustrated in details in [Figure 1]. After screening titles and abstracts and applying inclusion and exclusion criteria, 28 potentially relevant articles to our questions were selected and 261 articles were removed. Studies that could not be excluded with certainty were then examined in detail in full text. In cases of doubt, a second investigator was consulted. The quality of studies meeting the inclusion criteria was appraised by critical appraisal skills program, diagnostic test study form.  This form involves 12 questions that consider following three broad issues for appraising a diagnostic test study:
- Are the results of the study valid?
- What are the results?
- Will the results help me and my patient/population?
Nine questions of this form have a three-level scale (yes/do not tell/no) and 3 questions (7, 8, and 12) should be described. The first two questions are "screening questions" and can be answered fast. Even if the answer to one of them is "no" or "cannot tell," it is not worth continuing to the remaining questions. We appraised all potential relevant articles to this systematic review with these criteria. Studies that obtain "yes" answer to the first two question of appraisal form, and in sum, more than eight yes answer ("cannot tell" answer to 2 questions was tolerable), with reported or at least calculable results (question 7) and acceptable results (question 8) categorized as studies with high or moderate quality and remained in the pool of articles. Hence, five articles excluded via this criterion and 23 articles remained for studying in this systematic review. These steps, which are demonstrated in [Figure 1], were conducted by the first author and reviewed twice. The author tried to prevent any bias regarding the author's professional field or celebrity and selective reporting among studies.
Data extraction and abstraction
[Table 1],[Table 2] and [Table 3] show the original data elicited from included studies. [Table 1] and [Table 2] describe the characteristics of subjects, index measure/s, and reference standard used in every included study. The sensitivity and specificity of the index tests/measures used for differential diagnosis of preschool SLI children from their TD peers are shown in [Table 3]. Because of the heterogeneity of the tests/measures and differences in the way in which similar tests/measures were implemented in different groups of children, statistical pooling was not possible.
|Table 1: Characteristics of included studies on English-speaking subjects|
Click here to view
|Table 2: Characteristics of included studies on subjects speaking languages other than English|
Click here to view
|Table 3: Diagnostic performance of the various tests/language measures in included studies|
Click here to view
| Results|| |
Among included studies, 12 studies have been conducted on English or American - English-speaking populations - and the remaining 11 studies have been conducted on non-English speakers; among them, three studies carried out on Cantonese- and three on Italian-speaking children, and one study have been carried out on each of the French-, Spanish-, Slovakian-, Hebrew-, and Persian-speaking populations.
The sample size of studies ranged from 29 to 454 children. Participants of all studies are SLI children and their aged-matched TD peers. In 6 studies, younger language-matched TD peers also included. The numbers of subjects in SLI and TD groups are not the same in 9 studies; however, only in two studies (2 and 34), these differences in number are very obvious. Gender was not a significant factor in studies, and no separate analysis was done on girls and boys although in some studies there were a matching between SLI and TD control group according to the gender. All children with SLI in studies were receiving clinical services for their problem or were eligible for registration in speech-language services.
For primary categorization of children as impaired or normal and determine the case status, authors need a reference standard. The reviewed studies show considerable variety in how these case statuses are defined. Expert judgment has been the most popular reference standard among included studies. Only in two studies, a standardized test with predefined sensitivity and specificity in the diagnosis of SLI has been used for samples categorization. The lists of reference tests used in articles are seen in [Table 1] and [Table 2].
The index tests/measures vary among studies. Among included studies, eight studies used from one or more standardized tests as the index measure, in which seven studies from these eight carried out on English-speaking populations. Five studies focused on language measures elicited from spontaneous language and nine articles concentrated on linguistic or processing features extracted via language probes. In one study (2), a collection of all these speech extraction methods has been used as index. Moreover, in one study (32), indexes have been extracted from spontaneous language and experimental measures separately with the aim of comparing these two methods of extracting.
Evidence from included studies indicates that the majority of studies compare the performance of two or more diagnostic procedures when applied to a single population; this provides easier state to make judgments about the relative value of different procedures/measures. Moreover, three of included studies (21, 25, and 26) conducted their study on two separate populations of children with SLI with the aim of increasing the reliability of the estimated sensitivity and specificity of the index tests/measures.
The sensitivity and specificity of behavioral psycholinguistic measures/tests for the diagnosis of preschooler children with SLI from their TD peers are shown in [Table 3]. The sensitivities of tests or linguistic/processing measures have a range between 16% to100% and specificities vary from 14% to 100%.
The cutoff score used for index tests are demonstrated in [Table 3]. From 23 papers reviewed, the cutoff score is not reported in eight articles. Some authors used more than one cutoff point for one index test, but we reported only the score which defined as the optimum cutoff point by authors.
| Discussion|| |
Tests with more sensitivity and specificity rates can lead to increased reliability of detection rates for true positives and true negatives.  Moreover, since there is no single widely accepted "reference standard" for subject identification in the field of SLI,  introducing the tests or measures with empirical evidence of an acceptable sensitivity and specificity is of the high importance because they can then be used as reference test in future studies or clinical practices.
As demonstrated in the result section, the index tests used in included studies can be generally divided into two main categories: Standardized language tests which target different areas of language and psycholinguistic features elicited from the children's linguistic or processing system via speech sample analysis or psycholinguistic probes.
A survey in included studies shows that far more research is available on the diagnostic accuracy of standardized test in English than any other language. Hence, the preference of much of included studies that have been carried out on other languages (including Italian, Cantonese, Slovakian, Spanish, and Persian) is finding linguistic or processing measures that can be introduced as clinical markers for SLI. These tendencies may be related to excessive studies carried out on SLI in English language from the first time this concept emerged, so the linguistic characteristics and deficits of English-speaking children with this disorder are more explicit than SLI children speaking other languages, where this field of study is nearly new. Furthermore, the availability of various well-standardized language assessments on English-speaking populations could be another factor. However, as much as standardized tests, English-language investigators focused on the diagnostic performances of psycholinguistic markers for differential diagnosis of preschool children with SLI from TD children; moreover, among included studies on subjects who speak languages other than English, two studies used from standardized language or language processing tests or subtests of them as index tests. ,
Regarding Plante and Vance's (1994) criteria for acceptable sensitivity and specificity, among the English standardized tests used as the index test in included studies, Renfrew bus story had adequate sensitivity but weak specificity. Hence, its application to identifying preschool children with SLI can results in over-identification of TD children as SLI. Grammar and phonology screening, Structured Photographic Expressive Language Test (SPELT) - P2, and SPELT-3 are tests of grammatical production and all of them have good sensitivity and specificity for diagnosis of preschool children with SLI. Vocabulary tests including Peabody Picture Vocabulary Test (PPVT-III) and PPVT-IV had unacceptable sensitivity and specificity levels which made them inappropriate tools for identifying SLI children. These results are consistent with the results of the previous study by Gray et al. on diagnostic accuracy of four vocabulary tests (including PPVT-Ill) that show none of vocabulary tests is accurate measure for differential diagnosis of preschool English children with SLI.  It is notable that PPVT-IV is the newest version of PPVT, and regarding the Betz et al., is the third most commonly employed norm-referenced test used by clinicians for the diagnosis of children with SLI in the United States.  However, the results of these two studies not only show that despite known deficits of children with SLI in the area of vocabulary, these children are unlikely to score low on these commonly used vocabulary tests but also show that the newer test version is not superior to older in the diagnostic process. Hence, these results again confirm the importance of investigating the diagnostic performances of every linguistic test before its application for diagnostic purposes. It should be noted here that contrary to the results of these two English studies, the results of Thordardottir et al.'s study on the diagnostic power of Ιchelle de vocabulaire en images Peabody (EVIP), French version of PPVT, shows that this vocabulary test has acceptable sensitivity and specificity for differential diagnosis of preschool French children with SLI from their TD peers.  However, due to these inconsistencies between studies' results, it seems that clinician should be cautious about the application of EVIP as the only diagnostic tool for detecting French children with SLI and it is ideal if the results of Thordardottir et al. are repeated in another independent sample of French-speaking children.
Over the last two decades, research has consistently shown that English-speaking children with SLI score significantly lower than their age-matched TD peers and even than younger language-matched TD peers on tests of working memory such as nonword repetition (NWR), digit recall, and sentence repetition (SR). ,,, Bishop based on a twin study proposed that NWR can be served as a phenotypic marker of heritable language impairment.  Then, Dollaghan and Campbell suggested that tasks such as NWR may serve as a method of identifying children with language impairments.  After that, many studies were conducted to investigate this suggestion (e.g., Conti-Ramsden, 2003; Conti-Ramsden, Botting and Faragher, 2001; Archibald and Gathercole 2007). ,, The results of this study show that NWR is one of the tasks which have received much attention in included studies.
The majority of studies conducted in English which have investigated the potential of NWR as a clinical marker for SLI have used from one of these two tests: The children's test of nonword repetition (CNRep) and the nonword repetition test (NRT)  (these tests have been compared in detail elsewhere - see 46 for review). NRT has been used as an index test in two of included studies. , The results of Deevy et al.'s study imply good sensitivity and adequate specificity of this test.  In spite of that, the results of Oetting and Cleveland demonstrate that NRT, alone, could not be used as an accurate diagnostic tool because of low sensitivity although it is diagnostic power increases in combined with scores from one other nonbiased assessment (comprehension subtest VI of the Stanford-Binet).  The causes of the difference between the results of these two studies are not clear, but it could be attributed to different cutoff points, the difference between reference standard employed, the differences of age, cognitive characteristics and severity of impairment among participants, and the sample size of studies.
Conti-Ramsden and Conti-Ramsden and Hesket used CNRep in their studies to evaluate the performance of phonological working memory (pWM) in preschool children with SLI and to determine the CNRep's accuracy indistinguish these children from their TD peers. , As demonstrated in [Table 3], in both studies at the optimum cutoff point, the specificity of CNRep was fair but the sensitivity was low. Although the result of these two studies does not allow us to introduce the CNRep as an appropriate screening test to identify preschool children with SLI, Gray's study shows that CNRep had an excellent sensitivity and specificity. Interestingly, Gray's study also shows that while CNRep can be used as a diagnostic tool for SLI, the digit span task cannot.  Gray used the previous version of CNRep in her study. Moreover, the reasons expressed above could contribute to the variability of the results obtained in these studies.
Although it proposed that the children's performance on NWR task permits accurate classification of children with SLI and same-age peers even when the children spoke a nonstandard dialect of American-English,  regarding these incommensurable results, it seems that NWR cannot be introduced confidently as an adequate measure for diagnosis of English preschool children with SLI by itself. It seems necessary therefore to carrying out more studies with larger samples. It is worth to note here that the results of Stokes et al.'s study on diagnostic power of NWR and SR as language processing markers for SLI in Cantonese show that unlike English children, Cantonese preschool children with SLI do not score significantly lower than their age-matched peers on NWR task. Moreover, although SR was able to show group differences, at the individual level, this task has good specificity but unacceptable sensitivity.  Hence, the results of Stokes et al.'s study suggest that may be no limitation in pWM in Cantonese-speaking children with SLI or may be the executed tasks need and essay skills other than pWM. Stokes et al. proposed that poorer NWR capacity for English-speaking children with SLI might be related to weaker use of the red-integration strategy in word repetition.  These results imply that the clinical accuracy of NWR tasks may be related not only to individual subjects differences in language use and exposure but also to the language(s) tested. Hence, it is clear that further cross-linguistic investigations of language processing strategies including NWR are required.
Thordardottir et al. used tests of NWR, SR, following directions and digit span as linguistic processing markers for differential diagnosis of 5-year-old French-speaking children with SLI. Their results show that although the digit span test is not sensitive enough to detect Italian SLI children, the diagnostic powers of other tests are adequate. 
Kapalkovα et al. developed a fast and easily-administered NWR task and determined the performances of this task and its different scoring methods in distinguishing between Slovak-speaking children with SLI and TD children. The NWR task used in their study differs from English NWR tasks in number of items per length and scoring methods.  As could be seen in [Table 3], whole-item scoring method (number of correctly repeated consonants) has good sensitivity and specificity, but the diagnostic performance of vowel scoring method (number of correctly repeated vowels in addition consonants) is not fair due to the sensitivity of 75%. As Archibald and Gathercole et al. and Kapalkovα et al. found in their study that children repeat high word-like nonwords better than low word-likes; , this finding implies the influence of accumulated language knowledge on the performance of item repetition. The results of Dispaldro et al.'s study on real word repetition and NWR in normally developing children confirm this finding too. What was interesting in the results of Dispaldro et al.'s study was the strength of real word repetition in predicting the grammatical ability of children rather than NWR.  Hence, Dispaldro et al. appraised the diagnostic performance of both real word repetition and NWR in differentiating between Italian-speaking children with or without SLI and propound the question whether real word repetition could be as effective as NWR as a clinical marker for Italian-speaking children with SLI.  The high diagnostic value of NWR for identification of Italian preschool children with SLI had been marked in previous study by Bortolini et al.  As can be seen in [Table 3], not only nonwords but also real words show good to excellent sensitivity and specificity with both the two scoring methods.
Besides NWR and the other language processing measures, some linguistic features also have been surveyed and introduced by researchers as potential clinical markers for SLI in a variety of languages. For example, Rice and Wexler suggested that certain aspects of verb morphology, such as tense-marking, are especially difficult for SLI children and may constitute clinical marker which can improve the identification of SLI. 
From the 23 included studies, 12 studies evaluated the diagnostic performances of some linguistic measures elicited from spontaneous speech samples or linguistic probes as potential clinical markers. Among them, 4 studies have been performed on English, 3 on Italian, 2 on Chinese (Cantonese), and one on each of French, Spanish, and Persian samples. Since there are many dissimilarities in the linguistic characteristics of different languages, the results of these studies cannot be assimilated.
To finding potential clinical markers for SLI, researchers chiefly focused on those linguistic features that the previous studies consistently shown that are problematic for this group of language-impaired children  such as many tense/agreement morphemes in English. These morphemes include third-person singular - s, past tense - ed, both copula and auxiliary - is, are, am, and auxiliary - do, did, and does. Among included English studies, 2 studies (23 and 25) investigated the diagnostic values of specific combinations of these morphemes extracted from spontaneous speech.
Gladfelter and Leonard evaluated the diagnostic accuracy of two composite measures of tense/agreement from spontaneous speech (tense marker total and productivity score developed by Hadley and Short, 2005  ) besides the diagnostic accuracy of more traditional measure of finite verb morphology composite (FVMC) adapted from Leonard et al.  to determine whether these new composite measures could be serve as better identifiers for SLI children. The actual difference between these measures is in the number of obligatory contexts found for each morpheme. The FVMC is a combination of the number of obligatory contexts for all tense/agreement morphemes that divided into the total number of tense/agreement morphemes actually produced. In contrast, Hadley and Short's measures of spontaneous tense/agreement morpheme emphasize on the diversity of contexts in which these morphemes are used by diverse scoring and excluding contexts that are often associated with nonanalyzed productions. 
The results of Bedore and Leonard's study on diagnostic performances of FVMC have been shown that this measure has acceptable sensitivity and specificity.  The results of Gladfelter and Leonard show such as FVMC, these newly introduced measures seem largely successful in distinguishing 4- and 5-year-old children with SLI from their TD age-mates, but their power of diagnosis is not beyond the FVMC's power. Furthermore, their results imply that the combination of the FVMC measure and the measures of Hadley et al. would seem to be most informative. 
Souto et al. studied the diagnostic value of measures of global and developmental level of a child's tense/agreement morpheme use. Their results show although the diagnostic values of the two types of measures that provided developmental levels of tense/agreement morpheme use are not satisfactory, the diagnostic accuracy of traditional FVMC that involves a smaller collection of tense/agreement morphemes but treats all of these morphemes equally can be considered acceptable. Furthermore, their results show that among other studied global measure of grammatical accuracy, sentence point, and overall developmental sentence score, sentence point could be introduced as a suitable tool for identifying 4- and 5-year-old children with SLI, but the diagnostic accuracy of the overall DSS is not acceptable. Hence, the results of this study indicate that different grammatical measures do not yield equivalent results for children with SLI. 
The results of Conti-Ramsden and Conti-Ramsden and Hesketh studies on the diagnostic performances of grammatical marking (include tense-marking and plural-marking extracted via language probes) in distinguishing preschool children with SLI show neither past tense marking nor noun plural has acceptable diagnostic. , Furthermore, the results of Conti-Ramsden (2003) show that although the combination of past tense task and CNRep could be served as a diagnostic tool for differential diagnosis of preschool children with SLI, neither of them has acceptable sensitivity in separation. 
As mentioned previously, the most important trait of variables that can be labeled as clinical markers is low within-group variability in performance of SLI children as a group and the total absence of an overlap of scores for the SLI and TD groups in these measures.  However, the results of included studies on diagnostic performances of potential clinical markers for SLI in English again confirm the previously suggested idea that many measures yield significant group differences do not necessarily meet the higher standard of reliable identification of language impaired children individually.
Studies carried out to determine the diagnostic accuracy of linguistic indexes in distinguishing Italian children with SLI from normal children mainly focused on articles, clitics, and third-person plural inflections, separately or jointly, as more problematic aspects of language in Italian-speaking children with SLI. [Table 3] provides a summary of the results. The results show that there are some disagreements between the findings of Bortolini et al. (2002) and Bortolini et al. (2006). For example, third-person plural inflections, when considered alone, have acceptable diagnostic performances in one study while do not have sufficiently high sensitivity in another although specificity is quite good. , Since the two studies carried out on the same status and the subjects of two studies were similar in age and severity of language impairment, Bortolini et al. mentioned that the origin of these inconsistencies between the results is not clear,  but differences in IQ level may be a determinant factor. Clitics have acceptable sensitivity and specificity in the two studies. An outcome which can be seen in [Table 3] is the improvement of diagnostic accuracy when two or more measures are considered together; in the other word, the values improved or stayed without considerable changes (did not become poorer) when the measures were used jointly. Therefore, these results suggest the value of considering measures together.
Fair to good discriminant accuracy has also been reported for grammatical markers in one study carried out in another Romance language, Spanish.  This study concerned with the utility of tense as a clinical marker of SLI and authors used two different methods of data extraction including experimental methods (elicited production and grammaticality choice task) and spontaneous speech sample analysis (to extract six distinct indices of grammatical development). Their results show that Spanish-speaking children with SLI have problem with tense, and tense marking could be introduced as a potential clinical marker for SLI. Moreover, their results indicate that elicited production test has the most balanced accuracy for both sensitivity and specificity. Furthermore, some combined functions of experimental and spontaneous measures such as mean length of utterance by morphemes (MLU-m) + grammaticality choice task or elicited production task + mean length of terminable unit have good diagnostic performances. 
Gross indexes from spontaneous speech (including MLU by words and MLU-m) did not achieve acceptable discriminant accuracy in the Thordardottir et al.'s study on children speaking the other romance language, French. Moreover, this study examined the diagnostic power of a range of French standardized measures of language (including receptive vocabulary [by EVIP], receptive grammar [by Test for Auditory Comprehension of Language], and narrative production [by Edmonton Narrative Norms Instrument]) and language processing.  As could be seen in [Table 3], except narrative production indexes, other standardized measures of language and language processing provide accurate diagnostic tools for SLI in French.
Among 3 included studies on Cantonese language, Klee et al. and Wong et al. used a composite variable made up of MLU, lexical diversity (D), and age in their study as the index measure. , The results of Klee et al.'s study show this composite variable has excellent discriminative potential.  In spite of good diagnostic performances, because of the wide confidence intervals for sensitivity and specificity due in part to the sample size, Klee et al. cautioned that before recommending this measure for clinical use, its accuracy must be re-examined in another independent sample of Cantonese-speaking children.  The aim of Wong et al.'s study was replicating Klee's study in a second, independent sample of Cantonese-speaking children with or without SLI. Unlike the findings of the original study, the results of Wong et al. demonstrate that this measure cannot be used as an accurate instrument for the diagnosis of SLI because neither the sensitivity nor specificity values were acceptable.  Hence, regarding the results of these two studies, to ensure about the clinical usefulness of a diagnostic test or measure, it is helpful or even necessary to evaluate its diagnostic values in different studies on the target populations.
Finally, among included studies, one study is about the performances of language measures derived from play-based, conversational language samples in diagnosis of Persian preschool children with SLI. The results of this study show that although the majority of measures extracted from language samples were capable in differentiating children with or without SLI at the group level, only three of these measures exhibited good diagnostic performances at the individual level [Table 3]. 
| Conclusion|| |
The results of this study demonstrate that any test/measure that initially shows acceptable diagnostic power should subsequently be put to the test of replication in other accuracy studies on different samples. Among included studies, only a few studies compared a single diagnostic measure across different groups of samples. Moreover, the numbers of studies that compare the performance of more than one diagnostic test/measure on a single sample of children are limited across studies. If more than one test done simultaneously on one population, comparative information can be obtained and then the relative performance of the tests can be described. Hence, an important outcome of this study is the value of considering measures together to improve the diagnostic accuracy.
In addition, the results particularly encourage cross-linguistic research. Tests that have been standardized on specific population are not suitable for other populations, and specific linguistic or even processing measures are not applicable as diagnostic markers in different languages.
The results of this review also reveal that standardized tests vary in how sensitive they are to language impairment and also there is no single cutoff point which is appropriate across tests. It is notable that in most studies, the empirically derived cutoff score which provides the highest discriminative capacity is not the same as statically estimated cutoff point.
The final point then must be emphasis is the construction of the subjects. In all of included studies, the number of SLI subject is nearly equal to the number of normally developing subjects and the SLI group mostly constituted from clinically referred sample. Hence, it is clear that obtained values are not necessarily generable to general population of preschool-aged children, where the prevalence of SLI is nearly 7%.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| Authors Contribution|| |
- TM contributed in the conception and design of the work, conducting the study, drafting and revising the draft, approval of the final version of the manuscript, and agreed for all aspects of the work
- ShJ contributed in the conception and design of the work, conducting the study, revising the draft, approval of the final version of the manuscript, and agreed for all aspects of the work
- FH contributed in the conception of the work, revising the draft, approval of the final version of the manuscript, and agreed for all aspects of the work
- ZS contributed in the conception of the work, revising the draft, approval of the final version of the manuscript, and agreed for all aspects of the work
- PN contributed in the conception of the work, revising the draft, approval of the final version of the manuscript, and agreed for all aspects of the work.
| References|| |
Leonard LB. Children with Specific Language Impairment. 1 st
ed. Massachusetts: MIT Press; 2000. p. 3-25.
Thordardottir E, Kehayia E, Mazer B, Lessard N, Majnemer A, Sutton A, et al.
Sensitivity and specificity of French language and processing measures for the identification of primary language impairment at age 5. J Speech Lang Hear Res 2011;54:580-97.
Tomblin JB, Records NL, Buckwalter P, Zhang X, Smith E, O′Brien M. Prevalence of specific language impairment in kindergarten children. J Speech Lang Hear Res 1997;40:1245-60.
Maleki Shahmahmood T, Soleymani Z, Jalaei S. A comparison study in test of language development (TOLD) and speech samples between children with specific language impairment and their MLU matched group. Mod Rehabil 2009;2:25-33.
Maleki Shahmahmood T, Soleymani Z, Faghihzade S. The study of language performances of Persian children with specific language impairment. Audiology 2011;20:11-21.
van der Lely HK, Payne E, McClelland A. An investigation to validate the grammar and phonology screening (GAPS) test to identify children with specific language impairment. PLoS One 2011;6:e22432.
van der Lely HK, Marshall CR. Assessing component language deficits in the early detection of reading difficulty risk. J Learn Disabil 2010;43:357-68.
Maleki Shahmahmood T, Nakhostin Ansari N, Soleymani Z. Methods for identification of specific language impairment. Audiology 2014;23:1-18.
Tomblin JB, Records NL, Zhang X. A system for the diagnosis of specific language impairment in kindergarten children. J Speech Hear Res 1996;39:1284-94.
Spaulding TJ, Plante E, Farinella KA. Eligibility criteria for language impairment: Is the low end of normal always appropriate? Lang Speech Hear Serv Sch 2006;37:61-72.
Sackett DL, Haynes RB. Evidence base of clinical diagnosis; the architecture of diagnostic research. Br Med J 2002;324:539-41.
Kazemi Y, Stringer H, Klee T. Study of child language development and disorders in Iran: A systematic review of the literature. J Res Med Sci 2015;20:66-77.
Plante E, Vance R. Selection of preschool language tests: A data-based approach. Lang Speech Hear Serv Sch 1994;25:15-24.
Jaeschke R, Guyatt G, Sackett DL. Users′ guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 1994;271:389-91.
Conti-Ramsden G. Processing and linguistic markers in young children with specific language impairment (SLI). J Speech Lang Hear Res 2003;46:1029-37.
Conti-Ramsden G, Hesketh A. Risk markers for SLI: A study of young language-learning children. Int J Lang Commun Disord 2003;38:251-63.
Gray S. Diagnostic accuracy and test-retest reliability of non-word repetition and digit span tasks administered to preschool children with specific language impairment. J Lang Commun Disord 2003;36:129-51.
Perona K, Plante E, Vance R. Diagnostic accuracy of the structured photographic expressive language test: Third edition (SPELT-3). Lang Speech Hear Serv Sch 2005;36:103-15.
Oetting JB, Cleveland LH. The clinical utility of nonword repetition for children living in the rural south of the US. Clin Linguist Phon 2006;20:553-61.
Pankratz ME, Plante E, Vance R, Insalaco DM. The diagnostic and predictive validity of the Renfrew Bus Story. Lang Speech Hear Serv Sch 2007;38:390-9.
Greenslade KJ, Plante E, Vance R. The diagnostic accuracy and construct validity of the structured photographic expressive language test - preschool: Second edition. Lang Speech Hear Serv Sch 2009;40:150-60.
Deevy P, Weil LW, Leonard LB, Goffman L. Extending use of the NRT to preschool-age children with and without specific language impairment. Lang Speech Hear Serv Sch 2010;41:277-88.
Gladfelter A, Leonard LB. Alternative tense and agreement morpheme measures for assessing grammatical deficits during the preschool period. J Speech Lang Hear Res 2013;56:542-52.
Spaulding TJ, Hosmer S, Schechtman C. Investigating the interchangeability and diagnostic utility of the PPVT-III and PPVT-IV for children with and without SLI. Int J Speech Lang Pathol 2013;15:453-62.
Souto SM, Leonard LB, Deevy P. Identifying risk for specific language impairment with narrow and global measures of grammar. Clin Linguist Phon 2014;28:741-56.
Bortolini U, Caselli MC, Deevy P, Leonard LB. Specific language impairment in Italian: The first steps in the search for a clinical marker. Int J Lang Commun Disord 2002;37:77-93.
Klee T, Stokes SF, Wong AM, Fletcher P, Gavin WJ. Utterance length and lexical diversity in Cantonese-speaking children with and without specific language impairment. J Speech Lang Hear Res 2004;47:1396-410.
Bortolini U, Arfé B, Caselli CM, Degasperi L, Deevy P, Leonard LB. Clinical markers for specific language impairment in Italian: The contribution of clitics and non-word repetition. Int J Lang Commun Disord 2006;41:695-712.
Stokes SF, Wong AM, Fletcher P, Leonard LB. Nonword repetition and sentence repetition as clinical markers of specific language impairment: The case of Cantonese. J Speech Lang Hear Res 2006;49:219-36.
Wong AM, Klee T, Stokes SF, Fletcher P, Leonard LB. Differentiating Cantonese-speaking preschool children with and without SLI using MLU and lexical diversity (D). J Speech Lang Hear Res 2010;53:794-9.
Dispaldro M, Leonard LB, Deevy P. Real-word and nonword repetition in Italian-speaking children with specific language impairment: A study of diagnostic accuracy. J Speech Lang Hear Res 2013;56:323-36.
Grinstead J, Baron A, Vega-Mendoza M, De la Mora J, Cantú-Sánchez M, Flores B. Tense marking and spontaneous speech measures in Spanish specific language impairment: A discriminant function analysis. J Speech Hear Res 2013;56:352-63.
Kapalková S, Polišenská K, Vicenová Z. Non-word repetition performance in Slovak-speaking children with and without SLI: Novel scoring methods. Int J Lang Commun Disord 2013;48:78-89.
Katzenberger I, Meilijson S. Hebrew language assessment measure for preschool children: A comparison between typically developing children and children with specific language impairment. Lang Test 2014;31:19-38.
Kazemi Y, Klee T, Stringer H. Diagnostic accuracy of language sample measures with Persian-speaking preschool children. Clin Linguist Phon 2015;29:304-18.
Merrell AW, Plante E. Norm-referenced test interpretation in the diagnostic process. Lang Speech Hear Serv Sch 1997;28:50-8.
Crestani AH, Oliveira LD, Vendruscolo JF, Ramos-Souza AP. Specific language impairment: The relevance of the initial diagnosis. Rev CEFAC 2012;15:228-36.
Gray S, Plante E, Vance R, Henrichsen M. The diagnostic accuracy of four vocabulary tests administered to preschool-age children. Lang Speech Hear Serv Sch 1999;30:196-206.
Betz SK, Eickhoff JR, Sullivan SF. Factors influencing the selection of standardized tests for the diagnosis of specific language impairment. Lang Speech Hear Serv Sch 2013;44:133-46.
Dollaghan C, Campbell TF. Nonword repetition and child language impairment. J Speech Lang Hear Res 1998;41:1136-46.
Gathercole SE, Willis CS, Baddeley AD, Emslie H. The children′s test of nonword repetition: A test of phonological working memory. Memory 1994;2:103-27.
Montgomery JW. Working memory and comprehension in children with specific language impairment: What we know so far. J Commun Disord 2003;36:221-31.
Montgomery JW, Magimairaj BM, Finney MC. Working memory and specific language impairment: An update on the relation and perspectives on assessment and treatment. Am J Speech Lang Pathol 2010;19:78-94.
Archibald LM, Joanisse MF. On the sensitivity and specificity of non-word repetition and sentence recall to language and memory impairments in children. J Speech Hear Res 2009;52:899-914.
Conti-Ramsden G, Botting N, Faragher B. Psycholinguistic markers for specific language impairment (SLI). J Child Psychol Psychiatry 2001;42:741-8.
Archibald LM, Gathercole SE. Nonword repetition in specific language impairment: More than a phonological short-term memory deficit. Psychon Bull Rev 2007;14:919-24.
Campbell T, Dollaghan C, Needleman H, Janosky J. Reducing bias in language assessment: Processing-dependent measures. J Speech Lang Hear Res 1997;40:519-25.
Dispaldro M, Deevy P, Altoé G, Benelli B, Leonard LB. A cross-linguistic study of real-word and non-word repetition as predictors of grammatical competence in children with typical language development. Int J Lang Commun Disord 2011;46:564-78.
Rice ML, Wexler K. Toward tense as a clinical marker of specific language impairment in English-speaking children. J Speech Hear Res 1996;39:1239-57.
Simon-Cereijido G, Gutierrez-Clellen VF. Spontaneous language markers of Spanish language impairment. Appl Psycholinguist 2007;28:317-39.
Hadley PA, Short H. The onset of tense marking in children at risk for specific language impairment. J Speech Lang Hear Res 2005;48:1344-62.
Leonard LB, Miller C, Gerber E. Grammatical morphology and the lexicon in children with specific language impairment. J Speech Lang Hear Res 1999;42:678-89.
Bedore LM, Leonard LB. Specific language impairment and grammatical morphology: A discriminant function analysis. J Speech Lang Hear Res 1998;41:1185-92.
[Table 1], [Table 2], [Table 3]
|This article has been cited by|
||Diagnostic value of peripheral blood immune profiling in colorectal cancer
| ||Joungbum Choi,Hyung Gun Maeng,Su Jin Lee,Young Joo Kim,Da Woon Kim,Ha Na Lee,Ji Hyeon Namgung,Hyun-Mee Oh,Tae Joo Kim,Ji Eun Jeong,Sang Jean Park,Yong Man Choi,Yong Won Kang,Seo Gue Yoon,Jong Kyun Lee |
| ||Annals of Surgical Treatment and Research. 2018; 94(6): 312 |
|[Pubmed] | [DOI]|