Evidence of content validity and internal consistency for a reading comprehension questionnaire administered to 10th-grade students

Evidencias de validez de contenido y consistencia interna de un cuestionario de comprensión lectora en estudiantes de décimo año EGB

Genny Maritza Reyes Pihuave*

José Luis Rodríguez Flores*

Carlos Fernando Ayala Quinto*

Luis Arturo Escobar Moreno*

Introduction

Reading comprehension is a core skill for academic learning, as it goes beyond the decoding of written symbols to involve constructing meaning, identifying explicit information, making inferences, and forming judgments about texts. Its development has a direct impact on academic performance, participation in school life, and access to knowledge across various subject areas. International organizations have warned of the decline in reading skills following the pandemic and the need to strengthen evidence-based educational responses (UNESCO, 2021). Along the same lines, Duke et al. (2021) argue that reading comprehension should be understood as a multidimensional process that requires explicit instruction and rigorous assessment.

From a curricular perspective, reading comprehension is not a homogeneous skill but a progressive competency. In the Ecuadorian curriculum, reading performance is typically organized into literal, inferential, and critical levels, allowing for assessment ranging from the recognition of explicit information to the formulation of interpretations and reasoned judgments (Ministry of Education of Ecuador, 2016). Consequently, the quality of assessment depends not only on the pedagogical use of the instruments but also on the extent to which they are conceptually coherent and technically sound.

Recent research has emphasized that instruments designed to assess reading comprehension must provide explicit evidence of validity and reliability. Cervantes Buenfil and Canto Herrera (2025), in a systematic review of validated instruments for reading comprehension, found that psychometric support is a recurring requirement for validating the interpretation of scores and the educational decisions derived from them. Similarly, Trimiño-Pérez et al. (2024) note that many available reading tests have weaknesses in their theoretical foundation or validation procedures, which limits their representational accuracy.

Another relevant aspect relates to the format of the items. Much of the recent literature continues to favor multiple-choice instruments, while open-ended formats remain less common, despite their potential to capture more complex comprehension processes. Cervantes Buenfil and Canto Herrera (2025) documented the predominance of closed-ended response measures in this field. However, evidence also shows that open-ended formats can achieve adequate technical quality when constructed based on a clear conceptual structure and accompanied by appropriate scoring procedures. Çelikgün et al. (2026), for example, reported favorable psychometric indicators for a reading comprehension and fluency assessment tool with open-ended questions. Furthermore, Menzala Peralta et al. (2024) argue that analytical rubrics improve the objectivity, feedback, and equity of assessment, which supports the use of open-ended items when seeking to examine complex reading performance.

In this context, the doctoral research from which this article derives designed a reading comprehension questionnaire with open-ended questions and an analytical rubric, organized around the literal, inferential, and critical levels. The instrument was aimed at tenth-grade students in upper secondary education and was administered in a real-world school setting. Although the original study examined the effect of an educational intervention, this manuscript shifts the focus to the assessment instrument itself. This decision is relevant because, as Sedlmayr and Weissenbacher (2025) note, the interpretation of reading scores depends not only on the construct but also on the conditions under which comprehension is measured.

Therefore, the purpose of this study was to determine the evidence of content validity and internal consistency of a reading comprehension questionnaire among tenth-grade students in upper secondary education. Consistent with this, the guiding question was as follows: What evidence of content validity and internal consistency does a questionnaire present for assessing reading comprehension at the literal, inferential, and critical levels among tenth-grade students in upper secondary education?

Materials and methods

The study employed a quantitative approach and adopted an instrumental design with a descriptive scope, aimed at gathering evidence of content validity and internal consistency for a reading comprehension questionnaire. Although the instrument was originally used in a quasi-experimental study with pretest and posttest measurements, the focus of this article is not on the effectiveness of the educational intervention, but rather on the technical quality and empirical performance of the assessment instrument.

The research context was an educational institution in Guayaquil, Ecuador. In the original doctoral study, the population consisted of 152 students enrolled in the tenth grade of upper secondary education, distributed across four classes. The sample comprised 76 students selected through non-probabilistic purposive sampling and organized into two groups of 38 participants each. For the specific purposes of this article, psychometric evidence for the instrument was obtained from three sources: the evaluation by five expert judges, a pilot test with 30 students, and the empirical administration of the questionnaire to the total sample at two measurement points.

The instrument consisted of a reading comprehension questionnaire comprising 20 open-ended questions. Responses were scored using an analytical rubric. The questionnaire was structured into three dimensions. The literal dimension included indicators related to text comprehension, the identification of explicit information, and the communication of information. The inferential dimension addressed inferential comprehension, logical deduction, and the interpretation of the meaning of words and phrases. The critical dimension included the evaluation of information, the analysis of arguments and opinions, and the formulation of judgments. The scoring system was organized with interpretation ranges both overall and by dimension. At the general level, scores between 20 and 40 indicated low performance, between 41 and 60 indicated average performance, and between 61 and 80 indicated high performance.

Content validity was established through expert judgment. Five specialists with doctoral degrees in education evaluated the questionnaire and rubric based on four criteria: adequacy, clarity, coherence, and relevance. The values obtained ranged from 0.92 to 1.00, and the overall coefficient estimated using Aiken’s V was 0.96, indicating high agreement regarding the appropriateness of the content. However, some validation sheets included specific comments on the congruence of certain items with the corresponding indicators or dimensions, an aspect that should be understood as part of the normal technical review process for an educational instrument.

Internal consistency was estimated using a pilot test administered to 30 students. Reliability was calculated using Cronbach’s alpha and McDonald’s omega. The coefficients obtained were 0.906 and 0.911, respectively, which supports an adequate degree of homogeneity among the questionnaire items.

The procedure was carried out in four stages. In the first stage, the questionnaire and the analytical rubric were designed based on the theoretical and curricular framework of reading comprehension at the literal, inferential, and critical levels. In the second stage, the instrument was submitted to expert review for content validation. In the third stage, a pilot test was conducted to estimate internal consistency. In the fourth stage, the questionnaire was administered as a pretest and posttest to the sample of 76 students from the original study, allowing for the observation of its empirical performance under real school conditions.

Data analysis included Aiken’s V for content validity and Cronbach’s alpha and McDonald’s omega for internal consistency. Statistical processing was performed using SPSS v25 and Jamovi. Additionally, descriptive statistics were used to organize the questionnaire results from the pretest and posttest applications, both globally and by dimension. In this article, these descriptive and inferential data are interpreted as complementary evidence of empirical functioning, rather than as the central purpose of the study.

Throughout the research process, the ethical principles of justice, autonomy, beneficence, and non-maleficence were upheld. Because the participants were minors, institutional authorization and informed consent from legal guardians were obtained prior to the pilot test and the school-based administrations of the instrument.

t and the school-based administrations of the instrument.

Results

The first significant finding concerns the content validity of the questionnaire. The five expert raters evaluated the instrument in terms of adequacy, clarity, coherence, and relevance. Individual scores ranged from 0.92 to 1.00, and the overall Aiken’s V reached 0.96. This result reflects a high level of agreement regarding the questionnaire’s suitability as a measure of reading comprehension in the target population. Furthermore, the validation process showed that the instrument was, in general terms, well aligned with the dimensions it was intended to assess, although some specific items received comments aimed at semantic refinement or greater congruence with the corresponding indicator.

The second relevant result pertains to internal consistency. The pilot test with 30 students yielded a Cronbach’s alpha of 0.906 and a McDonald’s omega of 0.911. These coefficients indicate adequate internal consistency and suggest that the set of items functions cohesively with respect to the general construct of reading comprehension. The magnitude of both coefficients supports the preliminary reliability of the instrument for educational assessment purposes.

The administration of the pretest also provided evidence regarding the empirical performance of the questionnaire. At the overall reading comprehension level, the control group was distributed as follows: 31.6% at the low level and 68.4% at the medium level, with no students at the high level. The experimental group showed 57.9% at the low level and 42.1% at the medium level, also with no cases at the high level. This distribution indicates that the questionnaire was able to classify performance levels from the initial measurement.

By dimension, the questionnaire also captured distinct patterns. In the literal dimension, the control group had 26.3% at the low level, 63.2% at the medium level, and 10.5% at the high level, while the experimental group had 42.1% at the low level, 52.6% at the medium level, and 5.3% at the high level. In the inferential dimension, the control group recorded 39.5% at the low level, 55.3% at the medium level, and 5.3% at the high level, while the experimental group showed 63.2% at the low level and 36.8% at the medium level, with no cases at the high level. In the critical dimension, the control group recorded 47.4% at the low level and 52.6% at the medium level, while the experimental group had 76.3% of students at the low level, 21.1% at the medium level, and 2.6% at the high level. These results suggest that the instrument was sensitive to differentiated profiles across dimensions and groups.

In the post-test, the questionnaire again clearly classified performance levels. Overall, the experimental group had 65.79% of scores in the “meets learning objectives” category and 23.68% in “exceeds learning objectives,” while the control group remained primarily in “close to achieving learning” at 94.74%, and only 5.26% reached the “achieves learning” category. This pattern suggests that the instrument was capable of recording substantial changes in reading performance between the two measurement points.

At the dimensional level, the literal dimension showed 71.05% of the experimental group in “achieves learning” and 18.42% in “masters learning,” while the control group remained primarily in “close to achieving learning” at 89.47%. In the inferential dimension, the experimental group recorded 50.00% in “achieves learning” and 34.21% in “masters learning,” while the control group again concentrated 89.47% in “close to achieving learning.” In the critical dimension, the experimental group scored 55.26% in “achieves learning” and 26.32% in “masters learning,” while the control group remained mostly in “close to achieving learning” at 86.84%. These distributions provide empirical evidence that the questionnaire discriminated reading performance both overall and by dimension.

As complementary inferential evidence, the original study reported that there were no significant differences between the groups on the pretest, confirming baseline homogeneity. In contrast, the posttest showed statistically significant differences in the overall score, with t = -11.3, p = 0.01, and an effect size of d = -2.59 in favor of the experimental group. At the dimensional level, the posttest yielded U = 107, Z = -6.43, p = 0.000, and r = 0.74 for the literal dimension; U = 199.5, Z = -5.53, p = 0.001, and r = 0.63 for the inferential dimension; and a statistically significant advantage for the experimental group in the critical dimension, with a reported effect size of r = 0.67. Although these inferential data are not the focus of this article, they provide additional evidence that the questionnaire was able to capture distinct performance patterns in real educational settings.

The results of this study indicate that the reading comprehension questionnaire provides favorable evidence of content validity and internal consistency for tenth-grade students in upper secondary education. The overall Aiken’s V of 0.96 reflects a high degree of consensus among expert raters regarding the appropriateness of the instrument’s content, while the reliability coefficients (α = 0.906; ω = 0.911) support the internal consistency of the item set. In this regard, the study provides initial technical support for the use of the questionnaire in the educational context for which it was designed.

These findings align with research highlighting the importance of expert judgment and internal consistency in the development of educational instruments. Galicia Alarcón et al. (2017) argue that content validation through expert review is an essential procedure for determining whether an instrument adequately represents the variable it aims to measure. Similarly, Trimiño-Pérez et al. (2024) and Çelikgün et al. (2026) show that technically sound reading assessment instruments require conceptual clarity, systematic validation, and explicit evidence of reliability.

The results also align with recent studies on the validation of reading assessment tools. Trimiño-Pérez et al. (2024) reported favorable psychometric evidence for the Primary Reading Proficiency Test, including inter-rater agreement and adequate internal consistency. Similarly, Çelikgün et al. (2026) found satisfactory alpha and omega coefficients in a reading comprehension and fluency assessment tool with open-ended questions. Although these studies were conducted with different populations and at different educational levels, they align with the present research in demonstrating that technically sound reading assessment instruments require conceptual clarity, systematic validation, and explicit evidence of reliability.

An additional strength of the present study lies in the use of open-ended questions accompanied by an analytical rubric. This feature is relevant because it allows for a more nuanced assessment of comprehension, including the retrieval of explicit information, the generation of inferences, and the formulation of judgments. Prahl and Schuele (2022) found favorable evidence regarding construct validity for reading and listening comprehension methods that incorporated open-ended tasks at the passage level, suggesting that this type of format can capture complex comprehension processes. Likewise, Menzala Peralta et al. (2024) emphasize that rubrics improve grading objectivity, facilitate feedback, and promote equity in assessment. Consequently, the structure of the instrument used in this study is consistent with current arguments supporting more authentic and analytically rich forms of assessment.

However, the validation process also requires critical scrutiny. Although the overall content validity index was high, the expert review forms included observations regarding the wording and dimensional congruence of some items. This aspect is methodologically relevant because it shows that validation was not a mechanical confirmation of the questionnaire, but rather a technical review that identified aspects susceptible to refinement. This observation is consistent with Yeatman et al. (2024), who emphasize that the quality of reading assessment items depends on semantic precision, low ambiguity, and minimal contamination by demands for peripheral knowledge. Therefore, the results of the present study should be interpreted as evidence of strong overall adequacy, but not as proof of perfect performance of all items at the individual level.

A significant divergence emerges when comparing this study with the more psychometrically developed literature. While this research provides clear evidence of content validity and internal consistency, studies such as those by Trimiño-Pérez et al. (2024) and Çelikgün et al. (2026) extend the validation through exploratory or confirmatory factor analyses to examine the internal structure. Furthermore, the systematic review by Cervantes Buenfil and Canto Herrera (2025) shows that validated reading comprehension instruments typically report multiple forms of evidence, including construct, convergent, and predictive validity. In contrast, the present questionnaire still lacks these additional analyses. This does not invalidate the instrument, but it does indicate that its validation process remains incomplete and can be strengthened in future research.

Another point requiring caution relates to the interpretation of pretest and posttest results. In this article, the empirical performance of the questionnaire at two administration points should not be interpreted as evidence of test-retest stability, since an educational intervention took place between the two measurements. Rather, these results indicate that the instrument was capable of classifying performance levels and recording changes under real-world school conditions. Sedlmayr and Weissenbacher (2025) note that reading assessment results are sensitive to administration conditions and the way the task is operationalized. Therefore, the inferences drawn from this study must remain tied to the specific context in which the questionnaire was administered.

Overall, the findings suggest that the main contribution of this study is not to establish comprehensive psychometric validation, but to offer initial and functionally useful technical evidence of a reading comprehension questionnaire designed for a specific school context. Its value lies in combining a conceptual structure aligned with the curriculum, open-ended questions, rubric-based scoring, expert judgment, and satisfactory internal consistency. Future studies should strengthen the questionnaire through internal structure analysis, convergent validity procedures, and applications in larger and more diverse samples.

.Principle of the form

Conclusions

The reading comprehension questionnaire analyzed in this study provides strong evidence of content validity and internal consistency for use with tenth-grade students in upper secondary education. The expert review process revealed a high level of agreement regarding the adequacy, clarity, coherence, and relevance of the items, and the reliability coefficients supported the instrument’s suitability for assessing reading comprehension in the study population.

The organization of the questionnaire into literal, inferential, and critical levels constitutes a methodological strength because it allows for the assessment of reading comprehension from a progressive and differentiated perspective. This structure makes it possible to distinguish between the retrieval of explicit information, the drawing of inferences, and the formulation of judgments.

The empirical application of the instrument in a school setting also showed that the questionnaire is effective for classifying performance levels and recording variations in reading comprehension both overall and by dimension. In this sense, the instrument not only has initial technical support but also empirical evidence of use in real-world assessment conditions.

However, these conclusions should be interpreted with caution. Although the results support content validity and internal consistency, additional studies are still needed to expand the psychometric evidence for the questionnaire through internal structure analysis, comparisons with other measures of reading comprehension, and applications in broader educational contexts. Therefore, the instrument can be considered a relevant tool for assessing reading comprehension in the studied population, but its future consolidation will require further validation processes.

..........................................................................................................

References

Cervantes Buenfil, A. A., & Canto Herrera, P. J. (2025). Análisis de instrumentos validados para evaluar la comprensión lectora en educación primaria: Una revisión sistemática (2015 a 2025). RIDE Revista Iberoamericana para la Investigación y el Desarrollo Educativo, 16(31). https://doi.org/10.23913/ride.v16i31.2633

Çelikgün, B., Akdaş, F., Eser, B. N., Minga, R., Ölçek, G., Gücüyener, C., & Edman, S. (2026). Development and validation of a reading comprehension and fluency screening assessment tool for children aged 7–10: Implications for audiological rehabilitation. Frontiers in Psychology, 17, 1759333. https://doi.org/10.3389/fpsyg.2026.1759333

Duke, N. K., Ward, A. E., & Pearson, P. D. (2021). The science of reading comprehension instruction. The Reading Teacher, 74(6), 663–672. https://doi.org/10.1002/trtr.1993

Galicia Alarcón, L. A., Balderrama Trápaga, J. A., & Edel Navarro, R. (2017). Validez de contenido por juicio de expertos: Propuesta de una herramienta virtual. Apertura, 9(2), 42–53. https://doi.org/10.32870/ap.v9n2.993

Menzala Peralta, R. M., Ortega Menzala, E., & Zanabria Vargas, E. (2024). Uso de la rúbrica en la educación: Una revisión sistemática. Horizontes. Revista de Investigación en Ciencias de la Educación, 8(34), 1727–1743.

Ministerio de Educación del Ecuador. (2016). Currículo de los niveles de educación obligatoria. Ministerio de Educación.

Prahl, A., & Schuele, C. M. (2022). A pilot study assessing listening comprehension and reading comprehension in children with Down syndrome: Construct validity from a multi-method perspective. Frontiers in Psychology, 13, 905273. https://doi.org/10.3389/fpsyg.2022.905273

Sedlmayr, P., & Weissenbacher, B. (2025). Reading comprehension assessment for student selection: Advantages of text availability in terms of validity. Frontiers in Education, 10, 1524561. https://doi.org/10.3389/feduc.2025.1524561

Trimiño-Pérez, L., Hurtado-Reina, J., Velasco, E., & Martinez, A. (2024). Design and validation of a Primary Reading Proficiency Test (PCL-P). Frontiers in Language Sciences, 3, 1471040. https://doi.org/10.3389/flang.2024.1471040

UNESCO. (2021, 26 de marzo). Cien millones más de niños sin las competencias mínimas de lectura debido a la COVID-19: La UNESCO reúne a los ministros de educación. UNESCO.

Yeatman, J. D., Tran, J. E., Burkhardt, A. K., Ma, W. A., Mitchell, J. L., Yablonski, M., Gijbels, L., Townley-Flores, C., & Richie-Halford, A. (2024). Development and validation of a rapid and precise online sentence reading efficiency assessment. Frontiers in Education, 9, 1494431. https://doi.org/10.3389/feduc.2024.1494431