The Digital Pedagogy Competence Scale (DiPeCoS): development and validation

Digital pedagogy is the intentional integration of technology into teaching and learning to build rich learning experiences. Given this potential and the pace of the digitization of education, it is important to define, assess and develop teachers ’ digital pedagogical competence. Although there are several self-report measures that assess digital pedagogy competence, these do not include scenario-based tools. Scenario-based assessments allow the evaluation of knowledge and skills in real-world applications. We present here the Digital Pedagogy Competence Scale (DiPeCoS), a short, scenario-based tool that assesses a teacher’s digital pedagogy competence through choices made in real-world teaching and learning scenarios. An initial pool of ten items was reduced to create an eight-item scale using item response analysis, which was subsequently validated on 1,315 teachers in India. The DiPeCoS demonstrates unidimensionality, and its constituent items show acceptable levels of discrimination, difficulty and guessing parameters and reliability. Our results indicate that such a tool is valuable in assessing teachers’ digital pedagogy competence, and we hope it finds value in the field of digital pedagogical training and evaluation.


Introduction
The rapid advancement of technology in the twenty-first century places new demands on the teaching-learning experience and has broad implications, both in terms of what it means to teach, and how one teaches and creates a learning experience. Teachers today do not just require technological knowledge and information and communication technology (ICT) skills, they also must be able to leverage technology in modernizing teaching practice.
Efforts have been made at the international level to promote technology-enabled learning through ICT training for teachers. However, this approach does not consider the fact that the intentional use of technology by teachers to achieve learning goals requires a refreshed set of digital pedagogy competences (Fernández-Batanero et al., 2022;Koehler et al., 2013;Sailer et al., 2021).
Merely integrating technology into educational programmes with the aim of building digital skills does not fully utilize the power of technology to transform the learning experience. In other words, integrating more technology does not automatically lead to better learning outcomes (Sailer et al., 2021). In fact, results from the Organisation for Economic Co-operation and Development's (OECD) latest PISA assessment show a negative relationship between the intensity of technology use in classrooms and the digital reading, mathematics and science skills of fifteen-year-olds (OECD, 2021a). Moreover, the OECD's Digital Education Outlook report, launched after the COVID-19 pandemic, reveals that learners who spend more time posting work on their school's website, playing simulations at school using learning apps and websites, or doing homework on a school computer tend to perform worse in school assessments than those who do not (OECD, 2021b). The same report also points out that teachers do not have sufficient skills to make the most of digital technologies in school. This strengthens the idea that merely using technology does not guarantee better learning outcomes.
Research shows that 'learning occurs when access to technology is combined with relevant and engaging content, a well-articulated instructional model, effective teaching presence, learner support, and an enabling learning environment' (UNICEF, 2020). Such purposeful use of technology opens dynamic possibilities for curriculum, assessment and instruction, which contributes to learning that is inclusive, engaging and impactful.
However, such use of technology also requires teachers to be equipped with a whole new skill set and perspective connected to the application of digital competences in the areas of teaching and learning. In this paper, we refer to these skills and perspectives as a teacher's digital pedagogy competence and develop the Digital Pedagogy Competence Scale (DiPeCoS) to assess it.

Digital pedagogy
An analysis of the literature (Kivunja, 2013;Montebello, 2017;Sailin & Mahmore, 2018) on digital pedagogy highlights that mere integration of technology in teaching and learning does not qualify as digital pedagogy. Rather, the purpose of technology integration must be to enrich or enhance learning. Such an intentional use of technology warrants technological and pedagogical skills, as well as the ability to integrate both. Kivunja (2013) describes digital pedagogy as 'the art of teaching through computerdriven digital technologies, which enrich learning, teaching, assessment, and the whole curriculum'. Both Montebello (2017) and Sailin and Mahmor (2018) emphasize that digital pedagogy is the integration of technologies into teaching to enhance students' learning.
Given that the purpose of enhancing or enriching learning is at the centre of digital pedagogy, it is imperative to identify the factors that amplify learning experiences in order to define the domain of digital pedagogy. Extensive research has illustrated that factors such as individuals' perception of personal control, self-awareness of learning processes, personal goals and affective states significantly influence the quality and effectiveness of learning (Biggs, 1996;Pask, 1976). These factors are inherently individualistic, resulting in substantial variations among learners. Studies by Pekrun et al. (2017), Reeves (2004) and Rose and Meyer (2002) highlight significant diversity in learners' patterns, preferences and barriers, and establish that learners differ greatly in the way they learn. Consequently, a singular pedagogy, method or instructional practice cannot effectively cater to the diverse needs of all learners.
While the learning process is an individual one, the implementation of certain learning design principles can help cater to learner differences, thus optimizing the learning experience for a diverse range of learners. A prominent framework that guides the design of inclusive learning experiences is Universal Design for Learning (UDL) (CAST, 2018; Rose et al., 2005). UDL posits that by creating 'accessible' content and fostering an 'accessible' learning environment, the learning experience can be optimized for diverse learners, irrespective of their individual learning preferences and barriers (Rose & Meyer, 2002).
The concept of Universal Design (UD), introduced by Ronald Mace in the 1970s, emphasizes the design of products and environments that are accessible to all individuals without the need for specialized adaptations (Story & Mueller, 2001). In the context of learning, UD implies a design that improves and optimizes teaching and learning for all learners (CAST, 2018). UDL is created by incorporating multiple means of content delivery for learners to select from, offering diverse methods and avenues for engagement, expression and assessment. By allowing them to choose, learners are able to select the optimal method, pace and learning materials that align with their unique preferences and abilities, thus supporting them to meet learning and affective goals (Boothe et al., 2018;CAST, 2018).
Empirical research has shown promising results from UDL adoption in terms of academic performance and learner perceptions (Burgstahler, 2011;Rao et al., 2014). For example, the literature suggests a positive relationship between the application of UDL and student interest and engagement (Smith, 2012) as well as the potential of UDL to improve students' academic and social outcomes (Ok et al., 2017).
Three principles of UDL guide the design of learning that suits a wide variety of learners.
These are: 1. Providing multiple means of engagement, which emphasizes the 'why' of learning and offers checkpoints on recruiting interest, sustaining effort, and persistence and self-regulation.
2. Providing multiple means of representation, which focuses on the 'what' of learning and captures checkpoints under perception, language and symbols, and comprehension.
3. Providing multiple means of action and expression, which focuses on the 'how' of learning and captures checkpoints under physical action, expression and communication, and executive function.
Since the three UDL principles aim to improve how information is accessed, processed and internalized by diverse learners, integrating digital tools in alignment with these principles automatically enables teachers to use technology to enhance the learning experience for a wide variety of learners. Research has shown that UDL is a useful framework for meaningful technology integration to support diverse learner needs and preferences (Rose & Meyer, 2002). For example, spelling and grammar-checking tools or word prediction tools can be used to support learners' expression. Features that make text accessible, such as speech-to-text or text-to-speech tools, font adjustment, provision of synonyms, definitions and hyperlinks to the meanings of unknown words, and the translation of text into various languages can be used to support reading and comprehension.
Multi-sensory tools like speech-to-text or text-to-speech software can help offer choice to learners and give them agency to support action and expression.
Using UDL as our reference framework, we propose that digital pedagogy leverages digital technology to foster inclusive and engaging learning experiences by presenting information in such a way that it can be perceived and comprehended by learners effectively. Moreover, it offers multiple strategies to engage learners so that they are motivated to learn, enabling them to navigate the learning environment and express what they know.

Existing scales to assess digital pedagogy competence
Many tools have been developed to evaluate a teacher's digital competence (see Table 1).
A teacher's digital pedagogy competence is a subset of the teacher's digital competence and focuses on the purposeful use of digital technology for teaching and learning.
According to the DigCompEdu Framework, teacher digital competence encompasses skilful application of technology in five areas. Besides teaching and learning, these areas include professional development, digital resources, assessment and feedback, and empowering learners (Redecker, 2017). Twenty-two competences in six areas: social and professional commitment; digital resources; digital pedagogy; evaluation and feedback; empowerment of students; facilitating students' digital competence Self-reflection, available digitally SPTKTT (Survey of Preservice Teachers' Knowledge of Teaching and Technology) by Schmidt et al. (2009) TPACK (Koehler et al., 2013;Mishra & Koehler, 2006) Seven dimensions: technology knowledge (TK); content knowledge (CK); pedagogy knowledge (PK); pedagogical content knowledge (PCK); technological pedagogical knowledge (TPK); technological content knowledge (TCK); technological pedagogical content knowledge (TPACK) Application of the three principles of UDL: providing multiple means of representation, multiple means of action and expression, and multiple means of engagement Self-reflection, three-point Likert scale Existing tools to evaluate a teacher's digital competence assess pedagogical skills, the application of technology in a range of classroom work, the educational institution and the community in the context of their own personal and professional development (Lázaro-Cantabrana et al., 2019). A literature review of existing tools 1 finds that most tools measure a teacher's digital competence, a concept that goes beyond digital pedagogy competence, and only a few tools focus on evaluating a teacher's digital pedagogy competence.
Two tools that are closely related to the concept of teachers' digital pedagogy competence are the Survey of Preservice Teachers' Knowledge of Teaching and Technology (SPTKTT) by Schmidt et al. (2009), which is based on the TPACK model (Mishra & Koehler, 2006), and the UDL self-assessment tool by the University of Waikato (2018), based on the UDL framework (CAST, 2015). Both tools use Likert scale-based items to encourage teachers to reflect on their pedagogical practices (e.g., 'I know how to select effective teaching approaches to guide student thinking and learning in mathematics' on the SPTKTT and 'I encourage students to express their learning in multiple ways (e.g., essay, or video blog, poster or presentation)' on the UDL tool. Both tools rely on self-reports of respondents' behaviours, beliefs, perceptions, attitudes or intentions, which are shown to be virtually uncorrelated with their on-the-job behaviour (Thalheimer, 2018). Responding on selfreport scales is also often affected by social desirability bias (Van de Mortel, 2008), which can corrupt collected data. Thus, although self-report tools can be used for reflection and self-assessment that may promote learning and improvements in performance (Andrade & Valtcheva, 2009), they may not-and often do not-adequately assess a teacher's digital pedagogy competence.
Any tool that has been designed to measure competence must focus on the ability to apply knowledge and skills in real-life contexts. Thalheimer (2018) argues that assessing decision-making is better than gauging self-perception of skills or behaviours. He also argues that one way to evaluate realistic decision-making is by presenting learners with realistic scenarios and prodding them to make decisions that are similar to the types of decisions they will have to make on the job. It is this form of scenario-based assessment that efficiently counters the shortcomings of self-report tools. Scenario-based assessments are based on situated learning theory (Lave & Wenger, 1991), which focuses on the philosophy of 'situating learning and assessment in an authentic context.'

Purpose
There is a need for a tool that assesses the digital pedagogy competence of teachers which does not rely on self-reporting but rather evaluates teachers' competence based on their ability to identify effective digital pedagogies to foster inclusive and engaging learning experiences in realistic situations through scenario-based questions. The DiPeCoS for teachers addresses this need. The following sections present the results from our validation of the DiPeCoS.

Materials and methods Participants
A total of 1,315 English-speaking Indian teachers completed the virtually delivered scale.
The participants, all of whom had received formal education in the English language, reported a mean age of 42.1 years. Thirty-nine per cent (N = 513) of participants reported their gender as female and 61% (N = 802) identified as male. Teachers reported a mean teaching experience of 14.77 years (SD = 10 years), with 3.6% of teachers operating in 'primary school', 28.5% in 'middle school' and 52.6% in 'high school'; 15.3% reported their place of work as 'others'. Participants who reported their work setting as 'others' included education counsellors, consultants and freelance teachers.

Item development
Given the DiPeCoS tool was based on the UDL framework, the goal was to assess the competence of teachers to purposefully use digital technologies to foster inclusive and engaging learning experiences by: 1) presenting information in such a way that it can be perceived and comprehended by learners effectively; 2) offering multiple strategies to engage learners so that they are motivated to learn; and 3) enabling learners to navigate the learning environment and express what they know. It was thus essential that through scenario-based items, DiPeCoS was able to evaluate the ability of teachers to cater to learner variability when they applied digital pedagogy for the above three purposes.
The items on the scale were designed so that the respondent did not need technical knowledge of any particular discipline, subject-specific pedagogies or subject-specific digital tools. This made it possible to assess teachers across multiple disciplines and grades.
Popular guidelines on item development, such as avoiding complex language, doublebarrelled items, jargon and technical terms were followed, unbiased, as well as other forms of diversity such as reading levels (Devellis, 2012). In the interest of brevity, exceptionally lengthy scenarios were avoided, and the number of items was limited to ten.
Keeping in mind UDL principles, an initial set of items was created by a researcher with more than ten years' experience in teacher training, capacity-building and digitaltechnology-based educational programmes. These items were then reviewed by a team of three, comprising data analysts and researchers working in the field of education. Based on feedback, ten items were retained and explicitly mapped to the UDL framework, as described below.
Of the ten items on the original scale (see Table 2), three were designed to assess the use of digital technology to cater to learner variability while representing information; three focused on learner engagement; and four focused on the ability of learners to navigate learning environments and express what they know. This distribution was maintained to ensure that all three principles of UDL were more or less equally represented in the item set. Items 1, 3 and 6 assessed the teacher's proficiency in engaging and motivating diverse learners. This was done through questions that evaluated the strategic use of asynchronous (item 1) and synchronous (item 6) tools to foster collaboration, which have been identified as effective means of engaging learners (CAST, 2018). The third question (item 3) assessed the teacher's ability to provide personalized support to engage learners effectively. The subsequent set of items (4, 5, 8 and 10) aimed to assess the teacher's ability to present information in a manner conducive to optimal comprehension by diverse learners. This involved conceptualizing strategies to utilize technology in offering differentiated support (item 4), utilizing multiple media formats (visual and audio) to illustrate information (item 8), as well as possessing the skills to identify and utilize online information and resources for representation while adhering to copyright laws (items 5 and 10). The final set of questions (items 2, 7 and 9) focused on assessing the teacher's ability to leverage technology in facilitating learners' navigation of the learning environment and expression of knowledge. This encompassed proficiency in employing pedagogies that emphasize practice (item 7) and creation (item 2), which are known to assist learners in synthesizing knowledge in personally relevant ways (CAST, 2018). Furthermore, these questions evaluated the teacher's capacity to offer multiple tools for expression (item 9), enabling learners to articulate what they know using a medium and tools of their choice (CAST, 2018).
The pedagogical decision of the respondent in each scenario serves a dominant purpose and at the same time enables other purposes. For instance, item 1 (A teacher completed a chapter through an online class and would now like all 40 students to write and share their key learning with each other. What might be the best way to ensure peer sharing?) was developed to assess the ability of a teacher to leverage technology to facilitate peer sharing, which is aligned with the UDL checkpoint of fostering collaboration and community within the principle of multiple means of engagement (CAST, 2018, Checkpoint 8.3). It also enables the other two principles of UDL: representation and action and expression.
After principle-mapping, correct responses to scale items were deliberated upon. To explain how correct responses on the items were decided, we provide the following examples. Item 4 (Which of these methods would you use to get learners to use a new digital tool selected by you?) was designed to assess the teacher's capacity to facilitate comprehension among diverse learners through the provision of differentiated support. (i) Hold a 2.5-hour online session where every student can read their write-up before the class (ii) Ask a few students to share during the online class to reduce the length of the session (iii) Ask all students to post their write-ups on a digital whiteboard for peers to see (correct response) (iv) Ask all students to send their write-ups to the teacher E, R, A&E 2 According to you, which of the following is the most critical element for a successful online class?  In an environmental sciences class, students are learning about different types of leaves. How would you best design the class?* (i) Students will use a concept mapping tool to highlight the similarities and differences (correct response) (ii) Students will highlight the similarities and differences between kinds of leaves through a write-up, a video or a graphic (iii) Students will write a brief essay describing similarities and differences between the leaves in their notebooks To ascertain this competence, teachers were presented with a scenario involving the narration of a story to teach language in a virtual setting. The response options provided were: playing a pre-recorded audio file, displaying a physical book through the camera or utilizing a digital storybook. Teachers who selected the third option were awarded a point on the scale, as they demonstrated their ability to leverage an appropriate digital tooldigital storybook in this case-to deliver simultaneous audio and visual stimuli to learners.
This option was deemed to be more effective than option 2, as holding the book at the camera would reduce the sensory appeal of the stimulus, potentially impacting learner attention and engagement negatively. Extensive research has indicated that recall of story language can be improved by concretizing words and phrases through images (Sadoski & Paivio, 1994 options 1 and 3), the second option allows learners to express themselves using the art form.
Research indicates that facilitating activities and expressions that align with the learning goal yields superior learning outcomes (Biggs & Tang, 2011). In a similar way, correct options for all items on the scale were evaluated.
Following the mapping process, two experts with more than 20 years of research and programmatic experience in designing and conducting standard assessments, as well as digital and brief modes of assessment, examined the content validity of the tool. These experts also had extensive experience in researching and implementing educational programmes. After incorporating feedback delivered by the experts, the scale items were finalized. Table 2 contains item-wise mapping of the final ten items on the three principles of UDL, with two or more principles integrated in each setting and one dominant principle being assessed. These are marked E, R and A&E to represent the principles of: multiple means of engagement (E), multiple means of representation (R) and multiple means of action and expression (A&E) respectively. The dominant principle is highlighted in bold.

Data collection
Participants recruited for the validation of the tool were part of a wider intervention that included taking an online course on digital pedagogies. The scale developed in this study was utilized as a pre-post questionnaire to assess the impact of the course-based intervention. The results reported in this study were obtained from the pre-assessment responses.
Recruited participants were English-speaking teachers from English-medium schools across various parts of India, who filled out the questionnaire over a period of four months, from January to April 2022. Teachers working at private schools in urban areas, as well as government-run residential schools operating in various parts of rural India, participated in the study.
All participants were briefed about the aim of the study and were given the opportunity to resolve queries related to the study and the questionnaire during a 1.5-hour long online workshop. Additionally, they were able to have their queries resolved via email communication at a later stage. After the workshop, participants created an account on Framerspace 2 , an interactive learning platform, where the questionnaire and the course were hosted. The questionnaire consisted of the scale described above, along with a short demographic form that collected demographic information such as gender, age, teaching experience and teaching profile. It is important to note that the participants were not given any information regarding the UDL framework prior to completing the tool. Furthermore, the scale itself did not explicitly reference the three core principles of UDL that underpin the ten items. This approach was undertaken to assess the existing proficiency of teachers in addressing learning differences through their instructional practices, specifically in terms of information presentation, learner engagement, navigation support and expression facilitation, without influencing their responses. Information on the time required to complete the assessment was provided. Informed consent was sought from the participants, and no personally identifiable information (such as name or email address) was collected.

Data analysis
We performed statistical analyses using R version 4.0.2. 3 Specifically, an item response analysis was conducted to validate the developed scale. Item response theory (IRT) is a family of psychometric methods that examine whole tests as well as individual item properties. IRT has superseded classical test theory (CTT) techniques because of its advantages with regard to instrument validation, such as accounting for guessing and difficulty of items. The basic premise of IRT is that the probability of a response is a function of an underlying trait, continuum (latent dimension) or ability, denoted by theta (θ). Theta represents a person's true latent trait (e.g., in this case, digital pedagogy competence), standardized to follow the standard normal distribution with zero representing the average score (Baker, 2001). The primary reason for using IRT to validate new scales and modify existing scales is to measure how much of a latent trait one has. For example, IRT can be applied to investigate which items do not have enough reliable information about the construct being measured. IRT analyses can also differentiate item properties (e.g., discrimination and difficulty) among individuals across a much wider range of the construct at hand. If the analyses show that there is such a problem with some items, the researcher can remove/modify those items or add new items that help measure these parts of the construct, thereby providing information that can differentiate people across a much greater range of the latent trait and increasing the validity of the whole scale (Oishi, 2007).
To use IRT, we first tested basic assumptions pertaining to unidimensionality, local independence, monotonicity and differential item functioning (DIF). Unidimensionality (i.e., items in the scale load on only one latent factor) was tested using factor analysis.
Exploratory factor analysis (EFA) procedures such as eigenvalue extraction (Kaiser, 1960), scree test (Cattell, 1966) and parallel analysis (Horn, 1965) were used to test the presence of a unidimensional factor, and were affirmed using confirmatory factor analysis (CFA).
Monotonicity (indicating that when the likelihood of selecting a response in each item that reflects the participant's actual level of the trait increases, the levels of a person's latent trait rise in a monotonic function) was tested using Mokken analysis and differential item functioning (DIF) was applied to investigate whether the item responses were invariable across gender.
Following assumption testing, we selected an appropriate IRT model. Since the items were scored on a dichotomous scale (zero for an incorrect response and one for a correct response), we chose a logistic IRT model. One of three common logistic models, the one-parameter logistic (1-PL) model estimates the probability of endorsing an item based on its difficulty, b, as compared to each person's trait level. Unlike the 2-PL model, in which discrimination is freely estimated, in this model discrimination (a) is equal among the items. Finally, the three-parameter logistic (3-PL) model adds the parameter c or the guessing parameter. Out of the three models, we chose the 3-PL model because 3-PL models are considered appropriate for multiple-choice tests (like the one developed in this study) where the probability of success from a very low-ability person on an item may be significantly higher than zero because of random guessing (Diamond & Evans, 1973).
Moreover, the 3-PL model is generally considered more suitable than the 1-PL and 2-PL models for cognitive tests.
Thus, using the 3-PL model, we calculated item parameters, and plotted item information curves (IIC) and item characteristic curves (ICC). Results of these analyses, complemented by the theoretical judgement of the researchers, were used to guide the removal of items and increase the overall validity of the scale. After we finalized the scale structure, we plotted the test information function to observe how the overall scale responded to individuals with different abilities. Finally, we performed a reliability analysis. All results reported in this paper use a p-value of = 0.05.

Assumption testing
We received a total of 1,824 responses on the questionnaire. After removing duplicates, 1,315 entries remained. First, we performed EFA. Here, an unrestricted factor solution indicated that the magnitude of the first eigenvalue (4.11) was much greater than the magnitude of other eigenvalues (1.21, 0.91, 0.83, 0.72, 0.61, 0.56, 0.46, 0.36 and 0.18), hinting at a unidimensional scale structure. The ratio of the first to the second eigenvalue was also greater than three, which provided more preliminary evidence of unidimensionality (Sattelmayer et al., 2017). Results from the scree plot and parallel analysis also indicated single-factor solutions. A single-factor CFA model was then built to confirm unidimensionality in the scale. This model converged normally and demonstrated a good fit: 2 /df = 0.667, CFI = 0.998, TLI = 0.998 and RMSEA (90% C.I.) = 0.000 (0.000 -0.066). All item loadings were significant at p < .05, except items 7 and 9. Local independence is: (a) the chance that one item being answered is not related to any other item(s) being answered, and (b) the response to an item is every test-taker's independent decision, that is, there was no cheating or group work involved.
Results from Mokken analysis indicate that the response function of the probability of getting a correct response on each item increases when a person's latent trait increases for all items except item 9. Therefore, there is evidence of monotonicity in all items except item 9.
Results of DIF analysis indicate that all items were responded to similarly by both males and females. Next, we calculated different IRT parameters for items to examine the scale properties further.

The IRT model
We built a 3-PL IRT model, and calculated a, b and c parameters for each of the ten items.
All items fitted well (p > .05) and their model parameters are listed in Table 3. The discrimination parameter (a) values range from −∞ to +∞, but values typically fall in the range of 0 to +2.50. Item discrimination values of 0.01-0.34 are considered very low; 0.34-0.64 low; 0.65-1.34 moderate; 1.35-1.69 high; and 1.70 and above very high (Baker, 2001). Similarly, item difficulty (b) estimates vary from -4 to 4, where -4 represents most easy, 0 represents average and +4 represents most difficult. The guessing parameter (c) has a theoretical range of [0,1], but in practice, values below 0.35 are considered acceptable.
Broadly speaking, items 1, 2, 3, 4, 6, 8 and 10 are classified as 'easy' and items 5, 7 and 9 are classified as 'difficult' (based on the polarity of the difficulty parameter). Items 7 and 9 demonstrate high item guessing (c > 0.35). We also plotted ICC and IIC for the scale items (Figures 1 and 2, respectively). In both figures, theta (θ) represents a person's true latent trait (factor), standardized to follow a normal distribution, where zero represents the average score (Baker, 2001).
For the IIC, I(θ) represents the information function, which indicates how well each item contributes to score estimation precision (higher levels of information leading to more accurate score estimates). In ICC, discrimination is defined as how well an item can differentiate between examinees having abilities below the item location and those having abilities above the item location. Consequently, items with ICC that are more 'spread out' indicate lower discriminability, ICC that are farthest on the plot indicate higher difficulty, and ICC that have a finite value of y-intercept indicate that there are higher guessing probabilities. Similarly, IIC peak at the difficulty value (the point at which the item has the highest discrimination), with less information at ability levels farther from the difficulty estimate. As seen in Figure 1, items 5, 7 and 9 demonstrate higher levels of difficulty since they are positioned on the right-hand side of the figure, indicating that the probability of responding to these items correctly would be high only for individuals with high ability.
On the other hand, item 3 shows the lowest level of difficulty. Items with curves that are the least spread, for example, items 3, 6, 8, 9 and 10, indicate the highest levels of discrimination. Since items 7 and 9 also have significant positive y-intercepts, they have a higher probability of being guessed. These results coincide with those inferred from Table 3.
Some additional information can be gathered from Figure 2. For instance, item 3 peaks very high at an ability level of θ = -1.5 and also demonstrates a narrow IIC, indicating that the item provides most information about low-ability individuals. This is different from, say, item 4, which demonstrates a curve that is much wider spread and peaks at an ability level of θ = -1.

Fig. 1 ICC for the ten items forming the scale
We also plotted a test information function (TIF) for the overall scale (see Figure 3a), which is the sum of information functions of all items of the scale. As seen in the figure, the TIF is a bimodal curve, with two peaks at ability levels of θ = -1.4 and θ = 1.6. As much as possible, the TIF should be a unimodal curve centred around θ = 0 so that the scale serves as an unbiased assessment of low-and high-ability individuals.
For item 3 (If a learner is consistently NOT submitting their assignments or attending live classes, what should a teacher do?), a retrospective analysis of responses revealed high discriminability among teachers. We speculate that the response to this item could be significantly affected by the bias of social desirability as response options (other than the correct one) provided on the item were 'morally incorrect' for any teacher to answer Fig. 2 IIC for the ten items forming the scale. ( ) represents the information function, or how well each item contributes to score estimation precision and θ refers to the respondent's ability to respond to an item (standardized). Items 3 and 4 are annotated for reference in text (peak for item 3 at θ = -1.5 and for item 4 at θ = -1) Fig. 3 TIC: a) before removing items 3 and 9, b) after removing items 3 and 9. Peaks annotated for reference in text (bimodal peaks at θ = -1.4, 1.6 before removing items 3 and 9 and unimodal peak at θ = -0.9 after removing items 3 and 9) (see details in Table 2). While complete elimination of social desirability in a classroom is challenging, we propose minimizing its impact by articulating the question differently, responded to the use of the digital tool that could be best used for concept acquisition. In future, given the discriminability of item 9, it would be possible to have a graded scoring scheme, which means that the most appropriate response here would be scored as '3' and the least appropriate score '1'. This would ensure greater flexibility in responses.
We removed these two items and replotted the TIF. As seen in Figure 3(b), the distribution is improved, demonstrating a unimodal peak around θ = -0.9, and reasonably well spread. The eight-item scale was thus finalized.

Reliability analysis
As shown in Figure 4, the scale demonstrates the highest reliability for individuals with ability levels around θ = -1. There is almost no reliable information about ability ( ) represents the reliability function and θ refers to the respondent's ability to respond to an item (standardized). Peak of the graph has been annotated for illustration purposes below −2.5 and above 2.00. The standard error increases quickly for both smaller and larger θ values. The marginal reliability for the scale is approximately 0.62.

Discussion
The DiPeCoS is an eight-item scale that assesses teachers' digital pedagogy competence using scenario-based assessments situated in real-life learning and teaching contexts.
DiPeCoS items are mapped to the principles of UDL to evaluate the competence of teachers in: (a) providing information that can be effectively viewed and understood by learners with varying needs; (b) offering various strategies to ensure learner engagement; and (c) assisting learners in navigating the learning environment. It differs from other survey-based assessments that use self-reports and do not include any performance-based measures.
A 3-PL item response analysis was used to validate the scale with responses from 1,315 teachers from various parts of India. As shown in Table 3, all items fitted the item response model well (p > .05). However, the discrimination value of item 3 (If a learner is consistently NOT submitting their assignments or attending live classes, what should a teacher do?) was found to be low, while item 9 (In an environmental sciences class, students are learning about different types of leaves. How would you best design the class?) had issues with ambiguity and levels of 'difficulty'. Removing items 3 and 9 led to an improvement in the test information function of the scale, which demonstrated a wellspread unimodal peak around θ = -0.9 post removal (see Figure 3). The scale also demonstrated good reliability around ability levels of θ = -1 and a marginal reliability value of approximately 0.62 (see Figure 4).
DiPeCoS is an assessment tool in teacher training that has the potential to be useful across K-12 educational settings, regardless of the subject domain. This is particularly because the scale does not require respondents to possess technical knowledge of specific subjects, subject-specific pedagogies or digital tools, thus making it suitable for assessing the digital pedagogical competence of teachers from diverse disciplines. Even though it is agreed that teachers' competence to design effective technology-enabled learning experiences is influenced by their understanding of subject-specific digital pedagogies (Mishra & Koehler, 2006), relying solely on subject-specific pedagogies, without incorporating the principles of UDL, may overlook the diverse learning needs and preferences of students. Therefore, when applying digital pedagogies, it is crucial to consider the context of learner variability.
In this regard, the DiPeCoS assessment tool plays a vital role in evaluating teachers' capacity to effectively address the diverse needs and preferences of learners while utilizing technology to facilitate content representation, engagement, and action and expression.
The wisdom and advantages of creating assessment procedures that specifically encourage and support teacher development are being increasingly sought by assessment developers (National Education Association, 2010). As a result, it is important to view professional development and assessment as complementary aspects of the same process.
Regardless of the subject they teach, teachers will be able to utilize the DiPeCoS to evaluate their level of proficiency in digital pedagogy to address diverse learner needs and preferences. While the scenario-based questions in the tool could be tailored to develop more specialized subject-specific scales, benefits of the DiPeCoS include brevity and an easy-to-administer scale that can be used to gain some initial idea of the overall digital pedagogy competence of teachers in a school, province or country. This can help foster the professional growth of teachers through training. For example, first, the DiPeCoS score can be used to screen and identify educators with high and low overall digital pedagogy scores based on school, district and national benchmarks (percentile-based) in order to place them into tiered capacity development programmes. Second, the scale outcomes can guide the design of capacity-building workshops that address the identified gaps in digital pedagogy competence. Although individual scores for the E, R and A&E portions of the DiPeCoS cannot be calculated separately (as the scale has a unidimensional structure), scores on individual items can be examined to identify major areas of improvement in teachers' digital pedagogy competence. For instance, if most of the teachers in a school score low on item 4 (Which of these methods would you use to get learners to use a new digital tool selected by you?), the school can host targeted capacity-building workshops to help teachers identify strategies that offer support to learners requiring assistance without impeding the progress of other learners. Third, the scale outcomes can guide the design and revision of curricula for teacher training programmes. By identifying prevalent gaps in digital pedagogy competences, training modules can be integrated into the curriculum, ensuring that future teachers receive comprehensive instruction in utilizing technology effectively. Finally, based on the results of the scale, teachers can develop personal learning goals to address their specific needs and access relevant resources to build their skills. Such personalization can help optimize their learning experiences. The ultimate aim of the DiPeCoS is to help teachers, policy-makers and education administrators determine how to effectively support teachers inside classrooms.

Limitations and future directions
In future research, external measures, which were not included in this study to save time and avoid respondent fatigue (Morgado et al., 2018), could be used to establish external discriminant and convergent validity of the scale. It would also be useful to examine other significant psychometric properties of the scale, such as test-retest reliability. Also, although the assessment of respondents' decision-making with regard to pedagogical practices is a better measure of their digital pedagogical competence when compared with self-reports, it still does not predict the translation of these choices into real-life practice.
Supplementing DiPeCoS with an observation rubric as the next step may increase its efficiency in measuring teachers' application of digital pedagogical competence.
Scores on the scale were not presented to the respondents as the DiPeCoS was yet to be validated at the time the respondents undertook the study. Moreover, scores could not be shared with them after completing the validation of the scale as no personally identifiable information was collected to communicate with them after the study. However, in future deliveries of the scale, the authors plan to share scores from DiPeCoS with the participants in real time.
Finally, the methods, strategies and goals of digital pedagogy continue to evolve with the emergence of new digital technologies and their affordances. Like any tool designed to measure digital competence, this tool needs to be updated regularly to reflect new evidence and insights in the field of learning sciences that inform pedagogy as well as new advances in education technologies and their opportunities and constraints.
Abbreviations 1-PL: one-parameter logistic; 2-PL: two-parameter logistic; 3-PL: three-parameter logistic; A&E: action and expression; CI: confidence interval; CFA: confirmatory factor analysis; CFI: comparative fit index; CK: content knowledge; CTT: classical test theory; DIF: differential item functioning; DiPeCoS: Digital Pedagogy Competence Scale; E: engagement; EFA: exploratory factor analysis; ICC: item characteristic curve; ICT: information and communication technology; IIC: item information curve; IRT: item response theory; ISTE: International Society for Technology in Education; LMS: learning management system; NETS-T: National Educational Technology Standards for Teachers; OECD: Organisation for Economic Cooperation and Development; PCK: pedagogical content knowledge; PISA: Programme for International Student Assessment; PK: pedagogy knowledge; R: representation; RMSEA: root mean square error of approximation; SELFIE: Self-reflection on Effective Learning by Fostering the use of Innovative Educational technologies; SPTKTT: Survey of Preservice Teachers' Knowledge of Teaching and Technology; SRMR: standardized root mean squared residual; TCK: technological content knowledge; TDC: teacher's digital competence; TIF: test information function; TK: technology knowledge; TLI: Tucker-Lewis Index; TPACK: technological pedagogical content knowledge; TPK: technological pedagogical knowledge; UD: Universal Design; UDL: Universal Design for Learning; UNICEF: United Nations Children's Fund.
Endnotes 1 For example, the Self-reflection on Effective Learning by Fostering the use of Innovative Educational technologies (SELFIE) tool based on DigCompEdu (Redecker & Punie, 2017); the Teachers' Digital Competences Questionnaire based on the Common Digital Competence Framework for Teachers by INTEF (Tourón et al., 2018); the Wayfind Teacher Assessment based on the International Society for Technology in Education (ISTE) standards for teachers (Banister & Reinhart, 2013); and COMDID based on other frameworks of a teacher's digital competence