Cognitive and affective effects of teachers’ annotations and talking heads on asynchronous video lectures in a web development course

When it comes to asynchronous online learning, the literature recommends multimedia content like videos of lectures and demonstrations. However, the lack of emotional connection and the absence of teacher support in these video materials can be detrimental to student success. We proposed incorporating talking heads and annotations to alleviate these weaknesses. In this study, we investigated the cognitive and affective effects of integrating these solutions in asynchronous video lectures. Guided by the theoretical lens of Cognitive Theory of Multimedia Learning and Cognitive-Affective Theory of Learning with Media, we produced a total of 72 videos (average = four videos per subtopic) with a mean duration of 258 seconds (range = 193 to 318 seconds). To comparatively assess our video treatments (i.e., regular videos, videos with face, videos with annotation, or videos with face and annotation), we conducted an educational-based cluster randomized controlled trial within a 14-week academic period with four cohorts of students enrolled in an introductory web design and development course. We recorded a total of 42,425 total page views (212.13 page views per student) for all web browsing activities within the online learning platform. Moreover, 39.92% (16,935 views) of these page views were attributed to the video pages accumulating a total of 47,665 minutes of watch time. Our findings suggest that combining talking heads and annotations in asynchronous video lectures yielded the highest learning performance, longest watch time, and highest satisfaction, engagement, and attitude scores. These discoveries have significant implications for designing video lectures for online education to support students’ activities and engagement. Therefore, we concluded that academic institutions, curriculum developers, instructional designers, and educators should consider these findings before relocating face-to-face courses to online learning systems to maximize the benefits of video-based learning


Introduction
Ready or not, educational institutions were compelled to shift from traditional face-to-face to online instruction during the lockdown period of the COVID-19 pandemic (Fung et al., 2022;Khusanov et al., 2022;Krishnaswami et al., 2022). Fortunately, there has been a positive perception and acceptance of online learning systems before (Fidalgo et al., 2020;Garcia, 2017) and during (Amir et al., 2020;Das et al., 2021;Khan et al., 2021) this global health crisis. Despite the pervasive familiarity with this mode of education, challenges in teaching online courses persist (e.g., Ruipérez-Valiente, 2022). In a synthesized literature review, Kebritchi et al. (2017) categorized these challenges into major topical themes, such as issues related to learners (e.g., expectations and readiness), instructors (e.g., teaching styles and the transition from face-to-face to online), and content development (e.g., the role of instructional strategies and integration of multimedia in content). The pedagogical patterns employed in in-person lessons also require revisions to accommodate the learning requirements in a virtual classroom (Ferri et al., 2020). Hence, there is an urgent necessity to develop new multimedia learning materials (pedagogical challenge), provide access to online learning infrastructure (technological challenge), and assist stay-at-home learners to establish a conducive learning environment (social challenge). For late adopters of online education, the suddenness of this shift to emergency remote education makes it difficult to respond to these challenges immediately and in the long term. Nonetheless, it has become the priority of the education sector to ensure that learning never stops (Mukhtar et al., 2020;Thomas & Rogers, 2020). Thus, schools have been engaged with the rapid relocation of all their face-to-face courses to their respective online learning management systems.
As classroom venues transition from traditional to virtual, learning style becomes very important because of its association with student success in distance education (Battalio, 2009;Zapalska & Brozik, 2006). Theorists affirmed that learning styles are a manifestation of individual differences in learning. Butler and Pinto-Zipp (2005) investigated students' learning styles for online instructional methods. Using Gregorc Learning Styles Delineator, they discovered that Concrete-Sequential (structured, predictable, practical, thorough) and Concrete-Random (original, intuitive, investigative) were the most frequent single learning style. Online learners also prefer instructional methods that emphasize convenience. This assertion is consistent with the three-year study by Cole et al. (2014), in which convenience was the most cited reason for student satisfaction in online instruction. They also favor the asynchronous online learning style since it is unrestricted by time, place, or other classroom constraints, thereby fulfilling the promise of learning "anytime and anywhere" (Shahabadi & Uplane, 2015). For comparison, synchronous online learning employs time-bounded activities and meetings where each student virtually participates in class depending on the schedule (Malik et al., 2017). This learning mode may not be pragmatic for students with technological (e.g., unreliable internet connection and limited access due to gadget sharing), domestic (e.g., financial distress within the household and the necessity to work for an extra income), institutional (e.g., excessive cognitive load and activities) and individual barriers (e.g., difficulty adjusting learning styles). As a sustainable solution, an asynchronous mode of content delivery has been proposed in many studies, especially for developing countries with multifactorial and interrelated challenges (Baticulon et al., 2021;Garcia, 2022).
When it comes to asynchronous online learning, Koutsabasis et al. (2011) recommended multimedia content (e.g., videos of lectures and demonstrations). Educators used videos in various ways, including presenting situational challenges to encourage problem-solving, providing information in an engaging format, and producing supplementary materials for academic content (Malaina et al., 2018;Rasi & Poikela, 2016;Tsukuta et al., 2019). As it becomes an integrated fragment of traditional courses and a cornerstone of many blended courses, researchers have been exploring the formula for creating successful educational videos for learning purposessubsequently referred to as video-based learning (VBL).
For instance, Guo et al. (2014) launched the largest-scale study on how video production affects student engagement and uncovered influential factors, namely speaking rate and pre-production. Further, Bialowas and Steimel (2019) examined the ideal video length and found that short-form videos (around three minutes) could have more influence on student motivation and immediacy. This finding is supported later by the proliferation of learning videos in a form of nanolearning (Garcia, 2022). On the other hand, Brame (2016) reviewed the literature to establish principles and guidelines that can maximize student learning. The study arrived at various elements to consider, including cognitive load, student engagement, and active learning. All things considered, the delivery of lecture recordings and additional video materials has become a significant aspect of education as it allows for more flexibility in the teaching process and encourages self-directed and self-paced learning.
Despite several VBL studies, there are still challenges that teachers must overcome when creating asynchronous video lectures (subsequently referred to as video lectures). First, the disadvantage of video lectures relative to passive learning is that the lost direct contact with students does not promote active learner participation and collaborative learning (Yousef et al., 2015). In VBL and online learning, this is a significant point of inquiry to ascertain the validity and completion of the existing principles and guidelines on constructing video lectures for online instruction. The literature has also reported problems with video lectures.
For example, Homer et al. (2008) accentuated that the absence of teacher support causes students to experience learning difficulties. Teachers need to offer such support to promote student comprehension of video lectures. Lai et al. (2020) reported that one method to encourage a deeper understanding of learning materials and thus student performance is video annotations. Another major concern with online education is the sense of isolation that jeopardizes students' ability to learn (Borup et al., 2012). To overcome this challenge, researchers proposed increasing the feeling of emotional connection in an online learning environment and balancing technology utilization with a human touch. Kizilcec et al. (2015) determined that a teacher's talking head has the benefits of social and other nonverbal cues, which could further assist students in feeling more connected. To measure the applicability of these proposed solutions for online instruction, we examined the cognitive (i.e., learning performance) and affective (e.g., watching behavior, attitude, engagement, and satisfaction) effects of incorporating annotations and talking heads in video lectures. From a large-scale perspective, determining these effects may provide a robust basis for academic institutions, curriculum developers, instructional designers, and educators on maximizing the benefits of VBL in online education. In the succeeding parts of the paper, we covered the theoretical underpinning of online instructional strategies, how the data were collected and analyzed, a discussion of the findings, and the conclusions, implications, and recommendations.

Literature review Online instructional strategies and asynchronous learning
The promised benefits and efficacy of online instruction, from the convenience of online learning (student perspective) to the prospect of offering additional courses (institutional perspective), are widely discussed in the literature. For example, a meta-analysis asserted the importance of online instruction as a strategy to improve course access and flexibility in education institutions (Castro & Tumibay, 2021). However, Ferri et al. (2020) postulated that pedagogical patterns and instructional strategies typically used in in-person instruction require amendments to acclimate the unique learning requirements in a virtual environment.
Instructional strategies are methods and approaches used by teachers to provide conditions under which learning goals are accomplished. During the COVID-19 pandemic, Mahmood (2021) revisited various instructional strategies that deliver online education effectively in developing countries. An example of such an online instructional strategy is recorded video lectures that can provide anytime-anyplace access to learning materials, paving the way to an asynchronous learning mode (sometimes called self-paced learning).
For several years, the primary audience of online learning was students who deliberately selected this mode and those who could afford to create their virtual learning environments.
However, the COVID-19 pandemic pushed students with little to no resources to adjust to this type of learning (Garcia & Revano, 2022;Khusanov et al., 2022). The education sector must consequently rethink the most appropriate approach to implementing online teaching and learning. For instance, online courses necessitate synchronous web conferences where teachers and students mandatorily meet in a virtual space according to given schedules. In the case of working students trying to survive the pandemic, it is nearly impossible to attend synchronous online courses (Aristovnik et al., 2020). Conversely, students staying at home may have additional household responsibilities to support their parents working tirelessly just to provide financially during the pandemic. These barriers, and the added distractions from family members (e.g., younger siblings), may reduce the time for school interaction, review of learning materials, and completion of activities (Treceñe, 2022). In developing countries, a growing concern among learners is the unstable internet connection, which directly influences behavioral intention towards online learning (Garcia, 2017). Although not commonly accepted, educational institutions resort to asynchronous courses (e.g., the school provides recorded video lectures and students submit deliverables on their schedules within given time frames) as a viable response to these challenges.
Upon reviewing asynchronous and distance learning in the age of COVID-19, Brady and Pradhan (2020) pointed out two academic institutions that performed curricular changes to accommodate the unforeseen cancelation of in-person didactics. Both institutions acquired Cisco WebEx for video conferencing and recorded online sessions to allow asynchronous playback for participants unable to join at the designated time. They encouraged shifters to consider curricular assessments to ensure students learn the desired content. In medical and allied health professional education, Gupta et al. (2020) explored the use of asynchronous environment assessments. Drawn from the fact that assessment is an integral aspect of any teaching and learning system, especially during a pandemic (Fung et al., 2022), valid and fair asynchronous assessment procedures are compulsory when transitioning to an online mode of instruction. Although their findings may not apply to other disciplines, their study identified several assessment methods for an asynchronous environment (e.g., open-ended short-answer questions, problem-based questions, and more). Rapanta et al. (2020), on the other hand, suggested assigning more asynchronous collaborative and individual works to compensate for the consequence of teachers devoting more time to creating online learning materials. In another example, Ishak et al. (2020) examined the role of asynchronous online video lectures in a flipped classroom format using a mixed-method research design. They found that asynchronous instructional materials promoted students' intrinsic needs based on the self-determination theory (autonomy, relatedness, and competence).

VBL and asynchronous video lectures
Previous studies have evaluated the utilization of VBL materials in flipped, blended, and online classes as content-delivery tools. In a systematic review of VBL from 2008 to 2019, Sablić et al. (2020) categorized VBL literature into dimensions such as teachers' reflections and feedback, professional development, and student learning outcomes. Microteaching, a faculty development technique whereby teachers evaluate teaching session recordings, was one of the earliest applications of VBL as a feedback tool. Tripp and Rich (2012b) analyzed 63 studies that examined video reflection practices and arrived at six key dimensions, such as (1) reflection tasks, (2) guiding reflection, (3) individual or collaborative reflection, (4) video length, (5) number of reflections, and (6) the measuring reflection. From the teachers' perspective, reviewing recorded teaching sessions allows them to learn from the feedback (Christ et al., 2017) and increases preferences for changing their teaching style (Tripp & Rich, 2012a). For students, videos create a stimulating learning experience that promotes a deeper understanding of a topic. To construct educational videos that maximize student learning, Brame (2016) underscores three elements. First, cognitive load (or the amount of information that working memory can hold at one time) should be addressed by reducing extraneous load and enhancing germane load. Several techniques can be used to improve the cognitive load, including signaling (e.g., highlighting the most important keywords), segmenting (e.g., short videos), weeding (e.g., eliminating music), and match modality (e.g., Khan Academy-style tutorial videos). Another element is engagement which aims to raise the percentage of watched videos and social partnership between students and teachers. To foster engagement, making multiple videos for a lesson (or dividing a topic into subtopics) and using first-person narrative should be employed. Lastly, videos should promote active learning to drastically improve content knowledge, problemsolving abilities, and positive attitudes. Aside from guide questions, another recommendation to promote active learning is to use interactive features that give students control, such as movement through video and selecting predominant sections to review .
In the systematic review of VBL (Sablić et al., 2020), asynchronous video is an unpopular research topic. The opposite of live video, an asynchronous video is a pre-recorded video intended for watching after production. Live videos are also usable for asynchronous mode when purposely recorded for future usage. In 2008, Cardall et al. (2008) performed a crosssectional survey to compare student experience between live and video-recorded lectures.
They observed that live attendance remains the predominant method to watch lectures for various reasons: (1) lack of motivation to watch recorded videos, (2) to show appreciation to instructors, and (3) to feel as if they are getting more for their tuition money. Almost a decade later, Bahnson and Olejnikova (2017) replicated the study and found that students "really like" recorded videos, but there is no evidence to say that they prefer them more. In addition, student learning did not improve by substituting a self-paced, recorded module for live instruction. Then, during the COVID-19 era, Islam et al. (2020) repeated the study and learned that students now favor pre-recorded video lectures more than live lectures due to flexibility and convenience. They added that learning through video lectures depends on students' motivationa barrier reported by Cardall et al. (2008) and a missing factor in the study of Bahnson and Olejnikova (2017). This impediment has been a challenge for many educational institutions, and the pandemic aggravates this vulnerability resulting in students losing their motivation to learn (Bihu, 2022;Patricia Aguilera-Hermida, 2020;Tan, 2020). Notwithstanding, the change of heart from live videos to a recorded format among students indicates the consequentiality of continually rethinking and reevaluating the best way to incorporate videos in the teaching and learning process.

Common presentation styles of recorded video lessons
In many VBL studies, 'video' was the primary terminology for educational video materials.
However, there are various video styles ( Figure 1) whose format and structure could affect the evaluation of educational interventions, thus creating inconsistencies in the literature. Therefore, exploring these lecture styles is necessary to establish the characteristics of such video materials, distinguish how it differs from one another, and permit teachers to select the most suitable video style according to their skills and preferences. Online lecture videos are rendered in various styles, including a (1) narrated slide presentation, (2) presenteronly lecture, (3) live lecture capture, (4) picture-in-picture, (5) hand-drawn videos, and (6) screencasting. The first video lecture style, narrated slide presentation, depends on slide presentation software (e.g., Microsoft PowerPoint) supplemented with a teacher's voice-over explaining the information displayed on the screen. Excellent verbal communication skill is necessary for this video style since it is the only connection between teachers and students. Conversely, a presenter-only lecture uses a talking head (like a commentator on television), which is very effective for a presentation that requires an emotional connection. Aside from the art of communication, presenters must master the art of visual cues (e.g., good posture, body language, and eye contact). Unlike other styles, live lecture capture is recorded in a traditional classroom where a live audience is present.
The lecture is intended for the synchronous format but then recorded to allow asynchronous access. The primary advantage of this video style is the opportunity for teachers to directly interact with students and for students to raise questions while allowing absentees to catch up with the discussion. Another lecture style is picture-in-picture, which combines the narrated slide presentation and presenter-only lecture. Although it has the advantages of both video styles, picture-in-picture is one of the most complex formats since postproduction is required (Chen & Wu, 2015). The inclusion of post-production means that video editing skills and knowledge are needed. On the other hand, hand-drawn videos are an explainer type of media that heavily rely on animated graphics drawn by hand on a physical whiteboard or digital drawing board (e.g., Khan-style learning videos). This video style has several advantages, such as using hand motion as a social cue that influences learners to work harder, providing information incrementally that is synchronized with the linear audio data pattern, and directing learners' attention to the vital part of the lesson (Chen & Thomas, 2020). Lastly, screencasting (or the digital recording of a computer screen) is one of the latest video styles and is used as a video walkthrough with audio narration to explain how things work. Unlike other video styles, screencasting requires software capable of recording a screen and an energetic voice track to compensate for the lack of emotional connection and interaction.
Numerous studies have investigated the impact of using video styles through comparative analysis. Chen and Thomas (2020) compared narrated slide presentations and hand-drawn videos in a laboratory setting. According to students, the hand-drawn video was the most engaging style. Cross et al. (2013) obtained similar findings where the majority expressed that a hand-drawn video is engaging and personal while a PowerPoint presentation is clear and legible, which adds value during lecture and review, respectively. Another comparative evaluation that involves a narrated slide presentation was the study of Chen and Wu (2015), which was compared with picture-in-picture and live lecture capture. According to their experimental evaluation, both video styles (i.e., picture-in-picture and live lecture capture) elicited significantly better learning performance than a narrated slide presentation. Still, the narrated slide presentation generated the most sustained attention and highest cognitive load among the three video styles. In another study, Sadik (2016) employed the live lecture capture and compared it with screencasting to supplement classroom lectures. According to students, screen recordings are better than live recordings in terms of video quality and usefulness. Aside from the study of Chen and Wu (2015), there is little evaluation on the employment of picture-in-picture that highlights a talking head on the video. Other studies investigated talking heads but not in the context of video style. For instance, Mohamad Ali and Hamdan (2016) assessed the effects of a talking head added to instructional materials by comparing actual human characters to two-dimensional characters. In their study, there were no video learning materials involved. The nearest earlier evaluation to our paper was the observational field study of Kizilcec et al. (2015), which compared video lectures with or without the instructor's face. Nonetheless, this study aims to replicate some parts of their protocol, with the primary difference of having this study for emergency remote education and the inclusion of annotations in the video learning materials.

Research design
The present study followed the educational Cluster Randomized Controlled Trial (C-RCT) approach, in which groups of individuals (in this instance, class sections of students) were randomly assigned to a treatment. Moberg and Kramer (2015) asserted that C-RCT is ideal for testing treatments taken on behalf of a group and when the nature of the intervention involves a high risk of contamination. One example of such contamination is the frequent contact between participants, which is likely to occur between students in an online channel.
During the pandemic, many studies emphasized the importance of social relationships and student connectedness Hehir et al., 2021). Moreover, the participating university did not permit randomization at an individual level under the latest policy and student enrollment procedures. Nevertheless, we randomly assigned the treatment for each  Moreno, 2005). First, CTML offers a guideline for the creation of video lectures. According to this theory, the design of video lectures should not cause extraneous processing demand. It also suggests various guiding principles to follow, including coherence, signaling, segmenting, embodiment, and modality. On the other hand, CATML provides a basis for the evaluation of our educational intervention. This theory asserts that motivational factors, where affect acts as the on/off switch, mediate cognitive processes involved in learning from multimedia materials. Consequently, in addition to student learning performance (cognitive factor), we also investigated affective factors, such as video watching behavior, engagement, attitude, and satisfaction as part of the evaluation of educational treatments.

Setting and sample
We carried out this educational intervention study for one semester from January to April of the 2020-2021 academic year at one of the largest universities in the Philippines. Like other educational institutions in the country, this university switched to emergency remote education as a response to the challenges of the COVID-19 pandemic. One of the unique features of the online learning platform in this university is the provision of recorded video lectures for all offered courses of all undergraduate degree programs. These video lectures were purposely created for asynchronous access to accommodate all students who cannot attend the synchronous meetings for the same reasons discussed in the literature. However, the narrated slide presentation was the only available video lecture style, and there were specific professors assigned to create videos for each course (according to specializations).
We invited one masterclass consisting of four sections with 50 students each enrolled in an introductory website design and development course as our pool of participants (n = 200).
Three teachers handled the masterclass, and collaborative teaching was the primary method of instruction. Synchronous meetings were twice a week (lecture and laboratory sessions) and two hours per meeting. Nonetheless, students were not required to attend synchronous meetings (except the orientation during the first meeting) since video lectures were already available in the learning management system. Student enrollment per class section was not controlled in this study but based on the procedures mandated by the university.

Video lectures
Although video lectures with a narrated slide presentation style were already available, we recreated new video lectures from scratch to allow uniformity in all treatments. Without this procedure, the video style on the other treatments will be different from the available videos, which may affect our evaluation. We followed the applicable guiding principles of CTML in the creation of video lectures. These principles include segmenting (information is presented in small user-spaced segments), pre-training (key terms are presented before the actual lesson), modality (the speech was used in the discussion), embodiment (an actual human was used as an agent for videos with a teacher's face), voice (an actual human voice was used instead of a robot-like voice from text-to-speech programs), coherence (removal of non-essential information), and personalization (presenting lessons in a conversational style). In recording the new video lectures, we followed the picture-in-picture video style, but with the screen (PowerPoint presentation) and video (talking head) recorded separately.
The recorded video lecture with the PowerPoint presentation and without the talking head served as the regular videos for G1. When combined (screen and video) in post-production, the new video materials served as the videos with face for G2. In another post-production, we added annotations to regular videos to form the videos with annotation for G3. Finally, we integrated the recorded videos of a talking head early on with the treatment for G3 to form the videos with face and annotation for G4. All video materials underwent a review and approval stage with other subject matter experts teaching the same and related courses.
This requirement is necessary to ensure the correctness, completeness, and quality of video materials. Figure 2 shows video screenshots for each treatment in the same timestamp.
We followed the syllabus of the introductory web design and development course, which is composed of seven modules covering three web languages: HTML, CSS, and JavaScript.
Each module was divided into different subtopics (minimum of two and maximum of four).

Dividing a lesson into segments complies with CTML, which advises that video lectures
should not cause extraneous processing demand (Mayer, 2005). From the 18 subtopics, we produced a total of 72 videos (average = four videos per subtopic) with a mean duration of 258 seconds (range = 193 to 318 seconds), which is a little bit higher than the ideal video length (around three minutes) advised by Bialowas and Steimel (2019). Given the nature of the course, most videos were live coding demonstrations and hands-on exercises.

Research instruments
We examined the cognitive and affective effects of our educational interventions using several research instruments. Throughout the 14-week academic term, students answered ungraded formative assessments after every lesson, graded summative assessments after every two lessons, and a comprehensive final examination. To measure students' learning performance, we used course grades derived from their summative assessments (75%) and a final examination (25%). The content of the final examination is comparable to the pretest given during course orientation. All these departmental assessments were designed and developed by a pool of teachers (n = 5) who are considered subject experts. Assessment scheduling and grading systems were all determined by the target university. In terms of watching behavior, we collected several video metrics such as view count (total number of video views), heatmap (how a student played the video), view-through rate (percentage of students who watched the video in its entirety), and watch time (how long a student watches the video). To capture these video performance metrics automatically, we purposely coded a custom Google Chrome extension to track, monitor, and save activities performed by students. This approach was to ensure that we can capture the data we need and that they are not being stored in an external company's database. All participating students agreed to use a Google Chrome browser, turn on the developer mode, and install the extension.
Each visit, the corresponding activities (e.g., clicking the play button), and other important data (e.g., length of stay) were recorded by the extension and transmitted to our database.
For privacy protection, all data were encrypted to prevent the identification of students at any level of use. We also limited the data collection within the video landing pages under a single domain name. In terms of the affective factors, we developed a survey instrument to measure students' engagement, satisfaction, and attitude. Utilizing the expert judgment methodology, the same pool of teachers assessed the initial instrument to improve content validity by checking the accuracy, completeness, and readability. To discern whether each item per scale was congruent with the construct, we employed content validity index testing.
The computation resulted in an average congruency percentage of 91, which was higher than the threshold of 90 percent. A pilot test was also conducted with students from the other masterclass of the same course (n = 28) to ensure the reliability and validity of the instrument. Using Cronbach's alpha, the computation resulted in 0.74 for engagement, 0.84 for attitude, and 0.78 for satisfaction. All Cronbach's alpha values were above the cutoff point of 0.7, indicating that the instrument was internally consistent. Sample questions include "I think that the video lectures improve my learning" for attitude, "When video lectures are available, the online class experience is much better" for satisfaction, and "I was fully concentrated while watching the video" for engagement.

Data collection and analysis
During course orientation, students answered a pre-test questionnaire to ensure that prior knowledge regarding the subject matter was not significantly different among the groups.
We also used the results of this pre-test in a within-group comparison to determine if there were a significant increase in the final examination (post-test) scores. Students completed the pre-test on January 14, 2021, and the post-test on April 9, 2021. During the post-test collection period, students likewise answered the survey questionnaire consisting of three affective constructs (attitude, satisfaction, and engagement) subjected to a between-group analysis. Moreover, we collected assessment scores for the learning performance analysis (cognitive effect) of all treatments. All students accomplished a confidentiality undertaking and informed consent before starting their first lesson. We analyzed the collected data using the statistical software IBM SPSS Statistics version 26.0. We used descriptive statistics to report the demographic information and test the data distribution. For the statistical tests, we utilized the paired t-test, one-way Analysis of Variance (ANOVA), and Multivariate Analysis of Variance (MANOVA) to analyze the within-group comparison of pre-test and post-test, the between-group comparison of pre-test questionnaire, and the results of the learning performance between groups in all the recorded assessments.

Results and discussion
The primary objective of this study was to investigate the cognitive and affective effects of video lectures with annotations and talking heads in asynchronous online learning. Using a C-RCT study design, we randomly assigned treatments to four groups of students for the 14-week educational intervention in an introductory web design and development course.
We examined their learning performance, watching behavior, satisfaction, engagement, and attitude to measure the effectiveness of annotations and talking heads. A demographic survey revealed that our participants were dominated by male students (89.23%) with a mean age of 18.92 years. The mean scores of pre-tests among the four groups ranged from 38.6 to 54.2, and the one-way ANOVA analysis revealed that all participants possessed the same prior knowledge regarding the subject (F = .492, p = 0.765) before the intervention.

Learning performance
The first analysis concerning the cognitive effects of treatments was the comparison of pretest and post-test questionnaires within each group. Using paired t-test, we found that the mean scores of G1 improved from 34.24 ± 5.21 to 63.29 ± 7.22 (p = 0.000), G2 improved from 43.16 ± 7.28 to 67.82 ± 5.37 (p = 0.000), G3 improved from 42.11 ± 7.11 to 81.29 ± 8.21 (p = 0.000), and G4 improved from 38.92 ± 8.31 to 82.97 ± 7.69 (p = 0.000). These within-group analyses are consistent with the current literature proving the positive impact of VBL (Sablić et al., 2020). We also examined the learning performance of all groups in their summative assessments and comprehensive final examination. Using MANOVA, we found that there was a significant difference between treatments (see Table 1). G2 received the highest score in S1 (Introduction to Web Technologies), while G4 attained the highest scores in S2 (HTML), S3 (CSS), S4 (JavaScript), and FE. Further, G1 received the lowest scores in all activities. Therefore, the cognitive and affective support of teachers in a form of annotations and talking heads play a considerable role especially in learning complex topics online (S2-S4) but not as much in introductory lessons (S1).
Although all groups have significantly improved due to video lectures, it is noteworthy that G4 outperformed other groups in most of the recorded assessments (except S1). This finding indicates that combining annotations and talking heads in video lectures yields the highest positive impact on student learning performance compared to using each technique independently or not at all. The nature of the course and how it is ordinarily positioned as a programming course (even though it is technically not) in computing curricula (Park & Wiedenbeck, 2011) may have something to do with this finding. In a standard introductory web design and development course, students learn how to code web languages. They often mistakenly identify HTML and CSS as programming languages and the inclusion of a real web programming language (i.e., JavaScript) and coding activities (e.g., building a website) may explain why it is deemed a programming course. In computer programming education, there is a 'fear of coding' among novice students. This phenomenon causes low academic achievements and negative attitudes, especially when students are navigating this subject for the first time and alone (Garcia, 2021). Therefore, the cognitive and affective support of teachers in a form of annotations and talking heads play a significant role especially in learning complex topics online (S2-S4) but not as much in introductory lessons (S1).
As we transition to online instruction, the learning environment presents an opportunity to encourage independence and a sense of responsibility to students. However, the loss of human interaction that is fundamental for them becomes critical. One of the external factors influencing the negative feelings and perceptions towards the course is the availability of teachers. According to Rogerson and Scott (2010), teachers play a vital role in the student learning experience concerning this fear factor. In a study by Ferri et al. (2020) on remote teaching during a pandemic, students asserted that they "need to feel emotions, and that can not be given by a 100% remote experience". While we acknowledge that there is no substitute for proper teacher-student interaction, the extenuation of this problem may be attributed to talking heads in video lectures. A familiar talking head may have accentuated a parasocial interaction that decreases students' loneliness and fills their social needs. This social surrogacy is comparable to the illusionary parasocial relationship between television personalities and viewers commonly tackled in the Uses and Gratification Theory (Rubin, 2008). Such video lectures thus induce social presence even in a virtual environment, which leads to a more inviting online learning experience and reduced transactional distance. This is a major finding because a conducive digital learning space is a requirement, especially in a pandemic context (Lamsal, 2022). Meanwhile, Kizilcec et al. (2015) reported similar findings of an increased social presence when students watch videos with their teacher's face. This video style is also associated with an increased engagement and positive attitude, which is illustrated in the subsequent discussion.
Another significant aspect to investigate is how teacher-generated annotations improved learning in an introductory web design and development course. Presently, the literature is scarce for video lectures with annotations in this course or even in computer programming education. Consequently, we scrutinized students' improved learning performance through the lens of computer languages that have commonalities with human languages (Connolly, 2001), where annotations have been thoroughly examined. Comparable to the practices in language education, we incorporated various annotation techniques and styles. Some of the strategies we utilized were digitally writing notes, explanations, comments, drawings, and other types of visual remarks (e.g., underlining parts of the code or highlighting sections of a web page). Earlier studies explored the effects of various multimedia annotations for second language acquisition, which is regarded as a computer-mediated communication that bids access to authentic language input (Akbulut, 2007;Yeh et al., 2017). In computer programming, there is a learning method called a top-down approach where students use code snippets to acquire language ability before moving to the details (i.e., grammar, data definition, vocabulary) of the language (Saito & Yamaura, 2013). An interesting finding from the MANOVA results supporting the similarity to language learning was that teachergenerated annotations worked significantly better on topics that contain web languages (S2, S3, S4) than the foundational concepts (S1). It also goes back to the fundamental concepts of CTML that suggest learning occurs through a dual-coding process (e.g., a combination of verbal and non-verbal processing for encoding information; Mayer, 2005). minutes of watch time with 57% view-through rate. This could be explained by the fact that a talking head is more engaging (Guo et al., 2014) and that annotations made students pause and/or replay the video materials (Tseng, 2021). Figure 3 demonstrates how students played video material where drop-offs indicate where they stopped paying attention, and big spikes signify the section of the media that is compelling enough to watch and replay.

Video watching behavior
All groups started with 100% attention in the first few seconds of the timestamp. However, it shows that G1 lost engagement in the middle part of the video and possibly went back to the end section to watch the summary and conclusion of the lessons. On the other hand, G2 and G3 performed almost similarly while G4 retained attention in most parts.

Satisfaction, engagement, and attitude
For the affective factors, the between-group analysis ( Figure 4) demonstrates that G4 has the highest mean scores among the groups (4.27 ± 0.87) followed by G3 (3.90 ± 0.52) and G2 (3.91 ± 0.52) with almost similar mean scores, and G1 with the lowest mean score (2.82 ± 1.06). Among these factors, only attitude was not significant. First, the positive impact on satisfaction is consistent with existing studies that proved educational videos as a vital instructional material that enhances learning satisfaction compared to traditional education (El-Sayed & El-Sayed, 2013) and text-based video-free online learning (Choi & Johnson, 2007). Furthermore, talking heads on lecture videos may have compensated for the lack of interaction in online instruction, which is the most cited reason for dissatisfaction with the online learning mode (Cole et al., 2014). The additional effort of the teacher to add video annotations may have caused students to appreciate the online course, which is similar to the findings of Draus et al. (2014), where students expressed their appreciation for teachers who devote more effort to an asynchronous online class. In the case of engagement, it corroborates previous studies exhibiting that talking heads and video annotations cause students to like the lectures better (Kizilcec et al., 2015) and can be favorable for enhancing student learning engagement (Tseng, 2021), respectively.
Meanwhile, Tseng (2021) also reported that annotations distracted some students from watching the videos, which seems to be not the case in this study. Future study is still  (Sablić et al., 2020), video annotations (Chiu et al., 2018), and teachers' talking heads (Kizilcec et al., 2015). From a global perspective, this finding may be explained by the stronger effect of transitioning from onsite to online lectures due to the COVID-19 crisis on male students from less developed regions (similar to our participants), as determined by Aristovnik et al. (2020). For verification, another study should be conducted after the pandemic. Overall, both talking heads and annotations produce advantageous effects on the affective domain.

Conclusion
This paper brings attention to the efficacy of annotations and talking heads when integrated into asynchronous video lectures. To investigate the cognitive (i.e., learning performance) and affective (e.g., watching behavior, attitude, engagement, and satisfaction) effects, we conducted an educational-based cluster randomized controlled trial where four cohorts of students received different treatments (i.e., regular videos, videos with face, videos with annotation, or videos with face and annotation). After the 14-week intervention period, our major discoveries were as follows: (1) videos with talking heads and annotations yielded the highest learning performance, (2) the watch time of videos with annotations and talking heads were significantly longer, and (3) students from the G4 cohort expressed the highest satisfaction, engagement, and attitude scores. Our findings propose a valuable opportunity for academic institutions, curriculum developers, instructional designers, and teachers who are and will be transferring their face-to-face courses to online learning systems on how to maximize the usage of VBL, especially in a time of global crisis. With pedagogical patterns in in-person classes demanding revisions to satisfy the learning requirements in a virtual classroom, our findings converge on some recommendations when designing and creating video lectures. First, we recommend including a teacher's talking head in response to the sense of isolation that threatens students' capacity to learn. This video style increases the feeling of emotional connection in an online learning environment and balances the use of such technology with the human touch. We also recommend incorporating annotations to foster better comprehension in video lessons. This technique compensates for the absence of teacher support causing students to experience learning difficulties. In the case of student attitude, it may be necessary for schools to offer various learning options, address learners' emotions, and foster intrinsic motivation through activities that encourage exploration.
Success notwithstanding, our findings must still be observed within its limitations. First, the recruitment of our participants was subjected to the temporary policy and enrollment procedures precipitated by the pandemic. This restriction resulted in a small sample size that may influence the generalizability of our quantitative results. Furthermore, the topics covered by our video materials followed the syllabus of an introductory web design and development course. Our experiment may produce different findings when performed in other courses. Future studies could replicate our experiment in other disciplines to further demonstrate and validate the results. It is also important to note that the creator of our video lectures is technologically proficient with video production and editing resulting in highquality and professional videos. Guseva and Kauppinen (2018) highlight competencies and skills needed in creating effective educational videos, such as an understanding of the video production process, media quality, presentation, content, and visuals. These competencies denote that comprehensive training is needed by teachers who may lack these professional skills. In addition, faculty time requirements may be prohibitive, and creating high-quality asynchronous content could be more time and labor-intensive than the traditional didactics (Kraut et al., 2019). Thus, teachers need to evaluate specific student needs to determine the right balance between the effort spent on producing lecture videos and potential learning gains. One prospective solution to unburden teachers with these additional tasks is to hire external video editors. However, close supervision and collaboration with subject matter experts are necessary to guarantee the correctness and quality of video materials. Another consideration is how to present the talking head in the video. In our study, the talking head video focused on the upper human body from head to shoulder only. The experiment could yield a different result if a full body was presented because of more life-like behaviors (e.g., gestures) as a visual cue. Future analyses could compare different presentations of talking heads and determine which one is the most effective. Meanwhile, one issue we faced with talking heads was their positioning on the video screen. The extensiveness of some of the content (e.g., source codes) that we need to present on the screen obliged us to reposition the talking head's location depending on the slide. This inconsistency could be distracting for some students and warrants further solutions. Finally, as mentioned in the methodology, there were still students who attended synchronous meetings and did not exclusively rely on video lectures, which could have affected the student learning performance.
Despite the teaching and learning difficulties precipitated by the pandemic, this global health crisis only forced education innovation into the core of every academic institution.
It also presents an opportunity to identify new strategies and approaches that could leapfrog progress and respond to the issues during these challenging times. Ready or not, academic institutions will move forward by adjusting to a new educational environment.