Integrating Parsons puzzles within Scratch enables efficient computational thinking learning

A literature review revealed that students learning computational thinking (CT) via Scratch often require substantial teacher support. We surveyed grade 6-9 teachers to learn their perceptions of student engagement with CT and how well their needs are met by existing CT learning systems. The results led us to extend the trend of balancing Scratch’s agency with structure to better serve learners and reduce burden on teachers aiming to learn and teach CT. In this paper, we review architecture and implementation strategies developed to integrate Parsons Programming Puzzles (PPPs) with Scratch, and then analyze their effects on adults, who crucially influence the education of their children. The results from our pilot study suggest PPPs catalyze CT motivation, reduce extraneous cognitive load, and increase learning efficiency without jeopardizing performance on transfer tasks.


Introduction
In response to a crisis in computer science (CS) teacher certification and a deficit of student exposure to CS in grades K-12 (Leyzberg & Moretti, 2017;Wilson et al., 2010), governments are enacting policies requiring computational thinking (CT) in schools (Tamatea, 2019;The Royal Society, 2016;Whitehouse.gov, 2016).Additional argument (Wing, 2006(Wing, , 2008) ) and evidence (Grover & Pea, 2013) provide rationale for ensuring children achieve CT competency during the formative cognitive and social development cycles throughout grades K-12.Parsons programming puzzles (PPPs), which enable learners to practice CT by assembling into correct order sets of mixed-up programming constructs that comprise samples of well-written code focused on individual concepts, are one approach used to introduce CT efficiently (Ericson et al., 2018;Parsons & Haden, 2006).
These scaffolded program construction tasks facilitate learning the syntax and semantics underlying a CT concept.As the learner solves carefully designed single-solution puzzles, she arranges constructs from a curated assortment in a cycle of deliberate practice that exposes and addresses misconceptions (Emerson et al., 2020;Kaczmarczyk et al., 2010).
Among the correct code fragments, she might find distractors which provoke cognitive conflict that reinforces learning (Karavirta et al., 2012).The partial suboptimal path distractor type, for example, might tempt a learner toward faulty progress without enabling her to solve the problem fully, thereby triggering recognition of a misconception and productive backtracking toward the correct solution (Harms et al., 2016).Researchers have hypothesized distractors might be beneficial in PPP for reasons similar to those leading to their inclusion in multiple choice tests (King et al., 2004;Parsons & Haden, 2006), such as the illumination of conceptual misunderstanding, flawed reasoning, or inconsistent reasoning.
Research indicates this structured approach to learning CT can lead to more efficient concept learning than alternatives such as learning by tutorial, or writing/fixing code (Ericson et al., 2017;Harms et al., 2015;Zhi et al., 2019).To measure efficiency, researchers often leverage cognitive load theory, which helps to distinguish between the complexity of the material, the instructional design, and the strategies used for knowledge construction.Since PPPs provide constrained problem spaces, they can induce lower cognitive load than that experienced when writing code with open-ended agency in a realm notoriously challenging for novices (Fabic et al., 2018).
In the current study 1 , we seek further evidence of their efficiency by integrating PPPs into Scratch, a block-based environment initially designed for informal learning that invites exploration, collaboration, and knowledge construction through personally meaningful creation (Maloney et al., 2010).K-8 teachers use Scratch more than any other coding language internationally (Rich et al., 2019), resulting in an ecosystem with over 92 million registered users (MIT Media Lab, n.d.), and more research focus than any other environment in K-12 from 2012 to 2018 (McGill & Decker, 2020).However, historical findings indicate Scratchers infrequently demonstrate skill increases over time (Scaffidi & Chambers, 2012), misconceive loops, variables, Booleans, nested conditionals, and procedures (Grover & Basu, 2017;Grover et al., 2018), and often adopt habits unaligned with accepted CS practice (Meerbaum-Salant et al., 2011).In a recent study of 74,830 Scratch projects, 45% contained at least one bug pattern (Frädrich et al., 2020).Instead of problem-solving algorithmically, Scratchers often engage in bricolage (Harel & Papert, 1991), which involves bottom-up tinkering that does not necessarily prove productive (Dong et al., 2019).
To balance this agency with structure as recommended in Brennan (2013), and to encourage the development of desired habits when learning CT concepts without stifling learner creativity, researchers have designed external Scratch curricula (Brennan et al., 2014;Franklin et al., 2020), created introductory Scratch Microworlds with reduced functionality (Tsur & Rusk, 2018), and devised learning strategies based on the Use→Modify→Create pedagogy to scaffold instruction (Salac et al., 2020).We extend this trend by integrating with Scratch PPPs with explicit goals that offer gameful scoring targets and per-block feedback to disincentivize trial-and-error behavior and steer learners toward correct solutions.We reason that if learners initially can internalize CT concepts efficiently via PPPs, they can better deepen their understanding with heightened ownership and agency (Casanova et al., 2021) in less-restrictive interest-driven projects such as those described in Kong et al. (2020) that embrace Scratch's roots in constructionism (Brennan et al., 2014).This strategy would enable the development of learning progressions through the cognitive, situated, and critical framings of CT documented in Kafai et al. (2020), so that skill building leads to creative expression and participation in fostering equitable, ethical computing for all.
To test this reasoning, we ran a pilot study targeting adults, who comprise a general population that might not only benefit from learning CT, but who might most effectively mobilize the advancement of teaching and learning CT for all.The study separates participants into one of three training conditions: 1) PPPs; 2) PPPs with distractors (PPPDs); We first review the background and required software development.We then document the study purpose, formative and summative evaluations, and results before previewing future work.

Background
Since PPPs emerged in the CS literature as a new form of program completion problem in 2006, the community has investigated their strengths and weaknesses.Strengths include: scaffolded support of syntax and semantics learning; solvers with prior experience perform better and need less time (Harms et al., 2016); quicker grading and less grading variability than code writing problems (Ericson et al., 2017); easier detection of learning differences between students compared to code writing and code fixing problems (Morrison et al., 2016); a moderate correlation between PPP proficiency and code writing proficiency in an exam setting (Denny et al., 2008); less completion time required than for code writing exercises with equivalent performance on transfer tasks (Ericson et al., 2017;Zhi et al., 2019); higher enjoyment and less completion time required than for tutorial users with better performance on transfer tasks (Harms et al., 2016); and a lack of significant differences in performance across gender.Weaknesses include: constriction of puzzledesign surface to maintain single-solution structure (not strictly required, but commonly enforced to maintain strengths); the invitation of trial-and-error behavior in PPPs with excessive corrective feedback (Helminen et al., 2013); and a potential ceiling effect when feedback guides most learners to solve PPPs correctly, resulting in the need to evaluate learner process in addition to product when assessing (Helminen et al., 2012;Villamor, 2020).
The community also has explored differences in learning outcomes resulting from using different PPP elements.Evidence suggests that 2D puzzles, in which the student must not only correctly order programming constructs but also indent them correctly, are more difficult than 1D (Ihantola & Karavirta, 2011).Similarly, PPPs that conceal the number of lines of code needed for each solution section and those that include distractors are more difficult, require more time to complete, and produce higher cognitive load during training than those that specify section sizes and those without distractors (Garner, 2007;Harms et al., 2015).Learning differences continue to emerge when researchers vary these elements.
For example, learners struggle more when distractors are randomly distributed among the correct code constructs than when they are paired with correct constructs (Denny et al., 2008).
To identify these strengths, weaknesses, and learning differences between PPP elements, researchers often leverage Cognitive Load Theory (CLT) (Sweller, 2010).According to CLT, the brain provides limited short-term memory and processing capability along with infinite long-term memory, and learning occurs via schema construction and elaboration that leads to automation.Construction ensues by combining new, single elements into one larger element, and elaboration follows by adding new elements to an existing, larger element.Through intensive practice, individuals can automate their processing of these larger elements so that they execute without controlled processing.
CLT helps distinguish characteristics of and between PPP systems by offering a framework with tools to measure the three types of cognitive load experienced: intrinsic, extraneous, and germane.The total number of interacting elements perceived by the learner determines intrinsic load (IL); the sometimes-impeding organization and presentation of the content determines extraneous load (EL); and the instructional features necessary for schema construction determine the germane load (GL).PPP designers aim to reduce extraneous load to free learners' capacity to contend with germane load when attempting to maximize learning efficiency.For example, the pairing of distractors with correct constructs might increase germane load by focusing student attention on the intended, misconception-revealing differences between two solution options, while also reducing extraneous load by eliminating the need to search for and identify the two relevant options amidst a random distribution of constructs.
To measure relative learning efficiency quantitatively across conditions, researchers calculate instructional and performance efficiency (van Gog & Paas, 2008).These calculations account for learners who compensate for an increase in mental load by committing more mental effort, thereby maintaining constant performance while load varies.The data recorded often include empirical estimates of mental effort during instruction (EI) and transfer (ET) tasks and the performance (P) on transfer tasks.The EI and P calculation measures the instructional efficiency of the learning process, while the ET and P calculation measures the performance efficiency of the learning outcome.For example, in a study that included interactive puzzles in the transfer phase, results indicate PPPs with randomly distributed distractors decrease performance efficiency (Harms et al., 2016).In our study, we measure instructional efficiency with a focus on learning process economy.

Software development
To investigate our research questions, we modified Scratch to facilitate the design, play, and assessment of PPPs.Aligned with the gamification strategy described in Tahir et al. (2020), in which the game elements were added to SQL-Tutor, and similar to recent iSnap integrations offering progress panels and adaptive messages (Marwan et al., 2020;Zhi et al., 2019), we augmented Scratch to influence the behavior of learners.As shown in Figure 1, we first established a design mode which enables content developers to assign points to individual blocks and select blocks for inclusion in a new PPP palette.Equipped with this functionality, teachers can assign higher point values to blocks relevant to the CT concept studied and can isolate in a single palette blocks pertinent to the puzzle.
As presented in Figure 2, we next established a play mode which enables students to load PPPs in a manner that displays the designed animated elements in the Scratch stage, but none of the blocks in the scripts pane authored as the solution.Technical detail is reported in Sulaiman et al. (2019), but relevant to this study is an assessment system that includes a gameful scoring algorithm intended to encourage deliberate practice and discourage trialand-error behavior.Our early development attempts involved calculating the Manhattan distance of each block placed from its correct position and multiplying that by the points assigned to each block and the length of the sequence, combined with subtractions for During testing, however, this strategy proved insufficient, as scores could confusingly decrease when a block placed incorrectly in a long sequence was moved to the correct place in a shorter sequence.The longest common subsequence feedback algorithm described in Karavirta et al. (2012) ultimately inspired our final approach; ours differs in that we leverage block points, use them and subsequence length as multipliers, and sum the multiples from all subsequences matching the single correct solution while also deducting for incorrectness in absolute position.The closer the participant is to the solution, the higher the score.
To reduce complexity in the scoring algorithm but still discourage trial-and-error behavior, we simultaneously track a count we name meaningful moves, which increments when the learner drags a block from the palette to the scripts pane, connects existing sequences together, or disconnects a sequence into two.Other less significant block actions, such as the repositioning of a block or sequence within the scripts pane are discarded.Since we display this count (11) next to the score ( 27) and remaining time (3m 10s) as shown in Figure 2, we can encourage learners to achieve the highest score in the fewest moves and shortest time.
Additionally, we built auto-initialization and auto-execution functionality to reflect progress visually after each block placement during puzzle play.These mechanisms enable the display of gameful animations while an avatar presents per-block correctness feedback, while concurrently disabling Scratch features that might otherwise distract from CT learning, such as sprite editing controls.According to the feedback classification in Raubenheimer et al. (2021), this immediate correct-incorrect-distractor feedback is Fig. 2 Puzzle play and assessment functionality integrated via PPP in Scratch constructivist since it is problem-and instance-oriented, which has been correlated with significantly lower student failure rates than alternative types such as those solution-and instance-oriented.The auto-execution functionality also calculates completion progress so that the learner receives appropriate responses when she correctly solves the puzzle or the allotted time expires.
To help teachers and content developers organize learning, we wrapped these new features in Scratch within custom-built learning management tooling that facilitates the ingestion of class rosters, the structuring of learning paths in which games comprise quests which comprise missions, and the saving and loading of game/quest/mission progress.We intend this playful framing to further gamify the learning experience and nudge learners toward increased motivation as described in Bovermann and Bastiaens (2020).For the motivation to sustain beyond this learning experience, however, further progress in standardizing interoperability protocols between learning systems is necessary throughout the CS education community, similar to the one proposed in Brusilovsky et al. (2018).The reference architecture we present in Sulaiman et al. ( 2019) is intended as one small step forward in that direction.

Study purpose
This extended functionality positioned us to fill gaps in existing research.One study purpose was to explore the adult-use of CT learning system functionality primarily designed for children.Recent research has: 1) found significant correlation of motivation and previous programming experience with self-efficacy and inclination toward a CS career in elementary students (Aivaloglou & Hermans, 2019); 2) indicated drag-and-drop programming can increase three CS motivational factors in middle school (Bush et al., 2020); 3) suggested computing experiences prior to university can affect the world-image of computing habits, perceptions, and attitudes which enable or inhibit pathways into CS (Schulte & Knobelsdorf, 2007); 4) identified a parental role framework to enable adults to choose productive strategies to promote and foster children's CT (Ohland et al., 2019); and 5) illuminated benefits of community commitment and a CS/CT focused ecosystem inclusive of the home and community (Cao et al., 2020;DeLyser, 2018).Since demographic factors can drive communal values, and perceptions of how computing fulfills those values can affect sense of belonging and student retention (Lewis et al., 2019), we measure adult motivation and cognitive load while probing for attitudinal change that might influence the CT inclination for participants' children.
A second purpose was to further identify PPP elements that optimize learning efficiency, since the behavior of programming environments can affect novices' learning (Karvelas et al., 2020).While many researchers have hypothesized (Denny et al., 2008) and less often produced evidence (Ericson et al., 2018) that PPPs can result in more efficient learning than alternatives such as writing or fixing code, recently some have attempted to measure the contributions of various PPP elements (Kumar, 2017(Kumar, , 2019a(Kumar, , 2019b;;Sirkia, 2016), including the effect of displaying the number of lines of code in puzzle solutions (no effect on pre-post improvement, more time spent), of pairing distractors with related variants (effective in the longer of two studied puzzles), of using single-character or mnemonic variable names (no significant differences), and of presenting program visualizations alongside PPPs (visualizations were used most by novices who sought feedback the most via multiple submissions).We measure PPP learning efficiency with and without distractors, while offering a comparison to programming with LCF.Derived from the literature, our hypotheses were:

Formative evaluation
As an early step in a roadmap of studies intended to explore the efficacy of adding intelligent and gameful systems to novice programming environments, and with an aim to reinforce construct validity, we engaged in a formative evaluation with grade 6-9 educators.
Through design thinking activities (Razzouk & Shute, 2012) including iterative surveys, interviews, and prototyping, we advanced our learning design technique, similar to the approach described in Kashmira and Mason (2020), which illuminates design thinking as a strategy useful for exploration, managing uncertainity, learning from failures, and empathizing with the needs of the learner when connecting learning objectives to learning design.Our goals included: 1) identifying the CT concepts receiving focus; 2) eliciting the pedagogical needs of practicing teachers; 3) and refining puzzle and feedback systems.We focus discussion here on goal 1.

Participants
The participants included 21 teachers from learning organizations such as Girls Who Code and codeHER, and 17 from U.S. schools.11% had taught with Scratch for at least 2 to 4 years, 63% for 6 to 18 months, and 26% had instructed with Scratch for less than 6 months.92% taught CS with Scratch, but 16% also or alternatively taught math, and 16% taught science, language arts, or applied arts with Scratch.34% of the teachers used Scratch for at least 51% of their curriculum, 29% used it for 26-50%, and 29% used Scratch for at least 11-15%.Ihantola et al. (2016) highlights the concerning status quo in which most studies in the field focus on a single institution and a single course, without validation by subsequent replicating research, leading to limited understanding of the reasons results occur.To contribute replication results, and to identify the CT concepts receiving focus, we distributed a survey that included a question from a survey previously distributed to K-9 teachers in five European countries (Mannila et al., 2014).This question asks teachers to respond with their perceptions of student engagement in nine facets of CT.Since we targeted a narrower set of teachers in the U.S., it is perhaps unsurprising that the results do not match the earlier international study, in which teachers reported their students most frequently use CT concepts related to data (e.g., analysis).However, we present this finding to reinforce the replication concerns raised, and to underscore the challenges the community faces when attempting to disseminate CT globally.

CT concept engagement
Our findings in Figure 3 indicate teachers perceive their students engage in data CT concepts less than others such as abstraction and algorithms.Aside from the differences in population samples, and the associated threat to internal validity due to implicit differences in curricula (Barendsen et al. (2015) notes a low ratio of data knowledge in K-9 U.S. CSTA materials, 2%, compared with the English national curriculum, 14%, English Computing at School, 16%, and Italian guidelines, 25%), an extra explanation for this contrast could be related to the respondent recruitment process, as we specifically targeted Scratch teachers, whereas the earlier study did not.Since the small sample introduces a threat to external validity, future studies could try to replicate these results while controlling for Representation approach like the one described in Grgurina et al. (2014), in which researchers elicited via interview then charted teachers PCK across eight categories.
Regardless, the lack of student engagement with data warrants investigation, as it is an alarming result for an increasingly data-driven society.

Study design
The formative evaluation helped us roadmap implementations, craft learning materials, and plan an initial summative evaluation.To produce evidence supporting answers to R1-2, we organized a 10-step between-subjects study via Amazon Mechanical Turk (Amazon Mechanical Turk, 2021) with the CT concept sequences operating as the learning objective.
As depicted in Figure 4 and detailed in Table 1, the steps involved: 1) creation of credentials in the learning system and assignment to 1 of 3 conditions characterized in Table 2; 2) a background survey; 3) review of a 6-minute video tutorial on the UI and CT  2014), and to the intrinsic motivation Task Evaluation Questionnaire (TEQ) (SDT), which is a validated 22-item Likert scale measurement designed to reflect participant experience on four subscales: interest/enjoyment, perceived competence, perceived choice, and pressure/tension.In step 6, participants followed written instructions to guide their solving of four puzzles; instructions for the first puzzle included a graphical representation of the correct solution and an explanation of the behavior of each block used for the purpose of familiarization.Each puzzle auto-submitted upon correct completion or after 500 seconds if the participant had not previously submitted an incorrect solution.We advised participants to complete steps 4, 6, and 8 without interruption and required completion of all steps within two hours.Protocol materials are publicly available in Integrating Parsons Puzzles with Scratch (2021).We randomly assigned participants to one of three conditions operating as the independent variable: 1) PPP training (PPP); 2) PPP with distractors training (PPPD); 3) training by solving puzzles with access to all blocks and without move correctness or score feedback (LCF).The dependent variables included time spent and performance on the pretests and posttests, time spent and block moves made in puzzles, and the cognitive load, programming attitude, and TEQ results.

Materials
Following guidance in Harms et al. (2015), we aimed to design motivating scenarios with memorable segments while providing a challenge without being tricky and leaving the participants with a positive impression.To familiarize them, we included in the instructions for the first puzzle the solution and block-use descriptions.We also included more detailed instructions than typically found in PPPs, effectively resulting in a hybridization of the tutorial and PPP approaches described in Harms et al. (2016).We thought this approach might best minimize ambiguity and highly scaffold early learning of new CT concepts in the absence of an instructor.An example puzzle solution and the associated instructions are shown in Figure 5.
We tested and refined our materials in collaboration with a high school teacher, 16 of her freshman physics students with little prior exposure to CT, and eight undergraduates with diverse majors.Tests included trials of the surveys and puzzles, and think-alouds in which the participant would interact with puzzles while verbalizing her thoughts.Although we did not further formally assess validity and reliability, these results led to refinements such as puzzle theme modification, normalization of pre/posttest difficulty, and simplification of language used in survey questions.

Participants
In alignment with Wing's mobilizing declaration that CT is a "fundamental skill for everyone, not just for computer scientists" (Wing, 2006), and with the interventionist spirit of design-based research (Barab, 2014), we sought a learner population inclusive of those who might not otherwise encounter an opportunity to engage purposefully with CT but regardless might influence its trajectory in the lives of children.The learning objective of the CT concept sequences is suitable for this largely novice set of participants, as teachers often present this concept first in a CT curriculum (e.g., Brennan et al., 2014).By presenting our study as a Human Intelligence Task on Amazon Mechanical Turk, we recruited from a general population of over 100K individuals (Difallaha et al., 2018) 75 adults with varying educational experience (24% graduated high school, 60% earned an undergraduate degree, 16% earned a graduate degree) and the variety of self-reported programming experience presented in Figure 6.46 men and 29 women comprise the sample population sourced from eight countries including the U.S. (60%), India (20%), and Brazil (11%).As presented in Table 2, the backgrounds and self-reported programming experience of participants across conditions are largely homogenous, with slightly higher average programming experience on a 0-10 scale reported for the PPPD condition (4.7) than LCF (3.7) and PPP (3.3), higher female participation in the PPPD condition (50%) than in PPP (35%) and LCF (32%), and lower U.S. representation in the LCF condition (50%), than in PPP (65%) and PPPD (67%).Additional participant demographic detail and all summative evaluation data are available in Integrating Parsons Puzzles with Scratch (2021).

Performance
Though we did not find significant training performance differences across conditions (H(2)=.853,p=.653), participants in the PPP and PPPD conditions interacted with the blocks significantly more (H(2)=21.141,p<0.001, ε 2 =0.29).Using a Bonferroni-adjusted alpha of .017(.05/3), we found significant differences between conditions PPP (M=52.2) and LCF (M=32.1),p=0.001, and PPPD (M=57.9)and LCF, p<0.001.The fewer block moves made by participants in the LCF condition indicates some may have perceived the task as sufficiently overwhelming to decrease the probability of exploratory programming behavior.
Although participants in each condition solved more posttest than pretest questions correctly (PPP: M=0.65, PPPD: M=0.82, LCF: M=0.32), with those in the PPPD condition yielding the highest increase, there is no significant difference in performance gain across conditions (H(2)=1.335,p=0.513).This lack of transfer performance disparity between PPP and PPPD conditions ostensibly replicates findings in Harms et al. (2016), which found no significant difference in performance on transfer tasks for those training via PPPs and PPPDs.It is also similar to findings on PPP inter-problem and intra-problem adaptation in Ericson et al. (2018), in which no significant differences in learning gains occurred from pretest to immediate posttest across three conditions involving PPPDs and one involving code writing, which is similar to the LCF condition in our study.

Efficiency
To measure efficiency, we analyzed training and transfer task time across conditions.To emphasize the opportunity for efficient CT learning, we calculated instructional efficiency, using pre/posttest improvement to measure transfer performance and both time and cognitive load as measurements of mental effort during training, as recommend in Paas and Merrienboer (1993).

Motivation
To analyze motivation quantitatively, we scored the TEQ and calculated the within-subject change in programming attitude that occurred between the start and end of the study.
Although there was no significant difference in TEQ results across conditions, for the perceived competence subscale (H(2)=.3.
To supplement the quantitative results, we sought qualitative feedback by requesting that participants describe their attitude or view toward programming after the learning experience.For both those who self-reported low and high prior programming experience, In contrast, PPP and PPPD participants reflected more direct positive attitudinal change.
One PPP participant whose prior programming experience selection was "never attempted to program before" noted that she "definitely enjoyed the puzzles and feel[s] more knowledgeable in terms of programming.It made me much more interested in learning to program."A second PPP participant who recorded the same prior programming experience discovered possibility in her capability: "I feel like it's not as complicated as I thought it was.I could learn a lot through practicing more of it."A third PPP participant who selected "have tried programming activities, but have not taken a class" demonstrated confidence in his ability as well as opportunity for novices: "[T]his activity was somewhat easy but programming is really much harder than this.[B]ut this is a good way for a kid to start learning."Aligned with this viewpoint was one PPPD participant who selected "never attempted to program before" and revealed potential for future pursuit of CT: "I would love to learn more about programming and encourage my son to start learning programming early."A second PPPD participant who selected "have tried programming activities, but not taken a class" focused on the puzzle approach to learning in his response: "I think it is a skill that can be learned through practice.It was nice to look at programming as a series of puzzles rather than a complex language."These results support H1 and those in Charters et al. (2014), which found significant attitude improvement regardless of gender and education level after a brief online programming experience.

Finding summary
We conclude the analysis by summarizing findings for each varied PPP element in Table 4.

Conclusion & future work
Our survey of grade 6-9 teachers exposed teacher perceptions of limited student engagement with data concepts central to CT.These results led us to extend the trend of balancing Scratch's agency with structure to better serve learners and reduce burden on teachers.A small pilot study of an adult population using a learning system that integrates PPPs with Scratch yielded results indicating the structure provided by PPPs catalyzes motivation for CT, reduces extraneous cognitive load, and increases learning efficiency without sacrificing performance on transfer tasks.
While these results reveal opportunities to advance the teaching and learning of CT via augmentations to block-based programming environments, we remain cautious due to external validity limitations: the single CT concept, sequences, and small summative evaluation population (75 adults), threaten generalizability.In future work, we intend to study additional CT concepts, such as conditionals and looping, functionality variation, such as offering increasing agency through the introduction of teacher-defined, objectivedriven feedback, the fading of correctness feedback, and the configurable integration of multiple Scratch palettes for each puzzle, and participants, including online studies with over 500 adults as well as smaller, middle school classroom studies.These conditions should facilitate the study of CT learning beyond that of the beginners under focus in this study by offering tooling to apply incremental cognitive load, deepen CT concept uptake, and transition learners toward interest-driven projects that sustain motivation.With continued investigation, we aim to identify factors supportive of reliably efficient, effective, and equitable CT learning that build bridges between cognitive, situated, and critical CT.
3) programming with access to all blocks and without feedback (limited-constraintfeedback or LCF).Each successive condition offers the learner increasing agency by offering more block options from which to construct puzzle solutions.Condition 3, with block-move correctness feedback and scoring removed, most closely resembles the code writing experience native to Scratch.We investigated the following research questions: R1) what are the effects on motivation and cognitive load when training occurs via: PPPs; PPPDs; LCF? R2) what are the effects on learning efficiency for training via PPP, PPPD, and LCF?Although the 75-participant sample limits the number of statistically significant results, findings indicate: F1) participants self-report higher motivation when training via PPPs and PPPDs, and less extraneous cognitive load when training via PPPs than via PPPDs or via LCF; F2) participants training via PPPs and PPPDs experience increased learning efficiency compared with those training via LCF.

Fig. 1
Fig. 1 PPP palette and block configuration in design mode in Scratch

H1)
PPP and PPPD training increase motivation and reduce extraneous cognitive load compared to training via programming with LCF; H2) PPP training yields highest learning efficiency.

Fig. 3
Fig.3How a small sample of U.S. teachers perceive student engagement in CT concepts Fig. 4 Summative evaluation protocol

Fig. 5
Fig. 5 Example sequences PPP solution and instructions

Fig. 6
Fig. 6 Participant self-reported programming experience at the start of the study Figure 7 presents areas of high and low effectiveness separated by the effort line E=0.The chart depicts higher instructional efficiency for training with PPPs and PPPDs than with LCF.However, this result does not support H2, as the PPPD condition yielded the highest instructional efficiency.This result contrasts with findings in Harms et al. (2016), which found evidence of decreased learning efficiency from PPPDs when compared to PPPs, but it aligns with hypotheses regarding distractor learning benefits in Karavirta et al. (2012) and Parsons & Haden (2006) that propose distractors can facilitate the highlighting of both subtle and complex principles as well as edge cases and common misconceptions.

Table 1
Study protocol & measurements

Table 2
Training and participant characteristics across three study conditions

Table 3
Within-subject attitude change.Positive shifts (p), negative shifts (n).*p<0.05,**p<0.01werecordedmorehesitant responses from those who trained via limited constraint and feedback than those who trained via PPPs and PPPDs.One LCF participant who selected "have tried programming activities, but have not taken a class" in the demographic survey, reflected on sustained struggle: "I still feel like programming is insanely complex.When I was in college I dropped out of computer science as soon as we started python.I just couldn't understand what we were doing, and maybe I could understand it if I really tried.It just seems to be better geared towards certain people."Asecond LCF participant with the same prior programming experience selection revealed marginal incremental motivational change: "I have already begun to study programming but have not stayed consistent with my studies.This has encouraged me to give more attention to the subject."Athird LCF participant who selected "have tried programming activities, but not taken a class" alluded to seeking external supports: "I hope to translate what I have gained today to my studies in coding.It is a bit tedious, and there is a lot to know, but I think that many basic codes can be written with the help of a search engine or some material."

Table 4
General summary of findings across conditions