How do robots attract children? The role of appearance, motion, and voice as multisensory features in early-stage interactions

doi:10.11996/JG.j.2095-302X.2026010223

Abstract

Abstract:

With the rapid development of artificial intelligence technology, multimodal robots are playing an increasingly important role in preschool children’s education, entertainment, and daily life. Existing studies have primarily focused on the effects of single sensory cues of robots on children’s perception, while systematic research on multisensory integration effects remains limited. To explore how robots’ multimodal features jointly influence children’s emotional preferences and visual attention, 318 children aged 4-6 years were recruited to participate in an eye-tracking experiment. The experiment adopted a 2 (appearance features: humanoid vs. animal-like) × 3 (voice guidance: male voice, female voice, none) × 2 (gesture guidance: present vs. absent) mixed factorial design, with robot appearance features (humanoid vs. animal-like) and behavioral features (voice and gesture guidance) as independent variables, and children’s emotional preferences and eye-tracking indicators as dependent variables, thereby systematically examining the effects of multimodal features on child users. The results showed that, in terms of appearance features, no significant difference was observed in subjective preference ratings between humanoid and animal-like robots. However, humanoid robots attracted longer total fixation duration, more fixation counts, and shorter first-fixation latency, indicating superior attention-related performance compared with animal-like robots. Children were more readily attracted to humanoid robots during the initial stage of visual contact, and anthropomorphic design showed greater advantages in sustaining children’s attention. In terms of behavioral features, robots with gesture guidance received significantly higher subjective preference ratings than those without gestures, and also elicited longer total fixation duration and more fixation counts. Robots with female voices received slightly higher subjective preference ratings than those with male voices, and both were significantly preferred over robots without voices. Robots with male voices had slightly longer total fixation duration than those with female voices, and both significantly outperformed robots without voices. The difference in fixation counts between male- and female-voice robots was not significant, but both attracted significantly more fixations than robots without voices. Robots with gesture guidance and voice (especially female voice) performed better in subjective ratings and visual attention allocation, suggesting that behavioral features substantially enhanced children’s emotional preferences and interactive experiences. Furthermore, the effects of appearance and behavioral features on children’s emotional preferences and visual attention were relatively independent, and no significant interaction effects were observed. This study revealed the mechanisms through which robot appearance and behavioral features influenced preschool children’s emotional preferences and visual attention, thereby providing scientific evidence for designing child-oriented robots that align with users’ emotional needs.

Key words: preschool children, robot, appearance features, behavioral features, emotional preference, visual attention, multisensory integration

CLC Number:

TP242

LI Yi, CAO Chengcai, SONG Zhangtong, LI Zuoqi, LI Xiao, LI Hesen. How do robots attract children? The role of appearance, motion, and voice as multisensory features in early-stage interactions[J]. Journal of Graphics, 2026, 47(1): 223-233.

Figures/Tables 9

References 63

[1]	BELPAEME T, KENNEDY J, RAMACHANDRAN A, et al. Social robots for education: a review[J]. Science Robotics, 2018, 3(21): eaat5954. DOI URL
[2]	BARTNECK C, BELPAEME T, EYSSEL F, et al. Human-robot interaction: an introduction[M]. Cambridge: Cambridge University Press, 2024: 12-18.
[3]	CHIEN S E, CHEN Y S, CHEN Y C, et al. Exploring the developmental aspects of the uncanny valley effect on children’s preferences for robot appearance[J]. International Journal of Human-Computer Interaction, 2025, 41(10): 6366-6376. DOI URL
[4]	ZHANG F R, BROZ F, DERTIEN E, et al. Understanding design preferences for robots for pain management: a co-design study[C]// 2022 17th ACM/IEEE International Conference on Human-Robot Interaction. New York: IEEE Press, 2022: 1124-1129.
[5]	MANNER M D. Identifying differences in social responsiveness among preschoolers interacting with or watching social robots[C]// The 27th International Joint Conference on Artificial Intelligence. New York: ACM, 2018: 5777-5778.
[6]	LEE J, AOKI H, STEFANOV D, et al. A study on the relationship between robotic movement with animacy and visual attention of young children[C]// 2016 25th IEEE International Symposium on Robot and Human Interactive Communication. New York: IEEE Press, 2016: 682-687.
[7]	ANDRÉ V, JOST C, HAUSBERGER M, et al. Ethorobotics applied to human behaviour: can animated objects influence children's behaviour in cognitive tasks?[J]. Animal Behaviour, 2014, 96: 69-77. DOI URL
[8]	王艳群, 杨俞玲, 张仁杰, 等. 基于眼动实验的ASD儿童陪伴机器人造型设计评价研究[J]. 包装工程, 2022, 43(20): 121-131.
	WANG Y Q, YANG Y L, ZHANG R J, et al. Research on modeling design evaluation of ASD children accompanying robot based on eye-movement experiment[J]. Packaging Engineering, 2022, 43(20): 121-131 (in Chinese).
[9]	KE F F, LIU R H, SOKOLIKJ Z, et al. Using eye-tracking in education: review of empirical research and technology[J]. Educational Technology Research and Development, 2024, 72(3): 1383-1418. DOI
[10]	ZAJONC R B, MARKUS H. Affective and cognitive factors in preferences[J]. Journal of Consumer Research, 1982, 9(2): 123-131. DOI URL
[11]	KAWASAKI M, YAMAGUCHI Y. Individual visual working memory capacities and related brain oscillatory activities are modulated by color preferences[J]. Frontiers in Human Neuroscience, 2012, 6: 318. DOI PMID
[12]	FONG T, NOURBAKHSH I, DAUTENHAHN K. A survey of socially interactive robots[J]. Robotics and Autonomous Systems, 2003, 42(3/4): 143-166. DOI URL
[13]	TUNG F W. Child perception of humanoid robot appearance and behavior[J]. International Journal of Human-Computer Interaction, 2016, 32(6): 493-502. DOI URL
[14]	VAN STRATEN C L, PETER J, KÜHNE R. Child-robot relationship formation: a narrative review of empirical research[J]. International Journal of Social Robotics, 2020, 12(2): 325-344. DOI PMID
[15]	OROS M, NIKOLIĆ M, BOROVAC B, et al. Children’s preference of appearance and parents’ attitudes towards assistive robots[C]// 2014 IEEE-RAS International Conference on Humanoid Robots. New York: IEEE Press, 2014: 360-365.
[16]	WOODS S. Exploring the design space of robots: children’s perspectives[J]. Interacting with Computers, 2006, 18(6): 1390-1418. DOI URL
[17]	CORBETTA M, SHULMAN G L. Control of goal-directed and stimulus-driven attention in the brain[J]. Nature Reviews Neuroscience, 2002, 3(3): 201-215. DOI PMID
[18]	LIU Y S, LI F, TANG L H, et al. Detection of humanoid robot design preferences using EEG and eye tracker[C]// 2019 International Conference on Cyberworlds. New York: IEEE Press, 2019: 219-224.
[19]	LI M M, GUO F, REN Z G, et al. A visual and neural evaluation of the affective impression on humanoid robot appearances in free viewing[J]. International Journal of Industrial Ergonomics, 2022, 88: 103159. DOI URL
[20]	LIBERMAN-PINCU E, ORON-GILAD T. Impacting the perception of socially assistive robots- evaluating the effect of visual qualities among children[C]// 2021 30th IEEE International Conference on Robot & Human Interactive Communication. New York: IEEE Press, 2021: 612-618.
[21]	ADMONI H, SCASSELLATI B. Social eye gaze in human-robot interaction: a review[J]. Journal of Human-Robot Interaction, 2017, 6(1): 25-63. DOI URL
[22]	WRÓBEL A, ZGUDA P, ŹRÓBEK K, et al. ‘I prefer robot cats!’ reflections on robot animal-like morphology from an in-the-wild child-robot interaction workshop[C]// The 16th International Conference on Intelligent Human Computer Interaction. Cham: Springer, 2025: 159-170.
[23]	BARBER O, SOMOGYI E, MCBRIDE E A, et al. Exploring the role of aliveness in children's responses to a dog, biomimetic robot, and toy dog[J]. Computers in Human Behavior, 2023, 142: 107660. DOI URL
[24]	LEE D, PARK H, LEE H S. Book-Toki: a rabbit-shaped reading companion robot that enhances children’s reading concentration[C]// 2024 21st International Conference on Ubiquitous Robots. New York: IEEE Press, 2024: 520-524.
[25]	BECK A, CAÑAMERO L, HIOLLE A, et al. Interpretation of emotional body language displayed by a humanoid robot: a case study with children[J]. International Journal of Social Robotics, 2013, 5(3): 325-334. DOI URL
[26]	LI M M, GUO F, FANG C, et al. Multisensory integration effect of humanoid robot appearance and voice on users’ affective preference and visual attention[J]. Behaviour & Information Technology, 2023, 42(14): 2387-2406.
[27]	TIELMAN M, NEERINCX M, MEYER J J, et al. Adaptive emotional expression in robot-child interaction[C]// The 9th ACM/IEEE International Conference on Human-Robot Interaction. New York: IEEE Press, 2014: 407-414.
[28]	BORAU S, OTTERBRING T, LAPORTE S, et al. The most human bot: Female gendering increases humanness perceptions of bots and acceptance of AI[J]. Psychology & Marketing, 2021, 38(7): 1052-1068. DOI URL
[29]	KENDON A. Gesture: visible action as utterance[M]. New York: Cambridge University Press, 2004: 176-198.
[30]	KREJTZ I, SZARKOWSKA A, KREJTZ K, et al. Audio description as an aural guide of children's visual attention: evidence from an eye-tracking study[C]// Symposium on Eye Tracking Research and Applications. New York: ACM, 2012: 99-106.
[31]	JURKAT S, GUTKNECHT-STÖHR A C, KÄRTNER J. The socialization of visual attention: training effects of verbal attention guidance in urban German children[EB/OL]. [2025-01-23]. https://doi.org/10.1037/dev0001746.
[32]	KORY WESTLUND J M, JEONG S, PARK H W, et al. Flat vs. expressive storytelling: young children’s learning and retention of a social robot’s narrative[J]. Frontiers in Human Neuroscience, 2017, 11: 295. DOI URL
[33]	KOELEWIJN T, BRONKHORST A, THEEUWES J. Attention and the multiple stages of multisensory integration: a review of audiovisual studies[J]. Acta Psychologica, 2010, 134(3): 372-384. DOI PMID
[34]	DE BOER M J, BAŞKENT D, CORNELISSEN F W. Eyes on emotion: dynamic gaze allocation during emotion perception from speech-like stimuli[J]. Multisensory Research, 2020, 34(1): 17-47. DOI PMID
[35]	DE WIT J, BRANDSE A, KRAHMER E, et al. Varied human-like gestures for social robots: investigating the effects on children’s engagement and language learning[C]// The 15th ACM/IEEE International Conference on Human-Robot Interaction. New York: IEEE Press, 2020: 359-367.
[36]	JOUEN A L, MATSUNAKA R, HIRAKI K. Once upon a time… acquisition of second language vocabulary through robotic storytelling in classroom settings[J]. International Journal of Social Robotics, 2025, 17(6): 955-988. DOI
[37]	OVIATT S. Ten myths of multimodal interaction[J]. Communications of the ACM, 1999, 42(11): 74-81.
[38]	VANNASING P, DIONNE-DOSTIE E, TREMBLAY J, et al. Electrophysiological responses of audiovisual integration from infancy to adulthood[J]. Brain and Cognition, 2024, 178: 106180. DOI URL
[39]	KLASEN M, CHEN Y H, MATHIAK K. Multisensory emotions: perception, combination and underlying neural processes[J]. Reviews in the Neurosciences, 2012, 23(4): 381-392. DOI PMID
[40]	CORNELIO P, VELASCO C, OBRIST M. Multisensory integration as per technological advances: a review[J]. Frontiers in Neuroscience, 2021, 15: 652611. DOI URL
[41]	COSTA S, SOARES F, SANTOS C. Facial expressions and gestures to convey emotions with a humanoid robot[C]// The 5th International Conference on Social Robotics. Cham: Springer, 2013: 542-551.
[42]	TSIOURTI C, WEISS A, WAC K, et al. Multimodal integration of emotional signals from voice, body, and context: effects of (In)congruence on emotion recognition and attitudes towards robots[J]. International Journal of Social Robotics, 2019, 11(4): 555-573. DOI
[43]	ESCUDERO P, ROBBINS R A, JOHNSON S P. Sex-related preferences for real and doll faces versus real and toy objects in young infants and adults[J]. Journal of Experimental Child Psychology, 2013, 116(2): 367-379. DOI PMID
[44]	SPINNER L, CAMERON L, FERGUSON H J. Children’s and parents’ looking preferences to gender-typed objects: evidence from eye tracking[J]. Journal of Experimental Child Psychology, 2020, 199: 104892. DOI URL
[45]	FAZIO R H, OLSON M A. Implicit measures in social cognition research: their meaning and use[J]. Annual Review of Psychology, 2003, 54(1): 297-327. DOI URL
[46]	COSTA A, SCHWEICH T, CHARPIOT L, et al. Attitudes of children with autism towards robots: an exploratory study[EB/OL]. [2025-01-23]. https://arxiv.org/abs/1806.07805.
[47]	ROESLER E, MANZEY D, ONNASCH L. A meta-analysis on the effectiveness of anthropomorphism in human-robot interaction[J]. Science Robotics, 2021, 6(58): eabj5425.
[48]	ROSSINI G, MANZI F, DI DIO C, et al. Playing with robots in a nursery: a sociomaterial focus on interaction and learning[J]. European Journal of Psychology of Education, 2025, 40(1): 37. DOI
[49]	ZAJONC R B. Attitudinal effects of mere exposure[J]. Journal of Personality and Social Psychology, 1968, 9(2): 1-27.
[50]	BRAND J, MASTERSON T D, EMOND J A, et al. Measuring attentional bias to food cues in young children using a visual search task: an eye-tracking study[J]. Appetite, 2020, 148: 104610. DOI URL
[51]	SANDOVAL E B, SOSA R, CAPPUCCIO M, et al. Human-robot creative interactions: exploring creativity in artificial agents using a storytelling game[J]. Frontiers in Robotics and AI, 2022, 9: 695162. DOI URL
[52]	KANDA T, HIRANO T, EATON D, et al. Interactive robots as social partners and peer tutors for children: a field trial[J]. Human-Computer Interaction, 2004, 19(1/2): 61-84. DOI URL
[53]	ABRAMS R A, CHRIST S E. Motion onset captures attention[J]. Psychological Science, 2003, 14(5): 427-432. DOI PMID
[54]	EGETH H E, YANTIS S. Visual attention: control, representation, and time course[J]. Annual Review of Psychology, 1997, 48: 269-297. PMID
[55]	WEISS A, WURHOFER D, TSCHELIGI M. “I love this dog”—children’s emotional attachment to the robotic dog AIBO[J]. International Journal of Social Robotics, 2009, 1(3): 243-248. DOI URL
[56]	CHANG R C S, LU H P, YANG P S. Stereotypes or golden rules? Exploring likable voice traits of social robots as active aging companions for tech-savvy baby boomers in Taiwan[J]. Computers in Human Behavior, 2018, 84: 194-210. DOI URL
[57]	PRENTICE D A, CARRANZA E. What women and men should be, shouldn't be, are allowed to be, and don't have to be: the contents of prescriptive gender stereotypes[J]. Psychology of Women Quarterly, 2002, 26(4): 269-281. DOI URL
[58]	DOU X, WU C F, NIU J, et al. Effect of voice type and head-light color in social robots for different applications[J]. International Journal of Social Robotics, 2022, 14(1): 229-244. DOI
[59]	COHEN J. Statistical power analysis for the behavioral sciences[M]. 2nd ed. New York: Routledge, 1988: 559-560.
[60]	BARROUILLET P. Theories of cognitive development: from Piaget to today[J]. Developmental Review, 2015, 38: 1-12. DOI URL
[61]	GELMAN S A, MARKMAN E M. Categories and induction in young children[J]. Cognition, 1986, 23(3): 183-209. PMID
[62]	CHANDLER P, SWELLER J. Cognitive load theory and the format of instruction[J]. Cognition and Instruction, 1991, 8(4): 293-332. DOI URL
[63]	王伟伟, 宁瑨, 魏婷. 基于认知负荷的用户感知体验情感评价方法[J]. 包装工程, 2022, 43(4): 147-155.
	WANG W W, NING J, WEI T. Emotional evaluation method of user perceptual experience based on the perspective of cognitive load[J]. Packaging Engineering, 2022, 43(4): 147-155 (in Chinese).

指标	含义
主观量表	对机器人外观的主观评分，分值与满意度呈正相关
总注视时长	视线在兴趣区内所有注视点的注视时间总和，主要反映信息加工的偏向，持续注视时间越长，表明该兴趣区对被试的吸引力或复杂性越高，积极情绪则常导致更长注视
注视次数	一次注视为一个注视点，注视次数是在兴趣区内注视点的总数量，能反映吸引力或难度，注视次数越多，表明该兴趣区对被试的吸引力越强
首次进入时间	从观看样本开始到视线第一次注视到兴趣区的时长，首次注视前时间越短，表明该兴趣区越易被注意到，且越易引起关注
平均瞳孔直径	瞳孔的放大与情感效价呈非线性关系，而与情感唤醒度呈线性关系，即唤醒度越高，瞳孔直径越大

指标	含义
主观量表	对机器人外观的主观评分，分值与满意度呈正相关
总注视时长	视线在兴趣区内所有注视点的注视时间总和，主要反映信息加工的偏向，持续注视时间越长，表明该兴趣区对被试的吸引力或复杂性越高，积极情绪则常导致更长注视
注视次数	一次注视为一个注视点，注视次数是在兴趣区内注视点的总数量，能反映吸引力或难度，注视次数越多，表明该兴趣区对被试的吸引力越强
首次进入时间	从观看样本开始到视线第一次注视到兴趣区的时长，首次注视前时间越短，表明该兴趣区越易被注意到，且越易引起关注
平均瞳孔直径	瞳孔的放大与情感效价呈非线性关系，而与情感唤醒度呈线性关系，即唤醒度越高，瞳孔直径越大

变异来源	III类平方和	自由度	均方	F	P	η²p
修正模型	18.545	11.000	1.686	2.131	0.017
截距	11 506.501	1.000	11 506.501	14 543.404	0
声音	6.912	2.000	3.456	4.368	0.013	0.013 7
动作	8.287	1.000	8.287	10.475	0.001	0.016 6
外观	0.574	1.000	0.574	0.726	0.395	0.001 2
外观×动作	0.058	1.000	0.058	0.073	0.787	0.000 1
动作×声音	0.055	2.000	0.027	0.035	0.966	0.000 1
外观×声音	0.876	2.000	0.438	0.554	0.575	0.001 8
外观×动作×声音	1.508	2.000	0.754	0.953	0.386	0.003 0

变异来源	III类平方和	自由度	均方	F	P	η²p
修正模型	18.545	11.000	1.686	2.131	0.017
截距	11 506.501	1.000	11 506.501	14 543.404	0
声音	6.912	2.000	3.456	4.368	0.013	0.013 7
动作	8.287	1.000	8.287	10.475	0.001	0.016 6
外观	0.574	1.000	0.574	0.726	0.395	0.001 2
外观×动作	0.058	1.000	0.058	0.073	0.787	0.000 1
动作×声音	0.055	2.000	0.027	0.035	0.966	0.000 1
外观×声音	0.876	2.000	0.438	0.554	0.575	0.001 8
外观×动作×声音	1.508	2.000	0.754	0.953	0.386	0.003 0

指标	拟人型	拟动物型	T	P	η²p
总注视时长/ms	11 513.92	10 194.31	2.981	0.003	0.014 3
注视次数/次	22.99	19.40	4.388	0	0.030 9
首次进入时间/ms	16 11.96	2 324.57	-2.264	0.024	0.008 1
平均瞳孔直径/mm	4.84	4.89	-1.191	0.234	0.002 2