This study was approved by Otsuma Women’s University’s Life Sciences Research Ethics Committee (no. 28-015) and the Behavioral Research Ethics Committee of the Osaka University School of Human Sciences (no. HB020-032).

### Experiment 1

#### Participants

The participants were 24 full-term eight-month-old infants (12 boys and 12 girls; mean age, 8 months 13 days; range, 7 months 13 days to 9 months 27 days). The sample size was determined on the basis of prior infant morality studies^{11,21,22,38}. Eleven additional infants were tested but excluded owing to distress or fussiness (*N* = 4), or side-looking bias (*N* = 7, left = 7, right = 0; see the details of the criteria below). The parents provided written informed consent before the experiment and were financially compensated for participation.

#### Apparatus and stimuli

Infant gaze movements were measured using a Tobii TX300 near-infrared eye tracker (Tobii Technology), integrated with a 23-inch computer display (1,280 × 720 pixels). The sampling rate was 120 Hz. Task programming was completed in Visual Basic 2015 Express (Microsoft Corp.) and Tobii SDK (Tobii Technology). In all tasks, when an eye gaze was detected at a point on the display, a translucent red circle with a radius of 25 pixels appeared (Fig. 1a) to facilitate gaze control^{29}. However, during the occurrence of contingent events, the red circle disappeared to allow for focus on said contingent events. The display background was aqua in colour.

The participants’ faces were monitored and recorded with a video camera (Panasonic HC-WX990M). Images on the PC screen (presented to the participants) and images of the participants were synthesized (Picture in Picture) using a video mixer device (Roland, V-1600HD) and recorded on a laptop PC (HP, Elite Book 8570w/CT) with a monitor-capturing device (Avermedia, AVT-C875).

In the practical phase, the first six trials subjected the infants to gaze-contingent events in which fixation on a single object (a red or blue circle positioned alternately on the left or right) for 500 ms resulted in a stone falling and crushing the object. This phase was set to reduce side-looking bias. In four subsequent trials, the infants were presented with two objects side by side (a red circle and a blue circle) instead of a single object. When the infants fixated on either of the two objects for 500 ms, a stone fell and crushed it. The presented position of each object or pair of objects was fixed among the participants.

In the following pretest, the infants experienced gaze-contingent events identical to those in the practical phase except that the targets were two geometric agents with eyes (for example, green and orange squares; pretest in Fig. 1a). The presented position of the geometric agents (left or right) was counterbalanced across participants but consistent between the pretest and posttest within participants.

In the movie phase, the infants were presented with an aggressive interaction animation (20 s in duration) depicting one geometric figure hitting and crashing into another geometric figure^{20,21,22} (Fig. 1b and Supplementary Video 2). The roles of the geometric figures (aggressor or victim) were counterbalanced between participants. Following the movie phase, the infants completed the posttest phase with gaze-contingent events identical to those of the pretest.

#### Procedure

The infants were fastened in a baby carrier to prevent them from standing up and were placed on their mothers’ laps approximately 60 cm from the monitor. Nine-point calibration was used. The parents were instructed not to watch the monitor and not to talk or interact with their children during the experiment.

The infants experienced ten gaze-contingent events in the practical phase. Then, the infants experienced ten gaze-contingent events in the pretest. In the movie phase, the infants were presented with animated movies of aggressive interactions three times. Finally, the infants experienced ten gaze-contingent events in the posttest. Attractive animated clips (a rotating oval checkerboard) with sound were inserted between trials if infants did not pay attention to the monitor.

#### Data analysis

We excluded data from further analysis if infants showed a side-looking bias, which was defined as looking to one side in more than 12 of the 14 gaze-contingent events (the last four trials of the practical phase and the ten trials of the pretest) (Bayesian binomial test, two-tailed, BF_{10} = 8.11, moderate evidence in favour of the alternative hypothesis; traditionally, the binomial test gives a *P* value below 0.05). To compare the proportion of infant selective looks at agents between pretest and posttest, we used GLMMs with a binomial error structure and a logit link function. The response variable was infant selective looks at the aggressor (= 1) or the victim (= 0) in the pretest or posttest. The explanatory variables (fixed effects) were test type (pretest or posttest) and trial number. We set participant identity as a random intercept. To keep the random effects structure “maximal”^{44}, we also included all possible random slopes within participants and correlations.

We compared models on the basis of the BF. The model candidates were (1) the null model, (2) a model with the main effect of test type, (3) a model with the main effect of trial number, (4) a model with the main effects of test type and trial number, and (5) a model with the main effect of test type, the main effect of trial number and the interaction between test type and trial number. All models were compared with the null model, and we computed the BF (BF_{10}), with the relative evidence in favour of each model over the null model (Table 1). We assumed that the prior model probability was uniform and evaluated the degree to which the data had changed the prior model odds for each model. We also computed BF_{incl} (ref. ^{30}) for each effect to evaluate the level of likelihood that the data were under models that included the effect compared with models that excluded the effect (Table 2). BF_{incl} was computed on the basis of inclusion probabilities (that is, the sum of the model probabilities for the models that included the effect) across all models. For reporting BF_{10} and BF_{incl}, we set the Cauchy distribution with location 0 and scale 1/√2 as a prior distribution for a coefficient parameter^{31}. We also set the default prior (a *t* distribution with degrees of freedom 3 and scale 2.5) of brms as the prior distribution of an intercept and the standard deviation of random effects. To check whether the main conclusions from the data were robust to different priors, we conducted a sensitivity analysis for BF_{incl} (Fig. 2). We computed BF_{incl} for each effect and set the scale parameter of the Cauchy prior for the effect size from 0.05 to 1.5 in increments of 0.05.

We estimated the posterior distributions of the model parameters and checked the posterior predictive distribution for an infant’s selective looks towards the aggressor for the best model in the model comparison results (Supplementary Fig. 2a). We set the improper prior distribution for a coefficient parameter. Additionally, we set the default prior (a *t* distribution with degrees of freedom 3 and scale 2.5) of brms as a prior distribution of an intercept and the standard deviation of random effects. The posterior median and a 95% CI were calculated for each parameter.

The computation of BFs and parameter estimation were implemented using the brms package^{45,46} in R v.4.0.3 (ref. ^{47}). The parameters were estimated with the Markov chain Monte Carlo (MCMC) method, and brms was used as an interface to Stan v.2.21.0 (ref. ^{48}). As a general setting for MCMC sampling, iterations were set to 10,000, burn-in samples were set to 1,000 and the number of chains was set to four. The values of \(\hatR\) for all parameters were below 1.1, indicating convergence across the four chains; the parameter estimates are shown in Supplementary Table 1 (the best model) and Supplementary Table 8 (the full model). The graphical results of the full model are shown in Fig. 3a. The best model’s posterior predictive distribution for an infant’s selective looks towards the aggressor is shown in Supplementary Fig. 2a. All observed data were inside the 95% prediction interval. The mean times spent looking at the aggressive-interaction animations during the movie phase are shown in Supplementary Table 7.

### Experiment 2

#### Participants

The participants were an additional healthy 24 full-term eight-month-old infants (12 boys and 12 girls; mean age, 8 months 7 days; range, 7 months 17 days to 9 months 3 days). Eighteen additional infants were tested but excluded owing to distress or fussiness (*N* = 4), experimental error (*N* = 2) or side-looking bias (*N* = 12, left = 11, right = 1). All other details were the same as in Experiment 1.

#### Apparatus and stimuli

The movie phase of Experiment 2 used identical apparatus and animations to those in Experiment 1. The gaze-contingent events in Experiment 2 were also identical to those in Experiment 1, but with contact between objects and stones or between geometric figures and stones appearing less negative: materials falling softly hit objects or agents with less force than in Experiment 1 (Fig. 1a, Experiment 2).

#### Procedure

This was identical to Experiment 1.

#### Data analysis

The criteria and analyses of side-looking bias were the same as in Experiment 1, as was the analytic plan. The results of the model comparison and analysis of the effect are shown in Tables 1 and 2, respectively. The sensitivity analysis results for BF_{incl} are shown in Fig. 2. The parameter estimates are shown in Supplementary Table 2 (the best model) and Supplementary Table 9 (the full model). The graphical results of the full model in the model comparison are shown in Fig. 3b. The best model’s posterior predictive distribution for an infant’s selective looks towards the aggressor is shown in Supplementary Fig. 2b. All observed data were inside the 95% prediction interval. The mean times spent looking at the aggressive-interaction animations during the movie phase are shown in Supplementary Table 7.

### Experiment 3

#### Participants

The participants were an additional 24 full-term eight-month-old infants (12 boys and 12 girls; mean age, 8 months 19 days; range, 8 months 0 days to 9 months 22 days). Seven additional infants were tested but excluded owing to distress or fussiness (*N* = 2), machine trouble (*N* = 3) or side-looking bias (*N* = 2, left = 2, right = 0). All other details were the same as in Experiment 1.

#### Apparatus and stimuli

The movie phase of Experiment 3 used identical apparatus and animations as Experiment 1. The gaze-contingent events in Experiment 3 were also identical to those in Experiment 1 except that during the practical phase, the infants were presented with two objects side by side (a red circle and a blue circle) in all ten trials. This modification was to implement a 50% reinforcement probability. In the practical phase, pretest and posttest, when the infants fixated on one of two objects, half of the gaze-contingent events involved the object (or agent) that they looked at, while the other half involved the object (or agent) that they did not look at. The reinforcement order was randomized among infants; however, a given gaze-contingent event was repeated no more than three times (Fig. 1a, Experiment 3).

#### Procedure

This was identical to Experiment 1.

#### Data analysis

The criteria and analyses for side-looking bias were the same as in Experiment 1, as was the analytic plan. The results of the model comparison and analysis of the effect are shown in Tables 1 and 2, respectively. The sensitivity analysis results for BF_{incl} are shown in Fig. 2. The parameter estimates are shown in Supplementary Table 3 (the best model) and Supplementary Table 10 (the full model). The graphical results of the full model in the model comparison are shown in Fig. 3c. The best model’s posterior predictive distribution for an infant’s selective looks towards the aggressor is shown in Supplementary Fig. 2c. All observed data were inside the 95% prediction interval. The mean times spent looking at the aggressive-interaction animations during the movie phase are shown in Supplementary Table 7.

### Experiment 4

#### Participants

The participants were an additional 24 healthy full-term eight-month-old infants (12 boys and 12 girls; mean age, 8 months 13 days; range, 7 months 23 days to 9 months 13 days). Seventeen additional infants were tested but excluded owing to distress or fussiness (*N* = 7), machine trouble (*N* = 1), parental intervention (*N* = 1) or side-looking bias (*N* = 8, left = 5, right = 3). All other details were the same as in Experiment 1.

#### Apparatus and stimuli

Experiment 4 used the same apparatus as Experiment 1. The gaze-contingent events in the pretest and posttest, as well as the animations in the movie phase, were also identical to those in Experiment 1 with the following exceptions: we divided the eyes of both geometric features into white parts and black parts, with the aim of eliminating perceivable ‘animacy or agency’; we also removed the objects’ ability to self-propel and any distortion upon contact (Fig. 1a,b, Experiment 4; see also Supplementary Video 3).

#### Procedure

See Experiment 1.

#### Data analysis

The criteria and analyses for side-looking bias as well as the analytic plan were the same as in Experiment 1. The results of the model comparison and analysis of the effect are shown in Tables 1 and 2, respectively. The sensitivity analysis results for BF_{incl} are shown in Fig. 2. The parameter estimates are shown in Supplementary Table 4 (the best model) and Supplementary Table 11 (the full model). The graphical results of the full model in the model comparison are shown in Fig. 3d. The best model’s posterior predictive distribution for an infant’s selective looks towards the aggressor is shown in Supplementary Fig. 2d. All observed data were inside the 95% prediction interval. The mean times spent looking at the physical-collision animations during the movie phase are shown in Supplementary Table 7.

### Experiment 5

#### Participants

The participants were an additional 24 full-term eight-month-old infants (11 boys and 13 girls; mean age, 8 months 15 days; range, 7 months 18 days to 9 months 15 days). Eleven additional infants were tested but excluded owing to distress or fussiness (*N* = 5), machine trouble (*N* = 2) or side-looking bias (*N* = 4, left = 4, right = 0). All other details were the same as in Experiment 1.

#### Apparatus, stimuli and procedure

See Experiment 1.

#### Data analysis

The criteria and analyses for side-looking bias and the analytic plan followed those in Experiment 1. The results of the model comparison and analysis of the effect are shown in Tables 1 and 2, respectively. The sensitivity analysis results for BF_{incl} are shown in Fig. 2. The parameter estimates are shown in Supplementary Table 5 (the best model) and Supplementary Table 12 (the full model). The graphical results of the full model in the model comparison are shown in Fig. 3e. The best model’s posterior predictive distribution for infant’s selective looks to the aggressor is shown in Supplementary Fig. 2e. All observed data were inside the 95% prediction interval. The mean times spent looking at the aggressive-interaction animations during the movie phase are shown in Supplementary Table 7.

### Comparison of the effect sizes of test type for each experiment

The results indicating that infants selectively looked at the aggressor in the posttest rather than the pretest only in Experiments 1 and 5 are not sufficient to demonstrate that there were clear differences between the experiments in terms of changes in infants’ looking behaviour between the pretest and posttest phases^{37}. To compare the effect size of the test type for each experiment, we combined all experiment data and estimated the interaction effects between test type and experiment. We used GLMM with a binomial error distribution and a logit link function. The response variable was infant selective looks in the pretest or posttest phase; looking at an aggressor or a causer was treated as 1, and otherwise as 0. The explanatory variables (fixed effects) were test type (pretest or posttest), experiment (Experiment 1, 2, 3, 4 or 5), trial number and the interaction between test type and experiment. Participant identity was set as a random intercept. We also included all possible random slopes within participants and correlations.

The model parameters were estimated with the MCMC method. We used brms^{45,46} and performed MCMC sampling in the same setting as in the analysis of each experiment. The values of \(\hatR\) for all parameters were below 1.1, indicating convergence across the four chains. Using MCMC samples of the interaction effects between test type and experiment, we calculated the effect size difference between experiments. Comparisons of the effect of test type for each experiment are shown in Supplementary Fig. 1. The parameter estimates for the model assessing the interaction effects between test type and experiment are shown in Supplementary Table 13. The parameter estimates for the differences in the test type effects between experiments are shown in Supplementary Table 6.

### Post-hoc confirmation of the validity of the sampling design

To assess whether our sampling design of each experiment had sufficient power to detect the effect of test type, we computed simulation-based power, given the actual sample size and the theoretically expected effect size. We simulated new datasets, estimated parameters of the full model with the new data and calculated the 95% CI of the parameter for the effect of test type to set our sampling design, with 24 participants and ten observations per test phase. We set the effect size of test type for the simulation on the basis of a previous meta-analysis study, which investigated infants’ preferences between a prosocial and an antisocial agent^{49}. We randomly generated 100 samples on the basis of the effect size while setting various values for the magnitude of individual difference. Thereafter, we treated the proportion of samples in which the 95% CI of the parameter for the effect of test type did not include zero as a simulated power, given the theoretically expected effect size. Unfortunately, we found that this sampling design was not sufficiently powerful for the range of individual differences estimated from our actual data and the theoretically expected effect sizes. If our sample had been generated from a theoretically expected effect size, our sampling design would have had sufficient power only when the individual difference in the test type effect was small. Although it was not possible to know the magnitude of individual difference in the test type effect a priori in this study, it is advisable to select a larger sample size to conduct a similar paradigm in the future (see the Supplementary Information and Supplementary Fig. 3 for additional information).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

link