Yale Perception & Cognition Lab

VSS '19 Abstracts
Jump to:  
Chen, Y. -C., Chang, A., Rosenberg, M., Scholl, B. J., & Trainor, L. (2019). Are you the sort of person who would like this? Quantifying the typicality of aesthetic taste across seeing and hearing. Talk given at the annual meeting of the Vision Sciences Society, 5/20/19, St. Pete Beach, FL.  
Aesthetic experience seems both regular and idiosyncratic. On one hand, there are powerful regularities in what we tend to find attractive vs. unattractive. (For example, most of us prefer images of beaches to images of mud puddles.) On the other hand, what we like also varies dramatically from person to person: what one of us finds beautiful, another might find distasteful. What is the nature of such differences in aesthetic taste? They may in part be arbitrary -- e.g. reflecting the random details of highly specific past judgments (such as liking red towels over blue ones because they were once cheaper). But they may also in part be systematic -- reflecting deeper ways in which people differ from each other in terms of their perceptual and/or cognitive processing. We assessed the systematicity of aesthetic taste by exploring its *typicality* across seeing and hearing. A large group of people rated the aesthetic appeal of wide ranges of both visual images of scenes and objects (e.g. beaches, buildings, and books), and common environmental sounds (e.g. doorbells, dripping, and dialtones). Each person's 'taste typicality' for each modality was quantified by correlating their individual ratings with the mean ratings for each stimulus -- thus capturing how similar each individual's aesthetic preferences are to those of the group as a whole. We discovered that these typicality scores were reliably correlated across seeing and hearing in multiple independent samples -- even when controlling for rating reliability and differences in how peoples used the scales. In other words, if you're the sort of person who has (a)typical aesthetic preferences for visual images, you're more likely to also be the sort of person who has (a)typical aesthetic preferences for sounds. This suggests that one's 'taste typicality' is not entirely arbitrary, but rather reflects deeper factors that operate across multiple sensory modalities.
Colombatto, C., & Scholl, B. J. (2019). Unconscious pupillometry: Faces with dilated pupils gain preferential access to visual awareness. Poster presented at the annual meeting of the Vision Sciences Society, 5/21/19, St. Pete Beach, FL.  
Of all the information we extract when viewing others' faces, one of the most telling is their attentional state: at a glance, we can readily determine both where others are attending, and whether they are attentive (vs. distracted) in the first place. Some of the specific cues to others' attention are relatively obvious (such as a turned head), but others are more visually subtle. For example, attentional engagement (e.g. in the form of heightened vigilance, increased cognitive load, or emotional arousal) can cause one's pupils to dilate. Of course, the difference between seeing someone with dilated vs. constricted pupils is visually subtle, since such stimuli differ by just a fraction of a degree of visual angle. But given that dilated pupils are meaningful signals of others' attentional states, we wondered whether such cues might be prioritized in visual processing -- even outside of conscious awareness. To find out, we used continuous flash suppression (CFS) to render invisible faces with either dilated or constricted pupils, and then we measured the time that such stimuli took to break through interocular suppression. Faces with dilated pupils broke into conscious awareness faster than did faces with constricted pupils that were otherwise equated -- and a monocular control experiment ruled out response-based interpretations that did not involve visual awareness. Another experiment demonstrated that such stimulus differences only drive visual awareness when represented as pupils, per se: when the identical stimuli were presented instead as (smaller vs. larger) buttons on the actors' shirts, there was no difference in breakthrough times (with a significant interaction). These results collectively demonstrate that pupil dilation facilitates the entry of faces into visual awareness -- a case study of how visually subtle stimuli can be prioritized in visual processing when they signal socially powerful properties such as the attention of other agents.
Lin, Q., Yousif, S., Scholl, B. J., & Chun, M. (2019). Image memorability is driven by visual and conceptual distinctiveness. Talk given at the annual meeting of the Vision Sciences Society, 5/22/19, St. Pete Beach, FL.  
What drives image memorability? Previous work has focused almost exclusively on either specific dimensions of the images themselves or on their local categories. This work has typically involved semantic properties (e.g. whether an image contains a human, or whether it is an instance of a forest), but we have also demonstrated *visual* memorability: even when semantic content is eliminated (e.g. via phase-scrambling), some images are still consistently more likely to be retained in short-term memory. Beyond individual feature dimensions and image categories, here we ask whether the memorability of an image is also influenced by its *distinctiveness* in a much broader multidimensional feature space. We first measured distinctiveness behaviorally, by calculating the conceptual and perceptual distinctiveness of images based on pairwise similarity judgments of each type. We then also measured distinctiveness computationally, by calculating the average distance of each target image to all other images in a ~10,000-image database, at different layers of a CNN trained to recognize scenes and objects (VGG16-Hybrid1365). For intact vs. scrambled images, we observed opposite patterns of correlations between distinctiveness and short-term memorability. For intact images, short-term memorability was primarily a function of how conceptually distinct an image was. And strikingly, this was mirrored in the CNN analysis: distinctiveness at later (but not earlier) layers predicted memorability. For scrambled images, in contrast, the reverse was true. Collectively, these results suggest that memorability is a function of distinctiveness in a multidimensional image space -- with some images being memorable because of their conceptual features, and others because of their perceptual features. Moreover, because distinctiveness in the CNN was computed over all images in the much larger database, the relevant measure of distinctiveness may reflect not just the local statistics of an experimental image set, but also the broader statistics of our natural environment as a whole.
Ongchoco, J., & Scholl, B. J. (2019). How to create objects with your mind: From object-based attention to attention-based objects. Poster presented at the annual meeting of the Vision Sciences Society, 5/18/19, St. Pete Beach, FL.  
A curious thing can happen when you stare at a regular gridlike pattern -- e.g. a piece of graph paper, or the tiles on a bathroom floor. Although such patterns contain no structure, you may often begin to *see* structure anyway (e.g. a block '+' shape). This phenomenon appears to be based on attention to relevant squares of the grid, and previous (older, underappreciated) research demonstrated that these squares do indeed accrue attentional benefits, such as faster probe detection. We will call this phenomenon *scaffolded attention*, because of how the grid provides a scaffold for selection. (Note that you cannot see these same shapes when staring at a blank page.) Here we asked whether this form of attention actually creates bona fide object representations that go on to enjoy object-specific effects. In essence, whereas previous work explored many cues to 'object-based attention' (e.g. involving continuity and closure), these current studies ask whether attention can be object-based even with no cues to objecthood at all. In several experiments (each with a direct replication), observers viewed 3x3 grids, and attended to particular squares until they could effectively see shapes such as two vertical (or horizontal) lines, or a block-letter H (or I). As they engaged in this form of scaffolded attention, two probes appeared, and observers simply reported whether they were the same or different. Remarkably, this produced a traditional 'same-object advantage': performance was enhanced for probes presented on the same (purely imagined) object, compared to equidistant probes presented on different objects -- while equating for spatial factors. Thus, attention to the relevant squares effectively groups them, forming object representations out of thin (scaffolded) air. In other words, this demonstrates an unexpected inversion of the typical relationship between objects and attention: there is not only object-based attention, but also attention-based objects.
Uddenberg, S., Colombatto, C., & Scholl, B. J. (2019). The speed of demography in face perception. Poster presented at the annual meeting of the Vision Sciences Society, 5/21/19, St. Pete Beach, FL.  
When we look at a face, we cannot help but 'read' it: beyond simply processing its identity, we also form robust impressions of both transient emotional states (e.g. surprise) and stable personality traits (e.g. trustworthiness). But perhaps the most fundamental and salient traits we extract from faces reflect their social demographics -- e.g. race, age, and gender. Our interpretations of these properties have deep consequences for how we interact with other people. But just how are such features extracted by perceptual (and cognitive) processing? Curiously, despite a vast amount of work on higher-level social properties (such as competence and dominance), there has been very little work looking at the visual perception of basic demographic properties. Across several experiments, we tested how quickly demographic properties are extracted when viewing faces. Observers viewed unfamiliar full-color photographs of faces for variable durations, after which they were masked. We then correlated percepts of race, age, or gender from those faces with the same percepts that occurred during independent unspeeded (and unmasked) judgments. The results clearly demonstrated that demographic features are extracted highly efficiently: observers showed near-perfect agreement with their own unspeeded judgments (and with the ground truth) with only 50 ms of exposure -- and even (in the cases of race and gender) by 34 ms. This was true even when the property to be reported wasn't revealed until the face had disappeared. We also replicated these results in an independent group of observers who viewed faces that were tightly cropped and matched for mean luminance, thus controlling for several lower-level visual properties. Critically, we also observed much slower and less accurate performance for inverted faces, signaling a role for holistic processing. Collectively, these results demonstrate that the visual system is especially fast and efficient at extracting demographic features from faces at a glance.
Yousif, S., Chen, Y. -C., & Scholl, B. J. (2019). Systematic biases in the representation of visual space. Poster presented at the annual meeting of the Vision Sciences Society, 5/20/19, St. Pete Beach, FL.  
The ability to accurately perceive, represent, and remember spatial information is one of the most foundational abilities of all mobile organisms. Yet in the present work we find that even the simplest possible spatial tasks reveal surprising systematic deviations from the ground truth -- such as biases wherein objects are perceived and remembered as being nearer to the centers of their surrounding quadrants. We employed both a relative-location placement task (in which observers see two differently sized shapes, one of which has a dot in it, and then must place a second dot in the other shape so that their relative locations are equated) and a matching task (in which observers see two dots, each inside a separate shape, and must simply report whether their relative locations are matched). Some of the resulting biases were shape-specific. For example, when dots appeared in a triangle during the placement task, the dots placed by observers were biased away from the axes that join the midpoints of each side to the triangle's center. But many of the systematic biases were not shape-specific, and seemed instead to reflect differences in the grain of resolution for different regions of space itself. For example, with both a circle and a shapeless configuration (with only a central landmark) in the matching task, the data revealed an unexpected dissociation in the acuity for angle vs. distance: in oblique sectors, observers were better at discriminating radial differences (i.e. when a dot moved inward or outward); but in cardinal sectors, observers were better at discriminating angular differences (i.e. when a dot moved around the circle). These data provide new insights about the format of visuospatial representations: the locations of objects may be represented in terms of polar rather than cartesian coordinates.