This first part reviews the background of the project and the fundamentals of emotions in science and technology. The first chapter, Emotion transfer protocol - a draft begins with an explanation of the context of this thesis and the background of the problem from a social, technological and scientific point of view. The second chapter, What are emotions? explains the theoretical background of emotions, trying to present multiple different approaches, as no single answer exists that can answer the question in the chapter title.
The argument that underlies this work is that the current state of communication over the internet does not offer neither a natural nor optimal way of conferring emotions and therefore inhibits the most important precursor of rich and functional interaction: empathy. At the same time, because of this several areas of the internet can and should be improved. The benefits of creating more possibilities for emotion transmission and ultimately greater empathy are manifold: more emotional bandwidth can improve communication on a personal and business level, natural emotional interaction will support healthy development of socio-emotional skills in children, who are exposed to increasing amounts of technologically mediated communication. The positive aspects are not limited to children, as emotional communication is in many forms more universal than language, and many interpersonal and international misunderstandings can be alleviated if the emotional pathway is cleared. Furthermore, new ways of expression and still unimaginable forms of art and applications can be built by harnessing the emotion transmission channel.
In this master’s thesis, I explore and create a proposal on the possibilities of using affective technologies for enriching online communication. This thesis can be used as a guidebook for practical approaches to emotions in digital projects, especially for interaction designers and media artists. It is also a preliminary proposal for a new internet protocol: the emotion transfer protocol (ETP) but the way the protocol is presented is not meant to be a formal definition, but a collection of possibilities for using emotions and especially empathy as material.
In this first part I introduce the problem and explain the context for this work. Next, I move on to different scientific theories of emotion, and explore their practical and theoretical value. In the second part I approach the drafting of a new protocol by giving an overview of current practices for emotion recognition, transmission and presentation, with the aim of giving practical guidelines and presenting tools that can be used for creating emotionally aware applications and experiments. In the last part I present my own work in this area and reflect on the learnings and outcomes of three distinct approaches.
The problem of the existence of empathy online is technically threefold: we need to read either deliberate or unconscious emotions of the sender (Input), present emotions to the recipient in a meaningful way (Output) and transfer emotions while trying to transform the emotionally relevant input into emotionally meaningful output without loosing information in the process (Transmission). Input, Output and Transmission are approached in their own separate chapters. Apart from technical challenges, a hot potato is the actual nature of emotions, which I approach from the point of view of psychological theory, especially appraisal theory and explanatory frameworks: such as dimensional emotions, basic emotions, and their derivatives. Taking current scientific understanding regarding emotions into consideration is essential in building good applications.
I discovered this topic for my thesis in the context of a collaboration between neuroscientists at Helsinki University, researchers at Teosto, forward thinkers and children’s media specialists from the Finnish National Broadcasting Company YLE, and myself. We are involved in a project Natural Emotionality in Digital Interaction (shortened NEMO), participating in the Helsinki University 375 year idea competition Helsinki Challenge. The purpose of our project is to find ways to widen the emotional bandwidth in digital media and digital communications. One part of the motivation is to ensure children’s development into emotionally mature adults in an environment where daily screen time keeps increasing. There is increasing concern that as children spend time in environments that lack possibilities for natural emotion expression and transmission, the development of socio-emotional skills will be impaired (Konrath, O’Brien, and Hsing 2010; Radesky, Schumacher, and Zuckerman 2015). However, no longitudinal studies to date have examined whether or not such a problem actually exists, and therefore answering this question is also one of the focus areas of the team. Another, more general reason, is improving communication quality and finding new areas for applications, art and games by creating a better emotional communication channel.
As a part of the contest, we were sent out to isolation at a resort hotel on the rural coast of Finland, to explore the possible dimensions of the impact of our idea. While taking a walk in the area, our team came up with reasons for the importance of empathy on the internet, ways to transmit emotions, ideas for experiments and some funny concepts, such as EmoTinder, an application that lets users find matches that naturally mirror their facial expressions. At this point, we stopped in a forest, and our team member Wesa Aapro exclaimed that we were not thinking big enough to make a profound impact. He suggested that we should apply one of his own thought devices, the “Big Story”: which he explained as basically taking a silly unrealistic idea and blowing it so much out of proportion that it actually becomes credible – too crazy to be doubted. After a brief discussion, ETP was born. Not being one to shy away from a crazy idea, I decided to write my master’s thesis about it.
My role in the project is to act in the cross section of interaction design, affective neuroscience and media technology. Having worked in the Cognitive Brain Research Unit for several years, but lately focusing on interactive media, both experimentally, academically and as an entrepreneur, I have the necessary background to work on the bridge between scientific methodology, and practical or artistic applications. In this thesis I decided to tackle the drafting of ETP from my own point of view, receiving feedback from the NEMO team leader and my thesis supervisor Katri Saarikivi and trying to incorporate the ideas we have discussed during the project. The background theory and proposed solutions are wholly my personal work – I hope this thesis can serve as a discussion opener for the NEMO project, and I expect the thesis itself to evolve as the project progresses. Because of this I am writing primarily for web, and an updated version of the work can always be found at http://vatte.github.io/etp/.
A problem exists in digital communications. While the possibilities for transmitting meaning, facts and informational content online are efficient and diverse, emotional content is often lacking in quality and richness. It tends to get overly simplified: a realistic assessment and understanding of the conversational partners moods, feelings and emotional reactions often does not occur. Due to this lack of emotional information, achieving natural empathy online seems to be difficult. As a somewhat naive but easy example, expressing condolences to your friend for the loss of a loved one as a post on Facebook is just not the same as expressing this in person – a lot of emotional information will be missing. This is not to say that empathy does not exist online. Real empathic experiences are typical for certain online communities, especially those related to support in traumatic life situations, such as illness (Preece and Ghozati 2001). In these situations people can find a strong support network and a feeling of togetherness in a community without any sort of face-to-face interaction. Meanwhile, other internet communication platforms are characterized by anti-social behavior especially evident in un-empathic and aggressive online conversational phenomena, such as flaming and trolling on YouTube (Moor, Heuvelman, and Verleur 2010). These extreme examples of lack of empathy online make laypeople claim that there is a lot of emotion online and that we could do with less of it. My point of view is that the problem is not that people do not attempt to express their emotions online but that the quality of the means we have for expression is poor.
Textual communication lacks many of the cues and subtleties that make face-to-face communication so effective. To combat this, users have adopted new strategies to substitute physical cues in online settings (Reid 1993). Some of these are very creative, coming in the form of acronyms, new idioms, emoticons and emojis, and they are further discussed in the Representations section of the Output chapter of this thesis. Derks, Fischer, and Bos (2008) argue that even though emotions are very common in computer-mediated communication (CMC), and some forms of CMC even seem to reinforce them, there are most likely differences in the strength of the experienced emotions. Another interesting problem noted by Derks, Fischer, and Bos (2008) is that emotional communication online is usually extremely controllable, and as such may not represent the true emotion felt by the sender of the message. It is imaginable that people hit the like button on Facebook without actually experiencing a positive emotion at that moment: for example when liking a birthday wish while experiencing anxiety about getting older. If the emotional impulse is not preserved in the same way in the message, as felt by the sender, it can hinder real empathy between the sender and the receiver of the message.
A trend in CMC is an overestimation of our ability to interpret the intention of the sender correctly, combined with an often overly negative interpretation of the intended emotion by the recipient, especially when the explicit emotional information is low (Kruger et al. 2005; Y. Kato, Kato, and Akahori 2007). Kruger et al. (2005) suggest that this discrepancy stems from an egocentric point-of-view: the sender can imagine their message to have, for example, a certain tone of voice, but they are unable to accurately assess whether the intended message will be apparent to the receiver. Essentially we are not able to place ourselves in the position of our conversational partners, and evaluate their ability to understand our intended meaning as successfully as in real life.
The problem is not only in efficient communication; parents are worried about their children’s development into emotionally mature adults, and many of them are limiting their child’s time spent on personal computers and mobile devices. Interestingly, this seems to be a trend among Silicon Valley executives, for example the late Apple-CEO Steve Jobs is described as a low-tech parent (Bilton 2014), limiting the use of mobile devices in the household. A global trend, however, is that face-to-face communication is in a decline, and people prefer online communications increasingly to personal encounters (D. Russell 2015). The total amount of time spent using a screen-based electronic communications device, or screen time, has been linked with physical as well as psychosocial development and health issues, but also with positive aspects such as higher intelligence and better computer literacy (Subrahmanyam et al. 2000; Richards et al. 2010).
Some might argue that the ways that people interact have always evolved and that CMC is just another form of interaction that has its own characteristics. Perhaps children, who grow up to understand the peculiarities and special features of CMC will become adept at expressing themselves in the ways that are possible in these environments. As the evolution of human cognition follow the evolution of technology, maybe the decline of traditional empathetic skills (Konrath, O’Brien, and Hsing 2010) is nothing to worry about. While there may be some truth to this assessment, it would be hard to argue that the interactive capabilities of computers have already fully harnessed the power of the human body and mind. In any case making CMC more natural and able to take advantage of the built-in empathic circuitry of the human brain while developing these technologies can only have a positive effect on the quality of online communication.
Meanwhile, we are living in an age where the nature and structure of work is rapidly changing to fit a post-digital society. The effect of automation on a large quantity of low-skill jobs is expected to be considerable in industrialized countries, due to large employment sectors that are possible to be replaced by or made much more effective with digitalization. It is expected that not only manual labor and simple problem-solving, but also increasingly cognitive tasks can be automated in the future, as computing power increases and sophisticated methods for processing large amounts of data are developed. When exploring the extent of problem-solving in which computers surpass human cognition, it seems that the most important areas where people are still “unbeatable” are the realms of social cognition and creative thinking. Relying on neural processes such as mirroring, people make use of their subjective experience of themselves in understanding the emotional states of others. It is somewhat inconceivable that this type of processing could be automated, and therefore it is expected that the work people will concentrate on will emphasize socio-emotional skills. (Florida 2013; Pajarinen et al. 2015)
Generations may now be growing up in digital environments that are suboptimal for development of skills that are essential for future work quality. As interpersonal communication and socio-emotional understanding become increasingly important skills – not only in social life, but also in employment – developing natural communication skills has a large effect on economy and well-being on a large scale.
Affective computing is a field of human-computer interaction concerned to the study and development of computers that can recognize, process and simulate emotions. Since being introduced at MIT Media Lab in the mid-1990’s by Rosalind Picard, the field has acted on a lack of consideration for human emotions in the design of interactive computer systems. In the book Affective Computing, Picard (2000) argues that computers, as a result of the extremely logical structure of the underlying technology, have been mostly designed to act completely rationally and logically, in a way that produces the same result each time a certain set of actions are completed. This would seem like a good idea at first, but a completely rational and logical system is actually a very unnatural way for humans to communicate. The argument is that by not taking into account the complexities of human communicational behavior, especially regarding emotions, we are actually creating illogical and inefficient computer systems. If the functioning of the human brain is thought to be a pinnacle of problem-solving and a model for intelligent computers, dismissing emotional processing is ill-founded. In cognitive neuroscience there is a growing body of research showing that emotions continuously interact with cognitive functions such as memory and attention, and that areas of the brain associated with emotions play an important role also in cognitive processes (Pessoa 2008; Lindquist et al. 2012).
The field of affective computing has been especially concerned with the artificial intelligence of computers, or their artificial emotionality. By sensing the emotions of the user, computer programs can adapt to the users actions. Effectively using emotion in the adaptation is not trivial, and there is a risk of annoying the user if the emotional behavior of the computer is overly simplified. A related field, or a trend within affective computing, is known as affective interaction, which is more concerned with how emotional meaning is created and how it evolves in the interaction design process. The difference is that affective computing of the MIT variety has taken a very cognitivist, perhaps even reductionist approach to quantifying emotion, whereas the affective interaction movement has a more traditional design approach, considering emotions from the point of view of experience and phenomenology. (Höök 2013)
The topic and content of this thesis draws a lot of inspiration from the affective computing field, but it does not attempt to tackle issues relating to the design of affectively intelligent systems. Instead, the focus is on how to transmit emotions between two human users of communication devices. Therefore, the artificial intelligence and correct emotional behavior of the computer is not taken into consideration. The focus is rather on the optimal extraction and transmission of emotional content, its codification and ways of representing emotional content to the receiver.
As complete as possible simulation of physical presence, or telepresence, is sort of the holy grail of communication technology. In some form, it was already imagined at least as early as 1877, when the New York Sun wrote an article about a device known as “The Electroscope” (New York Sun 1877). An early illustration from 1910 by a French artist Villemard is shown in Figure 1. The benefits of telepresence to other forms of remote communication are obvious; as perfect as the simulation can get, the more indistinguishable the communication can be from physical reality. Douglas Engelbart (1968), in his classic “Mother of All Demos”, showed an early working version of video communication and collaborative working, demonstrating how he and a colleague could simultaneously work on a document, while seeing a camera feed from each other on the same display, while hearing each others voices. The demonstration had some minor issues with the video and voice feed being unreliable at times, but to anyone using current video conferencing tools it might seem ironical that we have not been able to work out those issues completely even now.
Achieving complete telepresence is difficult, and the setup for reaching state-of-the-art is often not feasible or even desired (Figure 2). Instead, popular solutions are Skype, Google Hangouts, Facetime, or similar video conferencing platforms. Seeing your partner’s face while discussing is a significant advance towards natural communication from voice-only communication, but the solutions are far from perfect. The lack of shared context and physical space, and the latency in video transmission and network transfer technologies are bottlenecks that are hard to solve. The speed of light itself limits the round-trip latency from Helsinki, Finland to Wellington, New Zealand to 104 milliseconds measured on the earth’s surface. Latency in video and audio transmission is not only irritating to the counterparts of interaction but may also interfere with synchronization between the people involved. Synchronization on the level of body movement, speech rate as well as brain rhythms has been connected to better cooperation (Stephens, Silbert, and Hasson 2010; Jiang et al. 2012). There is also a distinct lack of stimulation of senses such as smell and touch. Even with video, the lack of presence from a static camera, and with audio, the poor quality of sound transmitted from a single microphone are problems. A possibility outlined in this thesis is to augment the emotional channel with other data sources than video and voice.
To approach the topic of transferring emotions and emotional communication, understanding of emotions on conceptual, scientific and practical levels needs to be formed. The fields of affective science and affective neuroscience have gathered a vast amount of knowledge and tools, and I will go over some popular theories and frameworks. Having a better understanding of these concepts will help in creating emotionally meaningful applications, and some of the theories are interesting in their own right. From an interaction designer’s perspective different theories of emotion and the emotional frameworks used to discuss and quantify emotions offer interesting starting points for concepts and designs. Taking a framework as a basis for an emotional concept or design can help in making it meaningful, and the other way around; design concepts can be used to test the usefulness of the different emotional frameworks.
The word emotion itself can be understood in a variety of ways, and it is used colloquially for a range of things varying from mild to intense, simple to complex, brief to extended, and private to public. In the science of emotion, or affective science, different words are used to distinguish different, related phenomena (J. J. Gross 2010):
Affect is a wide umbrella word, often used to encompass the different phenomena concerning valenced, i.e. positive and negative, internal states.
Attitudes are the most stable beliefs held by an individual about the valence of things.
Moods are passing and more long-term states, often not directly or simply related to a specific cause.
Emotions are the most short-lived reactions and responses to events and situations, reflecting the current goals, attitudes and mood of the individual, and they work to appraise the situation. Emotions can further be explained as the conscious feeling, the behavior the emotion causes, and its physiological manifestation.
Even these definitions are disputed (J. J. Gross 2010), but I will follow them as a guideline when using these terms in this thesis.
Theories of emotion are a topic of psychology and philosophy, and are related to the nature of emotions, their evolution and the meaning of emotions. A consensus of the topic does not exist between psychologists, indeed it has been described as the most open and confused chapter in the history of psychology, with over 90 definitions of emotion having been proposed during the 20th century (Plutchik 2001). The topic is to some extent overly theoretical from the point of view of interaction design, and I will try not to go into too much detail. In any case, it is important to have an overarching understanding of what emotions most likely actually are, or at least what kind of conceptual devices are meaningful from a scientific point of view to better understand how emotions can be used in design.
The most intuitive idea is usually that emotions are the same thing as feelings, and this is referred to as the common-sense theory of emotions (James 1884). We tend to expect emotions to have a mental basis. This line of thinking goes that an emotion occurs in our mind, possibly caused by external sensory input, and the occurrence of our emotion in turn makes us and our bodies to react in a certain way. An example of this line of thinking is meeting a friend makes us happy, which in turn leads us to make a smile with our mouth and we let out a joyous sound, in that order. Or we are sad because we lost a relative, so we start to cry. Scientific theories of emotion have been divided between the relative importance of the unconscious and the body, and the importance of conscious feelings in the whole emotional experience, and also whether the unconscious emotion, bodily reaction, or the conscious feeling happens first. The theories explained below form a timeline of emotion research starting from the 19th century and ending in present day. This history has seen a shift in the general scientific opinion, back and forth between the two main schools of thought: does the bodily reaction preceed the conscious feeling, or the other way around. As no consensus has been reached, both of these viewpoints can be used as guidance when looking at different possibilities on how to approach emotion transmission.
William James (1884) was among the first scientists to attempt to create a psychophysiological theory of emotions. He was interested in the fact that emotions seem to produce a physical, bodily sensation, and that this sensation seems to be strongly characteristic for different emotions. Rather than the intuitive conclusion that the bodily reaction rises from the mental feeling, James proposed that “the bodily changes follow directly the PERCEPTION of the exciting fact, and that our feeling of the same changes as they occur IS the emotion.” This type of body-first theories are now known as somatic theories of emotion. This radical way of thinking suggested that the body had a more important role than previously thought, giving a central role to the perceived bodily sensation, the embodiment of the emotion. James’ theory was also a shift toward a more evolutionary way of thinking about how psychophysiology and the brain functions, drawing inspiration from, among others, Darwin’s (1872) work on emotion expression. In his essay, James managed to make convincing introspective experiments to support his claim, ranging from the surprising sensation of shivers while experiencing art, to the immediateness of the bodily flight-or-fight response when perceiving surprising movement in a dark forest, and the immediate reaction to common phobia: for example a small boy fainting when seeing blood for the first time. James does acknowledge that some emotions seem to arise from thought processes, and he distinguishes “standard” emotions from moral, intellectual, and aesthetic feelings.
James’ theory was influential, but it left a gap in the whole picture, especially considering the variety and complexity of emotions, and the differences in emotional experiences between individuals and their origins. The Cannon-Bard theory of emotion was created as an alternative to address some of these problems, stating that an event simultaneously causes a physiological response and a conscious emotion (Cannon 1927). The theory poses that emotional reactions in the body, and emotional feelings are simultaneous and independent processes, and that they originate in different areas of the thalamic region of the brain. Cannon’s and Bard’s main issues with James’ theory are the need for bodily reactions for emotional changes, and doubt about the physiological specificity of emotions in comparison to other phenomena, such as physical exercise.
The Schachter-Singer theory, also known as the two-factor theory, suggests that an event does cause a physiological response (Schachter and Singer 1962). According to Schachter & Singer, the physiological responses in themselves do not distinguish between emotions, and a situational context is necessary for the conscious feeling. The identification of the cause of the physiological response is in the center of this theory, and that causes us to label the emotion, while the physiological response informs us of the intensity of the emotion. This theory was backed up with a clever experiment, in which test subjects were injected with epinephrine, causing physiological arousal. The subjects reported different emotional states depending on situational factors, and when no emotionally significant situational factors were present, they attributed their feelings to cognitions. The two-factor theory has been criticized, and methodology of the experiment has not survived scrutiny regarding whether the injection of epinephrine had any effect on the induced emotion in the first place, as in another experiment no difference was found in the reported reactions under the influence of epinephrine as opposed to a placebo condition (Marshall and Zimbardo 1979).
While failing to explain certain phenomena, the Schachter-Singer theory inspired an influential set of theories known as the cognitive appraisal theories of emotion. Appraisal theories take the view that the experience of emotion is based on the appraisal of an event. Labeling the event with a certain emotion triggers the emotion and the accompanying physiological response. In this theory, the cognitive appraisal always comes first, and only after this appraisal can a bodily response be felt. Without a cognitive appraisal, no emotion is felt: the emotion always arises from the appraisal. Appraisal theories, and the Schachter-Singer theory are cognitivist theories of emotion, placing a lot of importance on the conscious, cognitive processes, and less on the sub-conscious and somatic experience. The credit of appraisal theories is that they are able to explain different kinds of emotional phenomena, both emotions that arise from thoughts and events, as well as individual differences to the same stimulus. Appraisal theories have been most popular among psychologists, partly thanks to influential work by Richard Lazarus (1982).
Lately, a modern set of somatic theories have surfaced, and managed to stir up discussion. Damasio’s (1996) somatic marker hypothesis holds that bodily sensations are stored in the brain and evoked in the ventromedial prefrontal cortex, producing a similar sensation based on what bodily sensations have accompanied an event in the past. Emotional experiences can, according to this view, be divided into primary and secondary inducers: primary inducers produce the bodily sensation, and secondary inducers trigger a similar bodily sensation based on the memory imprint caused by primary inducers. Prinz (2004) defends the view that a bodily reaction occurs immediately following a perception, and that the bodily state itself is the emotion. According to Prinz the conscious process is actually what the emotion represents. For example a perception of a snake causes the bodily sensation of fear, which in turn is represented in the conscious mind as danger.
Dimberg, Thunberg, and Elmehed (2000) showed in an experiment that emotional contagion works on an unconscious level: by showing pictures of faces expressing an emotion, his test subjects would report to feel the same emotion, even when the images were shown for such a short time that the test subject could not describe what he saw. The result of Dimberg’s experiments can be seen as counter-evidence against cognitive appraisal theories, especially views that consider that appraisal needs to be a conscious process – there has to be a mechanism that produces this emotional contagion effect happening faster and irrespective of a conscious evaluation of the context. Another interesting experiment by Strack, Martin, and Stepper (1988) divided subject’s into three groups and showed them cartoons: one group held a pen in their mouth to keep the mouth contracted, one group held the pen between their teeth forcing a smile, and one group held the pen in their hands. The results were that the smile-group rated the cartoons to be funniest, the contracted-mouth group rated them to be least funny, and the ratings of the hand-group were in between.
The specificity of psychophysiological patterns between different emotions and the evolutionary basis of these patterns enjoys strong experimental support (Arnold 1945; Ekman, Levenson, and Friesen 1983; Levenson et al. 1992; Picard and Daily 2005). This does not really lend definite proof for or against the different theories of emotion, but it offers some evidence against the strictest cognitivist views, as different emotions do produce different physiological responses. This can not be used as an indicator for the order of the mechanism, whether or not the body reacts after a cognitive appraisal, or whether it reacts directly to the stimulus and is involved in the communication between the emotional parts of the brain and the cognitive parts, like somatic theories suggest. On the other hand, knowledge of this specificity is definitely useful in that it allows for the use of peripheral physiology, i.e. also other parts of the body than the brain, as inputs for emotionally meaningful information.
To practically understand and utilize emotions in research and applications, a framework for describing emotions is needed. Typically, emotions are either divided into distinct categories or mapped out dimensionally. These frameworks can be used as practical tools for reporting and representing emotions.
Categorical, discrete emotions is a conceptual tool for working with emotions, as well as a scientific theory of a set of basic, universal emotions. Beginning with Charles Darwin’s book The Expression of the Emotions in Man and Animals, this theory holds that a set of emotions are biologically determined, culturally universal and not necessarily unique to humans (Darwin 1872). Ekman et al. (1987) have provided experimental evidence that a set of basic emotions and their accompanying facial expressions are perceived the same universally across cultures. Ekman’s view on emotions is that a set of basic emotion families exists, each family containing a set of similar states. This viewpoint also holds that the borders of each emotion are very clear and not at all fuzzy, that the basic emotions exist separate from each other both in expression and physiology (Ekman 1992).
The exact set and amount of basic emotions is debated, and no clear solution exists to determine the best set. Ekman (1992) proposes six basic emotions: happiness, surprise, sadness, fear, disgust and anger based on the basis of distinct facial expressions, as seen in Figure 3. Ekman also proposes a set of criteria for what is required from a basic emotion, but acknowledges also that it is not simple to leave out certain emotions, and that all of the big six do not fill the criteria equally. In any case, these six prototype emotions are widely used, with sometimes contempt added as a seventh emotion. Other frameworks propose much wider sets of emotion categories, for example J. R. Fontaine et al. (2007) identified 24 emotion terms that are commonly found in emotion research and everyday language, which are mapped onto the 2-dimensional emotion space in Figure 4.
Another way to look at emotions and to take their complexity into consideration is dimensional mapping. Taking one or more dimensions, emotions can be mapped out even without labeling them explicitly. Finding the right dimensions is not easy. One possibility is to simply rate how much of certain basic or categorical emotions are present, which is in a way a mixture between categorical and dimensional models. This is not often practical or desired due to the unavoidably large amount of dimensions: already six if only the most basic emotions are used. Another option is to take a set of dimensions that best distinguish emotions, typically at least the valence dimension, i.e. happy–sad or pleasurable–unpleasurable.
One commonly used tool is the 2-dimensional emotion space (2DES), consisting of an axis of arousal and an axis of valence. Arousal denotes the energy, with passive emotions such as sadness having a low arousal, and energetic emotions such as surprise having a high arousal. Valence represents the positive and negative scale of the emotion, with emotions such as happiness on one end, and disgust on the other end of a good–bad, pleasurable–unpleasurable axis. The 2DES scale is supported by J. A. Russell (1980) and his circumplex model of emotions (Figure 5), in which emotions are mapped onto a circle with pleasure at 0 °, excitement at 45 °, arousal at 90 °, distress at 135 °, displeasure at 180 °, depression at 225 °, sleepiness at 270 °, and relaxation at 315 °.
J. R. Fontaine et al. (2007) show in their comprehensive study that two dimensions are not enough to capture the similarities and differences in the meaning of emotion words. They suggest instead that four dimensions are needed: evaluation–pleasantness, potency–control, activation–arousal and unpredictability. Practically, for interfaces and reporting, four dimensions are sometimes too much, and this is a reason for the popularity of the two-dimensional model. 2DES can be used to continuously report emotions, even on an evolving task, as demonstrated in the Representations section of the Input chapter.
Plutchik (2001) has created a three-dimensional circumplex model of emotions. In this approach basic emotions are placed similarly to colors on a color wheel, similar emotions next to each other, and opposites 180 degrees apart (Figure 6). The work is based on analyzing and grouping hundreds of emotion words and trying to organize them. From the circumplex model emotions can be picked like colors from a color-wheel, combining different basic emotions to yield composite emotions on a continuous scale, similarly to a color gradient. In this model, eight basic emotions are used. They are organized into opposite pairs: rage–terror, vigilance–amazement, ecstasy–grief and admiration–loathing. On the third, depth dimension intensity of the emotion is represented, resulting in a three dimensional cone mapping of emotions.
Frijda (2004) has developed a view that relates emotions directly with actions, or more accurately action tendencies. Emotions are motivations and action readiness, manifested in the strong physical reactions and quick decisions characterizing passionate emotions: such as anger or lust. Frijda takes the idea furher, and considers that all emotions are characterized by certain motivations to act; for example admiration, fascination and being moved are not usually accompanied by strong reactions, but rather strive for being near the object that causes these emotions. A suggested list of action tendencies and related emotions include:
Marvin Minsky (2007) believes that emotions are more complex mental processes than is commonly thought. He suggests that emotions might not be primal or basic at all from a structural or processing point of view, and the difficulty in describing emotions actually stems from their complexity, not the fact that they would be too basic and elemental to reduce into smaller factors. According to Minsky’s view, emotions are different “Ways to Think”, and the conscious emotion is just us paying attention to a thread or parallel process that is already running in our brain. He considers the mind to be a set of complex systems, and opposes views that consider the consciousness to be driven by a single identity, or self, and he seems to regard the phenomenological way to contemplate the mind from an individual point of view fruitless. Minsk, as an influential artificial intelligence researcher, is mostly interested in modeling and replicating mental processes, and his views draw inspiration of how modern computers work, with many subprocesses running simultaneously and switching back and forth from idle to active. In accordance with Minsky’s views, recent theories of brain functions rely on network and complexity theory. Also, current views on the entanglement of brain processes related to emotions and cognition support Minsky’s views on emotions (Pessoa 2008; Lindquist et al. 2012)
A possibility I want to bring forward in this thesis is something I call unlabeled emotions. By not labeling emotions in our models and while encoding and decoding data, we can preserve emotional information in emotion-related phenomena. By taking emotionally expressive forms of media and emotionally relevant information, we can create models that link information directly to expression, leaving the emotional impression and interpretation to ourselves. Many forms of media has emotion-eliciting content; people are often relating sounds, music, colors, images, video to emotions. This kind of information comes through different forms of communication.
Contextual and explicit information are present in the lyrics of a song, the object of the image and the scene of the video. For example a video from a war zone might elicit strong negative emotions of anguish, anxiety and fear due to the subject, while a picture of a wedding ceremony might induce feelings of love, wanting, even transcendence. This kind of information can be very subjective, for example the war video can also carry feelings of bravery and freedom for a guerrilla fighter, and the wedding picture might represent negative emotions of jealousy, anger and sadness for an alternative, unrequited suitor or a disappointed mother-in-law.
The emotional information might not be present in the subject at all, as even abstract art as seemingly meaningless patterns of color, sound and forms can elicit emotion. The importance in this case can fall to the timing and sequence of events, the symmetry and meaning contained in colors or pitches. This type of information can be culturally dependent and subjective. Different cultures assign very different meanings to, for example, colors. This culturally and personally dependent understanding of emotional meaning in expressions is evident in the way that we have it easier to communicate to people that are close to us culturally, or whose cultures we have been exposed to over an extended period of time, even when the importance of language-barriers is not relevant. Also, we are usually better at understanding the emotions of people that are close to ourselves personally, than we are at understanding strangers, which already suggests that emotional communication is personal, subjective, and an interactive effort – the emotional information is not easily quantified and standardized.
On the other hand, certain components of emotional information in media may well be very much built-in, stemming from evolutionarily early features, emotional grunts, movements and other expressions. Research is able to find universal or nearly universal components of language, prosody, movements and expression that are understood in the same way across cultures and continents. For example, Sievers et al. (2013) found that emotional interpretations of both music and movements share common features across cultures.
We have evolved to be extremely good at interpreting and reacting to emotions, as described in the next section, Empathy. Therefore it makes sense to leave the task of emotion interpretation to ourselves, instead of trying to create models that represent phenomena in the context of emotional descriptors, which are already difficult to conceptualize in themselves. Not needing to consciously describe and communicate emotions brings us closer to a natural situation, in which we are not forced to think about emotions abstractly, but only through experience, impression and expression.
For example music is considered to convey a lot of emotional information and, as discussed earlier, there is substantial evidence that emotions and physiology are deeply linked. Therefore it should be possible to create a model that carries emotional information from physiology encode, transmit and decode it through music to be understandable. Later in this thesis I will present a project called Musical emotion transmission, which is an experiment trying to do exactly this.
Empathy is an ability to understand another individual’s internal state by identifying the similarity of your own mind and feelings with the other, and through that similarity sharing the experience of the other while still clearly understanding the distinction between self and other. The idiom to place yourself in another’s shoes is an everyday description of empathy. Empathy is a natural, innate ability, not a cognitive skill that needs to be explicitly learned, but the empathic abilities do develop through interaction with others. Activation of empathic ability does develop through interaction with others. Also, the situation that gives rise to the empathic experience does not require active mental decomposition. Definitions of empathy vary, and sometimes it is explicitly defined as a group of processes related to affective reactions to affective states of other people. Others have chosen to separate empathy into two main concepts: affective empathy for reading other peoples emotions by experiencing them in yourself, and cognitive empathy for understanding the thoughts and feelings of other people as a mental process. (Decety and Jackson 2004; Walter 2012) The problems with this affective/cognitive division are substantial, but lengthier discussion on the topic is beyond the scope of this thesis. For the sake of practicality, I will contend to use the cognitive/emotional dichotomy.
The distinction between different empathy-related phenomena is not always clear, and the definitions are bound to change over time, but Walter (2012) proposes working definitions in which cognitive theory of mind, cognitive empathy – or affective theory of mind, and affective empathy are distinguished as different concepts with different neural underpinnings. Theory of mind is the ability to understand mental states of others, and to understand the concept that other people have their own consciousness, thoughts and feelings. Cognitive theory of mind refers to the ability to mentalize and understand the cognitive states of others. Cognitive empathy is the ability to mentalize and understand the affective states of others, without necessarily experiencing similar states yourself. Affective empathy is the ability to experience the emotions of others. Affective empathy is in part enabled by emotional contagion. Emotional contagion can be described as a “lower form” or one of the underlying mechanisms of empathy. It does not necessarily require you to consciously distinguish between your own feelings and those of others, but it is rather simply adopting the feelings of people around you. This kind of contagion occurs also in pack behavior like laughter and the spread of arousal and alertness in panic situations. Affective and cognitive empathy usually occur together, but their neural pathways are distinct and they can also occur separately – especially cognitive empathy is not necessarily accompanied by affective empathy in all situations. (Walter 2012)
Mirror neurons have spurred a lot of discussion after they were discovered in the premotor cortex of the monkey (Di Pellegrino et al. 1992). These neurons activate both when a monkey performs an action, and when a monkey observes another individual perform the same action. Experiments suggest that a similar neuronal matching system also exists in humans (Fadiga et al. 1995). This phenomenon has been attributed with natural mind-reading abilities, theory of mind and the ability to simulate actions without performing them. Mirror neurons are often related with a philosophical theory known as the simulation theory, which suggests that we understand mental states and their expressions by internally simulating the same state in ourselves. The opposing theory is the so-caled theory-theory, which suggests that we have an internal model of mental states and understand other individual’s mental states on the basis of a theory of mind. (Gallese and Goldman 1998; Walter 2012)
Frans De Waal (2010) explains the evolution and development of empathy in his book The Age of Empathy: Nature’s Lessons for a Kinder Society. Aimed as a direct response against competition and selfishness–oriented views on biology and evolution, such as the one presented in Richard Dawkins’ The Selfish Gene, de Waal presents a comprehensive picture of cooperation and empathy in nature, and explanations why cooperation skills are essential to many species of animals. A strong myth has been that war and competition are essential and central to human life, but de Waal brings in evidence from psychology and neuroscience for the automatic and central nature of empathy and helpful tendencies. De Waal argues that selfishness and aggression are only capabilities of the human mind and certain genes in the genome, and equally, if not more important are genes for compassion, selflessness and empathy. By emphasizing selfishness in our world view we are actually fostering the development of those behaviors and capabilities, and instead if we focused more on the positive behaviors also present in biology and our nature, we would start enacting them more.
De Waal describes human empathy in three different layers. The first layer is emotional contagion, which was already discussed earlier. The second layer is feeling for others, which happens through bodily mirroring and is likewise common in other animals: chimpanzees who see other chimpanzees reach for a banana will stretch out their arms to express their support and understanding of the other’s predicament. The final level is targeted helping. It is the ability to get into another’s mind and be able to help in the right way for a given situation, like offering support for someone who is hobbling – it happens almost instinctively. This type of perspective taking and helping happens in animals, and for example chimpanzees will spend a lot of time consoling a member of their community who has lost a child. (De Waal 2010)
De Waal, as a primatologist, has studied chimpanzees extensively. Chimpanzees have a strong sense of ownership, and they do compete for resources. What de Waal has found out is that chimpanzees are also prone to sharing: when a chimpanzee acquires food, they start to share with their peers, and before not too long the whole chimpanzee colony has received their part. According to de Waal, this behavior is not limited to primates, but also evident in at least all the other mammals. It is not to say that empathy is uniform in animals, or that it would be an either – or capability that some animals possess, and others do not. Instead, empathy and emotions are a distinct set of evolutionary features that manifest in a variety of ways, and the further along the phylogenetic tree we look from humans, the more alien the forms of evolutionary empathy may seem to us. (De Waal 2010)
Music and emotions are deeply linked, so far that music is often described as the language of emotions, and such descriptions are seldom met with objections or denials. Emotional experiences involving music are common for listeners and performers, and it is extremely typical for people to listen to music, to dance and to play music in order to influence their emotions – to feel happy, to experience sadness, and to concentrate or to get distracted.
Music expresses emotion, either intentionally or unintentionally, as listeners seem to perceive emotions in music without fail. The perception of the expressed emotion is somewhat consistent, that is to say when asking many listeners to report what emotion they perceive in a piece of music, they will respond similarly. The accuracy of the responses is not great, that is to say listeners report that they perceive similar emotions, but there exists variation in the nuances. This ability does not require musical training, the perception is similar for trained musicians as it is for musical novices. Interestingly, Sievers et al. (2013) conducted an experiment that suggests that emotional features in both music and movement are perceived similarly across cultures, in this case comparing subjects from the USA and an isolated tribal village in Cambodia. The range of emotions that can be expressed by music is also vast: at least happiness, sadness, anger, fear and tenderness can be reliably identified. (Juslin and Laukka 2004)
The ability of music to induce emotion in the listener, to make the listener feel a certain way is not considered to be as clear as music’s ability to express emotions. Music provides emotional experiences and it is used for mood-regulation through adolescence and adulthood (Saarikallio 2011). However, it is not clear which emotions music actually evokes, and how those emotions relate to emotions arising from other situations. Is for example musical fear the same emotional state that is felt from real fear because of a perceived threat? Positive feelings such as enjoyment, happiness, fascination, relaxation and curiosity are most commonly linked with voluntary music listening. Sadness is also a commonly reported feature in music, but at least some listeners report feeling good when listening to sad music – that musical sadness can be a positive emotion (Huron 2011). The cause of musical emotion induction is also not clear. Suggested components are musical structures’ acoustic similarity to emotional speech prosody, the build-up and breaking of musical expectations, arousal potential, emotion contagion, associations, mental imagery, and the social context. (Juslin and Laukka 2004)
The previous sections demonstrate how multifaceted the theoretical views on emotions can be – and how tricky the experience of emotions is to investigate experimentally. In terms of this thesis, understanding the core findings from literature promotes a deeper view into the topic of expressing, receiving and transferring information on emotions. The frameworks and theories of emotion should not be seen as excluding each other. Rather inspiration can be drawn from several theories at the same time, and it is up to the evaluation of applications to determine which tools and theories are most useful. From a practical perspective, making use of the findings on physiological correlates of the emotional experience is very important. Also, the expression of emotions through the body, face and forms of media, such as music can be utilized.
The second part of my thesis is about the emotion transfer protocol itself, why it is needed, what it is based on, and how we can begin constructing it. The general concept is divided into three components: Input, Output and Transmission, organizing the available knowledge and providing suggestions for transmitting emotions. In the last chapter of this part Conclusion and proposal, I synthesize the information presented earlier in this thesis and reflect on the possibilities for constructing an actual protocol.
Fields of affective neuroscience and affective computing have explored reading emotions extensively, and with the use of physiological sensors, cameras, observation and self-reporting we have a wide starting point. In this chapter I will go over different possibilities for gathering emotional information from a user, but extracting emotional content from the gathered data, or encoding, will be mostly dealt with in the Transmission chapter.
The simplest way of emotional communication is simply stating an emotion. This approach can be used in verbal communication, e.g. “I am very angry with you!” or “I feel so happy”, and it is common when a strong exclamation of emotion is intended. This type of communication is very simple from a transmission standpoint, as the input is the same as the output.
Similar to verbal emotional reporting, researchers use different types of questionnaires for gathering information about the emotions subjects are feeling. These questionnaires usually follow the previously discussed Emotional frameworks, namely Categorical emotions and Dimensional emotions. Dimensional emotions have also been used for real-time input to rate the emotional content in continuous media, such as video and music. The 2-dimensional emotion space is an especially useful tool. Because of the two-dimensional layout, pen and paper, a mouse cursor or a tablet computer, as in Figure 7, can be easily used to input emotion on the 2DES.
In this section I will give an overview of aspects of the human emotional expression that are immediately apparent and partly controllable by the person herself. These include facial expressions, body posture and movements, as well as vocal cues that contain emotional information.
The face conveys a large amount of emotional information, and it is one of the primary sources of this information in offline social situations. Video or still image of the face is powerful in itself, and it is a reason for the relative popularity of video conferencing platforms. Facial expressions are one of the starting points for the theory proposing that a set of “basic” emotions are built-in to humans and animals, as explained in Categorical emotions.
To start working with facial data in interactive applications, we need a system for tracking facial features, such as the position and shape of the mouth, the lips, eyes and eyebrows. This can be achieved using a regular web cam and a set of image processing algorithms. A popular algorithm for facial feature tracking is known as deformable model fitting by regularized landmark mean-shift (J. M. Saragih, Lucey, and Cohn 2011). The model requires a set of training images that have been manually annotated to show the different points of each face. One such set is the MUCT face database (Milborrow, Morkel, and Nicolls 2010).
FaceTracker is a mature open source project implementing this algorithm with the help of the image-processing library OpenCV. It includes wrappers for OpenFrameworks, Python and Cinder, as well as a standalone application, FaceOSC, for transmitting the data with Open Sound Control -messages for easy access in many programming environments. An example of the data FaceTracker can extract is seen in Figure 8. Another project with partly the same authors, focusing on the use of the algorithm for avatar animation is CSIRO Face Analysis SDK (Cox et al. 2013). Clmtrackr is an implementation of the algorithm in JavaScript, utilizing WebRTC for capturing the user’s webcam and WebGL for image processing, capable of running in an internet browser.
Emotions are expressed in the body by postures and gestures. An energetic gesture, such as the fast movement of the hands accompanies high energy emotions, like joy and terror. A closed countenance with the shoulders together and forward is a sign of sadness or discomfort, while an open countenance shows happiness and comfortability. Motion tracking in physical space can be approached with different methods, such as magnetic, mechanical, acoustic, inertial and optical tracking, and their combinations (Bowman et al. 2004). I will give overviews of optical tracking with Microsoft Kinect and inertial tracking with motion sensors, as these methods are readily available, economical and represent two different use cases: tracking in a room and mobile tracking.
Accurate tracking of the body used to be difficult, expensive and required special markers or a special motion-tracking suit. The release of the Microsoft Kinect device in 2010, and the subsequent release of open source and official SDK’s for it, made motion tracking much more affordable and accessible. The Kinect and other similar devices use a depth camera and machine learning to analyze the position and orientation of the limbs of one or several people at the same time. There are a few limitations with this kind of technology: it does not work well in sunlight because infrared light is used to create the depth map, and it requires the physical placement of the device in a suitable location in relation to the user, so that the camera can see the whole person without occlusion. This makes it unfit especially for mobile applications.
Attaching motion sensors to the body is another way of tracking movement. The three most common motion sensors are accelerometers, gyroscopes and magnetometers. An accelerometer can detect the direction and strength of an accelerating force applied to it. A gyroscope detects the rotations that it is exposed to. A magnetometer detects magnetic fields, and in usual movement tracking is used to find the absolute orientation related to the earth’s magnetic field. A combination of an accelerometer, gyroscope and sometimes a magnetometer is known as an inertial measurement unit (IMU). An IMU provides data regarding its movement and orientation. Position data can be derived with varying accuracy depending on the complexity of the analysis algorithm and the quality of the sensors. Motion sensors are included and accessible with official or unofficial SDK’s in several consumer devices, such as activity monitors (Fitbit etc.), game controllers (Nintendo Wii, Blobo), as well as most smart phones. Also, inexpensive integrated electronic components that can be used together with microcontrollers, such as the Arduino, are available with the trade name IMU or simply as accelerometers, gyroscopes and magnetometers, or digital compasses.
We as listeners are good at inferring the emotion in a voice. Anecdotally, it is easy to recognize distress or excitement in another person’s voice, and to assess the sincerity of the emotion portrayed. Empirical studies have shown that the prediction accuracy is around \(50 \%\) and significantly above chance level, when picking from a set of five basic emotions: fear, joy, sadness, anger and disgust. The studies seem to also suggest that there is a difference in how well different types of emotions are distinguished. The recognition is best for sadness and anger, followed by fear and joy, and disgust is recognized rather poorly. Unsurprisingly, when using a larger set of possible emotions for the classification, similar emotions are more often confused together, for example pride is confused most easily with elation, happiness and interest. (Pittam and Scherer 1993; Banse and Scherer 1996)
Early work in attempting to classify the acoustic qualities of emotions looked into different vocal profiles and patterns in the fundamental frequency (F0) during articulation (Banse and Scherer 1996). Newer work explores a large amount of acoustic features modeled with machine learning methods, such as Gaussian Mixture Models, Hidden Markov Models and Support Vector Machines. Even the newer approach is not yet very good at evaluating natural data, as they perform only marginally above chance level with the most naturalistic data sets. Open source tools that can be used for this type analysis include OpenEAR, developed in the Technical University of Munich, and its newer incarnation OpenSMILE. OpenSMILE supports incremental real-time processing, and is therefore suitable for interactive applications. (Schuller et al. 2009; Eyben et al. 2013)
In this section I will consider the autonomous physiological features that can be used for input in an affective computing setting. These features are not immediately under conscious control, and for the most part not apparent on the outside. Physiological signals originate in either the central nervous system, or the peripheral nervous system. The central nervous system consists of the activity of the brain and the spinal cord. Modern neuroscience considers that the central nervous system is the control center of the organism, and as such contains a vast amount of processes that represent and control basically all the activity of the body and mind. The peripheral nervous system consists of the autonomic nervous system, which is in charge of the autonomous control of bodily organs, and the somatic nervous system which is in control of the sensory and motor nerves. The autonomic nervous system is further divided into the sympathetic and parasympathetic nervous systems. The sympathetic nervous system is responsible of stimulating the fight-or-flight response of the body, and the arousal features of emotions can be tracked from sympathetic activity. The parasympathetic nervous system is complementary to the sympathetic nervous system, and is responsible for the activity of the body in states of rest and contention, and more complex emotional dimensions can be traced to parasympathetic activity, as it stimulates processes such as sexual arousal, tear gland activation and salivation. (Tortora and Derrickson 2006)
Neuroscience uses different tools for different purposes, ranging from functional magnetic resonance imaging (fMRI) to electroencephalography (EEG). FMRI is what is typically used in research presenting “brain scanning” images. It is a state of the art technique that is used to take 3-dimensional images of changes in blood flow inside the brain. It can be used to research the function of different brain areas, and connections between them. FMRI also requires a very strong magnetic field, and because of this it is not portable or safe to use outside of the laboratory. EEG is a traditional method for measuring brain activity from the electrical potential across the scalp. It has a long history, with animal electricity being discovered in 1791 by Galvani, the electrical activity of the brain surface by Caton in 1875, and finally the first scalp recordings of human EEG were published in 1929 by Hans Berger. (Swartz and Goldensohn 1998)
The electric signal measured by EEG stems from the activity of brain cells, or neurons. Neurons communicate via electrical impulses, known as action potentials that travel across pathways, or synapses, between the neurons. A large population of neurons produce an effect that can be measured from the scalp with sensitive electrodes and amplifiers. EEG measurement produces a continuous signal from each electrode used, representing the potential difference, or the voltage, between the electrode and a reference point. Typical reference points are areas that are considered to have a neutral electric activity, such as the ear lobes, the nose, the mastoid bones or an average of many electrodes. EEG measurements can be done with one or several, sometimes up to 256 electrodes (Oostenveld and Praamstra 2001). An international standard for electrode locations, known as the 10-20 system (Figure 9), has been defined for consistency, and should be used as a reference when applying electrodes for a known application (Jasper 1958; Homan, Herman, and Purdy 1987).
Due to the distorting effect of the skull, and the small amount of electrodes used, what is actually being captured by the EEG signal is caused by different parts of the brain, producing a lot of background activity, apart from a specific phenomena that may be of interest to a researcher. This makes the EEG data inherently noisy, and a single effect, or the location within the brain that produces a signal is not easy to extract from the data. Source localization of EEG signals is a field of study in and of itself (Koles 1998). On the other hand, the EEG measurement has a very good temporal accuracy, and phenomena can be detected and analyzed on a millisecond-scale.
The EEG signal can be processed in two distinct ways. Single, accurate responses known as event-related potentials (ERP’s) are features of the signal that can be reliably produced with certain stimuli and patterns. Oscillations are observations in the frequency domain of the signal, and are typically studied as the relative power of different frequency bands.
Event-related potentials are used to study phenomena such as attention, memory and the processing order of sensory stimuli. Thanks to the high temporal accuracy of EEG, single atomic stimuli can be accurately tracked to certain features in the signal – these features are known as ERP’s. By varying parameters in the stimuli, different features have been identified that correspond to different sensory and mental phenomena. Due to a high signal-to-noise ratio, most ERP’s are studied by averaging multiple trials, thus reducing the amount of noise due. Single-trial ERP measurements are also possible, and they have been utilized in for example brain-computer interfaces.
Frequency analysis of EEG is based on calculating a Fourier transform of the EEG signal and analyzing the relative power of different frequency bands. The frequencies are typically divided into Delta, Theta, Alpha, Beta and Gamma bands, of which the Alpha band is the most commonly used as it is a reliable indicator of relaxation. The frequency bands can also be used to study the localization of effects by comparing the relative powers across different electrodes. A common technique used in emotion studies is measuring frontal and parietal hemispheric differences, especially as measured in the alpha band and from the electrodes P3 and P4 of the international 10-20 system (Crawford, Clarke, and Kitner-Triolo 1996). Some of this research suggests that relatively lower right hemispheric alpha activity is indicative of negative emotions, and lower left hemispheric alpha activity is indicative of positively valenced emotions, both in experiments where emotions are triggered by self-suggestion and by listening to music (Crawford, Clarke, and Kitner-Triolo 1996; L. A. Schmidt and Trainor 2001). Coan and Allen (2004) reviewed over 30 studies of EEG asymmetry, and identified compelling evidence for the role of EEG asymmetry as an emotion moderator, but noted that their review does provide entirely conclusive results about the reliability of using EEG asymmetry for identifying emotional states.
Laboratory equipment is not necessary for using EEG in interactive projects. A number of open source and commercial projects attempt to bring EEG to the hands of hobbyists and consumers. OpenBCI consists of an open hardware licensed 8 EEG-channel wireless measurement device, sold assembled on their site, and accompanying firmware and an analysis software known as OpenViBE, created for realtime processing of the EEG signal in a visual data flow environment. OpenEEG is a long-running project for gathering information about building EEG devices and analyzing the data; they offer reference designs and a long list of software for EEG analysis, some of it a bit outdated. Companies selling commercial EEG devices aimed at consumers and developers are Emotive and Neurosky. The relative simplicity and affordability of EEG measurement devices makes it a popular tool for interactive projects, and it has been used extensively for interactive art, musical instruments and brain-computer interfaces.
Skin conductance (SC) is a method for measuring the activity of topical sweat glands. When you get nervous, you may actually feel your palms getting sweaty. This is caused by endocrine sweat gland activation. These types of sweat glands are densely located in areas of the face, palms, wrists and foot soles. The sensation of sweaty palms is an extreme case; even very small activations can be sensed by measuring the difference in conductance between two points on the surface of, for example, the palm – as skin gets sweatier, it conducts electricity better, and as skin gets dryer, the resistance increases.
Skin conductance is used to evaluate sympathetic nervous system activation. This means that it contains information about the arousal level of the individual and it is linked to the so-called fight-or-flight response, but it seems not to contain much information about emotional valence. Skin conductance, or skin resistance, is used frequently in polygraphs, and the signal is highly diagnostic of truthfulness or deception. A recent study, in which skin conductance was measured for long periods outside of the laboratory, shows that SC activity can differ a lot between the left and the right side of the body when measured simultaneously (Picard, Fedor, and Ayzenberg 2015). The authors suggest a theory of multiple arousals, in which different parts of the brain affect the SC in different parts of the body. This new theory could be used to more accurately map the relationship between emotional experience and arousal across the body.
Two separate features in the skin conductance signal are usually analyzed. One is the skin conductance level (SCL), and another is the skin conductance response (SCR). The SCL is a slowly changing average level of the skin conductance signal, sometimes measured only from the points which do not contain any SCR’s. SCL level and its current direction can be used as a measure of stress or arousal, with relaxation producing a decreasing and activation an increasing SCL. The SCR is a feature in the skin conductance signal, in the form of a quick increase followed by a slow decrease in conductance. The SCR has also been referred to as a startle response, because it can be reliably elicited by presenting a sudden unexpected stimulus, such as a loud sound, but it quickly diminishes if the stimulus is repeated. SCR’s also occur spontaneously, known in that case as non-specific SCR’s, and their occurrence rate can also be a measure of interest.
In professional settings, signal amplifiers and filters are used to increase the signal quality. Thankfully, the skin conductance signal is not very noisy itself, and in an experimental setup it can be measured with a simple resistive voltage divider. The resistance of the skin is typically in the range of \(0.5 - 1.5 M \Omega\), and the voltage divider circuit can be constructed by thinking of the skin between two electrodes as one resistor, and using a \(1 M \Omega\) resistor as the other. The equation to calculate the skin conductance from the output voltage can be seen in Equation \(\ref{voltagedivider}\). \[ \begin{equation} \label{voltagedivider} V_{out} = \frac{R_2}{R_1 + R_2} * V_{in} R_1 = R_2*(\frac{1}{V_{out}} * V_{in} - 1) \end{equation} \] The analysis of skin conductance can be approached in different ways, but a starting point is identifying the SCR’s, and their features such as amplitude, rise time and decay time. A startle response detection algorithm, as described by Healey (2000), attempts to find a significant rise in the skin conductance signal to signify the beginning of a SCR, and subsequently determine the maximum amplitude by finding the change in signal direction. My real-time compatible implementation of the algorithm for Python can be found at https://github.com/vatte/rt-bio/blob/master/physiology/SkinConductance.py. The SCR’s can then be utilized as either single events, or by determining their frequency over a period of time. Once the SCR’s are identified, the SCL can be calculated from the signal at points where no SCR is ongoing.
The heart muscle produces a strong electrical signal when it pumps blood to the circulation. This signal is known as the electrocardiogram (ECG). By placing two electrodes on the left and right sides of the heart, the electric signal can be recorded. Typically electrodes are placed across the chest, such as in heart rate monitors used for jogging, but alternate placements are also possible, for example in both forearms. The ECG represents the cardiac cycle as a signal consisting of different features, labeled P, Q, R, S and T. The R peak is usually the most interesting of these features, as it is a large spike and can be used to determine the heart rate very accurately.
The heart rate or inter-beat interval (IBI) itself is an indicator for certain states, and an exited heartbeat when feeling strong emotions is a common occurrence. Heart rate is also very susceptible to exercise and movement, and as such it can be difficult to differentiate what effects are due to other reasons than mental processes. Because of the high accuracy of measuring heart rate with ECG, another feature known as heart rate variability (HRV) can also be studied. HRV is the change in length between successive heart beats, \(IBI_{current} - IBI_{previous}\). The HRV chain can be analyzed in the frequency domain as well as with statistical measures, such as standard deviation. HRV is seen as an indicator of parasympathetic nervous system activity, and abnormal HRV’s have been related to stress and mortality.
A simple way to measure heart rate is using photoplethysmography. By shining a light through an area of the body with a relatively large amount of blood vessels and relatively soft tissue, such as a finger or the ear lobe, the pressure changes in the blood stream can be measured with a light sensor. In its simplest form this can be achieved with a light-dependent resistor (LDR) and a light-emitting diode (LED). By adding an infrared sensor and an infrared LED, the oxygen or red blood cell level in the blood can also be measured, by seeing the difference in the clear light and infrared light measurements, due to red blood cells absorbing the infrared light more effectively.
Respiratory inductance plethysmography is the measurement of breathing from the varying circumference of the chest, or thorax, and the stomach. The varying circumference is caused by the filling and emptying of the lungs. This type of sensor can be made for example from a fabric stretch sensor. The signal can be analyzed by observing the changes in direction, or the changing of sign of the signal’s delta: breathing out begins when the circumference reaches its maximum, and breathing in begins when the circumference reaches its minimum value.
Transmission generally means taking information, transforming it into a format best suitable for the requirements of the transmission, such as speed, size and reliability, and conveying it from the sender to the receiver. Emotionally relevant data takes multiple forms, and there is no consensus over the best format that emotions should be transferred in. To try to keep the options for the protocol as unlimited as possible, I will present different strategies for encoding input data and decoding it into a format suitable for output, while preserving the most relevant signals. Encoding and decoding are approached from two different points of view: machine learning models use existing data to construct statistical models of relationships, and rule-based models apply a theory, an idea or scientific knowledge as fixed rules governing system behavior.
A straightforward solution for transmitting emotional data is encoding it in the form of metadata of emotional content in a human-readable format. A standard, Emotion Markup Language already exists for this purpose, and it will be presented later in this chapter in the Metadata section. On the other hand, reading, analyzing and interpreting emotions are not yet trivial issues in laboratory settings, let alone in vivo. As an alternative to metadata, Unlabeled emotions can be used as a starting point for transmission as well. The idea is that humans have evolved to be specialized in understanding emotions, and we just need an optimal way to encode and transfer them. In the last part of this chapter Emotional bandwidth I will explain the idea of widening our emotional information channel, transferring unlabeled data and the possibility of interpreting emotionally relevant signals through another medium, such as sound or images.
Statistical modeling is a way to make interpretations of data, without explicitly defining the way data should be processed. The field of machine learning develops algorithms and methods for building these types of models. Machine learning is a very useful tool for making sense out of large data sets, and for finding connections between complexly linked phenomena. Problems that have been possible to solve with machine learning methods much better than with expert knowledge include fields such as computer vision and object recognition, as well as natural language processing. Machine learning methods thrive with large amounts of input data. A surprising trend, in especially the natural language processing field, is that very large amounts of data often lead to a situation where more sophisticated learning can not beat simple ones, referred to as data beats better algorithms (Brill 2003; Halevy, Norvig, and Pereira 2009). In emotion processing we often deal with a similar problem, as large amounts of data from physiological sensors, movement and facial expressions need to be mapped onto emotional descriptors or other related phenomena, without a clear-cut connection for creating a direct rule-based model.
For emotion analysis, a typical method is supervised learning, in which a labeled training data set is used to teach and optimize a statistic model, and hopefully make it successful for generalization and automatic labeling of new data. Popular algorithms for supervised learning are multiple linear regression, support vector machines, tree models and neural networks. Supervised learning can deal with two types of problems: regression and classification. Regression produces a continuous value, and in the case of emotion analysis it can be used with dimensional models. Classification determines the most likely category based on the prediction of the statistical model, and can be therefore used with categorical models of emotion. Multidimensional data can be simplified by feature selection, with methods such as principal and independent component analysis (PCA and ICA), and forward-backward selection.
Rule-based models are models that are built manually, typically based on established knowledge of a certain set of rules that are necessary to fulfill a condition. The rules are typically derived from existing data or from the literature, but the systems behavior is determined by the programmer, not by data. Rule-based models follow a decision tree structure, where different conditions are met, and their results decided based on a pre-determined progression.
Emotion transmission often requires real-time and continuous data, but some of our data sources may not be directly compatible with that. It is important to distinguish what types of data come as events, and what kinds can be measured continuously when developing a concept. Event-related data can be transformed into continuous in two ways. First is maintaining the state of the previous event until a new event occurs, for example if a user reports an emotional state, we can expect her to maintain that state until she reports something else. Another option is calculating frequencies on a sliding time window: we can for example take all events that have happened during a certain time, and calculate a weighted average where the most recent events have the most weight.
Especially with different kinds of automatic inputs, such as facial analysis, bodily and physiological measurements we have data sources that produce data continuously. Sometimes the analysis of these data sources can create problems, for example heart rate variability typically requires a minimum time window of 10–15 seconds to be calculated. In this case it is often not desirable to only update data every 10–15 seconds, but instead we can utilize a technique known as a sliding window; by analyzing the last 10–15 seconds every second we can create a more continuous value.
A natural format for emotion transmission is to describe emotions to our best ability, and transferring this information in metadata. As explained in the Emotional frameworks section, there are two primary options for describing emotions: Dimensional emotions and Categorical emotions. By encoding and decoding emotions with data-based models, rule-based models or explicit information in the form of reporting emotional descriptors can be extracted – in a human-readable standardized format. Having this kind of format is very useful for developing emotionally aware applications as it can be used to bridge data between different systems: for example a physiological sensor, analysis and emotion extraction system can be connected to an emotion representation application flexibly with a shared metadata format.
The World Wide Web Consortium (W3C) has already realized the need for a standardized way to represent emotional metadata online. Schröder et al. (2011) in the W3C Emotion Markup Language Incubator Group have prepared a specification for Emotion Markup Language (EmotionML), which is an XML format for describing emotions for practical applications, with a scientific foundation.
The EmotionML format is designed to be used as a plug-in language for different contexts, and for this purpose it uses an encapsulating <emotion>
element. A top-level element <emotionml>
is also defined for the purpose of creating standalone EmotionML documents. Four types of XML tags are used to represent different ways of emotion description, they are: <category>
, <dimension>
, <appraisal>
and <action-tendency>
. EmotionML uses attribute names and values to indicate information type and actual values. Examples of the four categories:
<category name="happiness"/>
<dimension name="valence" value="0.6"/>
<appraisal name="agent-self"/>
<action-tendency name="approach"/>
Attributes are used instead of the alternative format, e.g. <category>joy</category>
, for the purpose of not interfering with text content in other XML formats that may be used in conjunction with EmotionML. Apart from name
, there is a confidence
attribute for each emotion descriptor, which takes values between \(0.0\) and \(1.0\), and it used to denote how reliable the descriptor is. The emotion descriptors can have a value
attribute to indicate the amount of the emotion on a scale from \(0.0\) to \(1.0\). Each <emotion>
tag can have an expressed-through
attribute to indicate through which modalities the emotion is expressed, and it takes a space-separated list of arbitrary modalities, for example gaze
, face
and voice
. Arbitrary additional metadata can be provide in an <info>
element inside the <emotion>
tag. A generic <reference>
element can be used to point to arbitrary unique resource identifiers (URIs) to provide context for the described emotion. the <reference>
element has a a role
attribute that can take one of four values: expressedBy
(default), experiencedBy
, triggeredBy
and targetedAt
.
EmotionML requires that each <emotion>
tag defines a vocabulary for the different types of emotion descriptions that are used: i.e. which name attributes the description categories are allowed to use. This requires that the EmotionML engine that is used to interpret a given EmotionML markup file has to determine the vocabulary and whether it is able to process it. This reduces interoperability, but is necessary to accommodate a multitude of emotional theories and take into consideration the fact that affective science has not reached a consensus on how emotions should be described. Interestingly, this allows developers and designers to experiment with different models to find out which one is the most fruitful and interesting tool. The vocabulary is defined in the <emotion>
tag from an external XML document, for example <emotion dimension-set="http://www.w3.org/TR/emotion-voc/xml#fsre-dimensions">
. In this case the vocabulary is defined according to the dimensional model by J. R. Fontaine et al. (2007), which is one of the vocabularies included in the W3C complementing specification (Burkhardt et al. 2014).
An example of a complete EmotionML compatible <emotion>
tag using a dimension set:
<emotion dimension-set="2DES.xml" expressed-through="face">
<dimension name="arousal" value="0.3" confidence="0.8"/>
<dimension name="valence" value="0.5" confidence="0.3"/>
<reference uri="http://niinisto.fi" role="experiencedBy"/>
</emotion>
By adding more sensors, input devices and other data sources that have emotional relevance, we can widen the emotional information bandwidth. The Unlabeled emotions –idea suggests that this emotional information can be directly transformed into a form that is understandable by humans. The transformation from a wide bandwidth of unlabeled data into a more meaningful data source can be approached with automatic machine learning methods. On one hand, the unlabeled data needs to be simplified, and sometimes a direct model to the output can be created, but another interesting possibility is creating any kind of arbitrary output from the data, and trying to create output representations that can best convey a large amount of information – which then needs to be understood by the mental processes of the receiver. This type of digital communication has been referred to as unrecognized input in the context of audio and drawings that are transmitted between users, but not recognized by the system (Bowman et al. 2004). This possibly will be further explored in the “New senses” section of the next chapter.
Emotions can be represented and understood in a wide variety of ways. In all communication, what is being conveyed very often does not match what the receiver of this information understands. This problem stems from the fact that humans have subjective minds that are not commensurate – a defining and unsolvable feature of our existence. With that said, there are a lot of commonalities in our expression and experience of emotion, as well as scientific theories and research on both which we can use as a framework for creating practical solutions.
According to appraisal and somatic theories emotions consist of our sensory experience unconsciously affecting our body. Our cognitive mind incorporates information from this bodily reaction, to the final conscious emotion – in essence the emotion is to at least a certain extent felt before it is understood. Taking this into account, displaying emotions explicitly and linguistically, such as “The author is happy”, does not take into consideration the unconscious experience that parallels the cognitive appraisal. What is conveyed is only the cognitive interpretation.
To respond to this lack of an important part of the emotional experience, we need to complement the explicit with a representation that we can understand implicitly, that we can feel. I will explore such possibilities as emotionally involved pictograms, using mediating forms of sensory content, and technologically created new senses.
Emotions can be represented in a language or coding that has been agreed on by the users. In the simplest form, words representing emotions, moods and feelings can be used. Many other forms have been created; especially common are emoticons and emojis used in CMC. These types of representation often work simultaneously as input methods: the sender can choose from a set of representations the one they want to communicate to the recipient. Representations can also be used with the other types of input data, in the case that the input data information can be modeled to match a representation.
Emoticons are emotion representations made out of punctuation marks. The word “emoticon” comes from the combination of the words “emotion” and “icon”. Symbols and the use of punctuation for communicating emotions can be traced back to the 19th century. Morse code had certain numbers that had conventions of using certain numbers to represent emotion, especially indications for affective greetings, such as “love and kisses”, “best regards” and “lots of success” (Gajadhar and Green 2003). Punctuation marks to represent emotion were proposed several times during the 20th century by creative typographers and writers.
The symbol known as the smiley face became popular with the yellow and black design by artist Harvey Ball for an American insurance company, which used the design to raise the morale of its employees by printing posters, buttons and signs with the happy face (Figure 10). The smiling face became a cultural hit, and the company produced thousands of buttons. The design has been imitated and reproduced in a multitude of forms, being a symbol for different cultural phenomena ranging from fashion and advertisement to rave music, drugs and counterculture. (Stamp 2013)
The origin of the smiley face on the internet can be traced to a bulletin board message from 1982. In the message, Scott Fahlman proposes that the symbol :-)
should be used for marking humorous messages. The use of these symbols quickly spread across ARPAnet and the bulletin boards, and several variations were quickly proposed. (Fahlman, Baird, and Jones 2002)
The original message:
19-Sep-82 11:44 Scott E Fahlman :-)
From: Scott E Fahlman <Fahlman at Cmu-20c>
I propose that the following character sequence for joke markers:
:-)
Read it sideways. Actually, it is probably more economical to mark
things that are NOT jokes, given current trends. For this, use
:-(
Emoticons have evolved in radical and varied ways, and hundreds have been created. Currently, the Wikipedia list of emoticons contains 14 variations of the smiley face itself, and a large amount of emoticons for other meanings. Online communication services often automatically convert emoticons into images and animations. Dresner and Herring (2010) identify three functions for emoticons: emotion indicators, as indicators of nonemotional meanings, such as a wink to indicate sarcastic intention, and as illocutionary force, a way to express intention in a communication pattern that has cultural meaning.
The smiley face and the other sideways emoticons originated in Western culture, and other signs have evolved in other cultural contexts. An example of this are Japanese kaomoji, or face marks, which can be viewed straight, rather than on a 90 degree angle. The smiley face :)
roughly translates to the kaomoji version ^_^
(Dresner and Herring 2010).
Another, related phenomenon that also originated in Japan, the emoji, which are pictograms depicting faces, objects and characters, and are defined as a character set managed by the Unicode Consortium. Emoji are represented through different fonts, such as the open source Emoji One (Figure 11). Many modern systems support emojis with their own fonts, such as OS X, Windows, Android and iOS. The usage of emoticons has also taken more creative forms of communication, in which entire messages and conversations are sometimes constructed from emojis. (Blagdon 2013)
Similarly to emojis, sharing emotionally loaded pictures, videos and music is a common way of emotion transmission in CMC. It is less limited than emojis or emoticons, as virtually any piece of media content can be shared. This type of shared content exploration and consumption has an empathetic effect in that it can produce a feeling of sharing a feeling over the internet; if the sender has an emotional reaction to content, they expect the receiver to also have a similar reaction, and by sharing the content they are effectively synchronizing the emotions of the participants in the exchange. Sharing cat videos or funny pictures is used as a way to convey an emotion, to console and to joke around. In some online communities, such as Reddit and 4chan, certain pictures and videos have become very well known, and an intended emotion is conferred by simply writing the filename of an image, such as feelsgoodman.jpg
(Figure 12).
Sensory interpretations of emotion are common across cultures, it is easy for people to attach emotional descriptions to abstract sounds and colors. Some of these interpretations have an evolutionary basis, but others are culture-dependent.
As described in the section Music and emotions, the emotional content of music is both complex and well-researched. Personal preference and social context make music shareable on social media to produce a feeling of common feeling and empathy, in the same way as with images and video, explained in the Representations section. Music is used as a mood enhancer in movies, playing with the rhythm, dynamics, tempo, and pitch to enhance, support, and prepare the moods and emotions of the viewer in a deliberate way. Film music is often program music, music that is designed to be presented in tandem with a story and an image, which leads to it often having the explicit role of modifying the emotional content in a scene (Plantinga 2009).
Music information retrieval (MIR) is a field of finding musically and psychoacoustically interesting features of music, that can be analyzed from either the acoustic properties of the sound itself, the notation and lyrics of the musical piece, or from metadata such as the genre of music. The field of music emotion recognition (MER) is concerned with the automatic recognition and classification of emotions in music with the help of MIR analysis. MER can be useful for playlist generation, as well as composition and music therapy practice. A typical MER approach is having listeners rate music on an emotional scale, and then finding acoustic correlates with the emotional ratings, a model employed by for example Schubert (2004) on a continuous temporal scale.
Decomposing music into the purely sonic characteristics suggests that – at least in some way – the emotional content needs to be contained in the acoustic features and musical structures. Utilizing this knowledge, expressive music can be analyzed and even synthesized. Automatic and generative composition of music, for example based on a MER model, can be approached with a wide variety of tools, ranging from analog synthesizers to orchestral software and machine learning models to compositional rules. Audio processing programming environments, for example the open source PureData and Supercollider are powerful tools for creating generative compositions. These programming environments offer built-in capability for acoustic analysis, synthesis of new sounds, as well as playing back samples.
Colors are sometimes related to human feeling, and it is common to have favorite colors and dislike others, and humans can experience pleasure from certain colors and combinations. Wolfgang von Goethe, one of the fathers of perceptual color theory already described the symbolic use of colors in 1809. He associated the six colors in his color wheel to valenced qualities: red – beauty, orange – nobility, yellow – goodness, green – usefulness, blue – commonness, and violet – unnecessity. Goethe further assigned the transitions between colors to different qualities of the human experience: red–yellow – reason, yellow–green – mind, green–blue – sensuality, blue–red – imagination (Schulze et al. 1994). Goethe’s original drawing can be seen in Figure 13.
The emotional meaning of colors is not culturally universal, as can be seen in Figure 14. Instead, the meaning can sometime be quite opposite, for example red means happiness in China, anger in Japan, death in Egypt and life in India; Green stands for criminality in France and prosperity and fertility in India (Russo and Boor 1993). A study on the cross-cultural meanings of colors does suggest that most of the variance between emotions and colors can be reduced to the chroma and lightness of the color, with the hue playing a much less important role, concluding that cross-cultural universals do exist on the chroma and lightness scales of color (Gao et al. 2007)
Novich and Eagleman (2015) successfully used the sense of touch to convey sound. The authors started by analyzing the necessary bandwidth for speech. Their plan was to encode the sound into vibrational motors placed on the back of a vest. What they found out was that spatiotemporal patterns were better for distinguishing sounds, in comparison to spatial and intensity encoded patterns. D. Eagleman (2012) discusses the possibility of creating completely new senses by using sensor data and transcoding it in a similar manner onto existing senses.
Neil Harbisson (2012) is an artist that has been completely colorblind since birth. He has installed a sensor that detects the color in front of him. The device then transforms that information into a continuous tone, and plays it into his head via bone transmission. For over 8 years, he has continuously listened to this sound, and he reports that color has become a direct feeling for him, he no longer needs to memorize which note corresponds to what color. Interestingly, he has even extended his cybernetic color vision by including infrared and ultraviolet spectrums.
Warwick et al. (2004) conducted an interesting experiment by placing a neural implant in the arm of both Warwick and his wife. Neural impulses were transmitted over the internet between the two implants: when one could read an impulse, the other would produce a similar stimulation. Although this experiment transformed mostly the motor impulses, Warwick theorizes that a similar technique could be used to communicate between two people with implants installed directly to their brain. This could allow for the communication of thoughts and even empathy between the two brains.
Harnessing the built-in models of the mind for emotional interpretation is an intriguing possibility. Transforming emotionally relevant data into a signal for existing senses opens up a possibility for the mind to use its built-in models for empathy and emotion-reading in interpreting new sources of information. This concept has not been proven, as emotional data is much more complex than sound or motor signals, and the processing of emotions in the mind is not as easy to quantify as for example the sense of hearing. I propose an experimental approach to assess what kind of data and output models are actually meaningful for emotion transfer by creating applications and evaluating their effectiveness.
Transcranial magnetic stimulation (TMS) is a method for stimulating accurately small regions of the brain cortex. A TMS device consists of a magnetic coil, that is placed close to the head of the subject receiving TMS. When activated, the coil produces a rapidly changing magnetic field that affects the neuronal activity in a small area of the brain by producing a small current through magnetic induction. TMS can be applied either as single or repeated pulses, and it can be used to activate and inhibit the activity in different areas of the cortex, depending on the frequency of the magnetic field fluctuations. TMS is used to study the functional role of different areas of the cortex. One study showed evidence for the functional independence of brain areas associated with cognitive and affective theory of mind (Kalbe et al. 2010), proving TMS to be an interesting tool for the study of induction and perception of emotions. Because of the nature of the magnetic field, deeper areas of the brain cannot be simulated without also affecting the cortex. TMS is considered to be safe and accurate, and its use in research has become more popular in recent years. TMS is used for evaluating the damage caused by stroke and other disorders affecting the brain. It has also been proven to be efficient in the treatment of neuropathic pain, forms of depression, with probable efficiency in the treatment of schizophrenia and after effects of stroke (Lefaucheur et al. 2014).
TMS is not widely available outside research laboratories, and is a significant investment even for a laboratory. Instead a more primitive method known as transcranial direct-current stimulation (TDCS) is sometimes used, even by hobbyists. TDCS is based on applying direct current across two electrodes on the scalp. The electrodes are similar to EEG electrodes, but instead of being connected to an amplifier, they are connected to a current source. The direct current in turn appears to lower the threshold of neuronal action potentials, in effect increasing activity in the stimulated area. TDCS has seen interest in use as a cognitive enhancer, with some studies reporting positive effects for learning and memory (Kuo and Nitsche 2012), but debate is ongoing if this effect is actually significant and consistent across different studies (Horvath, Forte, and Carter 2015; Price and Hamilton 2015). It is important to note, that unlike TMS, TDCS is not proven to be efficient in the treatment of any neurological disorders. Even though the use of TDCS is considered safe when proper safety measures are taken, I strongly advise not to experiment with the technology outside a laboratory setting.
What is an internet protocol anyway? One way to look at it is the OSI layered model of seven network technology layers. The physical layer: hardware, cables and components on the bottom, and the application layers such as e-mail, HTTP and SSH at the top. In the application layer protocols, typically a set of commands and methods are used to transfer content and represent the same content across systems. HTTP stands for Hypertext Transfer Protocol, and it is used among other things to transfer Hypertext Markup Language (HTML) files, which present internet pages as a collection text, hyperlinks to other resources on the web, and typesetting. The EmotionML presented in the Metadata section is the emotional equivalent of HTML, and HTTP could be used easily to transport also EmotionML documents. The problem, acknowledged also by the creators of EmotionML, is that a consensus does not exist of how to present or encode emotions in a meaningful way – emotion transmission is probably much more complex, and at least the information is more vague and unknown than transmission of text and media files.
An interesting finding of both the scientific theories of emotion, and the input and output modes presented in earlier chapters, is that there exists a brain-body divide. On one hand, we have somatic theories of emotion that place a great deal of importance on the sensation of emotion, and hold cognitive emotional descriptions as secondary, inaccurate representations. On the other hand, cognitive appraisal theories of emotion go as far as to claim that an emotion does not exist, neither on a bodily nor a mental level, until a cognitive judgment of the context and situation has triggered the emotion.
Input modes consist of three categories. First, the deliberate, cognitive reporting, such as smileys and emotional descriptions that can be transferred as is, and understood by the receiver. Second, the facial and bodily expressions, understood implicitly, under unconscious influence, but simultaneously subject to conscious awareness and control. Third, the masses of invisible autonomous physiological and neural data that are difficult to handle and use correctly, but contain enormous amounts of encoded information.
In output we have a similar divide. First, representations being easy to understand, and directly compatible with emotion reporting input, but often not capturing the subtleties of face-to-face communication. Second, experiments transcoding data to different senses, and leaving the emotional interpretation to internal human capabilities. Finally, the direct manipulation of the brain to create a feeling synthetically is still more science fiction than reality.
This dichotomy between the conscious, cerebral and unconscious, bodily emotions should be taken into account in the emotion transfer protocol. My suggestion is that the protocol be divided into two categories: labeled, explicit and unlabeled, implicit ETP. The purpose of the explicit-ETP is to generate emotional descriptions from different data sources, either via reporting or modeling patterns in complex data, for example data from physiological measurement. The implicit-ETP on the other hand attempts to widen the emotional bandwidth in communication by providing as much implicit information as possible, allowing the emotional connection to form naturally, as it does in face-to-face communication.
The W3C draft for an emotion markup language (EmotionML) is a logical starting point for the encoding of explicit ETP. Transmitting emotional content in a standardized way, by using common methods from research, such as dimensional emotions, creates a nice environment for application developers to use understandable data and create new ways of producing and representing it. Another interesting tool are emojis, as a symbol set maintained by the Unicode consortium, they form one basis for current everyday emotion representation online. Emoji could be expressed also in other formats than illustrations: for example musical or photographic emoji sets could be created.
Implicit ETP needs to start from the assumption that no information in emotional communication can be reduced to generalized descriptions, but that rather we need to find ways of analyzing and re-representing the data that provides more information to humans. Because emotions are not cognitive, we cannot reason our way into finding the relevant dimensions or categories, nor can we survey the population about their ideas. Rather, because emotions in this approach are a black – or at the very least grey – box, we should approach them from a practical, iterative design standpoint. By designing an application or a model that transfers data containing emotional information between people, and then testing said application in real world situations provides a way for us to approach the problem.
But how do we actually test implicit applications? If the implicit ETP does not allow for emotions to be strictly quantified, how do we measure the transferred emotional information? One solution, typical to the design approach, would be to gather user feedback of their preference, with the implication that the most natural and effective approach is perceived more positively by the users. Another way to approach this question comes from neuroscience, where synchrony has been a topic of debate and interest in recent times. If we can increase the measured synchrony between people, it can be used as a goal measurement for the fitness of an emotional transfer model.
The third part describes my experiments and projects combining interaction and emotions. In none of my own projects have I taken an explicit approach to emotions – I do not name the emotion of the user. Instead, emotions are approached as a black box. The goal of these projects is to widen my own vision about the subject as a designer, and to test different approaches for understanding their possibilities, limitations, and the technical difficulties involved in an otherwise abstract topic.
The first chapter, Undressing feelings with wearable electronics, goes over two projects with a similar goal: creating wearable clothes that present the wearer’s physiological reactions to the surroundings. Brainwise is a hat that displays the wearer’s brain activity on the surface, and Immediate Invisible is a women’s fashion collection that creates soundscapes out of the wearers’ peripheral physiology. Art and design were in the focus with these projects – an important consideration was put to aesthetics.
The second chapter, Musical emotion transmission is about a system used to create an automatic mapping from physiology to sound, based on the user’s physiological reactions while listening to music. The mapping is approached with a machine learning method, and the produced sound is resynthesizing sounds that the user has listened to before. This approach is meant to be a direct mapping from an input source, physiology, to an emotionally meaningful output, music, fulfilling the ideas of Unlabeled emotions.
The final chapter, Stimulating the cortex is about a pilot brain stimulation experiment. It is a journalistic and descriptive account of one experiment, in which we attempted to manipulate the subject’s emotions and affective theory of mind with brain stimulation. This approach is radical, and it is included more as a thought provoker and to demystify a futuristic possibility, than as an actual design project.
In this chapter I will present two wearable electronics project that play with the idea of unlabeled emotions. In both of these projects, physiological processes are exposed to the world, but emotional interpretation is left to the spectator. The result is a widening of the Emotional bandwidth, which was discussed earlier in this thesis, if the mapping from physiology to sensory experience has been successful. The first project, Brainwise, approaches the problem by taking a tool that neuroscience researchers use, and transforming it into an expressive tool, but interestingly leaving the wearer unaware of the expression itself. The second project, Immediate Invisible, is in some way a spiritual successor to Brainwise. A larger scale project in which physiological processes are used to drive a compositional soundscape, in which aesthetics had a very important role, and accurate data visualization – or in this case sonification – was not considered to be a focus.
Inspired by the images accompanying brain imaging research, in which brain areas light up as a response to stimuli and events, I had for a long time an idea to make a hat that displays its wearers brain activity in real time on the hat’s surface. With this kind of hat, the wearer’s brain activity could become an additional part of their unconscious bodily expression, broadcasting their thoughts in real-time to anyone who is watching. It would be a mind-reading hat. At the same time, due to the complex and imperfect interpretation and analysis of brainwaves, it would not be possible to actually infer the thoughts of the person from the data, at least not accurately, rendering the hat into an art object, rather than a functional tool.
It was in the fall of 2011 and my first year at Media Lab that I got the chance to realize this project on a joint course between Aalto University and MUU Artist’s Associations. The course, Wearable electronics, was lead by media artists Jukka Hautamäki and Tomi Dufva, with Markku Nousiainen as the producer. Another participant on the course, the brilliant costume designer Metti Nordin was interested in the idea, and together we formed the Brainwise team.
It was immediately clear that work would be mostly divided so that Nordin would be in charge of the physical construction of the hat itself, and I would handle the electronics and programming. We planned together the appearance and structure of the hat, and decided that it would have a folded structure resembling the brain. Light fiber would be used to display the light on the surface, the light source being RGB LED’s hidden inside the hat (Figure 15a). The electrodes, EEG amplifier, computer and battery would also be hidden inside the hat’s structure, making it completely portable and wireless. We started working with the project title Thinking Cap. Later on, we came up with the Brainwise name as a play on the words streetwise/bookwise, as well as bitwise operations, primitive actions that can be used to manipulate bit patterns in certain programming languages, such as C.
To measure EEG in a laboratory, wet electrodes are most common, meaning that electrodes made out of a highly conductive material – typically silver chloride – are used together with electrically conductive gel to make a stronger connection to the scalp. The application of wet electrodes is time-consuming, and afterwards the gel needs to be washed out of the hair. In consumer products, for example the Neurosky MindWave, dry electrodes are sometimes used. With a dry electrode, ensuring good contact between the scalp and the electrode is necessary, so it can be difficult to make the electrode function through hair. What I wanted to try for the Brainwise hat was a new type of electrode, a capacitive non-contact electrode. The non-contact electrode works by forming a capacitive connection between the scalp and the electrode, and as such it does not require direct contact, but works from a small distance. This is perfect for a hat, as it does not require the wearer to actually attach electrodes to their scalp, but the electrodes can be hidden into the hat’s structure.
Ready-made capacitive electrodes were either extremely expensive, or not available to buy in smaller batches. Thankfully, Chi et al. (2010) have detailed an EEG capable capacitive electrode and amplifier circuit, complete with the components that are needed. With the help of the electronics workshop master at Aalto Media Factory at the time, Ali Neissi, we were able to etch the circuit boards for the amplifier, three electrodes – two for each sides and one reference electrode – and solder even the small surface-mounted components on the board. To communicate with the digital amplifier, and to drive the RGB led’s, a Lilypad variant of the Arduino was used. The Lilypad is especially designed for wearable electronics projects, it can be sown directly with conductive thread to make electronic connections, but we opted for regular insulated electronic wire to reduce possible electric interference for the sensitive electrodes. Nordin came up with an expandable hard structure for the internal part of the hat, so that the electronics would be protected, while still accommodating different head sizes. The internal structure, together with the amplifier circuit board and the Arduino board can be seen in Figure 15b.
The three main colors of the LED’s: red, green, and blue were used as dimensions for mapping brain activity on the surface. Three RGB LED’s were connected to the light fibre to match the left and the right side, and the back of the hat. Two electrodes were placed on the left side, between the locations F3 and C3, and on the right side, between the locations F4 and C4 on the inernational 10-20 system (Figure 9), and a third, reference electrode was placed on the mastoid bone. The measured EEG signal from the left and right electrodes were individually processed with a Fourier transform to find out the total power in the Alpha frequency range, \(8 - 13\) Hz. Brain activation was considered as the inverse of this value, as the Alpha range is usually linked to relaxation. An adaptive normalization process with a \(30\) second sliding window was used to determine the activation level from \(0.0\) to \(1.0\), with \(0.0\) mapped to completely blue and \(1.0\) to completely red on the left and right sides respectively. The LED in the back of the head received its value from the average of the left and right sides. Additionally, the green color channel was mapped to the imbalance between left and right hemispheric activation, which according to certain experiments is linked with emotional valence, as explained in The Brain section. A short normalization window was chosen experimentally, so that the LED activity of the hat would be lively. The Fourier analysis was done every second, and between the analyses the old color values of the LED’s were gradually transformed into new values, producing a smooth lighting effect.
Brainwise was presented for the first time in MUU Gallery, Helsinki at the Wearable Electronics course presentations on 5th December 2011. On 15th December 2011 it was shown at the Media Lab demo day at Lasipalatsi, and later on the same day it was presented on stage at Kaapelitehdas, both in Helsinki. The latest exhibition that Brainwise was participating in was the Wearable Technology and eTextile design exhibition at the EU parliamant in Brussels, Belgium running from 17th to 20th November 2014.
At the time when I was working on Brainwise, I thought of it mostly as an aesthetic idea. I was thinking it would become an interesting and possibly beautiful object to wear, a commentary on neuroscience and an attempt to make science interesting and real for the public. Once the project was finished and we had presented the Wearable Electronics projects at MUU Gallery, another course participant, fashion designer Liisa Pesonen contacted me. She had watched Brainwise being worn on other people, and worn it herself while being watched by others. This had created a strange sensation of being exposed in a new way without an ability to control the expression or even being able to view the activity presented on the top of your head. The idea was intriguing, and it fit well with my own thoughts about implicit communication of emotional data through a new medium.
Pesonen presented her plan of creating a whole fashion collection with all the outfits capable of reading the body and outputting sound as an expression or reflection on the internal state, and we begun our collaboration. Immediate Invisible is the name of the wearable electronics fashion collection that I created together with Pesonen and composer Samuli Tanner, with electronics workshop master Jussi Mikkonen supporting and guiding us throughout the project. It was the topic of Pesonen’s master’s thesis (2013). Pesonen lead the project and took the main responsibility, but retrospectively we described the working process as an Exquisite Corpse, the end-result being an outcome equally affected by the backgrounds and skills of the whole working group, each member producing possibilities and limitations as well as giving artistic input due to their different, complementing backgrounds and skills (Pesonen, Wikström, and Mikkonen 2015). A picture of one of the final outfits can be seen in Figure 16.
The aim was to generate an evolving soundscape that responds to the involuntarily processes inside the wearer. We discussed the properties of each sensor and the ambience we wanted to create with the sounds together with Pesonan and Tanner. Tanner composed an evolving soundscape capable of receiving the sensor input, which I programmed in Pure Data. The speakers that were used in the outfits were 3D printed with the assistance of Jussi Mikkonen. Each speaker pair consists of one active speaker with built in amplifier, and one passive speaker.
The sensors are two photoplethysmographs, two skin conductance sensors, one ECG monitor and one electromyogram (EMG) sensor. The photoplethysmographs are used in two different ways, one to analyze the pulse, with the heart beat mapped to a deep booming sound, and the other used to analyze the overall level of the signal, mapped to a whooshing sound. The skin conductance sensors are analyzed for both SCL and SCR, with each driving a different part of the synthesis of either a tremolo sound or clacking sound. The ECG monitor is used to drive a melody, with each heart beat playing the next note, and HRV is analyzed to produce jumps in the melody when the HRV exceeds a threshold. EMG is analyzed for muscle contractions to control a spring sound.
All the clothes in the collection are functional. Each of them belong to one of three categories: computer, speaker or physiological sensor, and there are six clothes in each category. For this purpose we needed to create connectors and a common power source for all the clothes, which was implemented as a battery paired with the computers. Raspberry Pi’s were used as the computers, programmed in Python, and with a custom made analog shield I made for reading the physiological sensors. The collection itself is fully modular: any combination of three clothes – one from each category – can form a functional sonic outfit.
Immediate Invisible has been displayed at the Masters of Aalto exhibition in Helsinki, running from 16th May to 1st June 2014, as well as the Wearable Technology and eTextile design exhibition at the EU parliament in Brussels, Belgium 17th to 20th November 2014, where Brainwise also appeared. The Immediate Invisible project was presented at the Global Fashion Conference 20th to 21st November 2014 in Gent, Belgium.
Starting with the premise that both music, its acoustic properties and peripheral physiology carry emotional information, I wanted to create a system that creates an automatic mapping between these two spaces. Testing the concept of implicit ETP, the musical emotion transmission project attempts to transfer the emotional information without explicitly encoding physiological data to emotion descriptors, and decoding the descriptors into sound. Instead, the emotions are treated as a black box, and a model is created to transform physiology directly into sound. The project is also experimenting with the induction and expression of emotions through music: effectively creating an expressive model on the basis of music’s inductive capabilities.
Unlike in the Immediate Invisible project presented in the previous chapter, the mapping is not approached in a traditional compositional aspect. Instead, the system adapts to the user’s by collecting data of their physiological responses listening to recorded music. The music is analyzed for several MIR features continuously. By utilizing an artificial neural network model, an automatic mapping is created between the physiological responses and the acoustic features. This mapping can be used in real-time to predict what acoustic features best match the current physiological state of the user. Sound is finally resynthesized with a technique known as corpus-based concatenative sound synthesis, which is a method for creating new sounds by composing with a set of small samples, based on their acoustic qualities. The whole concept is illustrated in Figure 17.
A MindMedia Nexus-10 device for the measurement of skin conductance, ECG and respiration was used. This device is capable of sending raw sensor data over bluetooth, which I then parsed and analyzed in real-time in Python. The source code for biosignal analysis is available online at http://www.github.com/vatte/rtbio.
The goal is to create a machine that transforms emotions automatically into emotionally expressive, personal and meaningful new music. This is not the case yet, as the output of the resynthesis is quite unpredictable and noisy. The model does have some statistical power: it is able to predict to some extent the acoustic features of the music based on listener’s physiology (Wikström 2014).
A decision was made in the project to use music that fits the users personal taste, and use this for both resynthesis and to train the adaptive model. Using enjoyable music makes the training process more pleasant, and it also personalizes the emotional expression of the user through the generated sound. The communicative and emotional powers may not only reside in the acoustic content, but also in the personal tastes, which is the case also in typical spontaneous musical expression.
Once this type of model is able to create musically meaningful results, the evaluation of the effectiveness in transmitting musical information becomes a very interesting question. If a recipient is able to correctly identify the emotion of the sender that is hooked up to the automatic music generation system, the system can actually be used for emotion transmission.
Stimulating the Cortex is a project of the NEMO team, focused on an exploratory approach to evaluating the possibility of directly manipulating the perception and experience of emotions by brain stimulation with [Transcranial magnetic stimulation] (TMS). The description will follow a format of an experiment diary, to give the reader an understanding of the usage of the device. The results of this experiment are solely qualitative: by interviewing the subject and observing the experimental situation an insight to the use of TMS for interactive purposes is hopefully achieved.
Our plan was to stimulate Katri’s brain in areas of the prefrontal cortex related to affective theory of mind (AToM) and happiness. The brain area related to AToM in this experiment are the middle temporal gyrus (MTG) and the ventro-medial prefrontal cortex (VMPC), chosen on the basis of existing FMRI research (Sebastian et al. 2011). Practically, our aim was to find out if TMS pulses in these areas are comfortable for the subject; there are some muscles of the face that are close to the stimulated brain areas and in risk of getting activated.
Our host for the experiment, Tommi, is a trained TMS expert. He operated the TMS machine, NexStim NBS System 5. One of the experimenters (Katri) was the Guinea Pig, as pilot experiments on colleagues do not require bureaucracy in a similar way as external test subjects. Another experimenter (Vesa) was supervising the experiment, and I (Valtteri) was documenting the experiment in the form of audio recording and photography.
To locate different brain areas for stimulation, a 3-dimensional MRI image of Katri’s brain was loaded onto the computer controlling the TMS device. A 3D camera mounted to the device, and a special headband with a tracking fiducial made the navigation of Katri’s cortex possible. Before the experiment Katri was seated on the TMS chair, and an EMG sensor was installed to her thumb (Figure 18c).
By stimulating a well-documented brain region: the cortical representation of the thumb, we were able to verify the spatial calibration and find a comfortable threshold for the TMS pulse. By stimulating the area while measuring the EMG response, a suitable power for the TMS could be found. Starting from less than 20% of the full power, we increased the power slowly to find a level that was capable to cause an involuntary movement of the thumb, and an accompanying EMG response. Simultaneously, the speed of the motor pathway, or the time it takes for a signal to travel from the motor cortex to the muscle could be determined. This speed has considerable individual differences, as well as a difference depending on the time of day, the situational aspects and the alertness level of the subject.
At this point, an excited air filled the laboratory, evident from experimenters cracking nervous jokes and laughs. The 3-dimensional navigation of the cortex wass a distinctly cool experience, and seasoned neuroscientists seemed to meet with the same feelings as children presented with a new toy (Figure 18a, Figure 18b). Meanwhile, the researchers maintained a humble and respectful attitude; we were stimulating the human brain and attempting to manipulate some of the most important processes of the mind: the emotions. It was Katri’s first time in a TMS machine.
Before the TMS was activated for the first time, we were required to put on earplugs. The reason for this is that the sound of a TMS pulse is very loud, but so short (200 \(\mu S\)) that it does not actually feel painful, even though it can damage the hearing.
Tommi navigated the folds of Katri’s cortex to first find the motor cortex, and then an \(\Omega\)-shaped feature, supposedly a representation of the thumb. Katri was instructed to keep her hand relaxed, while Tommi began to issue pulses. Katri described that the pulses feel like something touching her head. At first, while the TMS power wass being increased stepwise from 20 %, no reactions were happening, and the experimenters speculated whether the spatial navigation was accurate, or whether Katri had a high activation level. Tommi reassured the other experimenters, saying that his personal activation level is around 40 %. At 28 %, a noticeable EMG was produced for the first time, but Katri reported that she did not feel any movement in her thumb. Tommi opted not to increase the power from this level, but instead made minute adjustments in the location between pulses. Finally, after first achieving many an involuntary movement of the long finger, middle finger and the ring finger, the thumb moved.
The cortical representations of the different fingers can be located very close to each other, Tommi explained. Katri described the sensation similar to a tic, a muscle spasm.
The optimal power for the next stage of the experiment wass 70 % of the motor activation threshold. In this case 20 % of full power. The section of the MTG that we wanted to stimulate is known as Broadmann area 21, associated with the feeling of happiness in literature. A problem with this area is its close location to some facial muscles, bound to get activated in the process. We decided to try both single and repeated pulse TMS (rTMS), to see if we could get any kind of effect. Repeated pulse TMS is typically used in therapy, to achieve a lasting effect on the patient or subject.
Katri’s face twitched, which she described as mildly uncomfortable, but not severely. We continued the experiment, giving pulses to different spots of the Broadmann area 21, and asked Katri to report her experiences. She lamented that feelings are so subjective, that she could not really say what kind of effect, if any, the stimulation achieved.
Listening to the recording of the experiment afterwards, I can notice a change in her voice at this point, there is a distinguishable sadness or distress in her tone. It can not be said for certainty whether it has anything to do with the stimulation, with other factors such as stress from the prolonged experiment, or my own imagination.
We tried a few different spots, and applied rTMS stimulation for a short period of time, but after a while we had to conclude that we were not able to achieve any perceptually noticeable effects. Safety of stimulating the respective areas seems confirmed, and our team has now a better idea of the TMS process. We wrapped up the experiment, and collectively decided that we had not yet figured out how to apply TMS for AToM manipulation.
Digital communication is amazing in many ways. We have new forms of interactive participatory art, multiplayer games, and we have social media for keeping in touch with a larger amount of people than was possible before. Meanwhile, online discussions, especially between strangers, have a bad tendency of containing miscommunication, outright flaming, and asocial behavior – much more so than in real life. This, in my opinion, stems from a lack of empathy, and the reason empathy is broken online is that the communication channel does not offer us enough emotional information.
The solution presented in this thesis is augmenting communications with an emotion transmission channel. I have gathered knowledge and tools for the creation of this channel, naming it the Emotion Transfer Protocol. I have divided the protocol into three parts: input, transmission, and output. Input consists of ways emotions can be read from the body and expression, using sensors, cameras, and standard human interface devices. Transmission includes the encoding and decoding of emotions between different input and output modes. Output consists of many ways to express and represent emotions through image, sound, and experimental approaches. ETP is not a complete solution at this stage, but a starting point for expanding and creating a viable channel for emotion transmission.
Existing attempts of transmitting emotions as explicit descriptors and metadata are missing one side of emotional theories – that emotions seem to happen, at least partly, on an unconscious level of the brain, a level of direct bodily reactions and interpersonal synchronization of brain processes. For this reason I have dedicated parts of the thesis to exploring the idea of unlabeled emotions, and implicit emotion transfer. In these approaches the interpretation of emotions is left to us humans, and the role of the transmission becomes making this information available and plentiful.
It is a difficult and interesting design challenge to represent unrecognized data in a meaningful way. This can be approached with designing output devices that represent the data based on rules, which are grounded in the nature of the data or in aesthetic considerations of the output medium. Another solution is creating a model of how the input and output are connected, and creating a natural representation that is bound to theory and relationships within data. Both of these approaches have been represented in the last part of my thesis in different experiments involving the implicit expression of emotions by augmenting the human body with digital devices.
Emotional communication technology is ready for the creation of new applications which utilize the available theory and methods. The most important step that needs to be taken to improve the applications and understand the fruitfulness of different approaches, is the development of ways to test those applications in the laboratory and in real-life situations. A large amount of work is needed in the development of testing practices for emotion transmission applications. The evaluation should be approached from both a design and a scientific perspective: applications can be tested by observing behavior of users in adapting new technologies, by asking users about their perception about different approaches, as well as by generating more objective tools for measuring the empathy, synchrony, and the accuracy and level of emotion transmission between people.
Arnold, Magda B. 1945. “Physiological Differentiation of Emotional States.” Psychological Review 52 (1). American Psychological Association: 35.
Banse, Rainer, and Klaus R Scherer. 1996. “Acoustic Profiles in Vocal Emotion Expression.” Journal of Personality and Social Psychology 70 (3). American Psychological Association: 614.
Bilton, Nick. 2014. “Steve Jobs Was a Low-Tech Parent.” New York Times. September 10, 2014. http://www.nytimes.com/2014/09/11/fashion/steve-jobs-apple-was-a-low-tech-parent.html.
Blagdon, Jeff. 2013. “How Emoji Conquered the World.” The Verge. Vox Media. http://www.theverge.com/2013/3/4/3966140/how-emoji-conquered-the-world.
Bowman, Doug A, Ernst Kruijff, Joseph J LaViola Jr, and Ivan Poupyrev. 2004. 3D User Interfaces: Theory and Practice. Addison-Wesley.
Brill, Eric. 2003. “Processing Natural Language Without Natural Language Processing.” In Computational Linguistics and Intelligent Text Processing, 360–69. Springer.
Burkhardt, Felix, Marc Schröder, Catherine Pelachaud, Kazuyuki Ashimura, Paolo Baggia, Alessandro Oltramari, Christian Peter, and Enrico Zovato. 2014. “Vocabularies for EmotionML. W3C Working Group Note.” World Wide Web Consortium.
Cannon, Walter B. 1927. “The James-Lange Theory of Emotions: A Critical Examination and an Alternative Theory.” The American Journal of Psychology. JSTOR, 106–24.
Chi, Yu M, Patrick Ng, Eric Kang, Joseph Kang, Jennifer Fang, and Gert Cauwenberghs. 2010. “Wireless Non-Contact Cardiac and Neural Monitoring.” In Wireless Health 2010, 15–23. ACM.
Coan, James A, and John JB Allen. 2004. “Frontal EEG Asymmetry as a Moderator and Mediator of Emotion.” Biological Psychology 67 (1). Elsevier: 7–50.
Cox, M, J Nuevo-Chiquero, JM Saragih, and S Lucey. 2013. “CSIRO Face Analysis Sdk.” Brisbane, Australia.
Crawford, Helen J, Steven W Clarke, and Melissa Kitner-Triolo. 1996. “Self-Generated Happy and Sad Emotions in Low and Highly Hypnotizable Persons During Waking and Hypnosis: Laterality and Regional EEG Activity Differences.” International Journal of Psychophysiology 24 (3). Elsevier: 239–66.
Damasio, Antonio R. 1996. “The Somatic Marker Hypothesis and the Possible Functions of the Prefrontal Cortex [and Discussion].” Philosophical Transactions of the Royal Society B: Biological Sciences 351 (1346). The Royal Society: 1413–20.
Darwin, Charles. 1872. “The Expression of the Emotions in Man and Animais.” Murray, London.
De Waal, Frans. 2010. The Age of Empathy: Nature’s Lessons for a Kinder Society. Broadway Books.
Decety, Jean, and Philip L Jackson. 2004. “The Functional Architecture of Human Empathy.” Behavioral and Cognitive Neuroscience Reviews 3 (2). Sage Publications: 71–100.
Derks, Daantje, Agneta H Fischer, and Arjan ER Bos. 2008. “The Role of Emotion in Computer-Mediated Communication: A Review.” Computers in Human Behavior 24 (3). Elsevier: 766–85.
Di Pellegrino, Giuseppe, Luciano Fadiga, Leonardo Fogassi, Vittorio Gallese, and Giacomo Rizzolatti. 1992. “Understanding Motor Events: A Neurophysiological Study.” Experimental Brain Research 91 (1). Springer: 176–80.
Dimberg, Ulf, Monika Thunberg, and Kurt Elmehed. 2000. “Unconscious Facial Reactions to Emotional Facial Expressions.” Psychological Science 11 (1). SAGE Publications: 86–89.
Dresner, Eli, and Susan C Herring. 2010. “Functions of the Nonverbal in CMC: Emoticons and Illocutionary Force.” Communication Theory 20 (3). Wiley Online Library: 249–68.
Eagleman, David. 2012. “Can We Create New Senses for Humans?” TED Talk. http://www.ted.com/talks/david_eagleman_can_we_create_new_senses_for_humans.
Ekman, Paul. 1992. “An Argument for Basic Emotions.” Cognition & Emotion 6 (3-4). Taylor & Francis: 169–200.
Ekman, Paul, Wallace V Friesen, Maureen O’Sullivan, Anthony Chan, Irene Diacoyanni-Tarlatzis, Karl Heider, Rainer Krause, et al. 1987. “Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion.” Journal of Personality and Social Psychology 53 (4). American Psychological Association: 712.
Ekman, Paul, Robert W Levenson, and Wallace V Friesen. 1983. “Autonomic Nervous System Activity Distinguishes Among Emotions.” Science 221 (4616). American Association for the Advancement of Science: 1208–10.
Engelbart, Douglas C. 1968. “The Mother of All Demos.” https://www.youtube.com/watch?v=yJDv-zdhzMY.
Eyben, Florian, Felix Weninger, Florian Gross, and Björn Schuller. 2013. “Recent Developments in Opensmile, the Munich Open-Source Multimedia Feature Extractor.” In Proceedings of the 21st ACM International Conference on Multimedia, 835–38. ACM.
Fadiga, Luciano, Leonardo Fogassi, Giovanni Pavesi, and Giacomo Rizzolatti. 1995. “Motor Facilitation During Action Observation: A Magnetic Stimulation Study.” Journal of Neurophysiology 73 (6). Am Physiological Soc: 2608–11.
Fahlman, Scott, Jeff Baird, and Mike Jones. 2002. “Original Bboard Thread in Which :-) Was Proposed.” http://www.cs.cmu.edu/~sef/Orig-Smiley.htm.
Florida, Richard. 2013. “Robots Aren’t the Problem: It’s Us.” The Chronicle. http://chronicle.com/article/Robots-Arent-the-Problem-/138007/.
Fontaine, Johnny RJ, Klaus R Scherer, Etienne B Roesch, and Phoebe C Ellsworth. 2007. “The World of Emotions Is Not Two-Dimensional.” Psychological Science 18 (12). SAGE Publications: 1050–57.
Frijda, Nico H. 2004. “Emotions and Action.” In Feelings and Emotions: The Amsterdam Symposium, 158–73. Cambridge University Press Cambridge,, UK.
Gajadhar, Joan, and JS Green. 2003. “An Analysis of Nonverbal Communication in an Online Chat Group.”
Gallese, Vittorio, and Alvin Goldman. 1998. “Mirror Neurons and the Simulation Theory of Mind-Reading.” Trends in Cognitive Sciences 2 (12). Elsevier: 493–501.
Gao, Xiao-Ping, John H Xin, Tetsuya Sato, Aran Hansuebsai, Marcello Scalzo, Kanji Kajiwara, Shing-Sheng Guan, J Valldeperas, Manuel Jose Lis, and Monica Billger. 2007. “Analysis of Cross-Cultural Color Emotion.” Color Research and Application 32 (3). New York: Wiley, 1976-: 223–29.
Gross, James J. 2010. “The Future’s so Bright, I Gotta Wear Shades.” Emotion Review 2 (3). SAGE Publications: 212–16.
Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The Unreasonable Effectiveness of Data.” Intelligent Systems, IEEE 24 (2). IEEE: 8–12.
Harbisson, Neil. 2012. “I Listen to Color.” TED Talk. https://www.ted.com/talks/neil_harbisson_i_listen_to_color.
Healey, Jennifer Anne. 2000. “Wearable and Automotive Systems for Affect Recognition from Physiology.” PhD thesis, Massachusetts Institute of Technology.
Homan, Richard W, John Herman, and Phillip Purdy. 1987. “Cerebral Location of International 10–20 System Electrode Placement.” Electroencephalography and Clinical Neurophysiology 66 (4). Elsevier: 376–82.
Horvath, Jared Cooney, Jason D Forte, and Olivia Carter. 2015. “Quantitative Review Finds No Evidence of Cognitive Effects in Healthy Populations from Single-Session Transcranial Direct Current Stimulation (TDCS).” Brain Stimulation. Elsevier.
Höök, Kristina. 2013. “Affective Computing.” The Encyclopedia of Human-Computer Interaction, 2nd Ed. The Interaction Design Foundation. https://www.interaction-design.org/encyclopedia/affective_computing.html.
Huron, David. 2011. “Why Is Sad Music Pleasurable? A Possible Role for Prolactin.” Musicae Scientiae 15 (2). SAGE Publications: 146–58.
James, William. 1884. “What Is an Emotion?” Mind, no. 9. Mind Assoc: 188–205. http://psychclassics.yorku.ca/James/emotion.htm.
Jasper, Herbert Henri. 1958. “The Ten Twenty Electrode System of the International Federation.” Electroencephalography and Clinical Neurophysiology 10: 371–75.
Jiang, Jing, Bohan Dai, Danling Peng, Chaozhe Zhu, Li Liu, and Chunming Lu. 2012. “Neural Synchronization During Face-to-Face Communication.” The Journal of Neuroscience 32 (45). Soc Neuroscience: 16064–69.
Juslin, Patrik N, and Petri Laukka. 2004. “Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening.” Journal of New Music Research 33 (3). Taylor & Francis: 217–38.
Kalbe, Elke, Marius Schlegel, Alexander T Sack, Dennis A Nowak, Manuel Dafotakis, Christopher Bangard, Matthias Brand, Simone Shamay-Tsoory, Oezguer A Onur, and Josef Kessler. 2010. “Dissociating Cognitive from Affective Theory of Mind: A TMS Study.” Cortex 46 (6). Elsevier: 769–80.
Kato, Yuuki, Shogo Kato, and Kanji Akahori. 2007. “Effects of Emotional Cues Transmitted in E-Mail Communication on the Emotions Experienced by Senders and Receivers.” Computers in Human Behavior 23 (4). Elsevier: 1894–1905.
Koles, Zoltan J. 1998. “Trends in EEG Source Localization.” Electroencephalography and Clinical Neurophysiology 106 (2). Elsevier: 127–37.
Konrath, Sara H, Edward H O’Brien, and Courtney Hsing. 2010. “Changes in Dispositional Empathy in American College Students over Time: A Meta-Analysis.” Personality and Social Psychology Review. SAGE Publications.
Kruger, Justin, Nicholas Epley, Jason Parker, and Zhi-Wen Ng. 2005. “Egocentrism over E-Mail: Can We Communicate as Well as We Think?” Journal of Personality and Social Psychology 89 (6). American Psychological Association: 925.
Kuo, Min-Fang, and Michael A Nitsche. 2012. “Effects of Transcranial Electrical Stimulation on Cognition.” Clinical EEG and Neuroscience 43 (3). SAGE Publications: 192–99.
Lazarus, Richard S. 1982. “Thoughts on the Relations Between Emotion and Cognition.” American Psychologist 37 (9). American Psychological Association: 1019.
Lefaucheur, Jean-Pascal, Nathalie André-Obadia, Andrea Antal, Samar S Ayache, Chris Baeken, David H Benninger, Roberto M Cantello, et al. 2014. “Evidence-Based Guidelines on the Therapeutic Use of Repetitive Transcranial Magnetic Stimulation (RTMS).” Clinical Neurophysiology 125 (11). Elsevier: 2150–2206.
Levenson, Robert W, Paul Ekman, Karl Heider, and Wallace V Friesen. 1992. “Emotion and Autonomic Nervous System Activity in the Minangkabau of West Sumatra.” Journal of Personality and Social Psychology 62 (6). American Psychological Association: 972.
Lindquist, Kristen A, Tor D Wager, Hedy Kober, Eliza Bliss-Moreau, and Lisa Feldman Barrett. 2012. “The Brain Basis of Emotion: A Meta-Analytic Review.” Behavioral and Brain Sciences 35 (03). Cambridge Univ Press: 121–43.
Marshall, Gary D, and Philip G Zimbardo. 1979. “Affective Consequences of Inadequately Explained Physiological Arousal.” American Psychological Association.
Milborrow, S., J. Morkel, and F. Nicolls. 2010. “The MUCT Landmarked Face Database.” Pattern Recognition Association of South Africa.
Minsky, Marvin. 2007. The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind. Simon; Schuster.
Moor, Peter J, Ard Heuvelman, and Ria Verleur. 2010. “Flaming on YouTube.” Computers in Human Behavior 26 (6). Elsevier: 1536–46.
New York Sun, The. 1877. “The Electroscope.” http://histv2.free.fr/19/electroscope.htm.
Novich, Scott D, and David M Eagleman. 2015. “Using Space and Time to Encode Vibrotactile Information: Toward an Estimate of the Skin’s Achievable Throughput.” Experimental Brain Research. Springer, 1–12.
Oostenveld, Robert, and Peter Praamstra. 2001. “The Five Percent Electrode System for High-Resolution EEG and ERP Measurements.” Clinical Neurophysiology 112 (4). Elsevier: 713–19.
Pajarinen, Mika, Petri Rouvinen, Anders Ekeland, and others. 2015. Computerization Threatens One-Third of Finnish and Norwegian Employment. The Research Institute of the Finnish Economy.
Pesonen, Liisa. 2013. “Immediate Invisible, MA Thesis.” Aalto University.
Pesonen, Liisa, Valtteri Wikström, and Jussi Mikkonen. 2015. “A Collaborative Development of an Artistic Responsive Fashion Collection.” Critical Studies in Fashion & Beauty 6 (1). Intellect: 95–119.
Pessoa, Luiz. 2008. “On the Relationship Between Emotion and Cognition.” Nature Reviews Neuroscience 9 (2). Nature Publishing Group: 148–58.
Picard, Rosalind W. 2000. Affective Computing. MIT Press.
Picard, Rosalind W, and Shaundra Bryant Daily. 2005. “Evaluating Affective Interactions: Alternatives to Asking What Users Feel.” In CHI Workshop on Evaluating Affective Interfaces: Innovative Approaches, 2119–22.
Picard, Rosalind W, Szymon Fedor, and Yadid Ayzenberg. 2015. “Multiple Arousal Theory and Daily-Life Electrodermal Activity Asymmetry.” Emotion Review. SAGE Publications, 1754073914565517.
Pittam, Jeffery, and Klaus R Scherer. 1993. “Vocal Expression and Communication of Emotion.” Handbook of Emotions, 185–97.
Plantinga, Carl. 2009. Moving Viewers: American Film and the Spectator’s Experience. Univ of California Press.
Plutchik, Robert. 2001. “The Nature of Emotions Human Emotions Have Deep Evolutionary Roots, a Fact That May Explain Their Complexity and Provide Tools for Clinical Practice.” American Scientist 89 (4). Sigma Xi Scientific Research Society: 344–50.
Preece, J, and Kambiz Ghozati. 2001. “Experiencing Empathy Online.” The Internet and Health Communication: Experiences and Expectations, 147–66.
Price, Amy R, and Roy H Hamilton. 2015. “A Re-Evaluation of the Cognitive Effects from Single-Session Transcranial Direct Current Stimulation.” Brain Stimulation. Elsevier.
Prinz, Jesse J. 2004. Gut Reactions: A Perceptual Theory of Emotion. Oxford University Press.
Radesky, Jenny S, Jayna Schumacher, and Barry Zuckerman. 2015. “Mobile and Interactive Media Use by Young Children: The Good, the Bad, and the Unknown.” Pediatrics 135 (1). Am Acad Pediatrics: 1–3.
Reid, Elizabeth. 1993. “Electronic Chat: Social Issues on Internet Relay Chat.” Media Information Research Exchange.
Richards, Rosalina, Rob McGee, Sheila M Williams, David Welch, and Robert J Hancox. 2010. “Adolescent Screen Time and Attachment to Parents and Peers.” Archives of Pediatrics & Adolescent Medicine 164 (3). American Medical Association: 258–62.
Russell, Daniel. 2015. “We Just Don’t Speak Anymore, but We’re ‘Talking’ More Than Ever.” Attentiv.com. http://attentiv.com/we-dont-speak/.
Russell, James A. 1980. “A Circumplex Model of Affect.” Journal of Personality and Social Psychology 39 (6). American Psychological Association: 1161.
Russo, Patricia, and Stephen Boor. 1993. “How Fluent Is Your Interface?: Designing for International Users.” In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems, 342–47. ACM.
Saarikallio, Suvi. 2011. “Music as Emotional Self-Regulation Throughout Adulthood.” Psychology of Music 39 (3). SAGE Publications: 307–27.
Saragih, Jason M, Simon Lucey, and Jeffrey F Cohn. 2011. “Deformable Model Fitting by Regularized Landmark Mean-Shift.” International Journal of Computer Vision 91 (2). Springer: 200–215.
Schachter, Stanley, and Jerome Singer. 1962. “Cognitive, Social, and Physiological Determinants of Emotional State.” Psychological Review 69 (5). American Psychological Association: 379.
Schmidt, Karen L, and Jeffrey F Cohn. 2001. “Human Facial Expressions as Adaptations: Evolutionary Questions in Facial Expression Research.” American Journal of Physical Anthropology 116 (S33). Wiley Online Library: 3–24.
Schmidt, Louis A, and Laurel J Trainor. 2001. “Frontal Brain Electrical Activity (EEG) Distinguishes Valence and Intensity of Musical Emotions.” Cognition & Emotion 15 (4). Taylor & Francis: 487–500.
Schröder, Marc, Paolo Baggia, Felix Burkhardt, Catherine Pelachaud, Christian Peter, and Enrico Zovato. 2011. “EmotionML–an Upcoming Standard for Representing Emotions and Related States.” In Affective Computing and Intelligent Interaction, 316–25. Springer.
Schubert, Emery. 2004. “Modeling Perceived Emotion with Continuous Musical Features.” Music Perception 21 (4). JSTOR: 561–85.
Schuller, Björn, Bogdan Vlasenko, Florian Eyben, Gerhard Rigoll, and Andreas Wendemuth. 2009. “Acoustic Emotion Recognition: A Benchmark Comparison of Performances.” In Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on, 552–57. IEEE.
Schulze, Sabine, Friedmar Apel, Johann Wolfgang Von Goethe, and Schirn Kunsthalle Frankfurt. 1994. “Goethe Und Die Kunst,” 141. http://www.kisc.meiji.ac.jp/~mmandel/recherche/goethe_farbenkreis.html.
Sebastian, Catherine L, Nathalie MG Fontaine, Geoffrey Bird, Sarah-Jayne Blakemore, Stephane A De Brito, Eamon JP McCrory, and Essi Viding. 2011. “Neural Processing Associated with Cognitive and Affective Theory of Mind in Adolescents and Adults.” Social Cognitive and Affective Neuroscience. Oxford University Press, nsr023.
Sievers, Beau, Larry Polansky, Michael Casey, and Thalia Wheatley. 2013. “Music and Movement Share a Dynamic Structure That Supports Universal Expressions of Emotion.” Proceedings of the National Academy of Sciences 110 (1). National Acad Sciences: 70–75.
Stamp, Jimmy. 2013. “Who Really Invented the Smiley Face?” Smithsonian.com. March 13, 2013. http://www.smithsonianmag.com/arts-culture/who-really-invented-the-smiley-face-2058483/.
Stephens, Greg J, Lauren J Silbert, and Uri Hasson. 2010. “Speaker–listener Neural Coupling Underlies Successful Communication.” Proceedings of the National Academy of Sciences 107 (32). National Acad Sciences: 14425–30.
Strack, Fritz, Leonard L Martin, and Sabine Stepper. 1988. “Inhibiting and Facilitating Conditions of the Human Smile: A Nonobtrusive Test of the Facial Feedback Hypothesis.” Journal of Personality and Social Psychology 54 (5). American Psychological Association: 768.
Subrahmanyam, Kaveri, Robert E Kraut, Patricia M Greenfield, and Elisheva F Gross. 2000. “The Impact of Home Computer Use on Children’s Activities and Development.” The Future of Children. JSTOR, 123–44.
Swartz, Barbara E, and Eli S Goldensohn. 1998. “Timeline of the History of EEG and Associated Fields.” Electroencephalography and Clinical Neurophysiology 106: 173–76.
Tortora, Gerard J, and Bryan Derrickson. 2006. “Principles of Anatomy and Physiology. 11th Edition.” Wiley.
Walter, Henrik. 2012. “Social Cognitive Neuroscience of Empathy: Concepts, Circuits, and Genes.” Emotion Review 4 (1). SAGE Publications: 9–17.
Warwick, Kevin, Mark Gasson, Ben Hutt, Iain Goodhew, Peter Kyberd, Henning Schulzrinne, and Xiaotao Wu. 2004. “Thought Communication and Control: A First Step Using Radiotelegraphy.” IEE Proceedings-Communications 151 (3). IET: 185–89.
Wigley, Tom. 2011. “Villemard’s Vision: 1910 Postcards Depict the Year 2000.” http://www.urbanghostsmedia.com/2011/03/1910-vintage-postcards-depict-year-2000/.
Wikström, Valtteri. 2014. “Musical Composition by Regressional Mapping of Physiological Responses to Acoustic Features.” In Proceedings of the International Conference on New Interfaces for Musical Expression.
Zhang, Zhengyou. 2012. “Microsoft Kinect Sensor and Its Effect.” MultiMedia, IEEE 19 (2). IEEE: 4–10.