论文总字数:43194字
A Novel design of Television: Display Multiple channels simultaneously in Virtual Reality
Lanyi Liu
KTH Royal Institute of Technology
Brinellvägen 8,114 28 Stockholm
( 46) 765633724
lanyi@kth.se
ABSTRACT
Displaying multiple channels in Virtual Reality (VR) can bring in new opportunity in television industry. The traditional way of watching TV, with a TV set and a remote control, is no longer people’s main choice of consuming contents. Instead, some new media, such as mobile phone applications, Internet have gradually taken much of the television market. Similarly, as a competitive new media, VR could play an important role in the future television market and many researches have developed many such applications on this new platform. Much of their endeavor was focused on 360-degree videos. However, this project builds a system with multiple two dimensional videos in VR and uses this system to study how human attention distributes in VR environment.
This research is in the field of Human Computer Interaction (HCI) and examines the question of which video positions in 3D spatial environment can better achieve users’ attention. It is based on the research of human visual attention on multiple video sequences [1]. Accordingly, a website with 30 videos playing at the same time was built. Then, 21 people were tested under the qualitative and quantitative methods from the HCI theory in Oculus rift. The result shows that in 3D spatial multi-channels environment, users tend to apply their attention more on the videos in front of them and videos in the middle and up place.
This study is useful for effectively arranging channels in VE by identifying the important levels of different positions. People from advertising companies and TV station might need the information provided by this study for a better arrange of channels.
Categories and Subject Descriptors
I.3.7 [Computer Graphics]: Virtual Reality; I.3.8 [Computer Graphics]: Applications
General Terms
Measurement, Design, Human Factors
Keywords
VR application, Multi-channels television, Human Computer Interaction, User Experience
INTRODUCTION
The way of people consuming content has kept changing. From movies and television programs to online videos like YouTube, people are still excavating new possibilities to change and improve the original ways of watching contents by designing new systems, interactions, concepts and platforms. According to the data from Eurostat, the proportion of using new media to watch TV had been steadily increasing each year from 2002 to 2012 in most of European countries [11].
Virtual reality, also known as VR, is a hot concept of creating a virtual environment by using sensor technology and computer graphic technology, in order to give people a feeling of the real world. Based on this concept, companies such as Google, Facebook and HTC etc. have designed all kinds of VR devices. Although many applications related to video watching have been built on these new platforms, most of them are about displaying 360 degree or panoramic videos, which aim to give people immersive feelings of scenes.
The other option of playing videos in VR is to display multiple two dimensional videos simultaneously. The main advantage of displaying multiple videos in VR is that users can reach more videos at a time thus could probably reduce the time to find their interested videos. One basis for this design is that VR devices have large field of view (FOV), which is capable of holding more videos in the human vision field without reducing the video size. Normally, human can perceive up to 270-degree view with only eye movement and 360-degree view with head movement. The other basis is that human has the ability and intendancy to split their attention. It is common to see people working with two screens or with split screens and playing mobile phones or tablets while watching TV.
However, the problem is that human brain resource or attention is very limited which cannot be divided into too many parts. When there is too many objects appear in front of people, they tend to focus on some particular interested objects but ignore other uninterested objects. This kind of attention is called selective attention, which will occur in multi-videos VR system. However, this system will become inefficient without people watching most of the videos in it. Not just explicit resources like network bandwidth and hardware, but also some implicit resources like energy and money will be wasted by the videos that get no or little attention to. So it is crucial to understand how people’s attention distributes in terms of position while watching multiple videos simultaneously in VR. This information not only will be useful for advertising companies, video producers and TV stations to allocate their videos more efficiently but also will help improving the user experience in this system.
The hypothesis for this research is that video positions play an important role in attracting people’s attention. Some positions will get more attention from people than other positions. Human tends to apply their attention mostly to the videos in front of them. The attention reduces to the videos in the left and right and videos in the back get the least attention. Similarly, videos in the center of the screen get the most of attention while videos in the up and down of the screen get less attention.
To sum up, this project builds a web system of displaying multiple two dimensional videos simultaneously in VR and conduct a user study to learn how users attention distributes inside VE. To the best of my knowledge, this is the first research learning how users attention distributes in 3D spatial environment. But it is based on a previous research on human attention distribution in multiple videos in a non-VE.
The remaining paper is structured as follows: firstly, we summarize work on virtual reality, multi-channels design and visual spatial attention. Subsequently, we discuss the method that we use including the experiment setup, prototype, user study and data analysis. After that, we present the result of the study in an objective and subjective way. Finally, we discuss and summarize the work.
THEORY AND RELATED RESEARCH
In this section, some theories and researches related to virtual reality, multi-channels and visual attention would be presented.
Virtual Reality
Virtual reality (VR) is a concept of utilizing computer graphic technology to imitate the scene in virtual world or construct a scene that even does not exist in virtual world, which aims to give people an immersive feeling of self-existing in the scene. Virtual environment (VE) is the scene created. Hardware that can display VE has been widely created and can be divided into two categories, which are desktop-based, e.g. Oculus Rift, HTC Vive and PlayStation VR and phone-based e.g. Samsung Gear VR, OnePlus Loop VR and Google Cardboard. Numerous applications built on hardware have reached to various fields of life including education and training, medical treatment, video games, art, architecture design etc.
Many applications related to video watching are also created under this concept. However, most of the effort is on developing omnidirectional videos (also known as 360-degree videos or panoramic videos) in VR. Lizzy Bleumers et al. proposed that this kind of design presents new opportunities for new interactive television format and analyzed the issues and opportunities users anticipated [3]. Hardware devices used for shooting 360-degree videos are also created [15,16,17]. By having 16 cameras surrounded as a circle, it is possible to shoot videos in 360 degree. YouTube has also enabled 360-degree video support to users. Users can navigate the video by moving their device around, the scene in which will also change accordingly. Apart from that, there is a lot of work cover transfiguring the omnidirectional videos or images into 2D display videos or images. Matt Yu et al. improved the coding of omnidirectional videos [22]. Benjamin Petry et al. also contribute to the 360-degree videos by decoupling the navigation in time to navigation in space [21].
There are also some applications trying to reconstruct the environment of TV watching in VR. For example, Netflix has launched its first VR application ‘living room’, with a huge TV screen inside a virtual living room. Users can play videos that Netflix provides on that large virtual screen [10]. Similarly, Hulu also launched its application of shows, movies and clip watching with the background of a big living room and a huge TV screen [28]. Also, virtual cinemas also provide users with the experience in real cinemas [12,13,14], with a large screen in the center of the view. However, these applications can only play one video at a time and many people complain that the scene created is not like the real world and cannot feel self-involved in the scene.
Research on multi-channels
The idea of displaying multi-channels has generated many applications. C.-H. Huang has used Visual Basic as his platform and brought multiple videos concept into physical learning [18]. Users can view sport action clip from three different angles at the same time. Another widely used application scenario is video surveillance.
Some researchers tried to realize multi-channels on web browser. Viewsync and SwigView allow users to watch multiple YouTube videos on websites [4,5] by adding videos’ URL. However, due to the limited space that a web layout can provide, user experience is not good enough. When there is a lot of videos, users have to scroll the website to watch videos in the bottom of the webpage.
Some companies tried to realize multi-channels in digital television. Game Mix and QuckView have enabled subscribers to watch four and nine channels at the same time on TV screen respectively [6]. However, the number of videos could be played at the same time is very limited. Samsung smart TV also adds the multi-link screen function that allows two videos to play simultaneously [19]. But one screen for the live channel, one screen for the YouTube video or web content.
Some researchers also provide the schemes for displaying multiple channels in digital television. Yeong Kyeong Seong and Yoon-Hee Choi in their paper “A Method for Watching Multiple Channels Simultaneously in a Digital Television” have emphasized that multiple PIP or mosaic style display can be considered a good way for easy channel navigation and proposed a method for displaying multiple channels with limited number of tuners in digital TV [7]. N. Uchihara and H. Kasai have also proposed the idea of displaying multiple videos for users to access the view area with any desired resolution interactively in the paper “Fast Stream Joiner for Next-Generation Interactive video”[2]. They use ANSI C and C languages and realize the system on LCD, web browser and smartphone.
Based on the concept that Yeong Kyeong Seong and Yoon-Hee Choi proposed, V. Watkanad and H. Kasai pointed out in their paper “Study of Visual Attention on Multiple Video Sequence” that the promoted scheme of displaying multiple channels in digital television would be unprofitable if user tends to view only salient videos at certain parts on screen [1]. Thus they emphasized the importance of visual attention in the multiple channels design. By having nine videos arranging into 3×3 blocks, participants are asked to watch this prototype and during which eye movement is recorded by an eye-tracking tool. Their result shows that video positions, human face and motion attract attention effectively. In terms of the video position, they think that users are likely to look at videos at the top and middle row of a screen rather than the bottom row.
In terms of how users will adopt the multi-screen service, there is also a research conducted by H.C.Lai et al. In their paper “Multi-screen Service Adoption and Use-diffusion: The Best Model Perspective”, they draw conclusions that intense users will be the main users in multiscreen market and functions of playfulness and interpersonal activities are both very important [20].
All of the above researches give a solid basis of developing multi-channels. However, all of them are implemented in a 2D environment.
Visual spatial attention
Visual spatial attention is one kind of visual attention, which referred to applying attention to a location in the space. This attention is differentiating with other forms of visual attention by emphasizing the direction and location.
How the visual spatial attention distributes has been the subject of many researches. Three models were established to represent the visual spatial attention, which are spotlight metaphor, zoom-lens metaphor and the gradient model. The first one uses spotlight to metaphor visual attention [23]. The brain could preprocess things within the attention spotlight. The second one emphasizes the changeable size of attention scope, which means attention scope can be narrowed or widen [24]. The third one proposes that great resources are concentrated in the focal point and decreases continuously away from center [25].
In terms of the ways to measure attention, qualitative method such as using questionnaire and interview etc. and quantitative method such as button press, eye tracking etc. are both included in related researches.
Duchowski and Andrew have discussed different technology involved in eye tracking [19]. The very first one is called EOG, which is to measure the electric potential on the skin around the eyes to determine the relative direction to the head. The recent one for detecting the eye trace is to use the infrared source light to locate eye position.
EEG is also a way to measure human’s attention by placing electrodes on scalp and recording brain’s activity. According to researches, P300 is a very important cue about attention.
Purpose statement and question
The basis of my project is the work “Study of Visual Attention on Multiple Video Sequence “from V. Watkanad et al. [1] Similarly, my research is also about studying visual attention in multi-videos environment. However, the difference between their study and my project is that my research is set in a 3D spatial environment other than a 2D environment. Apart from that V. Watkanad et al use the time people stay in one video as their variable to estimate the attention level, I add one more variable to record how many times people navigate the video. However, for simplicity, instead of using EEG or precise eye-tracking tools to achieve attention features, I use a rough way of achieving head movement data through console log.
In summary, my research question is how user’s attention distributes in a 3D environment with multiple videos. In other words, what are the important levels of different channels in different locations in VE? This study can help better understand human attention in multi-channels virtual environment thus might be useful for advertising companies and TV stations for better locating their programs. Also, this result of this study will be a tentative guidance for a better design of displaying multiple channels in VR.
METHOD DESCRIPTION
The multi-channels VR TV system was built on web browser, utilizing the open framework of A-Frame form Mozilla VR team. It can be operated on web browser Firefox Nightly or experimental builds of Chromium and on VR device Oculus Rift.
Open-source framework A-Frame
A-Frame is an open-source framework maintained by Mozilla VR team and is the foundation of the prototype in my work [26]. It is built on WebGL, a JavaScript API for creating 3D and 2D graphics in web browser, but has built its entity-component-system pattern, which is the basic concept in 3D game developing. Basic components like camera, light, 3D object, cursor, geometry, position, scale, material etc. are included in this framework, which will help to speed up the design process. However, the main advantage of using this framework is that A-Frame enables designers to create VR scene across desktop, the Oculus Rift and mobile phones.
The easiest way of using this framework is to include the JS build, aframe.min.js from CDN or install it through npm.
In my case, 3D object cylinders were used as the video channels and videos are applied as materials to the cylinders. Videos are initially set as playing automatically and in loop. For simplicity, no playback functions are included in the design. In terms of the web browser, I use Firefox nightly version 48.0.
Oculus Rift
Oculus rift is a virtual reality headset developed and maintained by Oculus VR. In my work, I use Oculus Rift DK2 as my test device [27].
The display’s material is OLED and has resolution of 1080×1200 per eye and has a refresh rate of 90 Hz. The FOV is 110°. There are a number of sensors in Oculus Rift, including a gyroscope, accelerometer, magnetometer and a headset position-tracking sensor. A data fusion process will be used to combine all sensor data. The result will be stored as a 6DoF format, which is a six-degree data. The first three values are translations in X, Y and Z and the last three values are rotations in X, Y and Z.
The display mode is set on the second screen. After the Oculus Rift is connected to the PC, the prototype needs to be moved to the second screen in order to be shown in Oculus Rift. The setting of the lens configurations is default.
System description
Figure 1. Prototype
Figure 1. is the screenshot of the multi-channels VR TV prototype. Totally 30 videos will be played at the same time with types of news, music, sports, fashion, health, science and cartoon. There are three rows of videos and each row with 10 videos. The videos are in curved shapes with length of 30 degree and width of 9 centimeters. The gaps between different rows are 2 centimeter and the gaps between different columns are 6 centimeter. The distance between eye and videos in this VE is 30 centimeters. The red circle in the VE is the gaze point and will move with head rotation or mouse movement. When the red circle focuses on one video, the sound of the video will be played and when the red circle leaves, the sound of the video will disappear.
When oculus rift is connected to the computer, press F key or click the camera button in the right bottom of the screen, the website will be split into two parts and will be rendered into oculus rift. If the oculus is not connected, user can still navigate the system with mouse movement and the functions of the system will still be the same.
In order to reduce the burden of video card, all videos are around one minute with size around 5Mb. The video resources are from YouTube.
User study
The goal of the experiment is to examine the relationship between human attention and video position. In order to make it clear, we define the experiment variables here. The dependent variable is the video position and the independent variables are video types, video qualities and watching postures. Here we define the human attention as the times and time people watch a channel. That is to say, the more times people navigate a channel and longer time they stay in a channel, the more attention is paid on that channel.
Totally 21 people aged between 21 and 25 participated in the study. Among them, 11 people are male and 10 people are female. They are bachelor or master students from Royal Institute of Technology. The experiment environment is set in the VIC laboratory. Figure 2 shows the experimental settings and environment. The Oculus is connected to the computer through wires and the camera connected on the computer is the sensor to detect the head movement. Users need to wear the Oculus Rift and a pair of earphone and sit on a moving chair.
Figure . Experiment environment
The experiment procedure is stated as below. Before the experiment, testers are asked to fill in an informed consent, which gives the basic information of the ongoing test and their rights and responsibilities and a copy of demographic questionnaire, which records their gender, age and education background. After that, testers are given instructions of navigating the prototype for two minutes without interruption. During the process, users sit on a moving chair with Oculus Rift and an earphone. When two minutes ends, I will press the Enter button on the keyboard to record watching behavior data including how many times user’s gaze enter a channel and how long time users spend on one channel. After that, users can take off the Oculus Rift and fill in another questionnaire, which collect some subjective information of how users’ feeling about the system.
In order to eliminate independent variables like video quality and video types, I will switch the video sequences for different users. In this case, 10 people are tested before switch and 11 are tested after the switch. The switch regulation is that videos in first row switch to the second row, second row to the third row and the third row to the first row.
Data collection and analysis
Three kinds of data will be collected after the test, which are called navigating times, stay timer and watching sequence. The first two variables are 1×30 arrays and initialized as 0 and the last variable is stored in stack. The tool used as data analysis was Microsoft Excel.
Each video has a label to differentiate with each other. The label value ranges from 0 to 29. Videos in the middle row have labels from 0 to 9. Videos in up row range from 10 to 19 and in bottom row from 20 to 29.
The navigating times records how many times people’s gaze enter a channel. Whenever the gaze point (red circle) enters the area of a channel, the value of its array space adds 1. For example, if the red circle enters channel 0, navigating times[0] adds 1. This variable can partly reflect how much attention a channel gets in two minutes test. However, this variable has much noise because before users actually start to watch, there is a slight shift of gaze point. In order to reduce the noise, the variable watching sequence is designed. Watching sequence push the channel’s label number into a stack so I can study the route of gaze point in two minutes.
The second variable stay timer records the total amount of time people spend on one channel. When user’s gaze enters one channel, a timer started to record the time people spent on one channel and when the gaze point leaves, the timer stops. A calculation of the amount of time is executed and the value is added to its array space. This variable can partly reflect the attention a channel gets in two minutes test.
Compared with navigating times, stay timer is more influenced by the video content. Because people tend to stay in their interested program more time. Thus the video switch operation might affect stay timer a lot. Navigating times is more related to the process of searching for the interested programs, thus might be less affected by the video switch operation but more affected by the searching behavior.
The Enter button on the keyboard is defined as writing the data into console. When the two minutes’ time ends, the experimenter should press the Enter button. After users taking off the Oculus Rift, data will be written in the Notebook and then analyzed by Microsoft Excel.
RESULTS
In this part, some results of the work will be presented. Firstly, why and how the raw data preprocessed will be explained. Secondly, the result of the objective data, which obtained from web console, will be presented. Lastly, the subjective data from questionnaires will be presented.
Data preprocessing
Navigating times and Stay timer are two main variables to reflect human’s attention distribution in the multiple VR TV system. However, before we make some statistic analysis, these raw data need to be preprocessed.
For Stay timer, although we set the navigating time as around two minutes for each user, there is a slight deviation in time for different users. Also, there are also some users leaving the test before two minutes reach the end. All the above factors make data has different weight. Thus, in order to eliminate the different weight for different data, we need to normalize them. For normalization, we begin by defining variable j=1,2,3… to represent the 21 users and S to represent the Stay timer array and i=1,2,3…to represent the channel number in the system. Then, we calculate the total stay time for each user, denoted as (s). The formulation used to normalize the Stay timer array is
(1)
is the dataset after preprocessing.
剩余内容已隐藏,请支付后下载全文,论文总字数:43194字
该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;