Optimizing Fixation Prediction Using Recurrent Neural Networks for 360 degrees Video Streaming in Head-Mounted Virtual Reality

doi:10.1109/TMM.2019.2931807

Full metadata record

DC Field	Value	Language
dc.contributor.author	Fan, Ching-Ling	en_US
dc.contributor.author	Yen, Shou-Cheng	en_US
dc.contributor.author	Huang, Chun-Ying	en_US
dc.contributor.author	Hsu, Cheng-Hsin	en_US
dc.date.accessioned	2020-05-05T00:01:27Z	-
dc.date.available	2020-05-05T00:01:27Z	-
dc.date.issued	2020-03-01	en_US
dc.identifier.issn	1520-9210	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/TMM.2019.2931807	en_US
dc.identifier.uri	http://hdl.handle.net/11536/153887	-
dc.description.abstract	We study the problem of predicting the viewing probability of different parts of 360. videos when streaming them to head-mounted displays. We propose a fixation prediction network based on recurrent neural network, which leverages sensor and content features. The content features are derived by computer vision (CV) algorithms, which may suffer from inferior performance due to various types of distortion caused by diverse 360. video projection models. We propose a unified approach with overlapping virtual viewports to eliminate such negative effects, andwe evaluate our proposed solution using severalCValgorithms, such as saliency detection, face detection, and object detection. We find that overlapping virtual viewports increase the performance of these existing CV algorithms that were not trained for 360. videos. We next fine-tune our fixation prediction network with diverse design options, including: 1) with or without overlapping virtual viewports, 2) with or without future content features, and 3) different feature sampling rates. We empirically choose the best fixation prediction network and use it in a 360. video streaming system. We conduct extensive trace-driven simulations with a large-scale dataset to quantify the performance of the 360. video streaming system with different fixation prediction algorithms. The results show that our proposed fixation prediction network outperforms other algorithms in several aspects, such as: 1) achieving comparable video quality (average gaps between -0.05 and 0.92 dB), 2) consuming much less bandwidth (average bandwidth reduction by up to 8Mb/s), 3) reducing the rebuffering time (on average 40 s in bandwidth-limited 4G cellular networks), and 4) running in real-time (at most 124 ms).	en_US
dc.language.iso	en_US	en_US
dc.subject	360 degrees video	en_US
dc.subject	Virtual Reality	en_US
dc.subject	HMD	en_US
dc.subject	prediction	en_US
dc.subject	machine learning	en_US
dc.subject	RNN	en_US
dc.subject	tiled streaming	en_US
dc.title	Optimizing Fixation Prediction Using Recurrent Neural Networks for 360 degrees Video Streaming in Head-Mounted Virtual Reality	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1109/TMM.2019.2931807	en_US
dc.identifier.journal	IEEE TRANSACTIONS ON MULTIMEDIA	en_US
dc.citation.volume	22	en_US
dc.citation.issue	3	en_US
dc.citation.spage	744	en_US
dc.citation.epage	759	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.identifier.wosnumber	WOS:000519576700014	en_US
dc.citation.woscount	0	en_US
Appears in Collections:	Articles