Full metadata record
DC FieldValueLanguage
dc.contributor.authorFan, Ching-Lingen_US
dc.contributor.authorYen, Shou-Chengen_US
dc.contributor.authorHuang, Chun-Yingen_US
dc.contributor.authorHsu, Cheng-Hsinen_US
dc.date.accessioned2020-05-05T00:01:27Z-
dc.date.available2020-05-05T00:01:27Z-
dc.date.issued2020-03-01en_US
dc.identifier.issn1520-9210en_US
dc.identifier.urihttp://dx.doi.org/10.1109/TMM.2019.2931807en_US
dc.identifier.urihttp://hdl.handle.net/11536/153887-
dc.description.abstractWe study the problem of predicting the viewing probability of different parts of 360. videos when streaming them to head-mounted displays. We propose a fixation prediction network based on recurrent neural network, which leverages sensor and content features. The content features are derived by computer vision (CV) algorithms, which may suffer from inferior performance due to various types of distortion caused by diverse 360. video projection models. We propose a unified approach with overlapping virtual viewports to eliminate such negative effects, andwe evaluate our proposed solution using severalCValgorithms, such as saliency detection, face detection, and object detection. We find that overlapping virtual viewports increase the performance of these existing CV algorithms that were not trained for 360. videos. We next fine-tune our fixation prediction network with diverse design options, including: 1) with or without overlapping virtual viewports, 2) with or without future content features, and 3) different feature sampling rates. We empirically choose the best fixation prediction network and use it in a 360. video streaming system. We conduct extensive trace-driven simulations with a large-scale dataset to quantify the performance of the 360. video streaming system with different fixation prediction algorithms. The results show that our proposed fixation prediction network outperforms other algorithms in several aspects, such as: 1) achieving comparable video quality (average gaps between -0.05 and 0.92 dB), 2) consuming much less bandwidth (average bandwidth reduction by up to 8Mb/s), 3) reducing the rebuffering time (on average 40 s in bandwidth-limited 4G cellular networks), and 4) running in real-time (at most 124 ms).en_US
dc.language.isoen_USen_US
dc.subject360 degrees videoen_US
dc.subjectVirtual Realityen_US
dc.subjectHMDen_US
dc.subjectpredictionen_US
dc.subjectmachine learningen_US
dc.subjectRNNen_US
dc.subjecttiled streamingen_US
dc.titleOptimizing Fixation Prediction Using Recurrent Neural Networks for 360 degrees Video Streaming in Head-Mounted Virtual Realityen_US
dc.typeArticleen_US
dc.identifier.doi10.1109/TMM.2019.2931807en_US
dc.identifier.journalIEEE TRANSACTIONS ON MULTIMEDIAen_US
dc.citation.volume22en_US
dc.citation.issue3en_US
dc.citation.spage744en_US
dc.citation.epage759en_US
dc.contributor.department資訊工程學系zh_TW
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.identifier.wosnumberWOS:000519576700014en_US
dc.citation.woscount0en_US
Appears in Collections:Articles