MVSNet plus plus : Learning Depth-Based Attention Pyramid Features for Multi-View Stereo

doi:10.1109/TIP.2020.3000611

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Chen, Po-Heng	en_US
dc.contributor.author	Yang, Hsiao-Chien	en_US
dc.contributor.author	Chen, Kuan-Wen	en_US
dc.contributor.author	Chen, Yong-Sheng	en_US
dc.date.accessioned	2020-10-05T02:01:10Z	-
dc.date.available	2020-10-05T02:01:10Z	-
dc.date.issued	2020-01-01	en_US
dc.identifier.issn	1057-7149	en_US
dc.identifier.uri	http://dx.doi.org/10.1109/TIP.2020.3000611	en_US
dc.identifier.uri	http://hdl.handle.net/11536/155205	-
dc.description.abstract	The goal of Multi-View Stereo (MVS) is to reconstruct 3D point-cloud model from multiple views. On the basis of the considerable progress of deep learning, an increasing amount of research has moved from traditional MVS methods to learning-based ones. However, two issues remain unsolved in the existing state-of-the-art methods: (1) only high-level information is considered for depth estimation. This may reduce the localization accuracy of 3D points as the learned model lacks spatial information; and (2) most of the methods require additional post-processing or network refinement to generate a smooth 3D model. This significantly increases the number of model parameters or the computational complexity. To this end, we propose MVSNet++, an end-to-end trainable network for dense depth estimation. Such an estimated depth map can further be applied to 3D model reconstruction. Different from previous methods, in the proposed method, we first adopt feature pyramid structures for both feature extraction and cost volume regularization. This can lead to accurate 3D point localization by fusing multi-level information. To generate smooth depth map, we then carefully integrate instance normalization into MVSNet++ without increasing model parameters and computational burden. Furthermore, we additionally design three loss functions and integrate Curriculum Learning framework into the training process, which can lead to an accurate reconstruction of 3D model. MVSNet++ is evaluated on DTU and Tanks & Temples benchmarks with comprehensive ablation studies. Experimental results demonstrate that our proposed method performs favorably against previous state-of-the-art methods, showing the accuracy and effectiveness of the proposed MVSNet++.	en_US
dc.language.iso	en_US	en_US
dc.subject	Feature extraction	en_US
dc.subject	Three-dimensional displays	en_US
dc.subject	Solid modeling	en_US
dc.subject	Computational modeling	en_US
dc.subject	Image reconstruction	en_US
dc.subject	Estimation	en_US
dc.subject	Training	en_US
dc.subject	Multi-view stereo	en_US
dc.subject	deep learning	en_US
dc.subject	3D model reconstruction	en_US
dc.subject	feature aggregation	en_US
dc.subject	plane sweep algorithm	en_US
dc.title	MVSNet plus plus : Learning Depth-Based Attention Pyramid Features for Multi-View Stereo	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1109/TIP.2020.3000611	en_US
dc.identifier.journal	IEEE TRANSACTIONS ON IMAGE PROCESSING	en_US
dc.citation.volume	29	en_US
dc.citation.spage	7261	en_US
dc.citation.epage	7273	en_US
dc.contributor.department	資訊工程學系	zh_TW
dc.contributor.department	電子工程學系及電子研究所	zh_TW
dc.contributor.department	Department of Computer Science	en_US
dc.contributor.department	Department of Electronics Engineering and Institute of Electronics	en_US
dc.identifier.wosnumber	WOS:000553851400002	en_US
dc.citation.woscount	0	en_US
顯示於類別：	期刊論文