閘道器系統上Caffe訓練階段之最佳化

標題:	閘道器系統上Caffe訓練階段之最佳化 Optimizing Caffe Training Framework for Gateway Systems
作者:	張哲豪陳添福 Zhang, Zhe-Hao 資訊科學與工程研究所
關鍵字:	閘道器系統;深度學習;傳導卷積神經網路;效能最佳化;Gateway systems;Caffe;CNN
公開日期:	2017
摘要:	隨著物聯網和深度學習的蓬勃發展，眾多感測器蒐集了各個種類的資訊，如照片、聲音等訊息。傳統的做法，會將大量的資料全部傳送到雲端後再來做處理做運算，當中由於頻寬的限制，會導致不可預期的延遲，如果想要獲得快速的回饋，可能會收到不正確的結果，所以如果將雲端計算移到本地的機器來做端點運算以減輕雲端工作量而且更快取得回饋，想必是必需的。而近年來最廣泛被應用的深度學習平台是Caffe，但是Caffe本身提供的多執行序加速效果並不好。主要有兩個原因導致加速效果不好，第一個原因是Caffe的多執行序在平行運算當中，各個執行序在運算資料時並沒有完全的交疊在一起，會很分散，導致省到的時間並不多；第二個原因，並不是所有傳導卷積神經網路中的演算法都有使用到Caffe本身提供的多執行序加速，所以其他未加速的部分也佔了總訓練時間的一部份。為了要在資源有限的本地機器上做加速，需要同時考慮到硬體資源的限制和運算效能的提升。在本論文中，提出了Caffe_concurrent的方法來去針對整個Caffe的平行度做改進；並提出Caffe_pipelined的方法來去權衡效能的提升和記憶體使用量的上升。Caffe_concurrent可以在資源足夠的平台下獲得好處；而Caffe_pipelined可以在那些有幾乎相近時間的前向傳導和反向傳導卷積神經網路中獲得好處。 With the growth of the Internet of Things (IOT) and deep learning, lots of sensors collect a variety of information, such as images, voice. Those data will be sent to the cloud to be processed and computed. However, because of the limit of network bandwidth, it will cause unexpected latency. If we want to get a rapid feedback, the latency may cause the incorrect result. Therefore, moving to the edge to compute is necessary. The most popular tool that widely used for deep learning in cloud computing is Caffe in recent years. However, using the multithreading which Caffe originally supplied to speed up doesn’t perform very well. There are two reasons causing it doesn’t perform very well. First, when using the multithreading on Caffe, computing data between threads doesn’t completely overlap. Therefore, the reduction of training time is not too much. Second, not every algorithm in CNN models uses the multithreading which Caffe originally supplied, so those aren’t using is a part of training time. In this thesis, we propose Caffe_concurrent to improve the parallelism of Caffe; and Caffe_pipelined to balance the reduction of execution time and the increase of memory usage on gateway systems. Caffe_concurrent can benefit from all CNN models on an unlimited resources machine; Caffe_pipelined can benefit from those CNN models that have almost the same forward and backward propagation time.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070456066 http://hdl.handle.net/11536/142917
Appears in Collections:	Thesis