標題: 圖形處理器平行計算技術應用於影像處理及空間桁架結構最佳化之研究
The study of image processing and space truss structure optimization using parallel computing techniques on GPU
作者: 陸勇奇
Lu, Yung-Chi
洪士林
Hung, Shih-Lin
土木工程系所
關鍵字: 統一計算架構;圖形處理器;蜂巢三角特殊圖案;數位影像相關法;六點參考值演算法;強化粒子群最佳化;邊界偏移;CUDA;GPU;Hive-triangle-pattern;Digital-Image-Correlation;Six-pixels-reference;AugPSO;Boundary-shifting
公開日期: 2015
摘要: 本研究針對目前支援統一計算架構(Compute Unified Device Architecture, CUDA)的圖形處理器(GPU),以平行計算技術發展出加速數位影像相關法分析技術之系統,並在空間桁架結構尺寸設計最佳化的問題使用強化粒子群最佳化(AugPSO)演算法,結合CUDA平行技術增進系統計算效能。對於影像處理的部份,本研究提出一個蜂巢三角特殊圖案(Hive Triangle Pattern, HTP),將其塗佈於實驗結構體之部分表面,使擷取之數位影像能夠呈現出更顯著的像素差異性,進而達到增強數位影像相關法(Digital image correlation)的辨識能力,也使得次像素(sub-pixel)分析的結果更加準確。另外,藉由六點參考值(Six Pixels Reference, SPR)演算法,可分析HTP圖案中心位置及大小,以獲得更適用的影像區塊。AugPSO則是利用二個特殊的策略,使得最佳化過程可以更快速收斂到更好的粒子解,又避免粒子過度同質化。這二個策略即為:邊界偏移(Boundary-shifting)和粒子位置重設(Particle position resetting),前者根據安全限制條件中的位移和應力,比對目前粒子解的分析結果,將粒子的斷面積快速移動到靠近邊界的位置;後者則是當粒子靠近目前最佳解附近時,對於符合特定機率的粒子,重新改變粒子的位置,可避免粒子群過度集中,使其從群聚範圍內跳脫到範圍之外,而達到避免同質化的效果。 將三層樓簡易構架安置於小型振動台進行地震模擬,再使用高速相機擷取該簡易構架每個時間步幅的數位影像,塗佈於其表面的HTP圖案亦隨之搖擺變化,而其位移的變化就隱藏於這些影像之中。快速而準確地求得這些歷時位移變化曲線,即為所有振動台實驗量測之最終目的。經由實驗分析所得之位移資料,和LVDT位移計的量測數據比較,其頻率相對誤差僅有0.07%,而位移的均方根誤差只有0.0205cm,顯示結合HTP和數位影像相關係數法的影像分析確實可獲得理想的答案。分析過程中的相關係數計算乃是整個影像處理過程最耗時的部份,而每一個目標影像與原始影像之間的相關係數計算可視為各自獨立無關,因此,不管是平行計算技術中的OpenMP或者本研究主攻的CUDA,與循序計算相較,分別能夠達到11.79倍和71.57倍的效能。 強化粒子群最佳化在常見的桁架benchmark問題中,都能夠更快速獲得更好的粒子解,而在二組大型高塔桁架結構的演算結果,也是具有相同的結果,完全表明AugPSO確實是一個具有快速收斂,而又能夠獲得更佳粒子解的最佳化演算法。在此演算法中的每個粒子,執行的結構分析行為基本上與其他粒子並無直接相關,亦即可將數個粒子予以平行化來達到計算效能提升之目的。尤其是根據數值分析,桁架結構設計最佳化的計算過程,矩陣運算求解節點位移的計算時間占了絕大部分,約占全部計算過程的98%,因此若能將此部分平行處理,必能獲得相當好的效能。然而,矩陣求解的運算過程,例如:高斯消去法,並不像是矩陣相乘那麼單純而又易於平行化。以矩陣相乘的最佳實驗結果,與循序計算可以高達將近1400倍的效能差異,但是,經過本研究嘗試將高斯消去法的矩陣運算,以CUDA平行化後之效果,僅能達到大約7.8倍的效能,也就是大約為GPU串流多處理器(SM)數量的一半。不過,這樣的效能增長並非固定不變,而是隨著矩陣尺度規模的增加而有所提升。這也與GPU的設計架構相符合,整體的數據資料量愈大,GPU能夠提升的效能也隨之增高。在現今巨量數據的資訊時代,利用GPU的巨量計算核心發展更快速適用的平行計算系統,必將成為一個無法避免的趨勢,也是本研究後續的重點發展方向。
The work developed a system used parallel computing techniques in digital image correlation measurement scheme and space truss structure optimization on GPU supported with CUDA. An Augmented Particle Swarm Optimization (AugPSO) was used in the truss structure optimization by combined CUDA parallel techniques to enhance computing efficiency. A new hive triangle pattern was painted on the surface of experiment specimen to express the significant difference of pixel. Recognition ability was enhanced by the HTP and the analysis of sub-pixel was much more accurately. Six pixels reference (SPR) algorithm can estimate the center location and size of HTP. AugPSO adopted two special strategies to fast converge to better particle solution and avoid excessive homogenization of particle. The two strategies are boundary-shifting (BS) and particle-position-resetting (PPR). The function of boundary-shifting is used to move particle to closer boundary between feasible and infeasible regions based on the displacement and stress of current particle with safety constraints. Particle-position-resetting is used to adjust the particle nearing the global better solution to far away original position in less than specific probability. PPR can avoid particles converging to similar position into homogenization effect. Three floors simple structure was set up on small shaker table to simulate earthquake. A high speed camera was used to capture digital images for this structure in each time step. The part surface of structure was painted with hive triangle pattern. The location of HTP was changed in these digital images and the displacement of structure can be estimated. The final goal for shaker table experiment is to quickly find the accurate curve of time-history displacement. Experimental results indicate that the relative error between data from LVDT and analyzed data from digital image correlation is below 0.007%, 0.0205cm in terms of frequency and displacement, respectively. The image analysis method combined HTP and digital image correlation is feasible and accurate applied to image measurement experiment. The calculation of digital image correlation is very time consuming and independent for each target image. Therefore, compared with sequential processing, parallel computing technology can achieve higher performance. The performance of OpenMP and CUDA is 11.79 and 71.57 times to sequential, respectively. In common truss benchmark problem, AugPSO can take less execution time to obtain better particle solution. The results of two large tower truss structure have same tendency to indicate AugPSO is a fast and accurately optimization algorithm. The calculating structure response of each particle is independent with other particles. Thus, several particles can be used in parallel computing to enhance efficiency. According to numerical analysis, the computation time of matrix operation is about 98% in entire calculating process of truss structure optimization. The performance must be able to get more improvements while matrix operation is parallelized successfully. However, matrix solving process is not so simple and easy parallelized as matrix multiplication. The best experiment result of matrix multiplication within parallel computing is 1400 times to sequential computing, but the result of matrix operation using Gauss elimination within CUDA is about 7.8 times to sequential computing. The performance will increase along with the increased scale of matrix. In other words, the larger amount of data, the more performance of GPU can be achieved. For big data era, the parallel computation system using huge streaming processors of GPU is a new tendency. The follow-up study focuses on large data application in GPU parallel systems.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079416805
http://hdl.handle.net/11536/126615
顯示於類別:畢業論文