運用增強式學習於HEVC/H.265之編碼控制

標題:	運用增強式學習於HEVC/H.265之編碼控制 Reinforcement Learning for HEVC/H.265 Encoder Control
作者:	鍾佳樺彭文孝 Chung, Chia-Hua Peng, Wen-Hsiao 多媒體工程研究所
關鍵字:	增強式學習;高效率視訊編碼;Reinforcement learning;HEVC/H.265
公開日期:	2017
摘要:	近年來，視訊壓縮領域持續在尋求比被廣泛採用之貪婪演算法更有效率的壓縮方式。而在追求更好的壓縮表現同時，困難度也隨之提升，因為在一個階段所作的編碼決策，會影響一連串後續的決策，進而影響到整體的壓縮表現。增強式學習正是非常適合解決此種有相依性問題的方式。在本論文，我們採用增強式學習去解決HEVC/H.265之編碼單元切割問題，以及intra-frame之位元率控制。針對編碼單元之切割，目標是在無須遍歷整個編碼四分樹的結構下，直接決定每一塊編碼單元的最終切割。我們將編碼單元之切割轉化為一種可用增強式學習的方式，藉由將編碼單元的亮度資訊以及量化參數作為狀態，切割與否的決定視為動作，並把位元率-失真的減少量作為回報。此外，傳統的Q-learning演算法只須考慮後續的一種狀態，無法適用於編碼單元切割後會產生四個子編碼單元；因此，本論文也提出一種應用於編碼單元切割的Q-learning方式，使用卷積神經網絡學習每一個狀態動作配對具有的價值。實驗結果顯示，相較於標準測試軟體HM-16.15，我們所提出之方式僅有2.5% BD-rate的上升；與傳統作法將編碼單元之切割視為二元分類問題相比，我們有相似甚至更好的壓縮表現，而本論文所提之方法能對於編碼單元之位元率-失真減少量有更精準的預測，能被運用於更多的應用中。對於intra-frame之位元率控制，目標是在限定的位元率限制下，去決定每一個編碼樹單元的量化參數以使畫面的失真最小。我們提出使用增強式學習的方式，將編碼樹單元的結構複雜度視為狀態，量化參數之值視為動作，編碼樹單元之負值失真量視為回報，並搭配Q-learning使用一類神經網絡去學習每一狀態該採取的動作。在有限的訓練影片情況下，我們提出的方式已可匹配標準測試軟體HM-16.15之位元率控制演算法。 The video coding community has long been seeking more effective rate-distortion optimization techniques than the widely adopted greedy approach. The difficulty arises when we need to predict how the coding mode decision made in one stage would affect subsequent decisions and thus the overall coding performance. Reinforcement learning lends itself to such dependent decision making problems. We introduce in this thesis reinforcement learning as a mechanism for the coding unit split decision and the intra-frame rate control in HEVC/H.265. For the coding unit split decision, the task is to determine the splitting of a coding unit without the full rate-distortion optimization search adopted by the current HEVC/H.265 committee software. We formulate the coding unit split decision as a reinforcement learning problem by regarding the luminance samples of a coding unit together with the quantization parameter as its state, the split decision as an action, and the reduction in rate-distortion cost relative to keeping the current coding unit intact as the immediate reward. We learn convolutional neural networks based on Q-learning to approximate the rate-distortion cost reduction of each possible state-action pair. The proposed scheme performs comparably with the current full rate-distortion optimization in HM-16.15, incurring a 2.5% average BD-rate loss. While also performing similarly to the conventional binary classification scheme, our scheme can additionally quantify the rate-distortion cost reduction, enabling more applications. For the intra-frame rate control, the task is to determine a quantization parameter value for every coding tree unit in a frame, with the objective being to minimize the frame-level distortion subject to a rate constraint. We draw an analogy between the rate control problem and the reinforcement learning problem, by considering the texture complexity of coding tree units and bit balance as the environment state, the quantization parameter value as an action that an agent needs to take, and the negative distortion of the coding tree unit as an immediate reward. We train a neural network based on Q-learning to be our agent, which observes the state to evaluate the reward for each possible action. When trained on only limited sequences, the proposed model can already perform comparably with the rate control algorithm in HM-16.15.
URI:	http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356638 http://hdl.handle.net/11536/142839
Appears in Collections:	Thesis