基于动态阈值与内容建模的H.264/AVC到HEVC视频转码器研究资源-CSDN下载

123 浏览量 2025-03-27 11:33:25 上传评论收藏 293KB PDF 举报

资源推荐

资源详情

资源评论

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

H.264/AVC to HEVC Video Transcoder based on

Dynamic Thresholding and Content Modeling

Eduardo Peixoto, Member, IEEE, Tamer Shanableh, Member, IEEE, and Ebroul Izquierdo, Senior Member, IEEE

Abstract—The new video coding standard, HEVC, was de-

veloped to succeed the current standard, H.264/AVC, as the

state of the art in video compression. However, there is a lot of

legacy content encoded with H.264/AVC. This paper proposes and

evaluates several transcoding algorithms from the H.264/AVC to

the HEVC format. In particular, a novel transcoding architecture,

in which the ﬁrst frames of the sequence are used to compute

the parameters so that the transcoder can “learn” the mapping

for that particular sequence, is proposed. Then, two types of

mode mapping algorithms are proposed. In the ﬁrst solution,

a single H.264/AVC coding parameter is used to determine

the outgoing HEVC partitions using dynamic thresholding. The

second solution uses linear discriminant functions to map the

incoming H.264/AVC coding parameters to the outgoing HEVC

partitions. This paper contains experiments designed to study

the impact of the number of frames used for training in

the transcoder. Comparisons with existing transcoding solutions

reveal that the proposed work results in lower rate-distortion

loss at a competitive complexity performance.

Index Terms—Transcoding, HEVC, machine learning.

I. INTRODUCTION

HE new video coding standard, so called High Efﬁcient

Video Coding (HEVC) [1], developed by the JCT-VC

group to replace the current H.264/AVC standard [2]. The

main goal of the HEVC codec is not to provide video

compression with different features, such as error correction or

scalability capabilities, but rather to signiﬁcantly improve the

rate distortion performance, compared to the current standard,

H.264/AVC, in order to allow for new applications, such as

beyond high-deﬁnition resolutions (so called 4K, 3840×2160

pixels, and 8K, 7680 × 4320 pixels).

The motivation for a H.264/AVC to HEVC transcoder

is twofold: (i) to be ready to promote inter-operability for

the legacy video encoded in H.264/AVC format, when new

applications using the HEVC emerge; and (ii) to be able to

take advantage of the superior rate-distortion performance of

the HEVC. The ﬁrst will be useful when the ﬁrst applications

are launched that use the new standard, while the second could

E. Peixoto is with Departamento de Engenharia El

etrica, Universidade de

Bras

ılia, Brazil, e-mail: [email protected]. E. Izquierdo is with School

of Electronic Engineering and Computer Science, Queen Mary, University

of London, London, UK, e-mail: [email protected]. T. Shan-

ableh is with the college of Engineering, American University of Sharjah,

UAE, e-mail: [email protected].

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending an email to [email protected].

This research was partially supported by the European Commission under

contract FP7-287723 REVERIE.

Manuscript received , ?; revised , ?.

be used straight away to migrate the abundant existent video

content encoded in the H.264/AVC format.

In this paper, we build on our previous work [3], in which

we proposed a H.264/AVC to HEVC transcoder based on

a metric called Motion Vector Variance Distance [3], and

another work in which we proposed a MPEG-2 to HEVC

transcoder based on content modeling [4]. Here, we explore

other transcoding solutions based on a content-based modeling

approach, in which the transcoder adapts the transcoding

parameters based on the contents of the sequence being

transcoded, and further evaluates the concept of content-based

modeling on the transcoder efﬁciency.

By deﬁnition, transcoding is the process that converts from

one compressed bitstream (called the source or incoming bit-

stream) to another compressed bitstream (called the transcoded

or outgoing bitstream) [5], [6], [7]. Several properties may

change during transcoding: the video format [8], [9], the

bitrate of the video [10], [11], the frame rate [12], [13], the

spatial resolution [14], [15], the coding tools used (i.e., one

bitstream might use B frames, while the other might not,

or scalability layers are added to the target bitstream) [16],

and even the insertion of new information on the video, such

as watermarking [17], hidden data [18] or a layer for error

resilience [19].

In transcoding, it is always possible to use a combination

of a suitable decoder and encoder in tandem, completely

decoding the incoming bitstream and then completely re-

encoding it in the target format. Here, this is deﬁned as the

trivial transcoder. While this approach usually achieves high

quality of the transcoded sequence and can be used for any

target conditions, it is not efﬁcient from the point of view of

complexity.

The two main categories of transcoders are: homogeneous

transcoding (the conversion of bitstreams within the same

format) and heterogeneous transcoding (i.e., between different

formats). Homogeneous transcoding is commonly used to

change the bitstream in order to adapt it to a new functionality,

such as a different bitrate or spatio-temporal resolution. Het-

erogeneous transcoding can also provide the functionalities of

homogeneous transcoding, such as reduction of bit rate and

change of spatio-temporal resolution, but it is mainly deﬁned

by the change of format. The H.264/AVC to HEVC transcoder

falls in the latter category.

In many solutions, heterogeneous transcoding is achieved

by completely decoding the source stream and re-encoding it

in the target format reusing information present in the source

bitstream to speed up the transcoding. This is known as the

cascaded pixel domain approach [5], [6].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

This paper is organised as follows: Section II provides a

review of the relevant literature on heterogeneous transcoding,

especially on algorithms for mode mapping, which is the main

focus of this paper. Section III details our previous work

on the topic, which is used as benchmark to evaluate the

transcoding options proposed here, while Section IV details

these transcoding options. Finally, Section V presents the

experiments and Section VI concludes the paper.

II. RELEVANT LITERATURE

A simple way of classifying the contributions reported in the

transcoding literature is to separate them into algorithms for

mode mapping, algorithms for motion vector approximation

and algorithms for motion vector reﬁnement. The goal of

the mode mapping algorithms is to use information on the

incoming bitstream in order to avoid testing all modes for

the target format. On the other hand, the goal of the motion

vector approximation algorithms is to maximize the reuse

of the motion vectors in the incoming bitstream in order to

avoid costly motion estimation operations in the target encoder.

Finally, the goal of the motion vector reﬁnement algorithms

is to improve the reused and approximated motion vectors so

that a good prediction can be achieved.

The HEVC is able to use the same reference frame structure

as the H.264/AVC [20] and, if this is the case, it would not

need motion vector approximation algorithms. At the same

time, for the HM reference software [21], [22], the impact

of the motion estimation module in the complexity is much

smaller than the impact for other implementations, and so MV

reﬁnement algorithms would not yield the same gain as in

other transcoders. However, the HEVC uses a large number

of modes, making mode mapping algorithms very important

for the transcoder.

In order to reuse the coding mode of a particular macroblock

in the incoming bitstream, a range of algorithms have been

proposed. A simple mode mapping algorithm was proposed

in the context of a H.264/AVC to MPEG-2 transcoder [8]. In

this algorithm, the H.264/AVC macroblock types are classiﬁed

in three categories, skipped, inter and intra, for macroblocks

encoded in SKIP mode, inter or intra modes, respectively.

Then, in the transcoder, only the modes associated with these

classes are tested. Another simple algorithm, used in the

context of VC-1 to H.264/AVC transcoding, was proposed

[23]. In this work, since the VC-1 codec offers a smaller

number of modes than the H.264/AVC (for instance, only

blocks sizes of 16 × 16 and 8 × 8 are used for motion

compensation, and there is no skip mode for a macroblock),

the transcoder uses both the macroblock type and the size of

the transform used in VC-1, proposing some rules based on

heuristics, summarized in the form of a look-up table, to decide

which modes are tested in the outgoing H.264/AVC video.

The block mode statistics are used in a MPEG-4 to

H.264/AVC transcoder [24]. In this work, several test se-

quences are transcoded using a trivial transcoder in order

to gather the macroblock mode conversion statistics. This

information is then used to generate a look-up table, which is

used during transcoding to decide which H.264/AVC modes

are tested according to the MPEG-4 mode. A similar approach

was used in a H.264/AVC to MPEG-4 transcoder [25].

In other reported solutions, the idea of using the block mode

statistics is expanded. Machine learning algorithms are used

to map the modes in the incoming bitstream and decide how

the modes in the target codec are tested, in the context of

MPEG-2 to H.264/AVC transcoding [26], [27], [28]. All these

solutions are built around similar ideas: ﬁrst, few frames of

test sequences are transcoded using a trivial transcoder. For

these frames, some features are computed and stored for each

macroblock, along with the optimal mode used to encode said

macroblocks. Then, a machine learning approach is used to

generate an algorithm to map features computed using the

incoming bitstreams into modes to be tested in the target

codec. The training is performed ofﬂine, with the goal of

developing a single, generalized, mapping that can be used for

transcoding any MPEG-2 video. In the ﬁrst of these solutions

[26], the features used include the MPEG-2 macroblock coding

mode, the coded block pattern, and the means and variances

for each 4×4 residual block, generating a total of 37 features.

In the other solutions [27], [28], the list of features was

expanded to include the MPEG-2 DCT coefﬁcients, neighbour-

ing macroblock information, coded block pattern, the motion

vectors, the mean and variance of the 4 × 4 residual blocks,

and the variance of the means and mean of variances for

each group of means and variances, generating a total of 131

features. A similar approach was presented by some of the

same authors in the context of a Wyner-Ziv to H.264/AVC

transcoder [29]. In this solution, three features are used to

generate the mapping algorithm, being the SAD of the residual

computed in the Wyner-Ziv decoding process, the length of the

motion vector generated by the Wyner-Ziv decoding process,

and information from the Wyner-Ziv reconstruction process,

and the same ofﬂine training process is used.

A transcoding solution from H.264/AVC to HEVC has

also been proposed by Zhang et. al. [30]. In this work, a

method to transcode intra frames is proposed, mainly based

on selective merging of the incoming H.264/AVC intra modes

and mapping them to larger HEVC CUs and PUs, according

to the prediction direction found in the H.264/AVC bitstream.

For inter pictures, it builds on the power-spectrum based rate-

distortion optimization (PS-RDO) [31]. In this method, the

cost of a motion vector in the transcoder is estimated from the

motion vector variation and power-spectrum of the prediction

signal resulting from that motion vector. The PS-RDO model

is used to determine both the CU partitioning and the motion

vector used for each PU.

III. PREVIOUS WORK

In this work, we build on our previous work of H.264/AVC

to HEVC transcoder [3]. Two transcoders were proposed: one

is based on MV reuse; and the other is based on a metric

called MV Variance Distance. Both are brieﬂy discussed here,

as they are used to evaluate the proposed transcoders in this

paper.

All transcoding methods presented in this paper are based

on mode mapping algorithms. Therefore, the main idea is to

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

use the H.264/AVC information in order to decide the CU

and PU partitioning, instead of testing every possible CU and

PUs. In this section, and for the remainder of this paper, the

testing of a CU is deﬁned as the assessment of the best way to

encode that particular CU (i.e., deciding the parameters - PU

partitioning, motion vectors, transforms, etc...) and producing

a rate-distortion cost. Similarly, the testing of a motion vector

is deﬁned as the evaluation of the cost of that motion vector,

and comparing this cost with the motion vectors that were

previously tested for that particular PU. In all cases, the default

metrics used in the HM reference software are used.

A. A Transcoder based on MV Reuse

This simple transcoder was presented before [3] for the sole

purpose of evaluating the effect of the motion vector reuse

technique [32], [33] in a H.264/AVC to HEVC transcoder,

which is a technique that is ubiquitous in transcoding. It is

not by any means designed to be a very efﬁcient transcoder,

but it is useful to identify the areas where the largest gain, in

terms of transcoding efﬁciency, can be achieved. The workﬂow

of the algorithm is the same for each coding unit (CU) in the

HEVC, and it is based on two main ideas:

1) If any part of this CU was encoded in intra mode in

H.264/AVC, then all possible intra and inter modes are

tested; otherwise, only the inter modes are tested.

2) For any inter partition unit (PU), all H.264/AVC motion

vectors within the current PU are tested. The motion

vectors are reused at integer-pixel level, without any

further reﬁnement at this level. Then, at half-pixel and

quarter-pixel, the default HM search is applied (testing

the eight neighbours at half-pixel level, then the eight

neighbours at quarter-pixel level).

Note that this transcoder reuses the incoming motion vec-

tors, but not the partitioning. All inter modes available in

the HEVC are considered, including the Asymmetric Motion

Partition, AMP [34] - for these partitions, the AMP speed-

up setting, present in the HM4.0rc1 reference software [21],

is enabled. The remaining HEVC settings are the same as

the low-delay conﬁguration for HM4.0rc1, including the

fast mode decision ﬂag (which is enabled). Therefore, this

transcoder saves complexity only by avoiding the motion

estimation (which is performed using a fast motion algorithm,

based on the Enhanced Predictive Zonal Search, EPZS [35]),

and by not testing all intra modes.

B. A Transcoder based on MV Variance Distance

This transcoder is based on a similarity metric, the MV

Variance Distance, and, according to this metric, make the

decision of how to test a particular CU. The MV Variance

Distance metric produces a value υ ≥ 0 for each CU that can

be tested in the HEVC. This metric is based on the variance

of the H.264/AVC motion vectors, and it is computed as:

υ =

(σ

)





(1)

where σ

and σ

are the variances of each component of

the H.264/AVC motion vectors within the CU. If the motion

vectors do not have the same reference frame, they are scaled

using the formula:

n→n−β





· m v

n→n−α

(2)

where n is the current frame, n − α is the reference frame

used by the H.264/AVC motion vector and n − β is the target

reference frame. If the scaling is necessary, then all motion

vectors are scaled to the frame which is closest to the current

frame. If any part of this CU was encoded using an intra

mode, then the metric does not produce a value. Before the

metric is computed, the motion vectors are propagated to the

4 × 4 blocks (i.e., the minimum size in H.264/AVC), and then

the variance is calculated. This way, the motion vectors are

weighted according to the area that they represent.

The idea of using this metric is that, if a large area has a

low value υ, it means that all motion vectors in this area are

similar, and thus it is more likely that this partition will be

encoded using a larger CU in the HEVC, as it is more likely

that a single motion vector will accurately predict the whole

CU. On the other hand, if the same area has a high value υ,

then the motion vectors within this area are very different, and

thus it is less likely that this block will be encoded using a

large CU in the HEVC (meaning it is more likely that it will

be split). This way, it is possible to combine the information

for different H.264/AVC macroblocks and make a decision for

a large block in the HEVC codec.

Two thresholds are used to decide how a particular CU will

be tested, namely T

low

and T

high

, which deﬁnes three different

regions R

(υ ≤ T

low

), R

low

< υ ≤ T

high

) and R

(υ > T

high

The transcoder algorithm works independently for each CU,

regardless of the CU size. The possible prediction units (PUs)

that can be tested are divided in four groups: (i) SKIP; (ii) inter

2N ×2N; (iii) all remaining inter modes (2N ×N , N ×2N , the

AMP modes, and N ×N); and (iv) the intra modes (2N ×2N,

N ×N and PCM). In addition, the transcoder can decide if the

CU will be split or not (if so, the CU is split in four sub-CUs,

as usual). Then, depending on the value of the MV Variance

Distance υ for this particular CU, four different settings can

be used:

1) if the CU is considered similar (i.e., if υ ≤ T

low

), then

only the PU groups (i) and (ii) will be tested and the

CU will not be split;

2) if the CU is considered as dissimilar (i.e., if υ > T

high

then only the PU groups (i) and (iii) will be tested, and

the CU will be split.

3) if the CU is not similar nor dissimilar (i.e., if T

low

υ ≤ T

high

), then the PU groups (i), (ii) and (iii) (i.e.,

all inter modes) will be tested and the CU will be split;

and

4) if the value υ cannot be computed (i.e., if

one H.264/AVC partition within the CU was encoded as

intra), then all PU groups are tested and the CU is split.

The algorithm starts from the largest CU size (64 × 64),

computing the MV Variance Distance υ for that CU. Then,

according to the υ value for the CU, the transcoder tests only

剩余13页未读，继续阅读

评论收藏

内容反馈

码流怪侠

粉丝: 4w+

基于动态阈值与内容建模的H.264/AVC到HEVC视频转码器研究

新一代视频压缩编码标准-H.264_AVC(第二版).pdf

新一代视频压缩码标准-H.264_AVC(第二版).pdf.zip

H.264/avc经典教程

h.264/AVC for generic audiovisual

H.264/AVC在03年的基本资料下载

H.264/avc reference software JM

H.264到HEVC视频转码技术研究.pdf

ITU-T-H.264.rar_H.264 解码_H.264/AVC_h 264 document_itu_itu-t h.2

H_264_AVC视频编码原理及主要技术

基于Fisher判别分析的H.264 / AVC到HEVC转码的快速CU分区

H.265/HEVC视频压缩标准最后定稿测试模型HM10.0

基于概率理论的从H.264 / AVC到H.265 / HEVC转码视频的客观质量评估方法

中文版视频编解码H.265HEVC [万帅，杨付正 编著] 2014.pdf

新一代高效视频编码H.265HEVC原理、标准与实现 2014年版

Video coding using the H.264/MPEG-4 AVC compression standard

NVIDIA显卡H.264/H.265硬编解码参考文档

视频编码中基于帧复杂度估计的I-帧与P-帧联合速率控制算法（适用于H.264/AVC）

H.265/HEVC压缩编码标准

HEVC/H.265原版标准

H.265/HEVC_HM代码

Effective H.264/AVC to HEVC Transcoder based on Prediction Homogeneity

Overview of the H.264_AVC Video Coding Standard

H.265/HEVC最新参考模型

h264，h265标准文档

视频编码领域中的背景建模辅助的AVC到HEVC转码方法及其实现优化

H.265/HEVC标准白皮书（2013年1月）

最新资源

中文版视频编解码H.265HEVC [万帅，杨付正编著] 2014.pdf