没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本文提出并评估了从H.264/AVC到HEVC的多种转码算法,特别是基于动态阈值和内容建模的新颖转码架构。该架构利用序列前几帧计算参数,使转码器能够“学习”特定序列的映射关系。文中详细介绍了两种类型的模式映射算法:一种使用单个H.264/AVC编码参数进行动态阈值划分,另一种则采用线性判别函数将传入的H.264/AVC编码参数映射为HEVC分区。实验表明,所提出的转码方法相较于现有解决方案,在保持竞争力复杂度性能的同时显著降低了率失真损失。 适合人群:对视频压缩标准(如H.264/AVC和HEVC)有一定了解的研究人员和技术开发者。 使用场景及目标:适用于需要高效转换大量旧有H.264/AVC编码视频内容的应用场景,旨在提高转码效率并减少质量损失。 其他说明:本文还探讨了不同训练帧数对模型有效性的影响以及长时间应用同一模型的效果。此外,提出了未来工作的方向,例如探索更多机器学习技术和特征集来改进转码器性能。
资源推荐
资源详情
资源评论


























Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
1
H.264/AVC to HEVC Video Transcoder based on
Dynamic Thresholding and Content Modeling
Eduardo Peixoto, Member, IEEE, Tamer Shanableh, Member, IEEE, and Ebroul Izquierdo, Senior Member, IEEE
Abstract—The new video coding standard, HEVC, was de-
veloped to succeed the current standard, H.264/AVC, as the
state of the art in video compression. However, there is a lot of
legacy content encoded with H.264/AVC. This paper proposes and
evaluates several transcoding algorithms from the H.264/AVC to
the HEVC format. In particular, a novel transcoding architecture,
in which the first frames of the sequence are used to compute
the parameters so that the transcoder can “learn” the mapping
for that particular sequence, is proposed. Then, two types of
mode mapping algorithms are proposed. In the first solution,
a single H.264/AVC coding parameter is used to determine
the outgoing HEVC partitions using dynamic thresholding. The
second solution uses linear discriminant functions to map the
incoming H.264/AVC coding parameters to the outgoing HEVC
partitions. This paper contains experiments designed to study
the impact of the number of frames used for training in
the transcoder. Comparisons with existing transcoding solutions
reveal that the proposed work results in lower rate-distortion
loss at a competitive complexity performance.
Index Terms—Transcoding, HEVC, machine learning.
I. INTRODUCTION
T
HE new video coding standard, so called High Efficient
Video Coding (HEVC) [1], developed by the JCT-VC
group to replace the current H.264/AVC standard [2]. The
main goal of the HEVC codec is not to provide video
compression with different features, such as error correction or
scalability capabilities, but rather to significantly improve the
rate distortion performance, compared to the current standard,
H.264/AVC, in order to allow for new applications, such as
beyond high-definition resolutions (so called 4K, 3840×2160
pixels, and 8K, 7680 × 4320 pixels).
The motivation for a H.264/AVC to HEVC transcoder
is twofold: (i) to be ready to promote inter-operability for
the legacy video encoded in H.264/AVC format, when new
applications using the HEVC emerge; and (ii) to be able to
take advantage of the superior rate-distortion performance of
the HEVC. The first will be useful when the first applications
are launched that use the new standard, while the second could
E. Peixoto is with Departamento de Engenharia El
´
etrica, Universidade de
Bras
´
of Electronic Engineering and Computer Science, Queen Mary, University
ableh is with the college of Engineering, American University of Sharjah,
Copyright (c) 2013 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
This research was partially supported by the European Commission under
contract FP7-287723 REVERIE.
Manuscript received , ?; revised , ?.
be used straight away to migrate the abundant existent video
content encoded in the H.264/AVC format.
In this paper, we build on our previous work [3], in which
we proposed a H.264/AVC to HEVC transcoder based on
a metric called Motion Vector Variance Distance [3], and
another work in which we proposed a MPEG-2 to HEVC
transcoder based on content modeling [4]. Here, we explore
other transcoding solutions based on a content-based modeling
approach, in which the transcoder adapts the transcoding
parameters based on the contents of the sequence being
transcoded, and further evaluates the concept of content-based
modeling on the transcoder efficiency.
By definition, transcoding is the process that converts from
one compressed bitstream (called the source or incoming bit-
stream) to another compressed bitstream (called the transcoded
or outgoing bitstream) [5], [6], [7]. Several properties may
change during transcoding: the video format [8], [9], the
bitrate of the video [10], [11], the frame rate [12], [13], the
spatial resolution [14], [15], the coding tools used (i.e., one
bitstream might use B frames, while the other might not,
or scalability layers are added to the target bitstream) [16],
and even the insertion of new information on the video, such
as watermarking [17], hidden data [18] or a layer for error
resilience [19].
In transcoding, it is always possible to use a combination
of a suitable decoder and encoder in tandem, completely
decoding the incoming bitstream and then completely re-
encoding it in the target format. Here, this is defined as the
trivial transcoder. While this approach usually achieves high
quality of the transcoded sequence and can be used for any
target conditions, it is not efficient from the point of view of
complexity.
The two main categories of transcoders are: homogeneous
transcoding (the conversion of bitstreams within the same
format) and heterogeneous transcoding (i.e., between different
formats). Homogeneous transcoding is commonly used to
change the bitstream in order to adapt it to a new functionality,
such as a different bitrate or spatio-temporal resolution. Het-
erogeneous transcoding can also provide the functionalities of
homogeneous transcoding, such as reduction of bit rate and
change of spatio-temporal resolution, but it is mainly defined
by the change of format. The H.264/AVC to HEVC transcoder
falls in the latter category.
In many solutions, heterogeneous transcoding is achieved
by completely decoding the source stream and re-encoding it
in the target format reusing information present in the source
bitstream to speed up the transcoding. This is known as the
cascaded pixel domain approach [5], [6].

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
2
This paper is organised as follows: Section II provides a
review of the relevant literature on heterogeneous transcoding,
especially on algorithms for mode mapping, which is the main
focus of this paper. Section III details our previous work
on the topic, which is used as benchmark to evaluate the
transcoding options proposed here, while Section IV details
these transcoding options. Finally, Section V presents the
experiments and Section VI concludes the paper.
II. RELEVANT LITERATURE
A simple way of classifying the contributions reported in the
transcoding literature is to separate them into algorithms for
mode mapping, algorithms for motion vector approximation
and algorithms for motion vector refinement. The goal of
the mode mapping algorithms is to use information on the
incoming bitstream in order to avoid testing all modes for
the target format. On the other hand, the goal of the motion
vector approximation algorithms is to maximize the reuse
of the motion vectors in the incoming bitstream in order to
avoid costly motion estimation operations in the target encoder.
Finally, the goal of the motion vector refinement algorithms
is to improve the reused and approximated motion vectors so
that a good prediction can be achieved.
The HEVC is able to use the same reference frame structure
as the H.264/AVC [20] and, if this is the case, it would not
need motion vector approximation algorithms. At the same
time, for the HM reference software [21], [22], the impact
of the motion estimation module in the complexity is much
smaller than the impact for other implementations, and so MV
refinement algorithms would not yield the same gain as in
other transcoders. However, the HEVC uses a large number
of modes, making mode mapping algorithms very important
for the transcoder.
In order to reuse the coding mode of a particular macroblock
in the incoming bitstream, a range of algorithms have been
proposed. A simple mode mapping algorithm was proposed
in the context of a H.264/AVC to MPEG-2 transcoder [8]. In
this algorithm, the H.264/AVC macroblock types are classified
in three categories, skipped, inter and intra, for macroblocks
encoded in SKIP mode, inter or intra modes, respectively.
Then, in the transcoder, only the modes associated with these
classes are tested. Another simple algorithm, used in the
context of VC-1 to H.264/AVC transcoding, was proposed
[23]. In this work, since the VC-1 codec offers a smaller
number of modes than the H.264/AVC (for instance, only
blocks sizes of 16 × 16 and 8 × 8 are used for motion
compensation, and there is no skip mode for a macroblock),
the transcoder uses both the macroblock type and the size of
the transform used in VC-1, proposing some rules based on
heuristics, summarized in the form of a look-up table, to decide
which modes are tested in the outgoing H.264/AVC video.
The block mode statistics are used in a MPEG-4 to
H.264/AVC transcoder [24]. In this work, several test se-
quences are transcoded using a trivial transcoder in order
to gather the macroblock mode conversion statistics. This
information is then used to generate a look-up table, which is
used during transcoding to decide which H.264/AVC modes
are tested according to the MPEG-4 mode. A similar approach
was used in a H.264/AVC to MPEG-4 transcoder [25].
In other reported solutions, the idea of using the block mode
statistics is expanded. Machine learning algorithms are used
to map the modes in the incoming bitstream and decide how
the modes in the target codec are tested, in the context of
MPEG-2 to H.264/AVC transcoding [26], [27], [28]. All these
solutions are built around similar ideas: first, few frames of
test sequences are transcoded using a trivial transcoder. For
these frames, some features are computed and stored for each
macroblock, along with the optimal mode used to encode said
macroblocks. Then, a machine learning approach is used to
generate an algorithm to map features computed using the
incoming bitstreams into modes to be tested in the target
codec. The training is performed offline, with the goal of
developing a single, generalized, mapping that can be used for
transcoding any MPEG-2 video. In the first of these solutions
[26], the features used include the MPEG-2 macroblock coding
mode, the coded block pattern, and the means and variances
for each 4×4 residual block, generating a total of 37 features.
In the other solutions [27], [28], the list of features was
expanded to include the MPEG-2 DCT coefficients, neighbour-
ing macroblock information, coded block pattern, the motion
vectors, the mean and variance of the 4 × 4 residual blocks,
and the variance of the means and mean of variances for
each group of means and variances, generating a total of 131
features. A similar approach was presented by some of the
same authors in the context of a Wyner-Ziv to H.264/AVC
transcoder [29]. In this solution, three features are used to
generate the mapping algorithm, being the SAD of the residual
computed in the Wyner-Ziv decoding process, the length of the
motion vector generated by the Wyner-Ziv decoding process,
and information from the Wyner-Ziv reconstruction process,
and the same offline training process is used.
A transcoding solution from H.264/AVC to HEVC has
also been proposed by Zhang et. al. [30]. In this work, a
method to transcode intra frames is proposed, mainly based
on selective merging of the incoming H.264/AVC intra modes
and mapping them to larger HEVC CUs and PUs, according
to the prediction direction found in the H.264/AVC bitstream.
For inter pictures, it builds on the power-spectrum based rate-
distortion optimization (PS-RDO) [31]. In this method, the
cost of a motion vector in the transcoder is estimated from the
motion vector variation and power-spectrum of the prediction
signal resulting from that motion vector. The PS-RDO model
is used to determine both the CU partitioning and the motion
vector used for each PU.
III. PREVIOUS WORK
In this work, we build on our previous work of H.264/AVC
to HEVC transcoder [3]. Two transcoders were proposed: one
is based on MV reuse; and the other is based on a metric
called MV Variance Distance. Both are briefly discussed here,
as they are used to evaluate the proposed transcoders in this
paper.
All transcoding methods presented in this paper are based
on mode mapping algorithms. Therefore, the main idea is to

Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
3
use the H.264/AVC information in order to decide the CU
and PU partitioning, instead of testing every possible CU and
PUs. In this section, and for the remainder of this paper, the
testing of a CU is defined as the assessment of the best way to
encode that particular CU (i.e., deciding the parameters - PU
partitioning, motion vectors, transforms, etc...) and producing
a rate-distortion cost. Similarly, the testing of a motion vector
is defined as the evaluation of the cost of that motion vector,
and comparing this cost with the motion vectors that were
previously tested for that particular PU. In all cases, the default
metrics used in the HM reference software are used.
A. A Transcoder based on MV Reuse
This simple transcoder was presented before [3] for the sole
purpose of evaluating the effect of the motion vector reuse
technique [32], [33] in a H.264/AVC to HEVC transcoder,
which is a technique that is ubiquitous in transcoding. It is
not by any means designed to be a very efficient transcoder,
but it is useful to identify the areas where the largest gain, in
terms of transcoding efficiency, can be achieved. The workflow
of the algorithm is the same for each coding unit (CU) in the
HEVC, and it is based on two main ideas:
1) If any part of this CU was encoded in intra mode in
H.264/AVC, then all possible intra and inter modes are
tested; otherwise, only the inter modes are tested.
2) For any inter partition unit (PU), all H.264/AVC motion
vectors within the current PU are tested. The motion
vectors are reused at integer-pixel level, without any
further refinement at this level. Then, at half-pixel and
quarter-pixel, the default HM search is applied (testing
the eight neighbours at half-pixel level, then the eight
neighbours at quarter-pixel level).
Note that this transcoder reuses the incoming motion vec-
tors, but not the partitioning. All inter modes available in
the HEVC are considered, including the Asymmetric Motion
Partition, AMP [34] - for these partitions, the AMP speed-
up setting, present in the HM4.0rc1 reference software [21],
is enabled. The remaining HEVC settings are the same as
the low-delay configuration for HM4.0rc1, including the
fast mode decision flag (which is enabled). Therefore, this
transcoder saves complexity only by avoiding the motion
estimation (which is performed using a fast motion algorithm,
based on the Enhanced Predictive Zonal Search, EPZS [35]),
and by not testing all intra modes.
B. A Transcoder based on MV Variance Distance
This transcoder is based on a similarity metric, the MV
Variance Distance, and, according to this metric, make the
decision of how to test a particular CU. The MV Variance
Distance metric produces a value υ ≥ 0 for each CU that can
be tested in the HEVC. This metric is based on the variance
of the H.264/AVC motion vectors, and it is computed as:
υ =
q
(σ
2
x
)
2
+
σ
2
y
2
(1)
where σ
2
x
and σ
2
y
are the variances of each component of
the H.264/AVC motion vectors within the CU. If the motion
vectors do not have the same reference frame, they are scaled
using the formula:
mv
n→n−β
=
β
α
· m v
n→n−α
(2)
where n is the current frame, n − α is the reference frame
used by the H.264/AVC motion vector and n − β is the target
reference frame. If the scaling is necessary, then all motion
vectors are scaled to the frame which is closest to the current
frame. If any part of this CU was encoded using an intra
mode, then the metric does not produce a value. Before the
metric is computed, the motion vectors are propagated to the
4 × 4 blocks (i.e., the minimum size in H.264/AVC), and then
the variance is calculated. This way, the motion vectors are
weighted according to the area that they represent.
The idea of using this metric is that, if a large area has a
low value υ, it means that all motion vectors in this area are
similar, and thus it is more likely that this partition will be
encoded using a larger CU in the HEVC, as it is more likely
that a single motion vector will accurately predict the whole
CU. On the other hand, if the same area has a high value υ,
then the motion vectors within this area are very different, and
thus it is less likely that this block will be encoded using a
large CU in the HEVC (meaning it is more likely that it will
be split). This way, it is possible to combine the information
for different H.264/AVC macroblocks and make a decision for
a large block in the HEVC codec.
Two thresholds are used to decide how a particular CU will
be tested, namely T
low
and T
high
, which defines three different
regions R
1
(υ ≤ T
low
), R
2
(T
low
< υ ≤ T
high
) and R
3
(υ > T
high
).
The transcoder algorithm works independently for each CU,
regardless of the CU size. The possible prediction units (PUs)
that can be tested are divided in four groups: (i) SKIP; (ii) inter
2N ×2N; (iii) all remaining inter modes (2N ×N , N ×2N , the
AMP modes, and N ×N); and (iv) the intra modes (2N ×2N,
N ×N and PCM). In addition, the transcoder can decide if the
CU will be split or not (if so, the CU is split in four sub-CUs,
as usual). Then, depending on the value of the MV Variance
Distance υ for this particular CU, four different settings can
be used:
1) if the CU is considered similar (i.e., if υ ≤ T
low
), then
only the PU groups (i) and (ii) will be tested and the
CU will not be split;
2) if the CU is considered as dissimilar (i.e., if υ > T
high
),
then only the PU groups (i) and (iii) will be tested, and
the CU will be split.
3) if the CU is not similar nor dissimilar (i.e., if T
low
<
υ ≤ T
high
), then the PU groups (i), (ii) and (iii) (i.e.,
all inter modes) will be tested and the CU will be split;
and
4) if the value υ cannot be computed (i.e., if
one H.264/AVC partition within the CU was encoded as
intra), then all PU groups are tested and the CU is split.
The algorithm starts from the largest CU size (64 × 64),
computing the MV Variance Distance υ for that CU. Then,
according to the υ value for the CU, the transcoder tests only
剩余13页未读,继续阅读
资源评论


码流怪侠

- 粉丝: 4w+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 本科毕业论文-基于php的家电在线比价系统设计与实现(1).doc
- 水的电离溶液的PHppt讲课文档(1).ppt
- 移动通信公司营业中心工作总结及工作计划(1).doc
- 软件设计开发项目开发总结(1).docx
- 变电站自动化现状与改进(1).docx
- 【精品推荐】-免疫检验自动化仪器分析(1).ppt
- 网站域名续费与维护服务合同(1).docx
- 基于单片机的汽车测速及防盗报警装置(1).doc
- 人工智能发展的喜与忧(1).docx
- 线上线下混合教学模式在开放计算机课程教学中的应用(1).docx
- 互联网+背景下高职语文教学模式探究(1).docx
- 基于大数据分析的互联网+创新创业现状调查与解决策略研究(1).docx
- 基于华为云平台的大数据专业实训教学改革(1).docx
- 斑马打印机ZPLII语言编程(1)(1).pdf
- 基于深度学习的初中数学课堂教学探讨(1).docx
- 中小企业会计信息化中云会计的有效运用分析(1).docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
