topical media & game development

talk show tell print

codecs

Back to the everyday reality of the technology that surrounds us. What can we expect to become of networked multimedia? Let one thing be clear

compression is the key to effective delivery

There can be no misunderstanding about this, although you may wonder why you need to bother with compression (and decompression). The answer is simple. You need to be aware of the size of what you put on the web and the demands that imposes on the network. Consider the table, taken from  [Codecs], below.

mediauncompressedcompressed
voice 8k samples/sec, 8 bits/sample64 kbps2-4 kbps
slow motion video 10fps 176x120 8 bits5.07 Mbps8-16 kbps
audio conference 8k samples/sec 8bits64 kbps16-64 kbps
video conference 15 fps 352x240 8bits30.4 Mbps64-768 kbps
audio (stereo) 44.1 k samples/s 16 bits1.5 Mbps128k-1.5Mbps
video 15 fps 352x240 15 fps 8 bits30.4 Mbps384 kbps
video (CDROM) 30 fps 352x240 8 bits60.8 Mbps1.5-4 Mbps
video (broadcast) 30 fps 720x480 8 bits248.8 Mbps3-8 Mbps
HDTV 59.9 fps 1280x720 8 bits1.3 Gbps20 Mbps

You'll see that, taking the various types of connection in mind

(phone: 56 Kb/s, ISDN: 64-128 Kb/s, cable: 0.5-1 Mb/s, DSL: 0.5-2 Mb/s)

you must be careful to select a media type that is suitable for your target audience. And then again, choosing the right compression scheme might make the difference between being able to deliver or not being able to do so. Fortunately,

images, video and audio are amenable to compression

Why this is so is explained in  [Codecs]. Compression is feasible because of, on the one hand, the statistical redundancy in the signal, and the irrelevance of particular information from a perceptual perspective on the other hand. Redundancy comes about by both spatial correlation, between neighboring pixels, and temporal correlation, between successive frames.

statistical redundancy in signal


  • spatial correlation -- neighbour samples in single frame
  • temporal correlation -- between segments (frames)

irrelevant information


  • from perceptual point of view

B. Vasudev & W. Li, Memory management: Codecs


The actual process of encoding and decoding may be depicted as follows:

codec = (en)coder + decoder



  signal  -> source coder   ->  channel coder    (encoding)
  
  signal  <- source decoder <-  channel decoder  (decoding)
  

Of course, the coded signal must be transmitted accross some channel, but this is outside the scope of the coding and decoding issue. With this diagram in mind we can specify the codec design problem:

codec design problem


From a systems design viewpoint, one can restate the codec design problem as a bit rate minimization problem, meeting (among others) constraints concerning:

  • specified levels of signal quality,
  • implementation complexity, and
  • communication delay (start coding -- end decoding).

...



1

compression methods

As explained in  [Codecs], there is a large variety of compression (and corresponding decompression) methods, including model-based methods, as for example the object-based MPEG-4 method that will be discussed later, and waveform-based methods, for which we generally make a distinction between lossless and lossy methods. Hufmann coding is an example of a lossless method, and methods based on Fourier transforms are generally lossy. Lossy means that actual data is lost, so that after decompression there may be a loss of (perceptual) quality.

Leaving a more detailed description of compression methods to the diligent students' own research, it should come as no surprise that when selecting a compression method, there are a number of tradeoffs, with respect to, for example, coding efficiency, the complexity of the coder and decoder, and the signal quality. In summary, the follwoing issues should be considered:

tradeoffs

  • resilience to transmission errors
  • degradations in decoder output -- lossless or lossy
  • data representation -- browsing & inspection
  • data modalities -- audio & video.
  • transcoding to other formats -- interoperability
  • coding efficiency -- compression ratio
  • coder complexity -- processor and memory requirements
  • signal quality -- bit error probability, signal/noise ratio
For example, when we select a particular coder-decoder scheme we must consider whether we can guarantee resilience to transmission errors and how these will affect the users' experience. And to what extent we are willing to accept degradations in decoder output, that is lossy output. Another issue in selecting a method of compression is whether the (compressed) data representation allows for browsing & inspection. And, for particular applications, such as conferencing, we should be worried about the interplay of data modalities,in particular, audio & video. With regard to the many existing codecs and the variety of platforms we may desire the possibility of transcoding to other formats to achieve, for example, exchange of media objects between tools, as is already common for image processing tools.

compression standards

Given the importance of codecs it should come as no surprise that much effort has been put in developing standards, such as JPEG for images and MPEG for audio and video. Most of you have heard of MP3 (the audio format), and at least some of you should be familiar with MPEG-2 video encoding (which is used for DVDs).

Now, from a somewhat more abstract perspective, we can, again following  [Codecs], make a distinction between a pixel-based approach (coding the raw signal so to speak) and an object-based approach, that uses segmentation and a more advanced scheme of description.

  • pixel-based -- MPEG-1, MPEG-2, H3.20, H3.24
  • object-based -- MPEG-4
As will be explained in more detail when discussing the MPEG-4 standard in section 3.2, there are a number of advantages with an object-based approach. There is, however, also a price to pay. Usually (object) segmentation does not come for free, but requires additional effort in the phase of authoring and coding.

MPEG-1

To conclude this section on codecs, let's look in somewhat more detail at what is involved in coding and decoding a video signal according to the MPEG-1 standard.

MPEG-1 video compression uses both intra-frame analysis, for the compression of individual frames (which are like images), as well as. inter-frame analysis, to detect redundant blocks or invariants between frames.

The MPEG-1 encoded signal itself is a sequence of so-called I, P and B frames.

frames


  • I: intra-frames -- independent images
  • P: computed from closest frame using DCT (or from P frame)
  • B: computed from two closest P or I frames
Decoding takes place by first selecting I-frames, then P-frames, and finally B-frames. When an error occurs, a safeguard is provided by the I-frames, which stand on themselves.

Subsequent standards were developed to accomodate for more complex signals and greater functionality. These include MPEG-2, for higher pixel resolution and data rate, MPEG-3, to support HDTV, MPEG-4, to allow for object-based compression, and MPEG-7, which supports content description. We will elaborate on MPEG-4 in the next section, and briefly discuss MPEG-7 at the end of this chapter.

example(s) -- gigaport

GigaPort is a project focussing on the development and use of advanced and innovative Internet technology. The project, as can be read on the website, focuses on research on next-generation networks and the implementation of a next-generation network for the research community.

Topics for research include:

GigaPort


  • optical network technologies - models for network architecture, optical network components and light path provisioning.
  • high performance routing and switching - new routing technologies and transport protocols, with a focus on scalability and stability robustness when using data-intensive applications with a high bandwidth demand.
  • management and monitoring - incident response in hybrid networks (IP and optical combined) and technologies for network performance monitoring, measuring and reporting.
  • grids and access - models, interfaces and protocols for user access to network and grid facilities.
  • test methodology - effective testing methods and designing tests for new technologies and network components.
As one of the contributions, internationally, the development of optical technology is claimed, in particular lambda networking, networking on a specific wavelength. Locally, the projects has contributed to the introduction of fibre-optic networks in some major cities in the Netherlands.

research directions -- digital video formats

In the online version you will find a brief overview of digital video technology, written by Andy Tanenbaum, as well as some examples of videos of our university, encoded at various bitrates for different viewers.

What is the situation? For traditional television, there are three standards. The american (US) standard, NTSC, is adopted in North-America, South-America and Japan. The european standard, PAL, whuch seems to be technically superior, is adopted by the rest of the world, except France and the eastern-european countries, which have adopted the other european standard, SECAM. An overview of the technical properties of these standards, with permission taken from Tanenbaum's account, is given below.

system spatial resolution frame rate mbps
NTSC704 x 480 30 243 mbps
PAL/SECAM 720 x 576 25 249 mbps

Obviously real-time distribution of a more than 200 mbps signal is not possible, using the nowadays available internet connections. Even with compression on the fly, the signal would require 25 mbps, or 36 mbps with audio. Storing the signal on disk is hardly an alternative, considering that one hour would require 12 gigabytes.

When looking at the differences between streaming video (that is transmitted real-time) and storing video on disk, we may observe the following tradeoffs:

item streaming downloaded
bandwidth equal to the display rate may be arbitrarily small
disk storage none the entire file must be stored
startup delay almost none equal to the download time
resolution depends on available bandwidth depends on available disk storage

So, what are our options? Apart from the quite successful MPEG encodings, which have found their way in the DVD, there are a number of proprietary formats used for transmitting video over the internet:

formats


Quicktime, introduced by Apple, early 1990s, for local viewing; RealVideo, streaming video from RealNetworks; and Windows Media, a proprietary encoding scheme fromMicrosoft.

Examples of these formats, encoded for various bitrates are available at Video at VU.

Apparently, there is some need for digital video on the internet, for example as propaganda for attracting students, for looking at news items at a time that suits you, and (now that digital video cameras become affordable) for sharing details of your family life.

Is digital video all there is? Certainly not! In the next section, we will deal with standards that allow for incorporating (streaming) digital video as an element in a compound multimedia presentation, possibly synchronized with other items, including synthetic graphics. Online, you will find some examples of digital video that are used as texture maps in 3D space. These examples are based on the technology presented in section 7-3, and use the streaming video codec from Real Networks that is integrated as a rich media extension in the blaxxun Contact 3D VRML plugin.

comparison of codecs

A review of codecs, including Envivio MPEG-4, QuickTime 6, RealNetworks 9 en Windows Media 9 was published januari 2005 by the European Broadcast Union. It appeared that The Real Networks codecs came out best, closely followed by the Windows Media 9 result. Ckeck it out!

(C) Æliens 04/09/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.