Low-level audio processing with QtMultimedia

Published Monday May 10th, 2010
21 Comments on Low-level audio processing with QtMultimedia
Posted in Multimedia

One of the new features introduced in Qt 4.6 is the QtMultimedia module. The ‘big picture’ view of QtMultimedia has been presented in an earlier post to this blog, and has been recently updated. Here I want to take a closer look at the low-level audio APIs in particular, to discuss the types of applications for which they may be useful.

In a following post, I’ll illustrate this by describing a new demo application which has been added to Qt. To whet your appetite, here’s a picture of it:

Spectrum analyser running on SymbianSpectrum analyser: screenshot
Screenshots of spectrum analyser demo app running on Symbian and Windows

Anatomy of an audio stack

One way to explain the intention of the QtMultimedia audio APIs is to take a step back and look at what happens inside an audio playback software stack. For now, let’s think about an archetypal stack, rather than the software which is running on any particular platform. While the implementations vary considerably between platforms, the concepts are broadly similar, at least for the purposes of this discussion.

When the user hits the ‘play’ button on a music track, the following may be among the operations which take place under the covers:

  • Acquisition of hardware resources required by the use case, e.g. output devices (speaker / headphones), coprocessors used for decoding or effects processing, etc.
    • This is particularly important on embedded devices, where:
      • Resources can be highly constrained, e.g. the device may only have sufficient processing power to decode one MP3 stream at a time
      • There may be a requirement to ensure that multimedia use cases don’t interfere with other aspects of the device, for example the fact that music is being played should not prevent the ringtone from being played when a mobile phone receives an incoming call
  • Reading the clip contents either from the file system, or by streaming from the network
  • Decryption of DRM-protected content
  • Parsing the clip’s container format
    • Extracting metadata such as artist, track name, format etc
    • Extracting the encoded audio bytestream
  • Decoding the bytestream to generate a raw PCM stream
  • Applying effects
  • Mixing with other audio streams which are being played concurrently
  • Routing to the correct output device
  • Audio rendering
    • Digital to analogue conversion (DAC) – converting the PCM bytestream into a varying voltage signal
    • Amplification

The following picture tries to show how the components which perform these operations may be related in our imaginary audio stack. The exact arrangement may vary considerably between implementations. In some cases, these differences are simply the result of differing philosophies or approaches to the design of the audio stack. In others, the configuration of audio components may be dictated by hardware constraints. For example, on many embedded devices, audio processing may be performed by a dedicated coprocessor. The physical output connection of this processor may constrain what processing can happen downstream of it – for example, if the MP3 codec runs on a processor whose output is wired directly to the DAC, then no effects can be inserted into the PCM part of the audio path.

Anatomy of an audio stackThe boxes represent components of the native audio stack. The red bars represent APIs which expose the functionality of the native stack, at different levels of abstraction.

The ‘high level API’ deals only with control, not data. This is to say that no buffers of audio data – be it MP3, PCM or any other format – pass between the client and the stack via this interface. Instead, the client describes the audio data which it wishes to process in the form of a descriptor such as a filename or URL. The processing itself is controlled via high-level commands such as play / record, pause, stop, and seek. On top of these commands, there may be another layer which provides features such as playlist management.

Parameters of the processing may be exposed to the client: these will almost definitely include volume / gain; more advanced parameters such as balance, equalizer, and control over audio effects may also be available. In addition, the API – or perhaps a companion API at a similar level of the stack – may allow the client control over the audio routing, by providing information about which audio input / output devices are currently available, and allowing the client to select which of them is used for a given playback / recording session.

In contrast with the above, the ‘low level API’ deals directly with the content of the audio stream. Buffers of audio data are exchanged between the client and the lower levels of the audio stack. The data formats which can be used at this level may vary depending on the platform: most, if not all, audio stacks will allow the client to play or record PCM streams, while support for processing streams of compressed data may or may not be provided.

Because this API is dealing directly with the data stream rather than with an abstract clip descriptor, some control commands – notably seek – do not make sense at this level. Others such as pause still do have a place: although the client is providing or consuming data via the API, it is not typically directly connected to the audio hardware itself. There must usually be a level of buffering between the two in order to ensure that, should the client temporarily stop processing (for example due to its thread yielding to, or being pre-empted by, another one), the audio hardware can continue to read or write data into memory.

The set of audio parameters which are available to the client may be restricted in comparison with those provided by the higher-level API – volume / gain may well be the only parameter which this interface exposes. Similarly, the client may be given less control over audio routing than the higher-level API affords. This would be the case, for example, if the low-level API represented a specific physical device, while the high-level API represented the audio subsystem as a whole.

This description of the high level API should sound familiar to those who have used Qt’s Phonon API (at least if we only think about audio playback – Phonon does not support recording). The functional scope of the high level API may, however, go significantly beyond that of Phonon, as discussed previously in Justin’s post.

The low level API, on the other hand, corresponds to QtMultimedia audio. Before looking at the latter in more detail, it’s worth emphasising one point regarding the relationship between Phonon and QtMultimedia: current Phonon backends do not use QtMultimedia. The implementations of these two APIs are currently completely separate – at least down as far as the native API level.

Looking forward, the QtMobility project is delivering a suite of high-level multimedia APIs. These provide a similar level of abstraction to Phonon, but include features which Phonon lacks, and afford additional flexibility. For a recent update on the status and availability of these APIs, see this post.

The QtMultimedia audio APIs

So, having looked at audio APIs from an abstract standpoint, let’s look at the QtMultimedia audio APIs themselves. This consists of the following four classes:

  • QAudioDeviceInfo
    • Represents an audio device such as a loudspeaker, headset or microphone. Describes its capabilities, in terms of which audio formats the device is able to process. A static function, availableDevices() is provided in order to allow the client to query the set of audio devices present in the system.
  • QAudioInput
    • Allows the client to receive data, in a specified audio format from a specified audio input device. Data is transferred via a QIODevice object, with QAudioInput offering two modes of operation

      • “pull mode”: the client provides a QIODevice by calling void start(QIODevice*). No further intervention is required from the client in order to manage the data flow.
      • “push mode”: the QAudioInput object provides a QIODevice via QIODevice* start(). The client must then listen for the readyRead() signal, and then read() the new data.

    The client can control some aspects of the latency* – i.e. the amount of time between audio being sampled by the hardware and the corresponding data arriving to the application – by calling setBufferSize(). Supported buffer sizes may vary from platform to platform, but most allow sub-10ms latencies at all supported formats.

    The processedUSecs() function allows the client to determine how much data has been captured by the audio device. At any given time, the difference between this and the amount of data which has been received via the QIODevice indicates the amount of latency.

    * Note however that the other source of latency – the time taken for the audio device to prepare to capture data – is outside the control of the client. This initialization happens asynchronously following a call to start(), and its completion is indicated by a stateChanged(QAudio::State) signal.

  • Corresponding interface for audio output, which provides a similar pair of “pull” / “push” overloads of start().

Work in progress

It’s worth saying at this point that there are low-level audio use cases which QAudioInput and QAudioOutput don’t cover. Or to put it another way, some of the functionality towards the bottom part of the diagram above is not currently exposed by QtMultimedia. This missing functionality includes the following:

  • Ability to resume following pre-emption
  • While QAudioOutput and QAudioInput allow the client to suspend and resume processing, on some occasions suspension may caused by events elsewhere in the system rather than being requested by the client.

    On some platforms – particularly in the embedded space – the concept of audio pre-emption is important. For example, on a mobile phone, music playback may need to be terminated by the system when a call is received, so that the ringtone can be played. Once the call has ended (or has been rejected by the user), music playback can be resumed; whether this happens automatically, or requires user to resume playback, depends on the device in question.

    For a QAudioOutput or QAudioInput client which is pre-empted in this way, we need (a) to be able to tell the client that it has been pre-empted, and (b) a way to notify the client when the audio resources which it needs have become available once again, so that it can either auto-resume, or just re-enable the ‘play’ button in its UI.

  • Notification of changes in audio device availability
  • Although QtMultimedia allows the client to query the list of available devices via QAudioDeviceInfo::availableDevices(), there is no signal via which the client can be notified when this list changes. This could happen for example when the user plugs in or disconnects a headset or an HDMI cable. When this happens, the platform may decide that the default audio device has changed, and automatically re-route. Because the application is not notified when this happens, it cannot decide whether this is actually the behaviour which it wants: for example, the application developer may wish to pause audio playback when a headset is disconnected, rather than have its audio automatically re-routed to a loudspeaker.
  • Seamless re-routing
  • This is related to the previous point, specifically, what the client can do when notified that an audio device has become (un)available. While the client can select which device to use by providing a QAudioDeviceInfo to the QAudioInput or QAudioOutput constructor, there is no way to change the output device during recording / playback. The only way to re-route is therefore to tear down the audio session by destroying the QAudioOutput device, and then create a new one with the desired device. The problem with this is that the audio subsystem to which QAudioOutput provides access may be buffering audio data close to the hardware. Tear-down and re-creation therefore at best causes a gap in playback while the system re-buffers, and may also involve some audio data being lost altogether.
  • Volume
  • Finally, QAudioOutput and QAudioInput do not provide any way to query or set the volume / gain.

Plans for adding volume control to QtMultimedia are under way; discussions around the other topics are ongoing. As always, we’d value feedback on these or any other aspects of Qt – please get involved by commenting on this post or via the #qt-labs IRC channel.

When you should go low

So, given the choice between the high-level Phonon API and the low-level QtMultimedia API, what considerations can help you decide which one to use?

Well, let’s start with some easy wins – there are some use cases when Phonon is clearly the right choice:

  • Development of a music player, when the Phonon backends on your targetted platforms already provide:
    • Codecs for most or all of the audio formats your users are likely to want to play
    • Support for most or all of the protocols via which your users want to stream music

In these cases, there is no reason to go to the extra effort which using QAudioOutput would require – the Phonon backend is already doing the heavy lifting for you.

Another use case for which Phonon is probably the way to go is when your application needs to deal with DRM data. This is because using QAudioOutput would likely require the application to handle plaintext (i.e. decrypted) data itself, and may therefore impose some limits on how the application can be deployed.

Update 18/05/10: Phonon does not currently support playback of DRM content, and there is no plan to add this.

Conversely, if the application needs to record, rather than just play audio, then Phonon clearly isn’t suitable since it doesn’t support audio capture.

On the other hand, if the application has any of the following characteristics, QAudioOutput may be the better choice:

  • Specific latency requirements
  • Need to access raw audio data directly

Applications which may have such requirements include:

  • Streaming applications such as VoIP or video telephony endpoints
  • Streaming music applications, when Phonon doesn’t offer the required protocol / codec support
  • Real-time audio analysis applications, such as instrument tuners
  • Applications which synthesize their own audio streams, such as musical instrument simulators
  • Those which need to play sounds at precisely defined moments, such as games

A problem which presents itself when looking at the above list is that taking the decision to use QAudioOutput leaves a lot of work to be done in the application itself. Imagine that the reason Phonon cannot be used is that, although the application needs to stream audio via a protocol which the Phonon backend supports (say, RTSP), the stream is encoded using a proprietary codec which is not available to the Phonon backend. In this case, the application needs to manage the RTSP stream itself – either by using its own streaming engine, or maybe by using native platform streaming APIs, decode the stream using the proprietary codec, and then pass the decoded audio data to QAudioOutput.

The root of this problem is that the abstractions offered by both Phonon and QtMultimedia correspond to a large chunk of audio stack, rather than allowing access to individual components. In the case of Phonon, the entire stack is lumped together and abstracted by a single API. QtMultimedia breaks this down a bit, but still groups the codecs and the final output stage (routing and the DAC) together. There aren’t yet any Qt abstractions for individual components such as codecs, meaning that application developers who wish to address those components directly must do so via platform-specific APIs.

A demo is worth a thousand words

… but I’ve already written twice that much, so it’s probably time for a break. In a following post, we’ll look at putting QtMultimedia to work in that demo application.

Do you like this? Share it
Share on LinkedInGoogle+Share on FacebookTweet about this on Twitter

Posted in Multimedia

21 comments

rule says:

That looks promising, Qt Multimedia API is what i missed in Phonon. Thatnk you for info, I’ll try as soon as possible.

sandsmark says:

This QtMultimedia API looks a lot like the `pcmio` API that Kretz started on for Phonon.

But how do you do plan on doing decoding? Do you use lavcodec and lavformat or something similar directly, wrap gstreamer, or runtime pluggable backends?

Also; the Phonon that ships with Qt 4.7 has a new class called “AudioDataOutput” that outputs raw PCM data, which should cover the use case for real-time audio analysis. And there is currently a GSoC project on going for improving the low-level I/O APIs in Phonon, as well as a GSoC on implementing high-level capture APIs (webcams is the usecase here).

rikrd says:

This all looks great! Looking forward to trying it out.
I read from the docs that the bufferSize (getter and setter) is specified in milliseconds and it is an int. Is it an approximate value or a mistake in the doc? I usually think of a bufferSize in terms of number of samples per buffer. e.g. 1024 samples

thanks for this post!

Gareth Stockwell says:

@rikrd: thanks for spotting that, it’s a mistake in the docs for QAudioInput. I’ve just committed a patch which fixes it – should arrive in the public snapshot in a couple of days.

grsji says:

Will MIDI be supported in QtMultimedia, some time?

songwei1984 says:

If you could share your spectrum analyser demo source code with Symbian and Windows version, I think it’s very useful to learn QtMultimedia module.
Thanks.

naresh says:

What about phonon in Qt 4.7. Will it support fully QIODevice based Phonon::MediaSource. By full I mean it won’t read whole source but treat it as stream and get data by chunks on “readyRead”?

@songwei1984: You can find the source code from here: http://qt.gitorious.com/qt/qt/trees/4.6/demos/spectrum

Gareth Stockwell says:

@songwei1984: As Janne says, the code is available in git; I’ll post another article to this blog shortly describing the implementation of the demo.

rae says:

This blog doesn’t print very well. I think your stylesheets need some looking into..

Coises says:

I’m in the midst of a project I started under Qt 4.5 and switched to Qt 4.6 that uses PortAudio v19. I’d rather use QAudioOutput, but there is a problem I could not solve. PortAudio uses a callback to request audio data, to which it presents the “outputBufferDacTime” — the time, given by a device-dependent clock, at which the first sample of the buffer will begin playing. It also has a “getStreamTime” function which returns the current playback position according to the same clock. Using these two together it is possible to determine very closely what sample is playing “right now.”

I can’t find the equivalent in QAudioOutput. Looking at the code, the closest thing it has, processedUSecs(), appears to give only the amount of data which has been supplied to the audio driver converted into time; there’s no reference I can find that would allow me to determine what is playing now, rather than just how much data has been consumed.

It also looks as if perhaps QtMultimedia doesn’t handle ASIO? That seems like a problem as well.

jan van katwijk says:

QtMultimedia looks great. My application uses (used) portaudio V19, with lots of problems under Linux (none under windows). It seems that QtMultimedia shows similar problems under Linux: enquiring a device list (sound card) will cause a fatal error in alsa and the application to crash. So, hoping and waiting for a binding that does not crash, may be a binding to pulseaudio?

jan

Mike says:

I’m not a friend of developing wheel new. But so long as the bug in Phonon::AudioDataOutput, reported in https://bugs.kde.org/show_bug.cgi?id=237112, isn’t fixed it is always good to have an alternative.

CU Mike

Gareth Stockwell says:

@naresh: Support for Phonon::Stream is dependent on the backend, and more specifically, on the native API which that backend uses. The DirectShow, GStreamer, QuickTime and WaveOut backends already support this in 4.6. For Symbian MMF, we currently do not have any plans to add support in 4.7.

@Coises: The documentation on processedUSecs() is somewhat ambiguous: does it return the amount of data consumed (i.e. sent to the native audio subsystem for buffering), or the amount of data actually played? The Windows, Mac and ALSA backends return the former, while the Symbian backend returns the latter. Clearly this needs to be addressed; if the decision is that processedUSecs() should return the amount of data buffered, then I agree that we should add an API for querying the actual playback / record position from the lower levels. We are looking into this now.

@Coises: At present the only available Windows backend uses WaveOut, and therefore latency may be an issue. I have contacted the team which developed the existing backend, to find out whether there are any plans to add support for ASIO.

@jan van katwijk: A PULSE backend is under development – I have contacted the team responsible to find out when this will be made available.

Coises says:

Thanks for the reply, Gareth. I’m hoping we’ll see the low-level QtMultimedia reach a point where music and video applications written in Qt can be build without outside dependencies (aside from the libav* modules from ffmpeg – I doubt you’ll ever be able to incorporate or replace that package).

Is there any thought of a “video backend” that would provide a concrete subclass of QAbstractVideoSurface which a program could employ to utilize whatever is the most “efficient” presentation method on a given system to show time-stamped video frames in a specified rectangle of the display? Right now I can’t see how to implement QAbstractVideoSurface in a way that gives me anything more than I get by just polling, checking a timer (in this case the PortAudio stream time) and updating a QWidget when a new frame is due.

There is a paper at http://www.portaudio.com/docs/portaudio_sync_acmc2003.pdf which might be of some interest to developers; it describes some of the synchronization scenarios PortAudio developers anticipated, and why they chose the methods they did. It looks like a lot of work has been done and is still continuing on that package, though it’s been without a new “stable” release since December, 2007. Since it is multi-platform and open source (an MIT-style license), perhaps there are useful ideas there to be drawn upon for Qt?

naresh says:

@Gareth Stockwell: I disagree… I couldn’t do simple task with current backends on Mac OS X and Windows. I need to retrieve an mp3 file and play it while it’s being retrieved. I’m downloading it with my custom “QIODevice” that is set as source in phonon. I emit readyReady everytime i get some data, but, after doing few steps in debugger i could see that: QuickTime backend tries to read WHOLE stream as a file. Also on Mac OS X quicktime backend makes my computer fly. It takes 100% cpu. Could you please provide me some more info or example code of such device. To make it simple let this device read a file in like 500ms intervals. I’ll do the rest. I’ve spent lot of time to find out how to make phonon play stream.

jmcphers says:

@naresh: You’re right, the Phonon QT backend will attempt to read the whole stream. There is no API in QT to support reading from an arbitrary bytestream – at least none that I have ever been able to find. It could be possible to hack something up inside the backend, but I wouldn’t recommend it – probably better for Phonon to have an interface to check for the capability.
If you are just streaming audio and you are aiming for cross platform, then hopefully in the future you’d be using a QAudio* class.

naresh says:

@jmcphers: Right now I’m using KDE phonon and phonon-vlc backend, and it works (it’s realy far away from “nicely done” but it works for now)

Robin Lobel says:

Running SDK example “Audio Devices” I’m only left with 1 or 2 channels support, no 5.1 :/
Plus if i try to check for 5.1 support (6 channels) using QAudioDeviceInfo/isFormatSupported I also get “no”.

Is it a bug, or no surround support is planned ?

(I’m running Windows 7 x64, audio is Realtek set up for 5.1)

pooja says:

I am trying to play file other than .wav using QSound class.I don’t want to use PHONON module could you help for me in this context.

pooja says:

I am trying to play file other than .wav using QSound class.I don’t want to use PHONON module could you help for me in this context.

Commenting closed.

Get started today with Qt Download now