Overflow in AppConference

Some weeks ago I got an offer for a free VoIPSurfer license. Why? just because of a patch I did one year ago for an issue I had, the patch helped the guy who seems to own voipsurfer, so, it is kind of open source Karma. The offering made me remember the interesting issue I had and I thought today it is a good day to write about it. Some background information can be found in this thread where I discussed with some Asterisk developers the issue: http://lists.digium.com/pipermail/asterisk-dev/2006-November/024616.html

Asterisk, as a software PBX, supports conferencing. However, default implementation in 1.2.x versions did not support native audio mixing. Asterisk default conferencing application is “app_meetme” and it works only if you have zaptel hardware available or if zaptel driver zt_dummy.ko is installed, otherwise it cannot work because meetme() use zt_conf functionality. Because of this I started looking for alternatives that did not require zaptel. I found app_conference which at that time was part of the IaxClient project, now it seems to be an independent project.

Anyway, I started doing some testing and kaboom!, Asterisk crashed when one of our IAX2 servers joined the conference. Running valgrind lead me to find code like this:

void mix_slinear_frames( char *dst, const char *src, int samples ) 

        if ( dst == NULL ) return ;

        if ( src == NULL ) return ;

        int i, val ;

        for ( i = 0 ; i < samples ; ++i ) 

                val = ( (short*)dst )[i] + ( (short*)src )[i] ;

                if ( val > 0x7fff ) 
                        ( (short*)dst )[i] = 0x7fff - 1 ; 
                        continue ;

                else if ( val < -0x7fff )
                        ( (short*)dst )[i] = -0x7fff + 1 ; 
                        continue ;

                        ( (short*)dst )[i] = val ;
                        continue ;

        return ;

This is a typical function where we’re in buffer overflow danger. It receives 2 buffers as arguments and just 1 len. So better for the caller to be sure that both src and dst have enough data to read/write from/to. I quickly found a call like this:

// allocate a mix buffer which fill large enough memory to
// hold a frame, and reset it’s memory so we don’t get noise
char* cp_listenerBuffer = malloc( AST_CONF_BUFFER_SIZE ) ;

memset( cp_listenerBuffer, 0×0, AST_CONF_BUFFER_SIZE ) ;

// point past the friendly offset right to the data

cp_listenerData = cp_listenerBuffer + AST_FRIENDLY_OFFSET ;

// reset the spoken list pointer
cf_spoken = frames_in ;

// really mix the audio
for ( ; cf_spoken != NULL ; cf_spoken = cf_spoken->next )

    // if the members are equal, and they
    // are not null, do not mix them.
    if (
      ( cf_send->member == cf_spoken->member )
       && ( cf_send->member != NULL )

        // don’t mix this frame
    else if ( cf_spoken->fr == NULL )

        ast_log( LOG_WARNING, “unable to mix conf_frame with null ast_framen” ) ;

        // mix the new frame in with the existing buffer
        mix_slinear_frames( cp_listenerData, (char*)( cf_spoken->fr->data ), cf_spoken->fr->samples);

So let’s look closely to the call. The first argument passed is cp_listenerData, a pointer to a brand new voice frame where all the mixing of the conference audio sources will be stored. As we can see in the code, the buffer is of a fixed length AST_CONF_BUFFER_SIZE, but since the pointer is advanced AST_FRIENDLY_OFFSET, then the effective length is AST_CONF_BUFFER_SIZE – AST_FRIENDLY_OFFSET. Let’s see some interesting defines:

// 160 samples 16-bit signed linear

// 2 bytes per sample ( i.e. 16-bit )

// 320 bytes for each 160 sample frame of 16-bit audio

// 1000 ms-per-second / 20 ms-per-frame = 50 frames-per-second

// account for friendly offset when allocating buffer for frame

So AST_CONF_BUFFER_SIZE is 320 + AST_FRIENDLY_OFFSET, since we don’t really care about the asterisk friendly offset since *real* data must start after this offset, we can say our buffer is 320 bytes long. Where this number comes from? Well, some assumptions are made to calculate that number:

1. Telephony most common sampling rate is 8000 samples per second.
2. Audio frames will be 20ms long.
3. Each sample will be encoded using 16 bits ( “slinear” format ).

Given those assumptions we can then calculate: At 8000 samples per second, 20ms audio frame will be made of

( 8000 samples per second * ( 20ms / 1000ms per second) )

that is 160 samples in one 20ms audio frame. Since each of those samples will be encoded with 16 bits, we have a audio frame byte length of:

( 160 samples * 16 bits per sample / 8 bits per byte ), that is 320 bytes, hence #define AST_CONF_FRAME_DATA_SIZE 320

Assumption of 8000 samples per second is an Asterisk requirement, currently Asterisk does not support other sampling rate. 16 bits per sample is assumed because the mixing will be done in slinear format, so this is ok too. However, 20ms audio frames is a *bad* assumption. Some common phones ( Linksys SPA? ) provide an option in their configuration page to change the RTP frame size, and that caused the crash. How many bytes does a 30ms frame has?

( 8000 samples per second * ( 30ms / 1000ms per second) )

that is 240 samples, each one of those samples of 2 bytes ( 16 bits ) give us a frame size of 480 bytes!, far more than the considered 320 bytes, and kaboom! we have a buffer overflow that could lead in the best case to a DoS attack.

Asterisk smoothers to the rescue!

Asterisk smoothers is a way to take variant size frames as input and return frames of single size. So, we can feed the smoother with 480 bytes frames or whichever size we get and the smoother will return us a frame of the size we want ( 320 bytes in this case ).

struct ast_smoother *ast_smoother_new(int size);
int ast_smoother_feed(struct ast_smoother *s, struct ast_frame *f, int swap);
struct ast_frame *ast_smoother_read(struct ast_smoother *s);
ast_smoother_free(struct ast_smoother *s);

So, I used this interface to smooth each frame with more samples ( hence size ) than required for the slinear audio mixing. Each time a new frame was read from a conference member I added this verification and partial fix code:

/* create the smoother if were receiving more samples than needed */
if ( AST_CONF_BLOCK_SAMPLES < fr->samples ) {

  if ( member->inSmoother == NULL ) {
    /* calculate bytes per sample */

    ast_log(LOG_DEBUG, “%s frame has %d samples in %d bytesn”, member->channel_name, fr->samples, fr->datalen);

    float bytes_per_sample = ( (float)fr->datalen / (float)fr->samples );

    ast_log(LOG_DEBUG, “%s frame has %f bytes per samplen”, member->channel_name, bytes_per_sample);

    float new_frame_len    = ( AST_CONF_BLOCK_SAMPLES * bytes_per_sample );
    /* WARNING, currently iLBC codec is not fully supported, sound is still choppy */

    if ( fr->subclass == AST_FORMAT_ILBC ) {
      ast_log(LOG_DEBUG, “ILBC format only accepts datalen multiple of 50, so make it happyn”);

      if ( new_frame_len < 50 ) {
        new_frame_len = 50;

    ast_log(LOG_DEBUG, “%s new frame size is %fn”, member->channel_name, new_frame_len);

    member->inSmoother   = ast_smoother_new(new_frame_len);
  ast_smoother_feed(member->inSmoother, fr);

  while ( ( sfr = ast_smoother_read( member->inSmoother ) ) ) {

    conf_frame* cfr = create_conf_frame( member, member->inFrames, sfr ) ;

    if ( cfr == NULL )
      ast_log( LOG_ERROR, “unable to malloc conf_framen” ) ;

      return -1 ;
    add_member_frame(member, cfr);
} else {

  conf_frame* cfr = create_conf_frame( member, member->inFrames, fr ) ;

  add_member_frame(member, cfr);

Messy, but hey! it compiled! ship it! ( I don’t remember why in the hell I used float for bytes_per_sample, but im sure I had a good reason?? )

So, what I did was just calculate the proper frame size when more samples than needed were received, so, when converted to slinear, the frame will result in 240 bytes as required by mix_slinear_frame and it will not overflow mixing the audio properly.

Anyway, I still got a problem I did not solve at that time. For some reason iLBC codec only accepted frame sizes multiple of 50, and since slinear mixing required 240 ( not a multiple of 50 ) voice sounded choppy 🙁

I guess some improvements were made to app_conference since I used it last time, so I will test it with Asterisk 1.4 and let’s see how it goes …

I’ll post results later,

This entry was posted in asterisk, C/C++, voip. Bookmark the permalink.

One Response to Overflow in AppConference

  1. Pingback: Moy Blog » Blog Archive » AppConference Underflow?

Leave a Reply

Your email address will not be published. Required fields are marked *