Linear Prediction and Synthesis of Speech Signals

 

Arash Komaee, Afshin Sepehri

 

Department of Electrical and Computer Engineering

University of Maryland

 

{akomaee, afshin}@eng.umd.edu

 

 

1      Introduction

 

Speech coding is the operation of transforming the speech signal to a more compact form, which needs smaller bit-rate for transmitting over a communication channel. Generally, this transformation is lossy which means that the signal, reconstructed from the compact form is not exactly the same as the original signal. Considering the dynamics and limitations of the human ear, one can develop some compression schemes such that from the viewpoint of listener, the reconstructed signal is close enough to the original signal.

 

During past few years, several techniques such as Linear Prediction Coding (LPC), Waveform Coding and Sub-band Coding have been proposed for speech compression.  In this project, we use the LPC technique to compress a speech signal. The basic idea of LPC is to transmit the prediction error (residue) instead of the speech signal. Since a linear predictor with properly chosen order can predict the signal with relatively small error variance, the power of residuals is effectively smaller than the power of the original signal. This property of linear prediction enables us to use lower bit rate for transmitting the speech signal through a communication channel.

 

In this project we discuss the application of linear prediction to speech compression in details. A MATLAB code is developed to perform this operation. Even though the implemented algorithm and the developed software can be used for different speech signals, this project focuses on a given speech file. Therefore the analysis of the results is based on this particular file. The file consists of 20768 bytes, sampled at rate of 8 kHz, so that the total duration of the signal is 2.5960 seconds. Fig1 shows this speech signal.

 

Fig1 - The original speech signal used in the reminder of the report

 

 

The reminder of the report is organized in the following way. In section 2 we discus the problem of finding stationary frames of the signal. In this section two methods for framing the signal are proposed. In section 3 we consider the design and properties of linear predictor filters. Also the optimal order of LP filters will be obtained.

 

In section 4 the details of quantization and practical aspects of implementation are discussed. The quality of reconstructed signal versus the compression ratio is the subject of section 5. The effect of channel noise on the quality of reconstruction will be discussed in section 6. Section 7 is a conclusion and summery of other sections.

 

 

2      Stationary Frames of Speech Signal

 

To apply the Linear Prediction method to a signal, it is necessary that the signal be stationary. In fact the speech signal is not stationary in its whole duration, but it consists of almost stationary small intervals. To apply the LP filtering to speech signal, the first step is to decompose the signal to stationary frames. In obtaining the frame duration two factors are considerable.

 

(1)    In data compression scheme using LP filters, one must transmit the LP parameters for each frame. Therefore it is desirable to have frames as long as possible, such that the total number of frames be minimized, as a consequence the total number of filter coefficients be minimized.

      In the other hand, computation of autocorrelation function, which is the basis of filter design, is more accurate in longer durations.

 

(2)    The signal is not stationary in very long frames.

 

     Considering the above factors there must be a trade-off between smaller in number, long-duration non-stationary frames and larger in number, short-duration stationary frames.

 

     Two different methods can be considered as the solution to this trade-off problem, fixed-length frames and variable length-frames. In this section we discuss both methods and fined a criterion for obtaining the frame lengths.

 

 

2.1         Constant Length Frames

 

In fixed-length framing scheme, we consider a fixed number as the duration of all frames. This number must be chosen to be the best solution to the discussed trade-off. Note that in this scheme we have only one degree of freedom, so the resultant frames may be not very well chosen. The general believe in the area of speech processing is that a good choice for frame duration is 20 milli-seconds.

 

For the case of 40 msec frames, some non-stationary frames of the speech signal are illustrated in Fig2. Searching the signal, one can find more non-stationary frames. The existence of considerable number of non-stationary frames in the signal shows that 40 msec is not a proper choice for frame length.

 

 

 

 

Fig2 - Some non-stationary frames with fixed length 40 msec

(Frame numbers 3, 9, 11, 14, 19, 20, 24, 32, and 38)

 

 

      Looking for non-stationary frames in case of 20 msec frames, the following six frames were found. Note that comparing to the last case, the number of non-stationary frames decreases. The cost we must pay for less number of non-stationary frames is increasing the number of frames by a factor of 2 and as a consequence, the total number of filter coefficients to be transmitted will increase by factor of 2. The non-stationary frames for case of 20 msec frames are shown in Fig3.

 

 

Fig3 -  Some non-stationary frames with fixed length 20 msec

(Frame numbers 5, 23, 30, 63, 76, and 96)

 

     If we continue this procedure, we will find only three non-stationary frames for 10 msec frames. In this case although the total number of non-stationary frames decreases, it does not seem logical to accept two times frames number. Based on this argument, it seems that 20 msec is a proper choice for frame length, which is suggested in speech processing literature.

 

2.2         Variable Length Frames

 

To have more degrees of freedom, we can use a variable-length framing scheme. The basic idea of this method is as follows. We consider a minimum frame length, a maximum frame length, an increment frame length, and a stationary speech condition. To obtain the length of each frame we start with the minimum length and in each iteration increase the frame length by the value of increment frame length until we reach to the maximum length or to the point that the stationary speech condition be violated.

 

      Let us assume that  is the average power of the frame up to the iteration  and  is the average power after concatenating the new incremental frame if the following condition satisfied, we accept the new incremental frame as a part of the frame, otherwise the stationary condition is violated.

 

In the above inequality  is a parameter, which must be obtained by trial and error. The value of other parameters have been used in this project are shown in the following table

 

 

Minimum Length

Maximum Length

Increment Length

10 msec

40 msec

5 msec

 

Table 1 - Parameters for variable length framing

 

 

       In the following figure the duration of frames for d=0.2, 0.35 and 0.5 are illustrated. Note that increasing the value of d decreases the number of frames. As it is expected, the frame duration lies between 10 and 40 msec.

Fig4 - Frame duration for d=0.2, 0.35 and 0.5

 

 

In Fig5 the total number of frames versus  is plotted. Note that for, the number of frames does not decrease wit increasing of.

Fig5 - Number of frames versus d

 

 

A reasonable choice for  must have the property that

 

(1)   Inside a frame the signal is stationary.

 

(2)    We cannot concatenate more frames and again we have a stationary widow. This condition is shown in Fig6. In this figure for each case, three adjacent frames are plotted, each frame is stationary and there is no possibility to concatenate more frames to have a new stationary frame.

 

Fig6 is obtained for. For this value of, most of frames have the property shown if Fig6, so  seems to be a proper choice.

 

 

Fig6 - Consecutive frames for variable length approach

(Frames (10, 11, 12), (14, 15, 16), (25, 26, 27), (91, 92, 93))

 

 

To have a better idea of the contents of each variable length frame, in Fig7 we show some frames from different parts of the signal.

Fig7 - Some frames with variable length from 10 to 40 msec

(Frame numbers 4, 9, 12, 15, 21, 25, 28, 32, 36, 38, 40, 51, 53, 55, 62, and 68)

 

 

 

2.3         Comparison of Constant Length and Variable Length Framing

 

Although it is too soon to compare fixed and variable length methods, to finish this section, we present Fig8, which shows the average residual power versus total number of LP coefficients. According to this figure for a reasonable choice of LP filters order, the total residual power is smaller for variable length method that justifies the application of this method. We will discuss this issue in next section in more precise way. To compare different futures of these two methods, in next section we will discus two methods in parallel.

 

 

Fig8 - Average Power of Residues versus total number of

gamma coefficients in two different approaches, fixed and variable length frames

 

 

3      Linear Prediction and Lattice Structure

 

The main idea behind the linear prediction filtering is to estimate the value of a signal based on its past observations. It is not surprising that the difference between the actual value of signal and its estimation is small. As a communication view point, transmitting the data as less as possible is desired, so it is convenient to transmit the estimation error instead of the original signal. To perform such compression in data size we need to design Linear Predictor filters.

 

The design of LP filters is based on the knowledge of autocorrelation function.  In this project the actual value of autocorrelation function is not given so we have to estimate this function based on the samples of speech signal.

 

Let us assume that the length of nth frame is  then we use the following equation to estimate the autocorrelation function

 

 

where  is the th sample of the frame. Note that in a proper choice of LP order we need only the first few’s such that.

    

In Fig9 the autocorrelation function and in Fig10 the power spectral density for some of frames are illustrated.

 

 

Fig9 - Autocorrelation function for a number of signal frames

 

Fig10 - Power spectral density for a number of signal frames

 

 

     In the next sub-sections we review the design procedure of LP filters and optimal criterion for obtaining their order.

 

3.1         Computation of Linear Predictor Coefficients

 

     To determine the reflection coefficients of LP lattice filters we use Levinson-Durbin algorithm. This algorithm can obtain reflection coefficients in a recursive method. In each iteration the order of LP filters increases by 1. We first start by filter from order 0 and set

 

             

 

Then in each iteration we perform the following operations

 

             

 

And continue the procedure to reach the desired order. We will obtain the optimal order of each frame in the next sub-section.

 

 

3.2         Optimal Order of Predictor

 

For actual values of autocorrelation function, the prediction error power  and reflection coefficients  are well behaved i.e. as m approaches to,  approaches to a non-zero constant and  approaches to zero. In Fig10 this behavior is illustrated for some frames of speech signal.

 

 

 

Fig10 - Prediction Error Power for well-behaved frames (Fixed Length Frames)

(Frames 3, 7, 14, 31, 35, 41, 45, 63, 72, 85, 94, 130)

 

     Because the instead of actual value of, we use its estimation, the mentioned property of LP coefficients violets for some frames. This violation has two different forms:

 

(1)   For some frames we have.

(2)   For some frames we have.

 

      Both of the above behaviors are unaccepted and show that we cannot increase the order of filter more than m. These behaviors of LP prediction error are illustrated in Fig11.

 

 

Fig11 - Prediction Error Power for bad-behaved frames (Fixed Length Frames)

(Frames 4, 10, 11, 22, 32, 48, 51, 53, 76, 79, 87, 89)

 

 

Because of the behavior illustrated in Fig11, in obtaining the reflection coefficient of filters the designer of algorithm should set a constraint to prevent the above phenomena. In other word, in Levinson-Durbin algorithm when we reach to one of the above condition we must stop the computation of more coefficients.

    

For the case of well-behaved frames (Fig10) for different frames we can see different rate of convergence. For example for some frames a reasonable choice of order is 3-4 and for some others the optimal order is 9-10.  Thus we should set another constrain in the algorithm to prevent over estimated filter order. One suitable criterion that we use in this project is

 

             

 

Where  is a predefined small constant which we set it to.

    

In general as soon as one of the following conditions is satisfied we stop the algorithm and use the number of past iterations as the order of filter

 

(1)    

(2)    

(3)    

(4)          When the order of filter reaches to some maximum order.

 

     Based on the above conditions we have computed the optimal order of filters for all frames for fixed-length frames and variable length frames. In obtaining the orders, the maximum order is chosen equal to 14. The results are shown in Fig12 and Fig13. it is clear from the figures that the order of filters vary from 0 to 14. Note that the filters from order 0 are corresponding to the case that the signal in the frame is completely uncorrelated (white).

 

Fig12 - Order of filters for all frames in fixed-length approach

 

Fig13 - Order of filters for all frames in variable-length approach

 

 

For have more information about the LP filters, G coefficients, impulse response and frequency response of filters are given in Fig14, Fig15, and fig16.

 

 

 

 

Fig14 - Reflection Coefficients (G) for some frames after all correction (Variable Length Frames)

(Frames 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120)

 

 

 

 

 

Fig15 - Impulse Response of LP filters for some frames after all correction (Variable Length Frames)

(Frames 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120)

 

 

 

 

 

 

 

 

 

 

 

 

Fig16 - Frequency Response of Prediction Filters for frames 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120 (Variable Length)

 

To obtain the best choice for maximum order of filters the two following curves are computed for both fixed and variable length frames. Fig17 plots the average residual power versus the maximum order of filters. Also Fig18 shows the average residual power versus the total number of reflection coefficients. Two important facts can be concluded from these two curves. First, a good choice for maximum filter length is 6-8 and the second is that the quality of variable length scheme is better than fixed length scheme.

 

 

 

 

 

 

Fig17 - Comparison of Average Power of Residues in two different approaches,

fixed and variable length frames

 

Fig18 - Average Power of Residues versus total number of gamma coefficients

 in two different approaches, fixed and variable length frames

 

 

 

3.3         Linear Prediction Properties

 

Based on the simulations and filter designed in this project we can show some important futures of linear prediction scheme. In the next three sub-sections we discus these properties.

 

 

3.3.1   Decreasing the Dynamic Range

 

One of the most important futures of linear prediction is decreasing the dynamic range of the residual signal in comparison of the original signal. Fig19 shows the original signal and the residual signal in a single graph. The ability of LP filters in decreasing the dynamic range is clear from this figure.

 

Fig19 - Decreasing the dynamic range of residuals.

Blue plot: original signal, Red plot: residuals

 

 

Another fact about LP filters is that increasing the order of filter decreases the dynamic range of residuals. This fact is shown in Fig20.

 

Fig20 - Average power in each frame for original signal and prediction error

when the signal is filtered by a first and second order LP filter

 

 

3.3.2   Whitening Property

 

Another property of linear prediction is that the prediction error is white noise. Fig21 shows the spectrum of original signal and the prediction error. It is clear that although the spectrum of speech signal has most of its energy in high frequencies, the spectrum of residuals is almost flat. This shows that the prediction error is almost white noise.

 

Fig21 - Spectrum of original signal and prediction error.

Top: Original signal, Bottom: prediction error

 

 

 

3.3.3   Stability of Inverse Filter

 

A FIR linear predictor has its roots inside the unit circle, so its inverse filter is stable. All the poles of LP filters for all frames are given in Fig22. It is clear that all the roots are inside the unit circle.

 

Fig22 Zeros of LP filters which are the poles of inverse LP filters

 

 

3.4  Simulation of LP Filters and Inverse LP Filters

 

     To check whether the LP filters and inverse LP filters can work to each other to analysis and synthesizes the speech signal a simulation has been performed. In this simulation we did not quantized the reflection coefficients or residuals.  Thus it is supposed to have perfect reconstruction. Not surprisingly Fig23 shows that the reconstructed signal and the original signal are exactly the same. In fact due to the limited length of computer registers, a total error power in the order of  was observed.

 

 

Fig23 - Results of Simulation,

Top: original signal, Middle: prediction error, Bottom: reconstructed signal

 

 

At this point we should mention that in reconstruction of signal an overlap between the adjacent frames is considered. Note that the inverse LP filters are IIR so they need initial conditions that their number is equal to the order of filter. We use the last samples of last frame as the initial conditions of IIR inverse filter. This means that the adjacent frames are overlapped equal to the order of filters.

 

 

4      Speech Compression Based on LP Filter

 

In this section we will discuss some practical issues regarding to data compression. First we consider the quantization problem both for reflection coefficients and residuals.

In the next sub-section the required information, which is necessary for reconstruction of signal are examined.  Also in this sub-section a suitable format for transmission of data is proposed. In sub-section 4.3 the results of compression with different compression ratios are presented.

 

 

 

 

 

4.1  Quantization

 

Next step in our process is to quantize data. That is changing format of data from a real number to an integer number, which can be stored in a certain number of bits and send them through the channel to destination. We should quantize all the data including our reflection coefficients and the residues. These quantization processes obviously have direct effects on the quality of the reconstructed signal as we will see.

 

 

4.1.1        Quantization of Reflection Coefficients and Stability Considerations

 

Since the filtering process performed by gammas is reversible, so we can use any number of bits for representing them subject to some conditions. First, we should be aware of stability of the inverse filter. If we decrease the number of bits to a certain point, it causes instability as it can be seen in figures 24 and 25 In figure 24 we used only 5 bits to represent each gamma, but as it can be seen, the filter is still stable and construction is almost perfect. But in figure 25 using 4 bits makes our inverse filter unstable and we see some unbounded values.

Fig. 24- Original Signal, Residues and Reconstructed Signal generated using 5 bits used for each gamma (Stable Result)

 

Fig. 25- Original Signal, Residues and Reconstructed Signal generated using 4 bits used for each gamma (Unstable Result!!)

 

We can find the most minimum number of bits, which can preserve stability by finding the roots of denominator or poles of the filter. They all should be inside the unit circle; otherwise the filter would be unstable.  This is elaborated in figure 26. The poles of the filter for different number of bits per gamma are drawn. For values greater than 4, all the poles are located inside the unit circle. For 3 or 4 numbers of bits, at least one of the poles is outside or on the unit circle.

 

Fig. 26- Position of poles of the filters for different number of bits dedicated to gammas

(3, 4, 5, 6, 7, 8)

3 and 4 are unstable

 

 

The other effect of quantization of the gammas is to make the filter depart from optimality. That is the generated residues may not be white enough. This causes the range of them to be larger and less compression ratio can be achieved. But since this parameter is not too sensitive to the number of bits, we still prefer to se minimum possible bits. So eventually we decided to use 5 bits to represent each gamma. This is a fixed number for all the frames so as it will be described later, this number can be considered known in destination, therefore we do not need to send it along with data.

 

 

4.1.2  Quantization of Residuals

 

The main reason of sing lattice filter in our transmitter is to whiten the speech signal. This kind of whitening decreases the range of our data considerably. This can be seen and compared in figures 27 and 28. These two figures compare the range of original signal and the generated residues for two different approaches we have used so far, fixed length frames and variable length one. It can be seen that this range is smaller for variable length. But at same time it has higher maximum in total. As shown, the range of residues is much less than the original signal. Knowing that this kind of filtering does not add any information to signal, we can use less number of bits for the residues than the original.

Fig. 27- Range of data in frames for original signal and residues (Fixed Length Frames)

 

Fig. 28- Range of data in frames for original signal and residues (Variable Length Frames)

 

 

Since the range of data is so different from frame to frame, we prefer to perform quantization of each frame individually, so we can use different number of bits for representing each frame’s data. For this purpose, by performing some simple range conversion find the reasonable number.

 

Also assuming that our data is not symmetric, we can use a bias to make all data positive first and then use the whole range in quantization. In this way we can get more accurate resides in expense of sending the quantized version of minimum and maximum values of residues. The final number of bits for each frame can be seen in figure 29 for the two methods.

 

Fig. 29 Number of bits allocated for residues for each frame in two methods, Fixed Length Frames (Top) and Variable Length Frames (Bottom)

 

 

4.2  Required Information for Successful Reconstruction

 

In contrast to the original signal which can be seen as 8-bit samples, the structure of data in compressed format is complicated. For successful transmitting of data, not only the residuals must be transmitted, the information about the quantization level of residuals, reflection coefficients and their number, and the number of samples (sub-frames) in case of variable-length framing must be transmitted. In the reminder of this sub-section we will examine some issues in this area.

 

 

4.2.1   Transmission of LP Parameters

 

     Among the most important part of data to be transmitted are reflection coefficients. In some special cases we can prevent the transmission of reflection coefficients. These cases are:

 

(1)    When the average power of residuals in a specified frame is very small comparing to the average power of signal, we can consider that the frame represents the silence. In this case, we can prevent the transmission of residuals and reflection coefficients.

 

(2)    When the frame is highly uncorrelated, the signal in the frame is itself like white noise so we cannot perform more compression on it. In this case we can avoid transmitting the reflection coefficients.

 

(3)    When the reflection coefficients in adjacent frames are very close to each other, we can concatenate two frames and transmit only the reflection coefficients of one of them. This method is similar to the variable length framing approach. The only difference between this to approaches is the criteria of stationary signal condition.

Let us assume that  are the reflection coefficients of frame m and  are the reflection coefficients of frame m+1 we can use the following criteria to decide whether concatenate the adjacent frames.

 

             

 

where  is a predefined small constant.

 

 

4.2.2   Block of Data to Be Transmitted

 

Corresponding to each frame we must produce a block of data with a specified format, which is known for both the transmitter and receiver. The format of this block can be considered as

 

(1)    Frame length. We recall that the variable length frames have minimum length of 10 msec, maximum length of 40 msec, and the increments of 5 msec so 3 bits are enough to represent the frame length. In case of fixed length framing this part can be removed.

 

(2)    The number of reflection coefficients. In this project we will use 3 bits to represent it.

 

(3)    Reflection coefficients each one needs 5 bits.

 

(4)    The resolution of residuals. Because the maximum level of quantization is 8 bits, 3 bits is sufficient to represent the resolution of residuals.

 

(5)    Minimum and Maximum of each frame, 8 bits for each one.

 

(6)    The quantized prediction error.

 

Note that in this scheme different blocks have different number of bits. Apart of the above bits that are parts of a block is necessary to transmit the number of frames.

     Based on above format, for each block the receiver has all necessary information for reconstructing the frame. Using (1)-(5) which is called “header”, the decoder can compute the number of bits available in the block. After receiving all bits of the block the receiver knows that must start with a new block and this procedure continues until the decoder reaches the end of file.

 

 

4.3         Optimal Compression Scheme

 

Appling the LP compression scheme to the given speech signal we have obtained several reconstructed signals for different level of quantization and as a consequence the resultand compressed files are different in size. Not surprisingly the reconstructed signals from more compressed files have lower quality. Tables 2 and 3 contain the related quantities for both fixed length and variable length. These quantities are:

 

(1) The volume of compressed file in bytes.

 

(2) Number of bits per sample.

 

 

(3) Compression ratio which is defined as

     (original file size-compressed file size)/ original file size*100

 

(4) Prediction error power

 

(5) Signal to noise ratio in dB

 

 

Generated File Size (Bytes)

Number of Bits per Sample

Compression Ratio (%)

Prediction Error Power Ratio

Signal/Noise Ratio (dB)

Generated Wave File

11984

4.6092

42.3846

0.0057

22.4382

Listen

10794

4.1515

48.1057

0.0179

17.4654

Listen

9719

3.738

53.274

0.0256

15.9036

Listen

9404

3.6169

54.7884

0.0261

15.8290

Listen

8984

3.4553

56.8076

0.0295

15.2900

Listen

8759

3.3688

57.8894

0.0306

15.1342

Listen

8649

3.3265

58.4182

0.0308

15.1005

Listen

8644

3.3246

58.4423

0.0651

11.8587

Listen

7574

2.913

63.5865

0.07

11.5435

Listen

6839

2.6303

67.1201

0.0926

10.3305

Listen

6614

2.5438

68.2019

0.0961

10.1708

Listen

6504

2.5015

68.7307

0.0969

10.1344

Listen

5539

2.1303

73.3701

0.8376

0.7693

Listen

5429

2.088

73.899

0.8383

0.7653

Listen

4804

1.8476

76.9038

0.9818

0.0792

Listen

 

Table 2 Some generated output file for fixed length frame method (20msec)

 

 

 

 

 

 

 

 

 

Generated File Size (Bytes)

Number of Bits per Sample

Compression Ratio (%)

Prediction Error Power Ratio

Signal/Noise Ratio (dB)

Generated Wave File

12091

4.6503

41.8701

0.0059

22.2333

Listen

10831

4.1657

47.9278

0.0195

17.0975

Listen

9711

3.7349

53.3124

0.0314

15.0252

Listen

9491

3.6503

54.3701

0.026

15.8499

Listen

8991

3.458

56.774

0.0357

14.4670

Listen

8791

3.3811

57.7355

0.0362

14.4012

Listen

8671

3.3349

58.3125

0.0367

14.3523

Listen

8591

3.3042

58.6971

0.148

8.2963

Listen

6891

2.6503

66.8701

0.2187

6.6013

Listen

6851

2.6349

67.0625

0.1711

7.6661

Listen

6511

2.5042

68.6971

0.1753

7.5601

Listen

5511

2.1196

73.5048

1.3488

-1.3001

Listen

4791

1.8426

76.9663

1.4868

-1.7231

Listen

4351

1.6734

79.0817

1.5027

-1.7692

Listen

3571

1.3734

82.8317

0.335

4.7486

Listen

 

Table 3 Some generated output file for variable length frame method

 

 

Based on the information of these tables Fig. 30 and 31 are generated. These figures are the plot of compression ratio versus signal to noise and bit per sample versus signal to noise ratio.  These curves are very important because they simply show that for decreasing the data size, how much the quality of signal will be decreased.

 

Fig. 30 Signal to noise ratio versus Compression Ratio

Fig. 31 Signal to Noise Ratio versus Bit per Sample

                

 

4.4         Locally Generated White Noise and Compression Ratio

 

In this section we discuss the possibility of not transmitting the residuals. Based on this idea for each frame, apart of LP coefficients we transmit the average residual power. At the receiver, a local white noise generator produces a pseudo white noise with the received average power.

  

In Fig. 32 the results of this scheme are shown. Clearly the reconstructed signal is far from the original signal and as a consequence the quality of reconstructed signal is very poor. The interesting point is that listening to this signal, the entire sentence is understandable. Although this compression scheme is very low quality, the volume of the compressed file is very small.

 

Fig. 32 Speech signal reconstructed from locally generated white noise 

Top: original signal, Middle: locally generated white noise, Bottom: reconstructed signal

 

  

5        Quality Analysis of Reconstructed Signal

 

In the last section we performed compression of speech signal based on LP filters. The compression performed based on different levels of quantization and as a consequence different compression ratios.

     In this section we discus the quality of each level of quantization. We use to measures of speech signal quality. First an objective measure which is based on a mathematical definition of difference between original signal and the reconstructed signal.

     Listening to the speech can do the best judgment about quality of a speech signal.  Therefore as an objective measure of a speech signal, one can ask some listeners to judge about its quality. 

     In this section we discus the quality of reconstructed signal based on both objective and subjective measures. Finally we decide about the best compression ratio which results an acceptable quality.

 

 

5.1         Objective Measures of Quality

 

As a measure of closeness of the original and reconstructed signal, one can use the ratio of average signal power to the average error power. Let us assume that  is the original signal and  be the reconstructed signal. Then we use

 

 

as a quantitative measure for the quality of the reconstructed signal.

 

In Tables 2 and 3, the above quantity is shown for different quantization levels. We also put in this table the compression ratio and bit per sample for some quantization level.

 

 

5.2         Subjective Measure of Quality

 

In this sub-section we discuss the quality of the reconstructed signal from the viewpoint of some listeners. It is asked from 5 unbiased listeners to assign one of the numbers (1, 2, 3, and 4) to the signals of table 2.  The authors are not involved in this group and most of them know nothing about speech compression, thus they are unbiased.  The test performs independently and the order of signals is chosen randomly. At the first stage they are asked only to listen to the signal and in the second round they asked to assign the numbers to each signal.

 

     The following table shows the results.

 

Signal Number

L1

L2

L3

L4

L5

Av

CR

B/S

1

4

4

4

4

4

4.0

42.3846

4.6092

2

4

4

4

4

4

4.0

48.1057

4.1515

3

4

4

4

4

3

3.8

53.274

3.738

4

3

4

4

4

3

3.6

54.7884

3.6169

5

3

4

3

3

2

3.0

56.8076

3.4553

6

3

4

2

3

2

2.8

57.8894

3.3688

7

3

3

3

3

2

2.8

58.4182

3.3265

8

3

4

2

3

1

2.6

58.4423

3.3246

9

1

4

3

2

1

2.2

63.5865

2.913

10

1

4

2

3

1

2.2

67.1201

2.6303

11

1

1

2

1

1

1.2

68.2019

2.5438

12

1

1

1

2

1

1.2

68.7307

2.5015

13

1

1

1

1

1

1.0

73.3701

2.1303

14

1

1

1

1

1

1.0

73.899

2.088

15

1

1

1

1

1

1.0

76.9038

1.8476

 

Table 4 The opinion of listeners about the quality of signal

 

      Based on the above table one can conclude that a proper choice is the 3th or 4th row of the table. Based on this criterion, we can say that the best results for fixed length method is 3.6-3.7 bits per sample this is corresponding to a 9719 kbyte compressed file. Similar results have been obtained from the similar table for variable length method.

 

 

6      Speech Transmission Through Imperfect Noisy Channel

 

In a real communication system, the transmitted data is subjected to distortion and noise. In this section we will examine the effect of an imperfect channel on the quality of reconstructed speech.

The block diagram of an imperfect channel is shown in Fig. 33.

 

 

Fig. 33 Block diagram of an imperfect channel

 

For the purpose of this project,  is a FIR transfer function with

 

 

and the variance of the white noise is such that the signal to noise ratio is 20dB.

 

     To simulate the effect of this channel on the reconstruction quality, we pass the residual through the channel and leave the header of the file unaffected.

     Next three figures show the original signal, residual signal, residual signal after transmitting through noisy channel, and the reconstructed signal based on noisy received signal.

 

In Fig. 34 the prediction error is produced by filtering of the original signal using high order LP filters, so the prediction error is very close to white noise and the compression rate is high. We can see from the figure that the effect of channel on reconstructed signal is catastrophic.

Fig34 – The effect of noisy, imperfect channel on a highly compressed signal

First row: Original Signal

Second row: Prediction Error before transmission

Third row: Prediction Error After transmission

Forth row: Reconstructed Signal

 

Fig. 35 shows the result of simulation of the effect of a noisy channel on an average compressed signal. It is clear from the plots that the effect of channel noise is not as catastrophic as last figure.

 

Fig35 – The effect of noisy, imperfect channel on an average compressed signal

First row: Original Signal

Second row: Prediction Error before transmission

Third row: Prediction Error After transmission

Forth row: Reconstructed Signal

 

     Decreasing the order of LP filters and as a consequence increasing the volume of data file, we obtain Fig. 36. According to this figure, the quality of reconstructed filter is acceptable. Listening to the reconstructed signal confirms this claim.

Fig36 – The effect of noisy, imperfect channel on a low compressed signal

First row: Original Signal

Second row: Prediction Error before transmission

Third row: Prediction Error After transmission

Forth row: Reconstructed Signal

 

 

 

The phenomenon we have observed from Fig 34 –36 is a general rule that is justified by Shannon theorem. We can transmit signals through any noisy channel with very low probability of error if the rate of signal is small enough. When we decrease the compression ratio it is equivalent to decreasing the data rate. This results a smaller probability of error.

 

 

7      Conclusion

 

In this project we developed and implemented a linear prediction based method for compression of speech signal. Two different methods for specifying the stationary frames presented and compared. At first it was supposed that the variable length method could better compress the signal, but the final results shows that at least for this specific speech file variable length framing has no sensible advantage over fixed length framing.

 

Based on design of LP filters some important properties of these filters were shown. These properties include, whitening property, decreasing the dynamic range of prediction error, and stability of inverse filters.

 

During the project we observed that, a noisy, imperfect channel has a highly destructive effect on the highly compressed signals. To avoid the destruction of the signal due to imperfect channel the only way is increasing the redundancy to the signal. In our case that is compression of the signal, an acceptable idea is to use a smaller compression ratio to safely transmit the signal through the imperfect channel.

 

 

8      References

 

[1]  J. R. Deller, J. G. Proakis, Discrete Time Processing of Speech Signals, IEEE, 1999.

[2]  T. F. Quatieri, Discrete Time Speech Signal Processing,  Pearson Education, 2001.

[3]  S. Haykin, Adaptive Filter Theory, Prentice-Hall, 1991.

[4]  M. H. Hayes, Statistical Digital Signal Processing and Modeling, John Wiley & Sons, Inc., 1996.