{akomaee, afshin}@eng.umd.edu
Speech
coding is the operation of transforming the speech signal to a more compact
form, which needs smaller bit-rate for transmitting over a communication
channel. Generally, this transformation is lossy which means that the signal,
reconstructed from the compact form is not exactly the same as the original signal.
Considering the dynamics and limitations of the human ear, one can develop some
compression schemes such that from the viewpoint of listener, the reconstructed
signal is close enough to the original signal.
During
past few years, several techniques such as Linear Prediction Coding (LPC),
Waveform Coding and Sub-band Coding have been proposed for speech
compression. In this project, we use the
LPC technique to compress a speech signal. The basic idea of LPC is to transmit
the prediction error (residue) instead of the speech signal. Since a linear
predictor with properly chosen order can predict the signal with relatively
small error variance, the power of residuals is effectively smaller than the
power of the original signal. This property of linear prediction enables us to
use lower bit rate for transmitting the speech signal through a communication
channel.
In
this project we discuss the application of linear prediction to speech
compression in details. A MATLAB code is developed to perform this operation. Even
though the implemented algorithm and the developed software can be used for
different speech signals, this project focuses on a given speech file.
Therefore the analysis of the results is based on this particular file. The
file consists of 20768 bytes, sampled at rate of 8 kHz, so that the total
duration of the signal is 2.5960 seconds. Fig1 shows this speech signal.

Fig1 - The original speech
signal used in the reminder of the report
The
reminder of the report is organized in the following way. In section 2 we
discus the problem of finding stationary frames of the signal. In this section
two methods for framing the signal are proposed. In section 3 we consider the
design and properties of linear predictor filters. Also the optimal order of LP
filters will be obtained.
In
section 4 the details of quantization and practical aspects of implementation
are discussed. The quality of reconstructed signal versus the compression ratio
is the subject of section 5. The effect of channel noise on the quality of
reconstruction will be discussed in section 6. Section 7 is a conclusion and
summery of other sections.
To
apply the Linear Prediction method to a signal, it is necessary that the signal
be stationary. In fact the speech signal is not stationary in its whole
duration, but it consists of almost stationary small intervals. To apply the LP
filtering to speech signal, the first step is to decompose the signal to
stationary frames. In obtaining the frame duration two factors are
considerable.
(1) In data compression scheme
using LP filters, one must transmit the LP parameters for each frame. Therefore
it is desirable to have frames as long as possible, such that the total number
of frames be minimized, as a consequence the total number of filter
coefficients be minimized.
In the
other hand, computation of autocorrelation function, which is the basis of
filter design, is more accurate in longer durations.
(2) The signal is not stationary
in very long frames.
Considering the above factors there must
be a trade-off between smaller in number, long-duration non-stationary frames
and larger in number, short-duration stationary frames.
Two different methods can be considered as the solution to this
trade-off problem, fixed-length frames and variable length-frames. In this
section we discuss both methods and fined a criterion for obtaining the frame
lengths.
2.1
Constant Length Frames
In fixed-length framing scheme, we consider a fixed number as the duration of all frames. This number must be chosen to be the best solution to the discussed trade-off. Note that in this scheme we have only one degree of freedom, so the resultant frames may be not very well chosen. The general believe in the area of speech processing is that a good choice for frame duration is 20 milli-seconds.
For the case of 40 msec
frames, some non-stationary frames of the speech signal are illustrated in
Fig2. Searching the signal, one can find more non-stationary frames. The
existence of considerable number of non-stationary frames in the signal shows
that 40 msec is not a proper choice for frame length.

Fig2 - Some non-stationary
frames with fixed length 40 msec
(Frame numbers 3, 9, 11, 14,
19, 20, 24, 32, and 38)
Looking
for non-stationary frames in case of 20 msec frames, the following six frames
were found. Note that comparing to the last case, the number of non-stationary
frames decreases. The cost we must pay for less number of non-stationary frames
is increasing the number of frames by a factor of 2 and as a consequence, the
total number of filter coefficients to be transmitted will increase by factor
of 2. The non-stationary frames for case of 20 msec frames are shown in Fig3.

Fig3 - Some non-stationary frames with fixed length
20 msec
(Frame numbers 5, 23, 30,
63, 76, and 96)
If we continue this procedure, we will find only three
non-stationary frames for 10 msec frames. In this case although the total
number of non-stationary frames decreases, it does not seem logical to accept
two times frames number. Based on this argument, it seems that 20 msec is a
proper choice for frame length, which is suggested in speech processing
literature.
2.2
Variable Length Frames
To have more degrees of freedom, we can use a variable-length framing scheme. The basic idea of this method is as follows. We consider a minimum frame length, a maximum frame length, an increment frame length, and a stationary speech condition. To obtain the length of each frame we start with the minimum length and in each iteration increase the frame length by the value of increment frame length until we reach to the maximum length or to the point that the stationary speech condition be violated.
Let us assume that
is the average power
of the frame up to the iteration
and
is the average power
after concatenating the new incremental frame if the following condition
satisfied, we accept the new incremental frame as a part of the frame, otherwise
the stationary condition is violated.
![]()
In the above inequality
is a parameter, which
must be obtained by trial and error. The value of other parameters have been
used in this project are shown in the following table
|
Minimum Length |
Maximum Length |
Increment Length |
|
10 msec |
40 msec |
5 msec |
Table 1 - Parameters for variable
length framing
In the following figure the duration of frames for d=0.2, 0.35 and 0.5 are illustrated. Note that increasing the value of d decreases the number of frames. As it is expected, the frame duration lies between 10 and 40 msec.

Fig4 - Frame duration for d=0.2, 0.35 and 0.5
In Fig5 the total number of frames versus
is plotted. Note that for
, the number of frames does not decrease wit increasing of
.

Fig5 - Number of frames versus d
A reasonable choice for
must have the property
that
(1) Inside a frame the signal is stationary.
(2) We
cannot concatenate more frames and again we have a stationary widow. This
condition is shown in Fig6. In this figure for each case, three adjacent frames
are plotted, each frame is stationary and there is no possibility to
concatenate more frames to have a new stationary frame.
Fig6 is obtained for
. For this value of
, most of frames have the property shown if Fig6, so
seems to be a proper
choice.

Fig6 - Consecutive frames for
variable length approach
(Frames (10, 11, 12), (14,
15, 16), (25, 26, 27), (91, 92, 93))
To have a better idea of the
contents of each variable length frame, in Fig7 we show some frames from
different parts of the signal.

Fig7 - Some frames with variable
length from 10 to 40 msec
(Frame numbers 4, 9, 12, 15,
21, 25, 28, 32, 36, 38, 40, 51, 53, 55, 62, and 68)
2.3
Comparison of Constant Length and Variable
Length Framing
Although it is too soon to
compare fixed and variable length methods, to finish this section, we present
Fig8, which shows the average residual power versus total number of LP
coefficients. According to this figure for a reasonable choice of LP filters
order, the total residual power is smaller for variable length method that
justifies the application of this method. We will discuss this issue in next
section in more precise way. To compare different futures of these two methods,
in next section we will discus two methods in parallel.

Fig8 - Average Power of Residues versus total number of
gamma coefficients in two different approaches, fixed and variable length frames
The main idea behind the linear prediction filtering is to estimate the value of a signal based on its past observations. It is not surprising that the difference between the actual value of signal and its estimation is small. As a communication view point, transmitting the data as less as possible is desired, so it is convenient to transmit the estimation error instead of the original signal. To perform such compression in data size we need to design Linear Predictor filters.
The design of LP filters is based on the knowledge of autocorrelation function. In this project the actual value of autocorrelation function is not given so we have to estimate this function based on the samples of speech signal.
Let us assume
that the length of nth frame is
then we use the
following equation to estimate the autocorrelation function ![]()
![]()
where
is the
th sample of the frame. Note that in a proper
choice of LP order we need only the first few
’s such that
.
In Fig9 the autocorrelation function and in Fig10 the power spectral density for some of frames are illustrated.

Fig9 - Autocorrelation function
for a number of signal frames

Fig10 - Power spectral density for a number of signal frames
In the next sub-sections we review the design procedure of LP filters and optimal criterion for obtaining their order.
3.1
Computation of Linear Predictor Coefficients
To determine the reflection coefficients of LP
lattice filters we use Levinson-Durbin algorithm. This algorithm can obtain
reflection coefficients in a recursive method. In each iteration the order of
LP filters increases by 1. We first start by filter from order 0 and set

Then in each iteration we
perform the following operations

And continue the procedure
to reach the desired order. We will obtain the optimal order of each frame in
the next sub-section.
3.2
Optimal Order of Predictor
For actual values of
autocorrelation function, the prediction error power
and reflection
coefficients
are well behaved i.e.
as m approaches to
,
approaches to a
non-zero constant and
approaches to zero. In
Fig10 this behavior is illustrated for some frames of speech signal.

Fig10 - Prediction Error Power
for well-behaved frames (Fixed Length Frames)
(Frames 3, 7, 14, 31, 35,
41, 45, 63, 72, 85, 94, 130)
Because the instead of actual value of
, we use its estimation
, the mentioned property of LP coefficients violets for some
frames. This violation has two different forms:
(1)
For some frames we have
.
(2)
For some frames we have
.
Both of the above behaviors are unaccepted and show that we cannot increase the order of filter more than m. These behaviors of LP prediction error are illustrated in Fig11.

Fig11 - Prediction Error Power
for bad-behaved frames (Fixed Length Frames)
(Frames 4, 10, 11, 22, 32,
48, 51, 53, 76, 79, 87, 89)
Because of the behavior illustrated in Fig11, in obtaining the reflection coefficient of filters the designer of algorithm should set a constraint to prevent the above phenomena. In other word, in Levinson-Durbin algorithm when we reach to one of the above condition we must stop the computation of more coefficients.
For the case of well-behaved
frames (Fig10) for different frames we can see different rate of convergence.
For example for some frames a reasonable choice of order is 3-4 and for some
others the optimal order is 9-10. Thus
we should set another constrain in the algorithm to prevent over estimated
filter order. One suitable criterion that we use in this project is
![]()
Where
is a predefined small
constant which we set it to
.
In general as soon as one of
the following conditions is satisfied we stop the algorithm and use the number
of past iterations as the order of filter
(1) ![]()
(2) ![]()
(3) ![]()
(4) When the order of filter reaches to some maximum order.
Based on the above conditions we have computed the optimal order of filters for all frames for fixed-length frames and variable length frames. In obtaining the orders, the maximum order is chosen equal to 14. The results are shown in Fig12 and Fig13. it is clear from the figures that the order of filters vary from 0 to 14. Note that the filters from order 0 are corresponding to the case that the signal in the frame is completely uncorrelated (white).

Fig12 - Order of filters for all
frames in fixed-length approach

Fig13 - Order of filters for all
frames in variable-length approach
For have more information
about the LP filters, G coefficients, impulse response and frequency response
of filters are given in Fig14, Fig15, and fig16.

Fig14 - Reflection Coefficients (G) for some frames after all correction
(Variable Length Frames)
(Frames 10, 20, 30, 40, 50,
60, 70, 80, 90, 100, 110, 120)

Fig15 - Impulse Response of LP
filters for some frames after all correction (Variable Length Frames)
(Frames 10, 20, 30, 40, 50,
60, 70, 80, 90, 100, 110, 120)

Fig16 - Frequency Response of
Prediction Filters for frames 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120
(Variable Length)
To obtain the best choice
for maximum order of filters the two following curves are computed for both
fixed and variable length frames. Fig17 plots the average residual power versus
the maximum order of filters. Also Fig18 shows the average residual power versus
the total number of reflection coefficients. Two important facts can be
concluded from these two curves. First, a good choice for maximum filter length
is 6-8 and the second is that the quality of variable length scheme is better
than fixed length scheme.

Fig17 - Comparison of Average
Power of Residues in two different approaches,
fixed and variable length
frames

Fig18 - Average Power of Residues
versus total number of gamma coefficients
in two different approaches, fixed and
variable length frames
3.3
Linear Prediction Properties
Based on the simulations and filter designed in this project we can show some important futures of linear prediction scheme. In the next three sub-sections we discus these properties.
3.3.1
Decreasing the
One of the most important
futures of linear prediction is decreasing the dynamic range of the residual
signal in comparison of the original signal. Fig19 shows the original signal
and the residual signal in a single graph. The ability of LP filters in decreasing
the dynamic range is clear from this figure.

Fig19 - Decreasing the dynamic range
of residuals.
Blue plot: original signal,
Red plot: residuals
Another fact about LP
filters is that increasing the order of filter decreases the dynamic range of
residuals. This fact is shown in Fig20.

Fig20 - Average power in each frame
for original signal and prediction error
when the signal is filtered
by a first and second order LP filter
3.3.2
Whitening Property
Another property of linear prediction is that the prediction error is white noise. Fig21 shows the spectrum of original signal and the prediction error. It is clear that although the spectrum of speech signal has most of its energy in high frequencies, the spectrum of residuals is almost flat. This shows that the prediction error is almost white noise.

Fig21 - Spectrum of original signal
and prediction error.
Top: Original signal,
Bottom: prediction error
3.3.3
Stability of Inverse Filter
A FIR linear predictor has its roots inside the unit circle, so its inverse filter is stable. All the poles of LP filters for all frames are given in Fig22. It is clear that all the roots are inside the unit circle.

Fig22 Zeros of LP filters which
are the poles of inverse LP filters
3.4 Simulation
of LP Filters and Inverse LP Filters
To check whether the LP filters and inverse
LP filters can work to each other to analysis and synthesizes the speech signal
a simulation has been performed. In this simulation we did not quantized the
reflection coefficients or residuals.
Thus it is supposed to have perfect reconstruction. Not surprisingly
Fig23 shows that the reconstructed signal and the original signal are exactly
the same. In fact due to the limited length of computer registers, a total
error power in the order of
was observed.

Fig23 - Results of Simulation,
Top: original signal,
Middle: prediction error, Bottom: reconstructed signal
At this point we should mention that in reconstruction of signal an overlap between the adjacent frames is considered. Note that the inverse LP filters are IIR so they need initial conditions that their number is equal to the order of filter. We use the last samples of last frame as the initial conditions of IIR inverse filter. This means that the adjacent frames are overlapped equal to the order of filters.
In this section we will discuss some practical issues regarding to data compression. First we consider the quantization problem both for reflection coefficients and residuals.
In the next sub-section the required information, which is necessary for reconstruction of signal are examined. Also in this sub-section a suitable format for transmission of data is proposed. In sub-section 4.3 the results of compression with different compression ratios are presented.
4.1 Quantization
Next step in our process is to quantize data. That is changing format of data from a real number to an integer number, which can be stored in a certain number of bits and send them through the channel to destination. We should quantize all the data including our reflection coefficients and the residues. These quantization processes obviously have direct effects on the quality of the reconstructed signal as we will see.
4.1.1
Quantization of Reflection Coefficients and
Stability Considerations
Since the filtering process performed by gammas is reversible, so we can use any number of bits for representing them subject to some conditions. First, we should be aware of stability of the inverse filter. If we decrease the number of bits to a certain point, it causes instability as it can be seen in figures 24 and 25 In figure 24 we used only 5 bits to represent each gamma, but as it can be seen, the filter is still stable and construction is almost perfect. But in figure 25 using 4 bits makes our inverse filter unstable and we see some unbounded values.

Fig. 24- Original Signal, Residues
and Reconstructed Signal generated using 5 bits used for each gamma (Stable
Result)

Fig. 25- Original Signal, Residues
and Reconstructed Signal generated using 4 bits used for each gamma (Unstable
Result!!)
We can find the most minimum number of bits, which can preserve stability by finding the roots of denominator or poles of the filter. They all should be inside the unit circle; otherwise the filter would be unstable. This is elaborated in figure 26. The poles of the filter for different number of bits per gamma are drawn. For values greater than 4, all the poles are located inside the unit circle. For 3 or 4 numbers of bits, at least one of the poles is outside or on the unit circle.

Fig. 26- Position of poles of the
filters for different number of bits dedicated to gammas
(3, 4, 5, 6, 7, 8)
3 and 4 are unstable
The other effect of quantization of the gammas is to make the filter depart from optimality. That is the generated residues may not be white enough. This causes the range of them to be larger and less compression ratio can be achieved. But since this parameter is not too sensitive to the number of bits, we still prefer to se minimum possible bits. So eventually we decided to use 5 bits to represent each gamma. This is a fixed number for all the frames so as it will be described later, this number can be considered known in destination, therefore we do not need to send it along with data.
4.1.2 Quantization of Residuals
The main reason of sing lattice filter in our transmitter is to whiten the speech signal. This kind of whitening decreases the range of our data considerably. This can be seen and compared in figures 27 and 28. These two figures compare the range of original signal and the generated residues for two different approaches we have used so far, fixed length frames and variable length one. It can be seen that this range is smaller for variable length. But at same time it has higher maximum in total. As shown, the range of residues is much less than the original signal. Knowing that this kind of filtering does not add any information to signal, we can use less number of bits for the residues than the original.


Since the range of data is so different from frame to frame, we prefer to perform quantization of each frame individually, so we can use different number of bits for representing each frame’s data. For this purpose, by performing some simple range conversion find the reasonable number.
Also assuming that our data is not symmetric, we can use a bias to make all data positive first and then use the whole range in quantization. In this way we can get more accurate resides in expense of sending the quantized version of minimum and maximum values of residues. The final number of bits for each frame can be seen in figure 29 for the two methods.

Fig. 29 Number of bits allocated
for residues for each frame in two methods, Fixed Length Frames (Top) and
Variable Length Frames (Bottom)
4.2 Required
Information for Successful Reconstruction
In contrast to the original signal which can be seen as 8-bit samples, the structure of data in compressed format is complicated. For successful transmitting of data, not only the residuals must be transmitted, the information about the quantization level of residuals, reflection coefficients and their number, and the number of samples (sub-frames) in case of variable-length framing must be transmitted. In the reminder of this sub-section we will examine some issues in this area.
4.2.1
Transmission of LP Parameters
Among the most important part of data
to be transmitted are reflection coefficients. In some special cases we can
prevent the transmission of reflection coefficients. These cases are:
(1) When the average power of
residuals in a specified frame is very small comparing to the average power of
signal, we can consider that the frame represents the silence. In this case, we
can prevent the transmission of residuals and reflection coefficients.
(2) When the frame is highly
uncorrelated, the signal in the frame is itself like white noise so we cannot
perform more compression on it. In this case we can avoid transmitting the
reflection coefficients.
(3) When the reflection
coefficients in adjacent frames are very close to each other, we can
concatenate two frames and transmit only the reflection coefficients of one of
them. This method is similar to the variable length framing approach. The only difference
between this to approaches is the criteria of stationary signal condition.
Let us assume that
are the reflection
coefficients of frame m and
are the reflection
coefficients of frame m+1 we can use the following criteria to decide
whether concatenate the adjacent frames.
![]()
where
is a predefined small
constant.
4.2.2
Block of Data to Be Transmitted
Corresponding to each frame we must produce a block of data with a specified format, which is known for both the transmitter and receiver. The format of this block can be considered as
(1) Frame length. We recall that
the variable length frames have minimum length of 10 msec, maximum length of 40
msec, and the increments of 5 msec so 3 bits are enough to represent the frame
length. In case of fixed length framing this part can be removed.
(2) The number of reflection
coefficients. In this project we will use 3 bits to represent it.
(3) Reflection coefficients each
one needs 5 bits.
(4) The resolution of residuals.
Because the maximum level of quantization is 8 bits, 3 bits is sufficient to
represent the resolution of residuals.
(5) Minimum and Maximum of each
frame, 8 bits for each one.
(6) The quantized prediction
error.
Note that in this scheme different blocks have
different number of bits. Apart of the above bits that are parts of a block is
necessary to transmit the number of frames.
Based on
above format, for each block the receiver has all necessary information for
reconstructing the frame. Using (1)-(5) which is called “header”, the decoder
can compute the number of bits available in the block. After receiving all bits
of the block the receiver knows that must start with a new block and this
procedure continues until the decoder reaches the end of file.
4.3
Optimal Compression Scheme
Appling the LP compression
scheme to the given speech signal we have obtained several reconstructed
signals for different level of quantization and as a consequence the resultand
compressed files are different in size. Not surprisingly the reconstructed
signals from more compressed files have lower quality. Tables 2 and 3 contain
the related quantities for both fixed length and variable length. These
quantities are:
(1) The volume of compressed
file in bytes.
(2) Number of bits per sample.
(3) Compression ratio which is
defined as
(original file size-compressed file size)/ original file
size*100
(4) Prediction error power
(5) Signal to noise ratio in dB
|
Generated File Size (Bytes) |
Number of Bits per Sample |
Compression Ratio (%) |
Prediction Error Power Ratio |
Signal/Noise Ratio (dB) |
Generated Wave File |
|
11984 |
4.6092 |
42.3846 |
0.0057 |
22.4382 |
|
|
10794 |
4.1515 |
48.1057 |
0.0179 |
17.4654 |
|
|
9719 |
3.738 |
53.274 |
0.0256 |
15.9036 |
|
|
9404 |
3.6169 |
54.7884 |
0.0261 |
15.8290 |
|
|
8984 |
3.4553 |
56.8076 |
0.0295 |
15.2900 |
|
|
8759 |
3.3688 |
57.8894 |
0.0306 |
15.1342 |
|
|
8649 |
3.3265 |
58.4182 |
0.0308 |
15.1005 |
|
|
8644 |
3.3246 |
58.4423 |
0.0651 |
11.8587 |
|
|
7574 |
2.913 |
63.5865 |
0.07 |
11.5435 |
|
|
6839 |
2.6303 |
67.1201 |
0.0926 |
10.3305 |
|
|
6614 |
2.5438 |
68.2019 |
0.0961 |
10.1708 |
|
|
6504 |
2.5015 |
68.7307 |
0.0969 |
10.1344 |
|
|
5539 |
2.1303 |
73.3701 |
0.8376 |
0.7693 |
|
|
5429 |
2.088 |
73.899 |
0.8383 |
0.7653 |
|
|
4804 |
1.8476 |
76.9038 |
0.9818 |
0.0792 |
Table 2 Some generated output file
for fixed length frame method (20msec)
|
Generated File Size (Bytes) |
Number of Bits per Sample |
Compression Ratio (%) |
Prediction Error Power Ratio |
Signal/Noise Ratio (dB) |
Generated Wave File |
|
12091 |
4.6503 |
41.8701 |
0.0059 |
22.2333 |
|
|
10831 |
4.1657 |
47.9278 |
0.0195 |
17.0975 |
|
|
9711 |
3.7349 |
53.3124 |
0.0314 |
15.0252 |
|
|
9491 |
3.6503 |
54.3701 |
0.026 |
15.8499 |
|
|
8991 |
3.458 |
56.774 |
0.0357 |
14.4670 |
|
|
8791 |
3.3811 |
57.7355 |
0.0362 |
14.4012 |
|
|
8671 |
3.3349 |
58.3125 |
0.0367 |
14.3523 |
|
|
8591 |
3.3042 |
58.6971 |
0.148 |
8.2963 |
|
|
6891 |
2.6503 |
66.8701 |
0.2187 |
6.6013 |
|
|
6851 |
2.6349 |
67.0625 |
0.1711 |
7.6661 |
|
|
6511 |
2.5042 |
68.6971 |
0.1753 |
7.5601 |
|
|
5511 |
2.1196 |
73.5048 |
1.3488 |
-1.3001 |
|
|
4791 |
1.8426 |
76.9663 |
1.4868 |
-1.7231 |
|
|
4351 |
1.6734 |
79.0817 |
1.5027 |
-1.7692 |
|
|
3571 |
1.3734 |
82.8317 |
0.335 |
4.7486 |
Table 3 Some
generated output file for variable length frame method
Based on the information of these tables Fig. 30 and 31 are generated. These figures are the plot of compression ratio versus signal to noise and bit per sample versus signal to noise ratio. These curves are very important because they simply show that for decreasing the data size, how much the quality of signal will be decreased.

Fig. 30 Signal to noise ratio versus
Compression Ratio

Fig. 31 Signal to Noise Ratio versus
Bit per Sample
4.4
Locally Generated White Noise and Compression
Ratio
In this section we discuss the possibility of not transmitting the residuals. Based on this idea for each frame, apart of LP coefficients we transmit the average residual power. At the receiver, a local white noise generator produces a pseudo white noise with the received average power.
In Fig. 32 the results of
this scheme are shown. Clearly the reconstructed signal is far from the
original signal and as a consequence the quality of reconstructed signal is
very poor. The interesting point is that listening to this signal, the entire
sentence is understandable. Although this compression scheme is very low
quality, the volume of the compressed file is very small.

Fig. 32 Speech signal reconstructed
from locally generated white noise
Top: original signal,
Middle: locally generated white noise, Bottom: reconstructed signal
In the last section we performed compression of speech signal based on LP filters. The compression performed based on different levels of quantization and as a consequence different compression ratios.
In this section we discus the quality of each level of quantization. We use to measures of speech signal quality. First an objective measure which is based on a mathematical definition of difference between original signal and the reconstructed signal.
Listening to the speech can do the best judgment about quality of a speech signal. Therefore as an objective measure of a speech signal, one can ask some listeners to judge about its quality.
In this section we discus the quality of reconstructed signal based on both objective and subjective measures. Finally we decide about the best compression ratio which results an acceptable quality.
5.1
Objective Measures of Quality
As a measure of closeness of
the original and reconstructed signal, one can use the ratio of average signal
power to the average error power. Let us assume that
is the original signal
and
be the reconstructed
signal. Then we use

as a quantitative measure for the quality of the
reconstructed signal.
In Tables 2 and 3, the above quantity is shown for different quantization levels. We also put in this table the compression ratio and bit per sample for some quantization level.
5.2
Subjective Measure of Quality
In this sub-section we discuss the quality of the reconstructed signal from the viewpoint of some listeners. It is asked from 5 unbiased listeners to assign one of the numbers (1, 2, 3, and 4) to the signals of table 2. The authors are not involved in this group and most of them know nothing about speech compression, thus they are unbiased. The test performs independently and the order of signals is chosen randomly. At the first stage they are asked only to listen to the signal and in the second round they asked to assign the numbers to each signal.
The following
table shows the results.
|
Signal Number |
L1 |
L2 |
L3 |
L4 |
L5 |
Av |
CR |
B/S |
|
1 |
4 |
4 |
4 |
4 |
4 |
4.0 |
42.3846 |
4.6092 |
|
2 |
4 |
4 |
4 |
4 |
4 |
4.0 |
48.1057 |
4.1515 |
|
3 |
4 |
4 |
4 |
4 |
3 |
3.8 |
53.274 |
3.738 |
|
4 |
3 |
4 |
4 |
4 |
3 |
3.6 |
54.7884 |
3.6169 |
|
5 |
3 |
4 |
3 |
3 |
2 |
3.0 |
56.8076 |
3.4553 |
|
6 |
3 |
4 |
2 |
3 |
2 |
2.8 |
57.8894 |
3.3688 |
|
7 |
3 |
3 |
3 |
3 |
2 |
2.8 |
58.4182 |
3.3265 |
|
8 |
3 |
4 |
2 |
3 |
1 |
2.6 |
58.4423 |
3.3246 |
|
9 |
1 |
4 |
3 |
2 |
1 |
2.2 |
63.5865 |
2.913 |
|
10 |
1 |
4 |
2 |
3 |
1 |
2.2 |
67.1201 |
2.6303 |
|
11 |
1 |
1 |
2 |
1 |
1 |
1.2 |
68.2019 |
2.5438 |
|
12 |
1 |
1 |
1 |
2 |
1 |
1.2 |
68.7307 |
2.5015 |
|
13 |
1 |
1 |
1 |
1 |
1 |
1.0 |
73.3701 |
2.1303 |
|
14 |
1 |
1 |
1 |
1 |
1 |
1.0 |
73.899 |
2.088 |
|
15 |
1 |
1 |
1 |
1 |
1 |
1.0 |
76.9038 |
1.8476 |
Table 4 The opinion of listeners
about the quality of signal
In a real communication system, the transmitted data is subjected to distortion and noise. In this section we will examine the effect of an imperfect channel on the quality of reconstructed speech.
The block diagram of an imperfect channel is shown in Fig. 33.

Fig. 33 Block diagram of an
imperfect channel
For the purpose of this project,
is a FIR transfer
function with
![]()
and the variance of the white noise is such that the signal to noise ratio is 20dB.
To simulate the effect of this channel on the reconstruction quality, we pass the residual through the channel and leave the header of the file unaffected.
Next three figures show the original signal, residual signal, residual signal after transmitting through noisy channel, and the reconstructed signal based on noisy received signal.
In Fig. 34 the prediction error is produced by filtering of the original signal using high order LP filters, so the prediction error is very close to white noise and the compression rate is high. We can see from the figure that the effect of channel on reconstructed signal is catastrophic.

Fig34 – The effect of noisy,
imperfect channel on a highly compressed signal
First row: Original Signal
Second row: Prediction Error
before transmission
Third row: Prediction Error
After transmission
Forth row: Reconstructed
Signal
Fig. 35 shows the result of simulation of the effect of a noisy channel on an average compressed signal. It is clear from the plots that the effect of channel noise is not as catastrophic as last figure.

Fig35 – The effect of noisy, imperfect channel on an average compressed signal
First row: Original Signal
Second row: Prediction Error
before transmission
Third row: Prediction Error
After transmission
Forth row: Reconstructed
Signal
Decreasing
the order of LP filters and as a consequence increasing the volume of data
file, we obtain Fig. 36. According to this figure, the quality of reconstructed
filter is acceptable. Listening to the reconstructed signal confirms this
claim.

Fig36 – The effect of noisy, imperfect channel on a low compressed signal
First row: Original Signal
Second row: Prediction Error
before transmission
Third row: Prediction Error
After transmission
Forth row: Reconstructed
Signal
The phenomenon we have
observed from Fig 34 –36 is a general rule that is justified by
7
Conclusion
In this project we developed and implemented a linear prediction based method for compression of speech signal. Two different methods for specifying the stationary frames presented and compared. At first it was supposed that the variable length method could better compress the signal, but the final results shows that at least for this specific speech file variable length framing has no sensible advantage over fixed length framing.
Based on design of LP
filters some important properties of these filters were shown. These properties
include, whitening property, decreasing the dynamic range of prediction error,
and stability of inverse filters.
During the project we
observed that, a noisy, imperfect channel has a highly destructive effect on the highly
compressed signals. To avoid the destruction of the signal due to imperfect
channel the only way is increasing the redundancy to the signal. In our case
that is compression of the signal, an acceptable idea is to use a smaller
compression ratio to safely transmit the signal through the imperfect channel.
8
References
[1] J. R. Deller, J. G.
Proakis, Discrete Time Processing of Speech Signals, IEEE, 1999.
[2] T. F. Quatieri, Discrete Time
Speech Signal Processing, Pearson
Education, 2001.
[3] S. Haykin, Adaptive Filter
Theory, Prentice-Hall, 1991.
[4] M. H. Hayes, Statistical Digital Signal Processing and Modeling,
John Wiley & Sons, Inc., 1996.