Information is the resolution of uncertainty.Claude Shannon.

Talk clearly and not so far-fetched as a cipher.Medieval - origin unknown.

*In many languages the word ‘cipher’ means ‘number’. However, the current meaning of ‘cipher’ in English – an ‘algorithm for performing encryption or decryption’ – can be traced back to the time when it meant ‘zero’ and the concept of ‘zero’ was new and confusing.*

In the military barracks of Bletchley Park in Buckinghamshire, UK, one of the world’s first computers (the so-called ‘Bombe’) was constructed for the single purpose of cracking military codes generated by the German Enigma encryption machine (Figures 1 and 2). Combining three rotors from a set of five, each rotor setting having 26 positions, and a plug board with ten pairs of letters connected, the Enigma machine had 158,962,555,217,826,360,000 (nearly 159 quintillion) different settings. Even a very fast modern computer could not systematically go through all those settings. The British had access to an Enigma machine but did not know the settings, which varied daily. A team of highly talented people working with the brilliant mathematician Alan Turing managed to crack the German codes in an ingenious way that involved realizing that the Germans would always start their encrypted radio transmissions with a weather report. This observation combined with groundbreaking advances in statistics allowed Turing and his team to limit the possibilities and guide the ‘Bombe’. The story of Alan Turing is as fascinating as it is tragic; only recently has the true nature of his genius and his impact on the 20th century been recognized. Breaking the Enigma code is believed to have shortened WWII by two to four years and saved millions of lives.

The goals for encrypting or encoding messages and signals have been and always will be as varied as they are interesting: from communicating securely with the front-line in war time, to increasing the communication capacity in cellular phone networks. A ‘bit’ of information represents the resolution of uncertainty between two distinct states/symbols. * N* bits of information resolve or distinguish between

*different states or symbols.*

**2**^{N}In 1948, Claude Shannon calculated that the maximum amount of information that can be reliably communicated between a single source and receiver is given by the formula:

where * S* is the received signal power,

*is the noise power, and information is measured in bits per second per Hertz of bandwidth available for transmission (Shannon, 1948; Simon, 2001). Note that Shannon’s equation is a statement about the*

**N***channel*through/over which the information is communicated. It does not say anything about the source that is producing the signals or messages. The average information produced by a source is called the entropy of the source and is also measured in bits. For a source producing

*discrete symbols with probability*

**n***, the entropy is computed as:*

**p**_{n}The concept of a measure of the average information content produced by a source is extremely important and parallels concepts such as structure or sparseness, which we will briefly get back to in the next instalment. We will now look at what limited hopes we have when encoding signals if we ignore or do not know the statistics of the signals produced by the source.

### Encoding, Decoding and the Welch Bound

Often one does not have access to the source signals per se (i.e., one cannot encode them), but it is the structure in the source signals, together with a multiplicity of measurements, that enables the decoding or separation. In many other cases, however, one does have access to the source signals and the question arises as to whether one can do better when encoding the source signals and if one can do away with the requirement of multiple measurements. One strategy could be to encode the source signals with a unique time-series, each of which has both low autocorrelation (apart from the central peak) as well as mutual cross-correlation properties. The individual source wavefields can then be obtained from the encoded simultaneous-source signal simply by cross-correlating the recorded data with the respective encoding sequence. In 1974, L. R. Welch derived a lower bound on the maximum cross-correlation of signals. This bound shows that, as one might expect, the maximum cross-correlation and off-peak autocorrelation values of a set of sequences cannot be arbitrarily low.

Consider a set of * M* general, complex-valued sequences

*of length*

**{a**_{n}}, {b_{n}}… {m_{n}}*, the discrete aperiodic/nonperiodic correlation function between a pair of sequences is defined as:*

**N**When * a=b*, this equation also defines the discrete aperiodic autocorrelation function. Let

*denote the maximum out-of-phase (i.e. off-peak) autocorrelation value and*

**C**_{am}*denote the maximum cross-correlation value. Then the Welch bound is derived as follows (Welch, 1974):*

**C**_{cm}The Welch bound is a very useful result since it tells us, given the number of sequences in the set (* M*) and the length of the sequences (

**), how low the maximum cross-correlation and off-peak autocorrelation values can be. Sequence sets which achieve the Welch lower bound exactly are known as**

*N**Welch bound equality sets*, the best-known of which is the

*Kasami sequence set*.

### Nature’s Way of Encoding

It turns out that random noise can be an extremely effective natural encoder. While earthquakes are recorded only intermittently, a seismic background noise wavefield is present and being continuously recorded. Usually, the exact origins of this wavefield are unknown. One source can be attributed to atmospheric pressure variations, which induce water waves that convert into low-frequency microseisms (faint earth tremors caused by natural phenomena) when crashing onto the shore. Despite the relatively low amplitude level, the continuous and random nature of such a background wavefield makes it possible to correlate it between receiver stations.

Although the existence of these noise bands have been known for more than half a century, it was only in the early 2000s that seismologists realized that, by cross-correlating such noise wavefields at different receiver stations, a wealth of information about the structure in between the correlated receivers can be extracted in the form of the interreceiver Green’s function. This process of cross-correlating (noise) data recorded at different receivers is now known as interferometry and can also be thought of as turning one of the receivers into a virtual source (Figure 3).

As noise data from any pair of receivers can be crosscorrelated, the number of virtual sources and receivers that can be created using this method is proportional to the square of the number of receivers. It turns out that such inter-receiver Green’s functions constitute ideal datasets for high-resolution surface wave tomography in regions where there is ample background noise and many receivers but few earthquakes. Thus, nature had conveniently encoded the information for us, but it took us some time to understand how to decode it! A similar strategy for interferometric modeling of Green’s functions between a large number of points in the interior of a model has been proposed by van Manen *et al.* (2005). In that case, however, when the data does not come for free the Welch bound predicts that the quality of the data after separation of the simultaneous simulation is proportional to the square root of the simulation time (Figure 4).

### Multiple Scattering: Friend or Foe?

Multiple scattering adds significant complexity to the Green’s functions (impulse responses). Therefore, if one considers every impulse response to be a realization from one or more stochastic information sources, one could say that the multiple scattering significantly increases the entropy of the information sources. Another way to say this is that the number of degrees of freedom in the source signals increases dramatically if we know that the signals do not only consist of, e.g., primary reflections. Fortunately, in the interferometric applications, we do not have to worry about this additional complexity when reconstructing the virtual source responses in ambient noise surface wave interferometry or when encoding Green’s functions between points in the interior, since interferometry intrinsically reconstructs the full Green’s function.

It is also interesting to consider the role of multiple scattering when it is present in the medium through which one wants to communicate, i.e. when the multiple scattering is part of the communication channel and not the source signals. Until the late 1990s, the reigning paradigm was that multiple scattering hinders communication and lowers the maximum rate of communication, rather as the noise term in Shannon’s equation lowers the information capacity. It turns out that the opposite is true. A first clue came in the classic paper by Derode and co-workers from 1995, who demonstrated that it is possible to time-reverse a high-order multiply-scattered wavefield through the actual scattering medium, to achieve super-resolution focusing at the original source location. The multiple scattering effectively enlarged the aperture of the linear array of transducers such that focusing well below the free-space diffraction limit was achieved. Around the same time, a researcher at Bell Labs named Gerry Foschini realized that the multitude of paths in a scattering medium actually help to increase the rate at which information can be transferred. Conceptually this can be thought of as sending different messages over the different multiple scattering paths (Figure 5) (Simon *et al*., 2001). What is more, the encoding and decoding algorithm that Foschini proposed, which realizes the higher rates of communication, does not require knowledge of the details of the scattering environment. When keeping the total transmitted power constant, but using MT transmitters and MR receivers, it turns out the channel capacity can be roughly MT as large.

These developments come full-circle in a more recent contribution by Derode *et al*. (2003), which shows how to exploit multiple scattering when communicating with time-reversal antennas. Taking advantage of the super-resolution enabled by acoustic time-reversal, they can transmit random bit series to receivers that are only a few wavelengths apart. In contrast, in a homogeneous medium, the communication breaks down completely as individual bit series are no longer resolved (Figure 6). The transfer rate is directly proportional to the number of eigenvalues of the time-reversal operator at a given frequency and increases with increasing multiple scattering.

Thus, to answer the question posed above: if you are trying to recover the multiply-scattered wavefield between two arbitrary receivers, and you are measuring in the presence of a strong and somewhat uniform ambient noise field, chances are that nature has already encoded the desired wavefield for you and all you will have to do is measure the ambient noise field for long enough at the receivers, and decode it using interferometry (cross-correlation), implicitly making use of Welch’s bound. Similarly, if you want to communicate through a high order multiple scattering medium, the multiple scattering can actually help you achieve higher transfer rates. Thus, in both these cases, multiple scattering is more a friend than a foe.

In the second part of this article, we will consider the case where we are not interested in an interferometric construction of seismic data but rather want to decode the original sources, including their multiply-scattered waves, directly as we consider the marine simultaneous source separation problem. As we will see, geophysicists have a few more tricks they can bring to bear on the seismic encoding and decoding problem.