“An advantage to concealing speech and not text is preservation of non-lexical content such as speaker identity.”
Steganography is derived from the word “steganos” meaning concealed or covered. It is the science of concealing messages inside other messages, which are referred to as ‘carrier.’ Steganography techniques date back to the 15th century when messages were physically hidden. In modern steganography, the goal is to covertly communicate a digital message. Typically, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages.
The carrier may be publicly visible. For added security, the hidden message can also be encrypted, thereby increasing the perceived randomness and decreasing the likelihood of content discovery even if the existence of the message is detected.
The most common approach of hiding is to encode the secret message in the least significant bits of individual signal samples. Another way is to conceal the secret message in the phase of the frequency components of the carrier or in the form of the parameters of a minuscule echo that is introduced into the carrier signal.
Neural networks are getting popular with such applications recently. They hide an entire image within another image while adding an adversarial loss term to the objective suggested using generative adversarial learning to generate steganographic images. However, these approaches do not explore speech data. So, in a paper titled, “Hide and Speak”, the researchers at Facebook and Carnegie Mellon University explored the use of deep neural networks as steganographic functions for speech data.
DNNs For Speech Steganography
Steganography techniques could be used in many fields like copyright protection, watermarks and secret transmission. The common procedure is to use a steganography algorithm to hide the secret message in the cover, with unaltered external detectors. Then the main challenge is to minimise the interference in the cover image when the secret is embedded while allowing the recovery of the secret message. Once that’s done, the steganographic image is being transmitted in public channels. On the other hand, the receiver receives the image and uses the decoding algorithm and the shared key to extract the secret message.
But do the techniques for image steganography fare well with that of speech? The researchers at FAIR say NO. In their work, they explored the use of deep neural networks as steganographic functions for speech data. Concealing speech instead of text enables preservation of non-lexical content such as speaker identity. To evaluate this, the researchers conducted both human and automatic evaluations, adhering to the Speaker Verification Protocol. For the human evaluation, 400 human answers were recorded and in 82% of cases, stated the researchers, and listeners were able to distinguish whether the speaker in the forth sample matched the speaker in the first three.
The architecture is composed of (i) Encoder Network (ii) Carrier Decoder Network and (iii) Message Decoder Network. Each component in the model is implemented as a gated convolutional neural network. The researchers evaluated the approach on TIMIT and YOHO datasets using the standard train/val/test splits to assess the model under various recording conditions.
Key Highlights
- This work empirically demonstrated the effectiveness of the proposed method compared to deep learning based on several speech datasets and analysed the results quantitatively and qualitatively.
- The authors showed that the proposed approach could be applied to conceal multiple messages in a single carrier using multiple decoders or a single conditional decoder.
Steganography, like cryptography, is a technique that provides a secret communication method. While the cryptography method focuses on the authenticity and integrity of the messages, steganography hides the existence of the secret. The authors wrote that in mega surveillance projects, even if the content is unknown, the existence of normal data communications may lead to privacy leakage. So, steganography is necessary for private communication, and deep learning algorithms have the potential to make them work easier.
Download the original paper here.