Back to blog

Inaudible Information

by Michel Comin Escude, Roger Pujol
August 09, 2016

What if your radio could talk to your phone? Or your TV talk to your tablet?

Most devices have Bluetooth, WiFi or NFC and they can exchange information using any of these new channels. But what about older channels, is there a way our new devices can communicate with them?

Take for instance the TV. As we know, when we watch the TV, we have two channels of communications: the images and the sound. Typically information is communicated to viewers via the images, eg, “go to www.______.com for more information.” In a world of automatic synchronization manually typing information into a device seemed antiquated. We wondered if there was a way to automatically synchronize devices with older types of signals.

The Idea

Instead of using the video signal, we tapped into the audio signal to send data to a device. After all, sound waves can behave the same way as the electromagnetic waves used in bluetooth and NFC, and can have some of the same applications. Additionally, with sound waves we don’t need to point at the source to receive the data, and, we can broadcast the information to an infinite number of clients at the same time.

Of course, there was a catch. If we wanted to add information to the sound waves we could not do so in the audible range because it alters the original audio, which defeats the purpose. The problem provided the solution -  find a way to add extra data to an audio file without affecting the original audio.

Much like IR or UV light, we humans cannot detect very high and very low frequencies of the sound spectrum. Common mics and speakers, however, can receive and emit on that range. It is possible add frequencies to the inaudible zone undetectable to humans but extractable with any gadget that has a microphone.

What we did

We tried this out for ourselves by synchronizing some visuals with an audio track being played on a separate device. Googling a little bit we found some info about this process and also at that time, there was an app called CHIRP.IO that used sound to send data peer to peer that worked quite nicely.

The prototype is the simple approach using FFT ( Fast Fourier Transform ) and the amplitude of different high frequencies to prove our concept. We chose this method because it’s faster to detect the info and therefore faster to synchronize.


How We Did It

At first we considered using Ultrasounds (frequencies higher than 20KHz), but in the end we used the acoustic espectre that humans can hear but at the highest frequencies (ranging from 17 KHz to 20 KHz). And there a couple of reasons we made that call.

First, while humans hear a range of frequencies, the closer that we get to 20KHz the less we perceive that frequency. When we are close to 20KHz it’s easier to hide a sound for most people.

Secondly, most of our everyday ambient sounds do not reach that range. At 20KHz we will have less noise interference from the environment.

Finally, even though there is some equipment that can hear and emit ultrasonic frequencies, it is expensive and not common. In addition, the range of frequencies used to send data might need to be shifted a little bit depending on the hardware used. It’s true that the common retail speakers can easily go from 20 Hz to 20 KHz or even higher frequencies (for instance,  the BEOPLAY A1 used in the demo can reach the 24KHz) but when it comes to microphones, the game changes. Professional microphones can detect frequencies up to 20KHz but retail microphones are usually designed to capture Human Speech Spectrum (20 Hz to 5KHz).

So, knowing this and with the hardware that we had in the office we encoded data from the 17KHz to the 20KHz range ( some of the people in the office said that they heard a weird chirp sound coming out of the song that we used for demo purposes, we are evaluating them to determine whether they have super powers). Using the SonicNet.JS library to do the detection, we had a prototype working quite quickly.

The Takeaways

This easy and out-of-the-box approach has a couple of pros & cons. The pros are that almost every device nowadays has a mic in it, so we can reach a high number of gadgets, mainly phones & computers.

The cons are that sound can easily be affected by other sound sources, so for example if you are in a noisy environment will be more difficult to detect the audio accurately.  Also, the quality of the transmission, as we commented above, depends on the quality of the speaker and the microphone. If the microphone isn’t high quality the range of frequencies it can detect usually leaves out the high frequencies. So, if it cannot “hear” the high frequencies, it won’t listen to the encrypted message. Again, same issue with the speaker (if it cannot emit high enough frequencies, it won’t send the message).

We also learned that it gets trickier with the more information you want to send, so there’s tradeoff between information complexity and the time span in which the information can be communicated (the last 10 seconds of an ad, for instance). But if you only want to trigger actions you can do it quite reliably at fairly long distances (we just tested it in the office, so we’re talking about office dimensions here… but don’t know what will happen with proper hardware really transmitting at ultrasonic frequencies with super directional speakers).

Practical application

We think that this technology can be applied to a variety of different scenarios (aside from the airgap hacking of course).

Ads and information: Our first thought was having ads on TV that can send information to our devices. This could be anything from a pizza commercial that embeds a deal for viewers using their app to music festival pre-sale codes. The possibilities with this are endless.

Synchronized visuals with songs: An artist’s website that reacts when one of her songs is played on the speakers. As the demo shows, the audio will carry the data, the web will decrypt it and will show the visuals. A new type of web experience influenced by sound.

Multiplayer Games: Imagine a multiplayer game where there’s no need to rely on a network to be able to play together. Data is shared via sound. App’s data syncs between devices without dedicated connection. Let’s pick Tetris for instance. Tetris doesn’t have multiplayer mode per se, but imagine that 4 friends are playing tetris at the same time. Every time that one of them completes a row, it triggers a sound from his device which contains “hidden data” which accelerates the velocity of the game. The other 3 friend’s devices hear this sound and the hidden data accelerates the velocity of their games for 1 second. Any of the players that complete a row will trigger this sound and will accelerate the velocity of the game for his opponents.

5. In conclusion

We are excited about the possibilities of transmitting data via sound. We will continue to research ways to reduce ambient noise sensitivity, increase the strength of the transmission, and send more complex information. The future is sounding pretty good.

See all of our blog posts