Before you advocate changing an extremely successful safety culture because you want to apply abstract principles, you might want to do some Chesterton's Fence thinking. Aviation safety depends on fearless analysis of objective data and blameless reporting, which is a very unnatural and sometimes counterintuitive framework for humans to operate in.
The NTSB releases transcripts of cockpit voice recordings, just not the literal voices. This is a human consideration that doesn't affect the quality or transparency of the analysis.
A distressed airplane makes a lot of noises which are very difficult for the human ear to pick out and identify. For example, multiple closely-spaced bangs or rumbling noises will appear distinctly on a spectrogram but will be very hard to hear.
I'm only an ardent viewer of crash investigation stuff, not a pro, but it seems to be a good way to show specifics of warning noises, engine sounds, unusual cabin noises (if relevant) and sometimes even structural failures happening over time in a more direct way from the cockpit voice recorder without sharing the actual "audio".
Like Scott Manley says, going from a frequency domain image representation to a time domain sound file is something that is extremely old and does not and has not required AI the last 50 years. It's just that they vibe coded the extremely old, extremely normal algorithmic solution. AI did not recreate the dead pilots voice, it just made data preparation and coding a bit less work.
It's almost certain you've used software or seen/heard software output today that transformed between frequency domain and time domain. It's ubiquitous.
A spectrogram is literally the same audio, just transformed through a Fourier transform. That transform has a trivial inverse. The spectrogram isn't perfect - the visual representation is low resolution and the phase information is missing - but it's plenty enough to at least figure out what was said. There's nothing surprising that this is possible, only disappointing that whoever published the article didn't realize it.
Grisly, but I’m against restrictions on releasing what should be public information. Even if they came from the 1990s.
These knee-jerk reactions, creating special case rules, really seem like a negative to me.
Just wait for a ban on posting dash cam or police body cam recordings.
Before you advocate changing an extremely successful safety culture because you want to apply abstract principles, you might want to do some Chesterton's Fence thinking. Aviation safety depends on fearless analysis of objective data and blameless reporting, which is a very unnatural and sometimes counterintuitive framework for humans to operate in.
The NTSB releases transcripts of cockpit voice recordings, just not the literal voices. This is a human consideration that doesn't affect the quality or transparency of the analysis.
Why did they need the spectrogram?
A distressed airplane makes a lot of noises which are very difficult for the human ear to pick out and identify. For example, multiple closely-spaced bangs or rumbling noises will appear distinctly on a spectrogram but will be very hard to hear.
I'm only an ardent viewer of crash investigation stuff, not a pro, but it seems to be a good way to show specifics of warning noises, engine sounds, unusual cabin noises (if relevant) and sometimes even structural failures happening over time in a more direct way from the cockpit voice recorder without sharing the actual "audio".
Where else would they get sounds from inside the cockpit that weren’t transmitted on the radio?
Not everything is AI, they provided the spectogram. Even a trained eye can read one, especially if context is provided.
> Even a trained eye can read one, especially if context is provided.
I'd hope a trained eye could read one, that's the point of the training.
The article quotes the creator saying he used AI
Having the title be what it is is like saying a note-taking app is AI-powered if you used Claude to create it.
Like Scott Manley says, going from a frequency domain image representation to a time domain sound file is something that is extremely old and does not and has not required AI the last 50 years. It's just that they vibe coded the extremely old, extremely normal algorithmic solution. AI did not recreate the dead pilots voice, it just made data preparation and coding a bit less work.
It's almost certain you've used software or seen/heard software output today that transformed between frequency domain and time domain. It's ubiquitous.
It also works with time domain video files like audio visualizers: https://www.youtube.com/watch?v=E3gf88rSzqo
Nothing extremely surprising though.
FFTs are found in every nook and cranny of modern communications and computing.
It says in the article that the creator used OpenAI Codex, presumably because the spectrogram image wouldn’t have enough resolution by itself.
you are correct - I coded this in the late 1980s with digital sound domain experts
Next year: Congress bans the Fourier transform.
A spectrogram is literally the same audio, just transformed through a Fourier transform. That transform has a trivial inverse. The spectrogram isn't perfect - the visual representation is low resolution and the phase information is missing - but it's plenty enough to at least figure out what was said. There's nothing surprising that this is possible, only disappointing that whoever published the article didn't realize it.