hckrnws
The originals sound better. The aliasing provides a crunchiness and sharpness to the final output that drives emotional energy. That zero mission rhythm isn't intended to sound smooth and soft, the driving hard beats are an emotional tool for eliciting anxiety and anticipation from the player.
But this is a bit like those who use smoothing filters. It's ultimately about taste, but it should be recognized that unless the filter is attempting to accurately recreate the original hardware of the era then the original design intent is not being adhered to, and so something may be lost in the "enhancement".
A friend had this killer basement setup with a projector into a huge canvas dropsheet. Plus the game cube, and the GBA dock for it, so we were projecting those games meant for a 2 inch screen maybe 10-15 feet wide.
Comment was deleted :(
Imagining how sharp and crisp those pixels must have been at that size... Oh man.
If it wasn't so late I'd calculate how big (in inches) an individual pixel is at that size.
0.5" at 10' wide screen (assuming no letter-boxing). GBA screen is 240x160 pixels.
It's really close for me. I listened to the accurate version, then the enhanced one, and my first thought was "oh, yeah, this sounds better."
Then I listened to the accurate version again, and thought "wait, never mind, this one sounds better."
After going back and forth a few times, I think I still agree the original/accurate one is better, but it's pretty close. I really encourage people to listen for themselves.
For what it's worth, I have little to no personal nostalgia for the Game Boy Advance.
This is fair, I should have made it clearer that this is all subjective. For what it's worth I have all of this disabled by default in my own emulator because I think default settings should always err towards accuracy when that's a question.
I personally do prefer the interpolated versions in most cases because to me the extra high-frequency information just sounds like noise that makes it harder for my brain to process the underlying music. But clearly many feel differently!
Well, we can't just say 'original = intent'. The original artists presumably did the best job of expressing their intent as far as possible in the medium at the time, but that doesn't mean that this necessarily is the best expression of their intent ever.
It's like saying you can only watch the Simpsons with the exact late 1980s / early 1990s ads that they originally aired with, and everything else is sacrilege.
But without asking them it's pure conjecture. I don't think trying to retcon the best expression of their intent needs to be used to justify this project, either. Sometimes it's fun to see if you can build an improvement on what exists, even if it's a vehicle for learning about DSP or whatever domain the learner is in.
> The originals sound better. The aliasing provides a crunchiness and sharpness to the final output that drives emotional energy.
In the mid-1980s the first really affordable sampler was the Ensoniq Mirage, which used the Bob Yannes-designed ES5503 DOC (Digital Oscillator Chip) to generate its waveforms. It played back 8-bit samples and used a fairly simple phase accumulator that didn't do any form of interpolation (I don't count "leftmost neighbour" as interpolation). Particularly when you pitch it down, you get a rough, clanky, gritty "whine" to samples, that the analogue filters didn't necessarily do a lot to remove.
Later on they released the EPS which had 13-bit sampling. Why 13-bit? I don't know, I guess because the Emulator I and II used 8-bit samples but μ-law coding, giving effectively 13-bit equivalent resolution. It also used linear interpolation to smooth the "jumps" between samples, and even if you loaded in and converted a Mirage disk the "graininess" when you pitched things down was gone.
I'm currently writing some code to play back Mirage samples from disk images, and I've actually added a linear interpolator to it. Some things sound better with it, some things sound worse. I think I'll make it a front panel control, so you can turn it on and off as you want.
I'll just throw some more ES5503 DOC love here. It's also the audio chip in the Apple IIGS. In 1986, having a stock home computer playing 32 simultaneous hardware voices (without software mixing), each with hardware pan ... was remarkable. Otherwise you were stuck with 3 or maybe 4 hardware voices. e.g. the timbre and filter of the C64 SID chip was gorgeous (another Bob Yannes design), but 3 voices was all you got. And just 3 square waves and noise on the Ataris of the era. Chords or complex harmony? Fire up the arpeggiators! Lol.
When I browse the demoscene I'm always a bit surprised there's not much Apple IIGS content. Graphically, it was stunted, but the ES5503 DOC was a pro synth engine right there next to the 6502 ... yowza.
I think it's not so much that one sounds better or the other.
The "uninterpolated" one is incorrect.
The "interpolated" one is incorrect.
The uninterpolated one has sharp square edges, which isn't correct. The GBA has a 12dB/octave filter at around 12kHz (IIRC) on the output, which the uninterpolated simulated output doesn't appear to have. This would knock the corners off a bit and make it "smoother" and less hissy, but would still have quite crunchy low frequency sounds.
The interpolated one smooths things off excessively, and while it doesn't really have much less spectral energy high up, what's there is in the wrong place.
Yeah this is a neat experiment, but the ‘cleaned up’ versions sound ‘wrong’ to my ears - that high whistle/hiss is ‘missing.’
The originals sound better.
I don't think so, I think you're just getting a high end that isn't in the original audio. In the places where there are high frequencies the aliasing and the hiss just gets in the way.
that drives emotional energy
Seems like a hyperbolic rationalization.
The ‘improved’ versions sound muffled like I have water in my ears. Plus I’d rather hear the game as it was designed, artefacts and all.
The artifacts weren't a conscious design decision, they were a constraint. We don't know whether the designers would have chosen to keep them or not, if they had the choice.
You say that, but it was quite common to "allow" a bit of aliasing in sampling back when we had very limited equipment, to introduce a bit of "sparkle" into percussive sounds that would otherwise be lost by low sampling rates.
Given its spectral complexity can you even tell if a hihat sample is aliased?
GBA games were made for a console that behaved like this.
Accuracy is paramount. Targeting else than the console's sound is an affront to preservation.
Preservation and design intent are two very different things.
The idea that sound designers on old games were totally siloed and ignorant of how their compositions would sound on final consumer hardware is completely wrong. Most of these composers were programmers themselves and knew exactly how to get the final hardware to make the sounds they wanted, even when they composed using more advanced tech.
Programmers using devkits (more powerful than the consumer hardware) likewise.
That may be true, but the sound designers were still making the best of what they had. They could probably imagine how the same composition would sound better.
When you play e.g. Gamecube games in an emulator, do you run them in 480p or do you render at a higher resolution? The former is clearly what the designers were targeting, but I think there’s rarely any benefit to eschewing higher resolutions. It just looks even better.
I don't understand what you mean. Nobody said they didn't know how their compositions would sound, my argument is that at least some of these composers would have chosen the more advanced interpolation method, if it were available.
I guess it's hard to stop my originalist tendencies from boiling over into other topics...
What you're saying to me is like someone saying, well, if the piano had more octaves then existing compositions would have been better. But those pieces were composed with the current amount of octaves in mind in the first place...
Maybe there's an analogue with the harpsichord-to-piano transition, but I'm not knowledgeable enough about that yet.
Haha, my first gut reaction to reading your second paragraph was "No, it'd be better to compare it to compositions written for harpsichord and played on piano".
I guess history has shown that most composers (and listeners) preferred the piano sound over the harpsichord sound the majority of the time.
Comment was deleted :(
sure, and you know what their design intent was right?
>I don't think so, I think you're just getting a high end that isn't in the original audio. In the places where there are high frequencies the aliasing and the hiss just gets in the way.
I don't get this, are you saying that this aliasing is just an artifact of the emulation? Like the GBA speaker/headphone jack itself would also be affected by the same aliasing right? And in that case the song was composed for that, right?
I don't think it would be right to go as far as to say that there's a huge strong interplay in every single GBA title's song with the hardware (I'm sure some stuff was phoned in and only listened to by the composer in whatever MIDI DAW thing they were using) but at one point the GBA was the target right?
This is great stuff… basically, an easy way to get much higher quality audio out of a GBA emulator.
I’ll add some context here—why don’t more games run their audio at 32768 Hz, if that’s such a natural rate to run audio? The answer lies in how you fill the buffers. In any modern, sensible audio system, you can check how much space is available in the audio buffer and simply fill it. The GBA lacks a mechanism to query this. Instead, what you do is calculate this yourself, and figure out when to trigger additional audio DMA from the VBlank interrupt. You know the VBlank runs every 280896 cycles, and you know that the processor runs at 16777216 Hz, so you can do some math to calculate how much data is remaining in the audio DMA stream.
A lot of games simplify the math—it’s easier to start a new audio DMA in your VBlank handler, but that means running at a lower sample rate, which will sound pretty crispy.
YMMV, some people like the crispy aliased audio. If the audio weren’t crispy, the sound designers probably would have adjusted the samples to compensate. Other factors being equal, I’d rather listen to what the original artists heard when they were testing on real hardware, because that is probably closer to what they intended, even though it has a lot of artifacts in it.
> why don’t more games run their audio at 32768 Hz, if that’s such a natural rate to run audio?
I've written some code to play back 8-bit samples (and indeed to wavetable, FM, and VA synthesis) on 8-bit Arduinos using the PWM to output 8-bit audio. That runs at 31373Hz which is a pretty crazy sample rate.
Why?
Because the chip is clocked at 16MHz, and if you program the PWM for no prescaler and "phase correct" PWM where it counts up and back down, so you get a widening pulse in the middle of a "burst", then it counts 510 "steps" of the counter. It's an 8-bit counter so it counts from 0 to 255, then the next step counts back down to 254, and so to 0 again, when the next step takes it to 1.
And 16000000/510 is 31372.55 ;-)
The crispy aliasing of the audio has always felt cozy to me. It’s also a bit of a signature of the system, like the wobbly polygons on PS1. I appreciate that there are ways to change the sound, but it feels a bit rude to label it broken or defective.
I strongly disagree here. I was so hyped for the GBA that I bought it on release day, only to be disappointed later. One of reasons is the lackluster sound; seriously, Nintendo had already built an impressive sound system for the SNES, and then the GBA just had a software-driven DAC? Why did the cheapened out so much?
Same as for the PS1, I always found the wobbly polygons and warping textures painful to watch.
Sony made the SNES sound chip, and the GBA was power-constrained by AA batteries. Those are the two biggest reasons I can think of just off the top of my head.
I know, but I think that a simple 8-channel ADPCM mixer would not have taken a lot of power, and would have resulted in much better sound.
If you combine "GBA Mus Ripper" and "SoundFont MIDI Player", you can get some seriously excellent sound for listening to GBA music.
"GBA Mus Ripper" detects the so-called "Sappy" music driver and extracts and converts the songs to MIDI files, and generates a SF2 soundbank file. Available at https://www.romhacking.net/utilities/881/
"SoundFont MIDI Player" plays back MIDI files. You can configure it to automatically load a SF2 soundbank file in the directory. When you load a converted GBA MIDI file, you get the high music quality of a modern feature-packed MIDI playback engine. Available at https://falcosoft.hu/softwares.html#midiplayer
It's not perfect though, as GBA games do not use true standard MIDIs. Some MIDI controller commands (like modulator wheel) don't translate correctly.
Thanks for this, I was not aware that a good portion of GBA songs can be exported as MIDI. But I'm guessing that with good soundfonts you can get pretty reasonable quality for many of them!
They don't use general midi standard instruments. You need the extracted soundfont because the instrument numbers are unique to each game. In order to "improve" the soundfont, you need to edit that soundfont to have higher quality instruments, you can't just switch out the whole soundfont for a different one.
The reason the nearest neighbour interpolation can sound better is that the aliasing fills the higher frequencies of the audio with a mirror image of the lower frequencies. While humans are less sensitive to higher frequencies, you still expect them to be there, so some people prefer the "fake" detail from aliasing to them just been outright missing in a more accurate sample interpolation.
It's basically doing an accidental and low-quality form of spectral band replication: https://en.wikipedia.org/wiki/Spectral_band_replication which is used in modern codecs.
It's actually the other way round: Aliasing fills the lower frequencies with a mirror image of the higher frequencies. So where do the higher frequencies come from? From the upsampling that happens before the aliasing. _That_ makes the higher frequencies contain (non-mirrored!) copies of the lower frequencies. :-)
Oh yes you're correct, imaging would be the correct term for what's happening I think (aliasing is high -> low and imaging is low -> high)?
I think I've heard the word “images” being used for these copies, yes.
Interpolation is a bit of a confusing topic, because the most efficient implementation is not the one that lends itself the easiest to frequency analysis. But pretty much any rate change (be it up or down) using interpolation can be expressed equivalently using the following set of operations and appropriately chosen M and N:
1. Increase the rate by inserting M zeros between each sample. The has the effect of creating the “images” as discussed.
2. Apply a filter to the resulting signal. For instance, for nearest neighbor this is [1 1 1 … 0 0 0 0 0 …], with (M+1) ones and then just zeroes; effectively, every output sample is the sum of the previous M+1 input samples. This removes some of the original signal and then much more of the images.
3. Decrease the rate by taking every Nth sample and discarding the rest. This creates aliasing (higher frequencies wrap down to lower, possibly multiple times) as discussed.
The big difference between interpolation methods is the filter in #2. E.g., linear interpolation is effectively the same as a triangular filter, and will filter somewhat more of the images but also more of the original signal (IIRC). More fancy interpolation methods have more complicated shapes (windowed sinc, etc.).This also shows why it's useful to have some headroom in your signal to begin with, e.g. a CD-quality signal could represent up to 22.05 kHz but only has (by spec) actual signal up to 20 kHz, so that it's easier to design a filter that keeps the signal but removes the images.
And also, to add to the actual GBA discussion: If you think the resulting sound is too muffled, as many here do, you can simply substitute a filter with a higher cutoff (or less steep slope). E.g., you could use a fixed 12 kHz lowpass filter (or something like cutoff=min(rate/2, 12000)), instead of always setting the cutoff exactly at the estimated input sample rate. (In a practical implementation, the coefficients would still depend on the input rate.)
I'm not well-versed in the terms, so I'm not sure which part is the so-called "audio aliasing."
To me, the original has very obvious background noise which the enhanced version removes. But as the author has said, the enhanced version sounds "muffled" (and, IMHO, not just a little), which probably makes most people (including me) feel it sounds worse.
Also, shouldn't most of music be included in the game's official OST? I assume that version would not be limited by the game media's technical limitation at the time and should represent the artistically intended version best.
Edit: apparently in this very case, "Metroid: Zero Mission" doesn't seem to have any official OST release. Unfortunate.
The issue is if you pass filter all the high end stuff to try and get rid of that crunchiness, it ends up all sounding muted and muffled. I like the original better (even with the crunchiness).
I don't quite understand why the author is doing special handling for PSG versus PCM audio.
My GameBoy emulator generates one "audio sample" per clock tick (which is ~1 mhz, so massive 'oversampling'), decimates that signal down to like 100 ksample/sec, then uses a low-pass biquad filter or two to go down to 16 bit / 48 khz and remove beyond-Nyquist frequencies. Doesn't have any of the "muffling" properties this guy is seeing, aside from those literally caused by the low-pass.
I suspect that without nostalgia, the fixed interpolation would absolutely sound better. Unfortunately, nostalgia. The lesson I'm taking away here is that, oh, the terrible resamplings are the aspect of faithful emulation that makes it sound like a GameBoy and not just sawtooths.
Ah- The white noise based percussion needs to be special cased, since for it nearest neighbor is correct and the frequencies over nyquist are real
Love what you're doing, but it is funny - I make a lot music in the style of GBA, and specifically bitcrush and downsample to bring in those audio artifacts. They add a lot of high frequencies that give it a great shimmer.
Having said that, there is definitely many use cases where GBA games would want to reduce that artifacting. Keep it up!
Impressive.
Audio was the thing I could never figure out on my Gameboy emulator. I couldn’t get it to pass basic tests, even without bothering to output sound on the computer.
The original sounds so much better...
Does it? It has some very high pitched hiss that I find physically painful, I very much prefer the filtered version.
"Much cleaner! The second recording does sound a little more muffled, but I’ll take that over the horrible audio aliasing in the first recording."
I absolutely wouldn't. To my ears the second version sounds much worse. Personal opinion etc etc, but wow, it's very very clear to my ears.
The loss in high-frequency information is not worth the interpolation. Bass loses its crunch. Percussion fades into the background.
Besides, I personally prefer to play my vgm at the original sample rate, and my soundcard adjusts to the correct rate for each song through fb2k plugins.
>what if, instead of accurately emulating how the GBA PWM hardware works, the emulator uses its own interpolation algorithm to resample from audio channels’ sample rates directly to the emulator’s audio output sample rate?
Then it would be less accurate to the actual console, and thus a worse emulator.
Accuracy isn't always the point of video game emulators since gaming experience is a subjective thing. Most of old games were crippled by the limitations of the hardware their run on. Inaccuracies can very much improve the experience, like removing sprite limits, displaying wider aspect ratios or in this case improving sound interpolation.
Comment was deleted :(
>improving sound interpolation.
To make it sound subjectively worse, for every sample in that page. Others noticed as much.
It sounds better... on paper. In reality, it doesn't, simply because it isn't how it is supposed to sound.
[dead]
Crafted by Rajat
Source Code