Sunday, 30 July 2017

One small step for man, one giant leap for the Commodore 64

The making of Planet Golf - Part 1

As of July 28th, Planet Golf, my latest work for Commodore 64, is available. If you have not done so already, go check it out! And maybe buy a copy too: All the proceeds go to Unicef. It can't be that bad, after all.

It has been a fantastic one-year ride and to see it finally come to completion, with a nice old style packaging, feels like an enormous achievement. Benefiting from the help of incredibly talented composers and artists, I could finally focus solely on the actual game design and programming. In fact, Planet Golf has so many new aces up its sleeve compared to P0 Snake, the first game that I published, that I thought it would have been interesting to blog about all the things I learned in the process. I am starting with what I believe is the most intriguing one: Full Motion Video.

Why FMV?

The B-side of the floppy (yes, back then floppy disks had two sides. You could manually flip the disk and enjoy an additional 170Kb of 8-bit greatness) comes with a bunch of extras, one of them being "Planet Golf – the story so far". The game takes place in the year 38911, when humanity has colonized most of the planets of the solar system, and space tourism has been a reality for many centuries. So much so that the first interplanetary golf tournament is about to start. But how did we get there?
In the 80s, most games that would have benefited from a preface got away without this, providing the user with a leaflet or a section in the manual.

Enlightenment User manual

There were few noticeable exceptions though, especially in the late 80s, when disk drives had become more affordable and software houses had started investing in this media, thus relinquishing the constraints of tapes (painfully slow and sequential-only loads). One of my favourite was Last Ninja 3, by system 3. This game has a cinematic intro that fills in the player with what happened since the end of the previous chapter of the trilogy.

Frames from Last Ninja III Cinematic Intro

It came very late in the life of the Commodore 64, at a time when players had already been spoiled by the marvels of console games and 16-bit computers, but few of us were still playing with the humble Commodore 64 mainly for nostalgia. Yet, the first time I saw it, I remember being totally blown away by the technical achievement and the impeccable implementation. Not only that: most importantly, I thought that this intro was not just a blatant show-off of the programming prowess of the developers, but it rather added depth to the game. You could see that there had been a development in the story, there was a path that was being followed. Of course, to watch it now, in the times of Hollywood game productions, will put a smile on your face. But you have to bear in mind that when this came out, players on the Commodore 64 were used to a static screen with credits and the evergreen "press fire button to play". Cause that is what games were for, and still are. But Last Ninja 3 was one of the first pioneering outings in the industry that tried to close the gap between videogames and other type of media. And if you are still not impressed with this, think that Robin Levy lit each of those pixels, for each of those frames, one by one. And that was a time when mice were not very common yet, so maybe he did so with a joystick!

With this intro in mind I thought it would have been nice to explain what had happened in the universe of Planet Golf before 38911 with a "cinematic" preface to the game. In doing so, I tried to somehow up the ante and add full motion video coming from real footage. Pretty much for one simple reason: I'm no Robin Levy and I couldn't really see myself drawing all those pixels by hand!
First, I thought I would put an excerpt from Kennedy's famous speech to the congress in which, following the first successes of the Mercury space program, he basically announced funding for NASA's Gemini and Apollo missions that would eventually put a man on the moon.

This is also known as the "Kennedy Challenge" and it is an incredibly inspiring speech, belonging to a time when politics encouraged (and empowered) people to come together to accept and overcome challenges, rather than putting them against each other over problems. But this is another story!

The Kennedy Challenge

"I believe that this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the moon and returning him safely to the earth"

I thought I would also alternate credits with more footage from the Apollo missions, like the Saturn V lift-off and a moonwalk. Oh, and of course the famous countdown and the iconic "Euston, the Eagle has landed!" sequence had to be there too.  I also envisioned a speaker doing some voiceover, plus some solemn music playing in the background. 

Bask in the glow of my storyboard!

With the script out of the way I was left with the following figures: In total we are talking about roughly 70 seconds of digitized speech and music, and a grand total of 30 seconds of video alternated with few stills for the credits, making up for the remaining 40 seconds.
After we have taken off the data for the other extras, There are about 120 Kbytes of floppy disk space left for the cinematic intro and we need to squeeze-in all the aforementioned components, plus of course the code to use them. There’s no doubt we are going to use the whole 120 Kbytes for this task, but this also means that, besides audio and video, there’s a third component that we have to take care of, and it’s I/O. The commodore 64 gets his name from the amount of memory it comes with, that is 64Kbytes. Therefore we need to “stream” the data from the floppy disk while the video and the audio keep playing. Quite ambitious, isn't it? And, let me remind you, the Commodore 64 is equipped with a 1Mhz CPU capable of 8-bit integer math, no less. There is no doubt we will have to come down to compromises.
I’ll see you in the next post, where we’ll dive into the technical details of the implementation and see how, once again, the limitations of this machine can become its strength.

Wednesday, 17 June 2015

P0 Snake 64Kb, grab your copy today!

Just a very brief post to say that P0 Snake is finally out for sale, so grab your copy NOW:

This is the real deal, guys: You get a nice big box with stickers, a poster, a booklet, a password table and, of course, your P0 Snake cartridge. Just hook your Commodore 64 to your telly, plug the cartridge in and play this game the way it's meant to be played! 


Friday, 10 April 2015

The Making of P0 Snake - Part 3: Audio Compression For Poor CPUs


Compression of digital sound has been a subject of study for some 50 years now, during which researchers have produced a wide number of audio codecs, that is systems to compress (or encode) sound and to decompress (or decode) it to something that can be played. We mentioned mp3 already, which is based on psychoacoustic modelling and was designed for music, and GSM, which is a very simple algorithm to encode speech, but there are many more. What pretty much all of these systems have in common is that the decoder does not return the original waveform, but an approximation of it. They are therefore always referred to as lossy audio codecs. We have seen already that even the introduction of heavy modifications in the original digitized waveform, like changing the frequency to 8000hz or moving from 16 to 4 bit quantization, does indeed affect the quality of the sound, but, especially for speech, it retains its "understandability". There are algorithms like CELP that can encode speech to as low as 800 bytes per second. Unfortunately, we can only dream of using this kind of technology in this context, for two simple reasons: first, any audio decoder would need space just for the code. What good would it be to compress our 9 seconds of audio to 4KB if we need, say, a 20KB decoder to play it? Second, and most importantly, audio decoders are too heavy on the CPU. In fact, for a computer like the Commodore 64, whose CPU only runs at 1mhz and is only capable of 8 bit integer math, even just playing the uncompressed samples is a demanding task!
How demanding? Let’s find out. (Warning: what follows might be a bit boring and very C64-centric, so you might want to take my word for it that playing digitized audio is very demanding and just skip to the next section.)

We mentioned already that the Commodore 64 was not designed to play digitized sound. The way programmers managed to do so was by exploiting a bug in the sound chip, the SID. Basically, this chip allows the programmer to set the volume to one of 16 different levels via a 4 bit register. Incidentally, the action of setting the volume creates a side effect, an audible “click”. The amplitude of this click is proportional to the volume; therefore changing the volume repeatedly at a steady frequency approximates the playing of any waveform. That’s very fortunate! So, to play the 7884Hz 4 bit waveform we have talked about so far, one should just alter the volume register 7884 times per second and feed it with the right sample value. Now, to keep the frequency so steady is the real problem here. Luckily, the C64 comes with a programmable interrupt system that can be driven by a timer. Without going too much into detail, what the Commodore 64 can do for you is to execute a piece of code every time the timer counts down to zero and restart it automatically for you while invoking your piece of code, which we’ll call interrupt handler. Anything that the CPU was doing when your interrupt handler is invoked is in fact interrupted and your interrupt handler routine takes control. This is beautiful, as in theory we can just program this routine to read the next sampled value from memory, put it in the volume register and exit, thus returning control to whatever the CPU was doing when it was interrupted. And that’s exactly what P0 Snake or any other talking game does.
The problem is that whatever the CPU was doing, it was doing so by using its registers, and our sound play routine will use the same registers to do its thing. This means that, upon returning control, our routine must make sure that the registers contain exactly the same values that they contained when the CPU was interrupted. This is achieved by saving all the registers somewhere as the first thing in the interrupt routine and then restoring all of them as the last thing before returning control. This somewhere is usually the system stack. Unfortunately, only one of the registers, the Accumulator, A, can write to the stack. The other registers, X and Y can’t. But they can write to the Accumulator! 
We now have the default skeleton code for a timer driven interrupt service routine, which will look like this:

  pha //push the Accumulator onto system stack. [3]
  txa //transfer X to Accumulator [2]
  pha //push accumulator again, thus saving X value. [3]
  tya //transfer Y to Accumulator [2]
  pha //push the Accumulator once again, thus saving Y value [3]

  //our sample playing routine here, 
  //which can now use the 3 registers

  pla //pops Accumulator from the stack [4]
  tay //transfer Accumulator to Y [2]
  pla //pops Accumulator [4]
  tax //transfer Accumulator to X [2]
  pla //pops the Accumulator [4]
  rti // return to whatever CPU was doing,
      //All the registers correctly restored [6]

The numbers in the square brackets are the number of cycles that the instruction takes. Remember that the Commodore 64 is clocked at 1Mhz, or 1000000 cycles per second, so let’s see what the overhead of all this saving and restoring registers takes us to:

3+2+3+2+3+4+2+4+2+4+6 = 35

So for each call to the interrupt routine, 35 cycles go to waste without doing anything. Zero, zip, nada! We said that the highest sampling frequency at which we end up calling this routine is 7884 times per second, which means every second 7884 * 35 cycles are wasted. That’s 275940 cycles per second, which is roughly 28% of the CPU horsepower.

To cut a long story short, 28% of the CPU goes to waste just because of the overhead of this saving and restoring the registers, and we haven’t really played anything yet! Therefore, we can't really think of adding anything very complex on top of that if we want to be able to ALSO run the game at the same time. 

Although there are tricks involving self modifying code (yuk!) that will allow you to take that number down, and P0 Snake uses all of them, this number still gives us an idea of how demanding this task is for the CPU, therefore on the fly audio decompression is out of the question here.


Let’s summarize our findings so far:

  • We squeezed our original source audio into 21KB.
  • We want to land somewhere in the neighborhood of 4KB.
  • We accept to have a short wait before the game runs to have the 4KB of compressed audio decompressed to the 21KB that we are going to use from that moment on.
  • The decoder should be a fairly short program, whose memory footprint should be negligible compared to the 4KB of the compressed audio.
  • It should be really fast! No fancy floating point, no weird math, nothing like that. Possibly LUTs or Dictionaries but little more than that.

Having ruled out lossy audio compression schemas, classic, lossless compression approaches spring to mind. LZ77 is at the base of most of these approaches, such as RAR, ZIP, 7Z, you name them. It’s based on the idea that multiple occurrences of the same subsequence in a sequence can be encoded by using a single copy of such subsequence and references to the same data for the other copies of the subsequence. 
A string like 


is encoded to something like


The pairs (i,j) between parentheses mean “go back i symbols and copy j symbols from there”.

While finding such repeated copies is a computationally intensive task, such a task is up to the encoder. The decoder, which is what we are most interested in, has a fairly straightforward job to do. Straightforward enough to be implemented on the C64 without too much pain. We are almost there! 
The only problem left is that LZ77, being a lossless compression approach, is not expected to compress much. We are lucky if we’ll see a 30-40% reduction (you can do this experiment yourself: try to record a wav file and zip it). That would take our memory footprint down to 12-15 KB. Much better, but not quite there yet.

But what if we manage to make LZ77 encoder’s life easier? What if we could somehow maximize the number and the size of identical subsequences of samples?

Let’s bring up once again that piece of the  “Oh No” waveform we have mentioned in part 2. 

We said that it almost looks like it was made up of the same segment repeated over and over. Let’s bring up these segments.

That’s very close to what LZ77 needs to do its dirty job at its best, but not quite that, yet. There are, in fact, minimal differences among the blocks. If this waveform was really made exactly of the same segment repeated over and over, LZ77 would only need one of those segments plus a few indices to encode the entire waveform in the picture. 
Which brings us to the following intuition: how bad would it be (sound) to just use one of these blocks and repeat it over and over to form the actual waveform? In this case, the differences are so minimal that we probably wouldn't be able to appreciate them, and LZ77 would encode this piece to roughly one fifth of its original size! So, all we need to do is try to find all the similar blocks, make them identical, and run a normal LZ77 compression pass to have something that will unpack easily and will sound reasonably similar to the unaltered waveform. We don't really need fancy approximate pattern matching algorithms here, given how small the amount of data we are talking about is, and we could just go the brute force route. The algorithm would be something like this:

encode(waveform, threshold)
  while (i < length(waveform))
    for (blocksize = 64; blocksize > 4; blocksize--)
      j = find_most_similar_block in range (0,i) 
          of size blocksize;
      if (similarity((j,j+blocksize),
                     (i,i+blocksize)) <  threshold(blocksize))
         // replace the current block with a copy 
         // of a similar one
         copy ((j,j+blocksize), (i,i+blocksize))
    else //no similar block of any size was found
       i++;  // keep the current sample value and go ahead.
  return LZ77encode(waveform);

Dealing with blocks smaller than 4 samples doesn’t really make much sense, because the overhead for the indexes that LZ77 needs to replicate the blocks would outsize the length of the block itself. 
The most important part of this algorithm is the similarity function, which needs to be able to compare two blocks of a given size and return a distance between them. The temptation of going straight with the Euclidean distance is strong. And in fact we’ll give in to that, but there’s one last thing to speech encoding that must considered. The dynamic range of speech is not linear (which is a fact that some basic audio compression schemas, like a-law, leverage). Basically, for our concerns this means that an alteration to samples that are “close” to zero will introduce a heavier distortion than the same absolute alteration to samples that are at the boundary of the sample range. To account for this, assuming such a range to be between [-1,1], rather than using the absolute sample value, we’ll go with the logarithm of the amplitude.
Ultimately, given two vectors of samples P and Q, whose values range in [-1,1], the distance between the two vectors is

It’s important to note that this logarithmic decay is taken into account only when computing the distance between two blocks, but the waveforms we end up storing are the original, linear sampled versions.

Final tricks and conclusion

Although there are many more tricks/hand tuning that I had to resort to in order to improve compression and/or reduce the compression artifacts, the encoding algorithm we discussed so far does the largest part of the job, and puts us in a position in which the commodore 64 only has to LZ77-decode 4KB of data to 21KB before the game starts. Now the only thing we need to feed the encoder with is the right threshold, which is dynamically adjusted by the encoder according to the size of the vectors it compares (this part would probably deserve a post on its own).

We can basically try different thresholds until we are satisfied with the total size of the encoded waveform, which is 4KB in our case, and go for it. Of course a higher threshold will cause more blocks to be marked as similar and lead to a better compression, at the cost of a decrease of sound quality.

In fact, I used different thresholds for different sentences in the game, which leads me to the very last trick that I want to tell you about. Listen again to the speech:

You might have noticed that the worst sounding sentence of the bunch is the first one, “Welcome to P0 Snake”. It will surprise you to know that this part is actually the one that has been compressed the least and takes up half of the total space of 4KB alone. It should therefore sound better than the rest. Believe it or not, it actually does. I can further explain with an example: in the silence of the night you can probably hear both your watch ticking and your neighbour talking in another flat. Now, suppose you take your neighbour out for dinner in a crowded restaurant. You can still hear what she is saying, but even if you focus on your watch, you won't be able to hear it ticking. 
This phenomenon is called Auditory Masking, and it’s just one of the many concepts you'll get familiar with, should you decide to dive into the fascinating world of Psychoacoustics. Glitches and artifacts introduced by compression are like the ticking of your watch as your neighbour talks during the night. You can hear them in a quiet environment, but if you bring in more sound to “confuse” the ear, they'll become unnoticeable. That’s what I did with the remaining set of sentences in the game: whenever the game is talking, there’s always music playing along with the speech. It’s a bit like hiding the dust under the carpet, but, well… it works!

This concludes the third and last part about speech encoding. Next time we'll probably be looking at something a bit more geeky, like self modifying code or graphics compression.
Stay tuned!

Monday, 16 March 2015

The Making of P0 Snake - Part 2: IT TALKS!


One of the features that stood out about P0 Snake, according to the people who played it, is its digitized speech.

Now, mind you, talking games have been around for quite some time, and they were not exactly uncommon in the 80s either. The problem with speech though, is that it takes “a lot” of memory, Therefore, while modern productions can get very chatty (even too much),  usually old games could only afford few seconds of speech, before they filled up the host machine’s memory. But it's probably this scarcity that contributed to making the few words that came out of the talking video games so memorable. Elvin Atombender, the mad scientist in Impossible Mission, would welcome you to his lab with his signature “Another visitor… stay a while… stay forever!” that would send shivers down your spine. Also, with speech being such an "expensive" commodity, programmers had to resort to it only where it really added something to the gameplay.
Mega Apocalypse, another all-time favourite of mine, was so hectic that sometimes players didn't know what they were doing, such was the focus on trying to stay alive. So the game just told them what was going on. “Extra life!”, and the player knew he had hit something good, not a cursed asteroid.
With P0 Snake things were a bit tricky though. The RGCD 16Kb development competition, as the name suggests, only allows 16Kb games. So, how much speech can we fit into 16Kb? Or, even better, how much speech can we squeeze into 16Kb and still have enough space left to code a game? Let’s find out!

Enter “The Dictionary”

“When words are scarce they are seldom spent in vain.”
[William Shakespeare]

P0 Snake's dictionary is made up of  7 sentences:

“Welcome to P0 Snake”
“Get Ready”
“One Up!”
“Oh no!”
“Well Done!”
‘Game Over”.

In total, they account for roughly 9 seconds of speech.
It doesn't sound like a big deal, but, trust me, it is. Let’s see why.

A sound is basically a waveform, or, for our purposes, the digital representation of its analog form.  This process of analog-to-digital conversion has two steps: sampling and quantization. A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part.
Of course having more than 2 samples per cycle will increase the accuracy and the quality of the waveform representation, but what’s really important is that you don’t have LESS than 2 samples per cycle, as this would cause the frequency of the wave to be completely missed. If you want to know more about this stuff, just look up the Nyquist-Shannon Sampling Theory and start from there, I’m trying to keep this simple. What all of this is telling us anyway, is that in order to digitize sound properly, we must keep a sample rate that is at least twice the frequency of the signal we are sampling. The 44.1 Khz frequency at which your CD track or your mp3s are sampled can render a signal of up to 20 Khz, which is enough to capture all the frequencies that the human ear can hear. If mp3 was made for dogs, it would need to use a sample rate higher than 120Khz, as dogs can hear frequencies of up to 60Khz. If it was made for cats, it would need to be more than 160Khz! It’s good that we evolved from apes and not from cats, otherwise our ipod could only contain a fourth of the music it contains now!
And there are more good news: human voice only spans a narrower set of frequencies which goes up to approximately 3500Hz. It means that if you want to sample speech you just need a sample rate of at least 7000hz to keep it intelligible. Indeed, most speech encoding systems, including GSM whose acronym probably rings a bell (ahem…) for most of you, only use 8000Hz. That’s the kind of frequency we are really dealing with, and this closes the math for the sampling, the first part of the analog to digital conversion. What about quantization? In general, the more the better, and your typical CD track, to go back to the original example, uses 16 bit, 2 bytes, which means that each sample is a number in the range (-32768, 32767). Let’s assume we are using the same resolution. We have all the numbers now, so let’s see how much space those 9 seconds of speech will eat up.

9 seconds X 8000 samples per second X 2 bytes per sample = 144000 bytes = 140 Kb

Given that we have to fit both speech AND a game into 16 Kb that’s a bit too much. We should target 4 or 5  Kb at most. It’s a long way uphill from here...

The first little help comes, again, from a limitation of the machine. 16 bit samples simply can’t be played on the Commodore 64. In fact, The SID, the Commodore 64 sound chip, was not designed to play sampled sound at all. The way programmers did it was by exploiting a bug in the sound chip. I won’t go into details as to how it works (try this if you dare), but, simply said, although a better quantization can be achieved with weird tricks, the most common way of playing sampled sound on a Commodore 64 only allows for 4 bit resolution. This means that the range we can address is not (-32768,32767) but (-8,7). Since 4 bits are one fourth of 16 bit, our original space consumption goes down to 35Kb from 140kb. That’s a lot better, but still more than twice the space we have for the entire game.

Before we move on, let’s take a look at what these waveforms we have been talking about look like. For example the piece of speech “Oh No!”.

From top to bottom, the original recording, the same one sampled at 8000Hz with 16bit quantization, and finally the same one at 8000Hz with 4 bit quantization.

As you can see, the fact that only 16 different amplitude values are possible with 4 bit quantization shows in the form of the waveform being a bit "blocky" at a glance. This really represents the BEST waveform we can think of playing on the Commodore 64, and, despite the appearance, it doesn’t sound that bad. This is what we really have to start from anyway.

Before we even start to think about compression, let’s see if we can bring memory occupation down from the current 35Kb

Let’s zoom in a bit, and let’s explore “the “oh” part of “oh no!”, going back to the original 16 bit quantization and 44.1Khz sampling,that is CD-quality. This is what 20 milliseconds of “oh” look like

We immediately notice something: the waveform looks very “regular”, as if it was made of the same piece repeated many times, with just minimal differences. We'll take advantage of this aspect when we deal with the compression, but for now another interesting piece of evidence is that there aren't really many “parts” in this waveform, that is it doesn't change very rapidly. We mentioned that the frequency of human voice tops at 3500hz, and that we need double that frequency, that is 7000Hz which we had rounded to 8000Hz, to sample it. But I bet you we need much less than that for this specific segment.
Let’s see what it looks like at 8000hz:

It looks very similar, but we knew this already, as we know that 8Khz will suffice for human voice in general

let’s see what happens at 5000hz

It’s starting to deteriorate a bit, still, all the transitions are there and this means that the “oh”, although “noisy”, will still sound like an “oh”. Let’s try 1000Hz

Bummer! Most of the transitions have disappeared. This waveform doesn't sound like an “oh” anymore. But what we take from this exercise is that we don't always need to use 8000Hz. Some pieces of our speech are happy with a lower sample rate.

Speech fragments can be divided into two categories: Voiced and Unvoiced. Voiced sounds are generated by the vocal cords’ vibration. Among their characteristics is the fact that they are periodic, and their period is called pitch. It’s the case for vowels and specifically the fragment that we have analyzed so far. Unvoiced sounds, on the other hand, are not generated by the vibration of the vocal cords and they don’t have a specific pitch. Furthermore, and most importantly, they come with a higher frequency. It’s the case for fricative sounds like “F” or “S”.

Now this is interesting, and you can see already where this is going: let’s use a higher sample rate for unvoiced segments and a lower sample rate for voiced segments.
We can validate this theory looking at another segment of P0 Snake’s speech. The “ZE” part in “welcome to P ZEro Snake”.

You can clearly see where the “Z” ends and where the “E” starts already. Now let’s sample this segment at 5000Hz, which worked quite well in the previous example

Look what just happened: the “Z” has almost completely gone but the “E” is still there.

In conclusion, we really need all of the 8000Hz to sample fricative sounds and preserve their understandability.

Putting everything together

So far we have learned the following facts:

  1. 4 bit amplitude is all we can afford
  2. 8000Hz is a sampling rate that allows any type of speech to be coded.
  3. Fricative sounds require this entire bandwidth.
  4. Voiced sounds will be happy with much less. How much less? It depends on the sound, the voice of the speaker and other factors.

So, in order to minimize the memory occupation of the speech in P0 Snake, we just need to use the minimum sampling frequency for each speech segment that retains the understandability of the segment. This frequency will be higher for unvoiced segments and lower for voiced segments.
Although we could choose ANY frequency for each of the segments we really want to limit this search to a small set of possible frequencies for various reasons, the most important of which will be clear in the next part, but we can say already that we also want to be able to encode this frequency somehow in the bitstream. P0 Snake uses only 4 frequencies, so it requires a 2 bit overhead to indicate the frequency for each speech segment:
S = {3500Hz, 3942Hz, 5256Hz, 7884Hz}

And the algorithm to preprocess the data is now quite easy:

S = [3500Hz, 3942Hz, 5256Hz, 7884Hz]
segments = split(source) //splits the source in segments
    //of equal duration

foreach (segment in segments)
f = Frequency_Analysis(segment)
i = 0
while (i < 3 && S[i] < f)
i = i + 1
sampled_segment = Sample(segment,S[i])
yield return( (i,sampled_segment) )

The result of this preprocessor was only good as a starting point for a (painful) manual refinement of each segment: although the frequency analysis lib I used worked very well for me, I found that for some segments I could still move to the lower frequency, keeping a decent sound quality. This pickiness might sound like overkill, but don't forget that we are fighting with bytes here, so every little helps!

In the end, all the speech in P0 Snake averages at roughly 4900hz. Let’s see where this takes us up to:

35Kb * 4900/8000 = ~21Kb

21 Kb. Now... we are talking! It’s still 4 times larger than what we are targeting, and (as we'll see in the next post), we can't use any known audio codec to bring this number down, because the Commodore 64 simply doesn't have enough horsepower to play a sampled sound AS it decompresses it AND to also runs a game. Therefore, we must play uncompressed data. But this number already looks very interesting for one simple reason: The rules of the competition are that the size of the game must not exceed 16 Kb, but of course we can use the entire memory space of the Commodore 64 at runtime, that is 64Kb. If we could come up with a way to compress our speech in 4Kb in a way that can be decompressed (in a reasonable time) to 21 Kb BEFORE the game runs, then that would be it.
We'll see how in the next post, if you wish.