Paku Paku -- 1.6 released 9 November 2011

Discussions on programming older machines

Paku Paku -- 1.6 released 9 November 2011

Postby deathshadow60 » Tue Feb 08, 2011 1:44 pm

*** NOTE *** version 1.6 released

After eight months in development and testing, version 1.6 of Paku Paku has been released. I've also revamped my programming website to host it officially -- so from now on if you're looking for information about Paku Paku, the new site is:

http://www.deathshadow.com/

With the page about Paku Paku being here:

http://www.deathshadow.com/pakuPaku

You can also play the game live on that site if you have Java installed, using the java port of DOSBox.

http://www.deathshadow.com/pakuPakuLive

The new version adds a slew of bugfixes, new hardware support, and a much more compact and speedy executable. Check out the revision history:
http://www.deathshadow.com/pakuPaku_Revisions

for the full list of changes...

On modern systems this runs great in DosBox -- I highly suggest using the oplmode=cms and the /cms command line option for the best sounding version in DosBox.
Last edited by deathshadow60 on Wed Nov 09, 2011 1:17 am, edited 9 times in total.
The only thing about Adobe web development products that can be considered professional grade tools are the people promoting their use.
deathshadow60
 
Posts: 62
Joined: Mon Jan 10, 2011 6:17 am
Location: Keene, NH

Re: Paku Paku -- new DOS Game released

Postby Brutman » Tue Feb 08, 2011 2:12 pm

Well, we'll just have to try it out tonight on the real deal and see how it runs. Although if you are shooting for an 8Mhz AT with 16 bit video this might be painful.

Where is the bottleneck? Are you painting portions of the screen, xor'ing data, or what? (I have a little experience with fast screen updating ...)

And why on earth do you not have a PCjr yet? :-)


Mike
Brutman
Site Admin
 
Posts: 910
Joined: Sat Jun 21, 2008 5:03 pm

Re: Paku Paku -- new DOS Game released

Postby deathshadow60 » Tue Feb 08, 2011 3:07 pm

Brutman wrote:Well, we'll just have to try it out tonight on the real deal and see how it runs. Although if you are shooting for an 8Mhz AT with 16 bit video this might be painful.

My original target was a 7mhz Tandy 1K, but the hardware just isn't up to the job of realtime sprites without flicker.

Brutman wrote:Where is the bottleneck? Are you painting portions of the screen, xor'ing data, or what? (I have a little experience with fast screen updating ...)

The only time you can draw to the screen without the risk of flicker (and snow on real CGA cards) is during a retrace period. The horizontal retrace is so short that by the time you detect that you're in it, it's already ended, leaving the vertical retrace as the only real option. Most of the bottleneck is that PC video cards have no built-in sprite handling, so it's all software.

To draw a sprite without corrupting the background behind it you are stuck every time you draw it with restoring the original background, grabbing the background at the sprite's new location, then doing an AND mask to erase where you'll be drawing the sprite and then ORing the sprite's color data. You can't even take the XOR shortcut since this particular color mode is 4bpp packed in alternating stripes... (every other byte being a text mode character 0xDD). The pixel packing also means you need two copies of the sprite (since it takes to long to SHR 4 and carry on the fly across 6 bytes) in addition to the mask, and you have the overhead of figuring out which copy (even or odd aligned) you want to blit.

Net result? 30 byte write to restore background, 30 byte read for original background, 30 byte read/write AND, 30 byte read/write OR working out to 90 bytes read and 90 bytes written PER 5x5 sprite. With five (six if you count the bonus item) sprites that's 450 bytes in and 450 bytes out... and a 8mhz 8088 just cannot manage to do that inside the retrace period... in fact it generally only has enough time free to do about 18-28 bytes depending on the code overhead.

AND you have to erase all of them then draw all of them together in groups in the opposite order so that if they overlap their restored backgrounds don't screw it up.

A possible solution I'm investigating is breaking the screen into 'sections' and only drawing elements within those 'sections' a certain period of time after the retrace. For example when the retrace occurs you can draw the bottom half of the screen, then by the time that's done the retrace will be showing the bottom half letting you draw the top. The problem with this is what to do when sprites overlap across those boundaries -- I'm still not sure how I'm going to handle that. The overhead of dividing them up and checking their position for redraw could take so much time it impacts game performance.

It's a problem all text-mode applications on the CGA face thanks to the unbuffered memory.

A better fix would be do to paging, but because this is a glorified text mode with every two pixels taking up two bytes (high byte $DD, low byte broken into pixel colors) it ends up 16000 bytes -- and the CGA only has 16k of RAM. (though I may make a tandy/jr port that uses their native 160x200 mode that has a lot less of these problems)

Brutman wrote:And why on earth do you not have a PCjr yet? :-)

Because I have a limited budget, limited space, and a working 1000 SX? Only way I'd have room for a Jr would be to move the Coco down to the garage again, and that's not happening in the dead of winter. :D
The only thing about Adobe web development products that can be considered professional grade tools are the people promoting their use.
deathshadow60
 
Posts: 62
Joined: Mon Jan 10, 2011 6:17 am
Location: Keene, NH

Re: Paku Paku -- new DOS Game released

Postby Vorticon » Tue Feb 08, 2011 5:01 pm

Nicely done! I played it on the PCjr and it works just fine, although there is definitely some flickering, but it does not really distract from the gameplay. Tandy sound works perfectly BTW. Really a fine piece of programming :)
Vorticon
 
Posts: 276
Joined: Fri Nov 27, 2009 7:25 am

Re: Paku Paku -- new DOS Game released

Postby Brutman » Tue Feb 08, 2011 5:18 pm

The PCjr doesn't suffer from CGA snow, so a PCjr version should not be burdened with the need to check for the vertical retrace. And it might make the game faster too because you won't have that dead time spinning waiting for it to start.

I should look at the code and stop asking questions. But there are some fun tricks you can use to improve performance. For example, even though the 8088 only has an eight bit bus it is far quicker to use a 16 bit read or write than to use two 8 bit read or writes. The string ops with a REP prefix can really speed things up. (Especially went filling memory. REP on a LODSW is kind of pointless.)

Michael Abrash did a great book on high performance graphics programming - it is full of tips and tricks.


Mike
Brutman
Site Admin
 
Posts: 910
Joined: Sat Jun 21, 2008 5:03 pm

Re: Paku Paku -- new DOS Game released

Postby deathshadow60 » Tue Feb 08, 2011 7:04 pm

Brutman wrote:The PCjr doesn't suffer from CGA snow, so a PCjr version should not be burdened with the need to check for the vertical retrace.

The check isn't just for snow though -- because there are no hardware sprites you have to erase the element and redraw it (or at least that's the only reliable way to do sprites fast enough). if the retrace goes by while the sprite is erased, that's where the 'flicker' comes in. This is why modern graphics usually either use double buffering (two video buffers that you draw to one and show the other) or rely upon vSync for the redraw.

Brutman wrote:And it might make the game faster too because you won't have that dead time spinning waiting for it to start.

That's actually why I have the speed test in the code which shows a red "machine too slow" and disables the vSync... which ends up cute since it's similar to how the real pac man hardware starts up (though in their case it's a memory test).

Brutman wrote:For example, even though the 8088 only has an eight bit bus it is far quicker to use a 16 bit read or write than to use two 8 bit read or writes.

Which I'm using LODSW and MOVSW a lot. The problem is how the data is stored in this mode... Odd numbered bytes hav to remain 0xDD while even numbered bytes hold the two "pixels" as 4 bit packed... So word-sized operations can only do two pixels at a time even in a 4 bit per pixel mode. It's actually why the core of my 5x5 blit routine looks like this:

Code: Select all
   lodsw
   mov  bx,ax
   mov  ax,es:[di]
   and  al,bh
   or   al,bl
   stosw


I even unrolled the loops to squeeze a few extra cycles out of it. The sprite data format is stored as byteMask:byteData words which I point to with DS:SI for LODSW... which I then move to BX (which sucks, but is still faster than MOV reg16,mem; add SI,2) so I can use bh as the mask and bl as the data. Read in ES:DI, and, or and then write it out... Which is the process for every two pixels in the sprite. You'll notice I only operate on AL, since AH stores the $DD character value that has to be preserved.

Brutman wrote:The string ops with a REP prefix can really speed things up.

Agreed, but there's just too much data to process in realtime with sprites since they are supposed to not erase the background beneath them allowing things to show through and still be there once it's moved somewhere else -- and there's NOT enough ram or time to redraw the original background.

Brutman wrote:Michael Abrash did a great book on high performance graphics programming - it is full of tips and tricks.

99.99% of which is only useful on the 256 or higher color modes like mode 0x13 and the various Mode X flavors... Though his planar 4 bit info was very useful when I made my old 720x480 16 color VGA hack... It just doesn't apply to this graphics mode with the column interlace of characters in the middle of the pixel-stream.

Though yeah, his book is a great tool -- ranks right up there with Ferraro's "Programmers Guide to the EGA/VGA Cards" and is right next to it on my shelf.
The only thing about Adobe web development products that can be considered professional grade tools are the people promoting their use.
deathshadow60
 
Posts: 62
Joined: Mon Jan 10, 2011 6:17 am
Location: Keene, NH

Re: Paku Paku -- new DOS Game released

Postby deathshadow60 » Wed Feb 09, 2011 12:49 pm

This is cute - I started playing with the idea of a tandy/jr optimized version, cutting out all the auto-detection and other hardware support... and was going to rewrite it for 160x200...

But that mode is scanline interlaced! That makes it slow as hell to actually implement code for! Not only does it add an extra layer of complication to the address calculation, I'd have to write just as many bytes at two offsets.

At first it looked promising since I could write 4 pixels with just one "iteration" and three memory accesses -- letting each loop handle up to 8px at once. (and for anything more than 4px there's no reason to code less than 8px at a time)
Code: Select all
   mov  cx,5

@loopWrite:

   mov  bx,es:[di]
   lodsw
   and  bx,ax
   lodsw
   or   ax,bx
   stosw

   mov  bx,es:[di]
   lodsw
   and  bx,ax
   lodsw
   or   ax,bx
   stosw

   add  di,78

   loop @loopWrite


But I'd have to scanline double either performing that same operation twice at the $2000 offset (wasteful) or by copying the upper buffer to the lower buffer thus:

Code: Select all
   mov  di,dx { I stored DI's starting offset in DX during blitting }
   mov  dx,ds
   mov  ds,es
   mov  si,di
   add  di,$2000
   mov  cx,5
   mov  bx,78

@loopDupe:
   movsw
   movsw
   add di,bx
   add si,bx
   loop @loopDupe


Guess that's why you don't see a lot of "true" sprite engine games on 8088 class machines.

So I'm sticking with the funky semigraphics mode I guess... I do think I can eliminate the flicker though by implementing a software buffer, I'm just not sure how that's going to effect performance as I'm going to end up needing to build two 8k buffers (source and render -- kiss that 128k friendly memory footprint goodbye), write code for erasing from the source buffer as you eat pellets, and of course the final rendering. It would allow me to pre-build the images before blitting them out to the display area though -- and if memory serves system RAM is usually faster than video RAM

CRUDSTUNK!!! Except on the Jr isn't system RAM is shared with Vid...
The only thing about Adobe web development products that can be considered professional grade tools are the people promoting their use.
deathshadow60
 
Posts: 62
Joined: Mon Jan 10, 2011 6:17 am
Location: Keene, NH

Re: Paku Paku -- new DOS Game released

Postby jmetal88 » Wed Feb 09, 2011 1:45 pm

Yup, on the Jr, the video RAM is reserved out of the system RAM. The size of the video RAM can be set with Jrconfig.dsk/Jrconfig.nrd if more or less than the default is needed, though.
jmetal88
 
Posts: 793
Joined: Sun Jul 25, 2010 10:22 am

Re: Paku Paku -- new DOS Game released -- VER 1.2!!!

Postby deathshadow60 » Wed Feb 09, 2011 11:59 pm

Ok, if you folks could run a check on the new version 1.2 for me that would be greatly appreciated. I've completely revamped how the sprite engine worked by implementing those back-buffers I mentioned. It now needs 70k of free DOS memory, but the performance difference is night and day...

http://www.cutcodedown.com/retroGames/paku_1_2.rar

WAY better.
The only thing about Adobe web development products that can be considered professional grade tools are the people promoting their use.
deathshadow60
 
Posts: 62
Joined: Mon Jan 10, 2011 6:17 am
Location: Keene, NH

Re: Paku Paku -- new DOS Game released -- VER 1.2!!!

Postby Brutman » Thu Feb 10, 2011 7:27 am

Sorry about the delayed response - I've been unexpectedly busy. It looks like you have the basics of high performance code covered .. ;-0

On the Jr the first 128K of RAM is going to be penalized. The video hardware gets priority access to it compared to the CPU, so a PCjr running anything in the lower 128K is dog slow compared to a real PC. Memory access above 128K far better - when I configure my machines I always ensure that the lower 128K is fully consumed by video buffer and a RAM disk.

I have found out many times while coding my TCP/IP stack that buffering, while expensive in terms of memory, often beats computation. The 8088 has a hard time keeping it's prefetch buffer full, so any extra instructions hurt. Branching is devastating too. So while the extra memory is painful to use, it looks like it made a big difference.

I can't begin to tell you how much code I've written and thrown away over the years. Performance tuning is an ongoing experiment ..
Brutman
Site Admin
 
Posts: 910
Joined: Sat Jun 21, 2008 5:03 pm

Next

Return to Programming

Who is online

Users browsing this forum: No registered users and 2 guests