..No justification ..No multiple spaces, use TAB chars (^P^I) ..Bolding with ^P^B, Italics with ^P^Y ZSDOS, Anatomy of an Operating System, Part II by Harold F. Bower, Major, US Army Signal Corps; BSEE, MSCIS, Ham  (WA5JAY), avid homebuilder (starting with 8008 running SCELBAL). and Cameron W. Cotrill, Vice President, Advanced Multiware Systems;  specialist in "impossible" real-time hardware and software  systems. In the first part of this article, we presented the philosophy  and the features of ZSDOS (Z-System Disk Operating System). In  this portion, we will summarize the performance of ZSDOS, share a  few of the tricks we used to shoehorn all these features into 7  bytes, and give a few programming examples showing how to use  some of the new features of ZSDOS and ZDDOS. ZSDOS Performance. Measuring the performance improvements of ZSDOS is a complicated  matter. During development, an entire suite of tests was run on  ZS/ZDDOS in various configurations in an attempt to validate the  design tradeoffs. The most revealing tests of BDOS differences  turned out to be a series of assemblies done under control of a  command script. This should be no surprise as assemblies are by  nature disk intensive. To reduce the perception that our results are "tailored" or  skewed in favor of a particular system or configuration,  different processor chips (Z80 and HD64180), different BIOSes  (MicroMint, XBIOS, Ampro), and different media (RAM disk, Hard  Disk and Floppy disk) were used in the timed runs. Since the  results were most affected by the media, results are shown in the  categories of RAM, Hard Disk and Floppy Disk performance. No form  of file date stamping was done since ZSDOS would have a distinct  advantage in this field. Three sets of hardware were used in these analyses in an attempt  to minimize the effect of any unique processes in a given system  from skewing the results. The first system (System 1 in the  timing runs) was a "stock" MicroMint SB-180 operating at a 6.144  MHz clock speed. System 2 was an Ampro Little Board 1A with a Z80  running at 4.0 MHz, and System 3 was a homebrew Z-180 system  designed to be compatible with the SB-180 operating at 9.216 MHz.  Complete information on each system in the Appendix. OPERATING SYSTEMS. CP/M 2.2. Gary Kildall and Digital Research developed this  operating system for 8-bit processors in an evolutionary process  on early 8080-based computers. A subsequent product, CP/M Plus  (also known as CP/M 3) is still in limited use, but has not Šgained the wide acceptance of the earlier release. CP/M 2.2 is  coded in 8080 assembly language and is a non-banked, non- reentrant single-user, single tasking operating system. ZRDOS 1.9. Echelon Incorporated released many versions of this  CP/M 2.2-compatible operating system over the past several years.  It is coded in Z80 assembly language and will therefore not  execute on 8080 processors. Some additional features were added,  such as one-level reentrancy under user control, and return of  the current DMA address. Later versions (after 1.5) include  enhanced support for hard disk media by not rebuilding the  allocation bit map on a disk relog command. Version 1.9 added  larger disk and file sizes. Like CP/M, it is single-user and  single-tasking. ZSDOS. This is the topic of this article, with details and  descriptions of features contained in Part I. ZSDOS is coded in  Z80 assembly language and is also a single-user, single-tasking  operating system capable of single-level reentrancy. Since this report was an aimed at formalizing an evaluation of  the performance characteristics of ZSDOS, a number of different  variants to the above operating systems were initially timed.  Because the performance of these systems was very similar to  others in the test, their comparative results are simply  summarized below. CP/M 2.2 with Plu*Perfect Systems' PUBlic patch. Only minor  differences in performance from the basic CP/M 2.2 were noted, so  results of the patched system were not included in the final  results. ZRDOS 1.2. The performance of ZRDOS 1.2 was very close to CP/M  2.2, being a couple of percent slower in the majority of cases.  It was therefore not included in the final timing analyses. ZRDOS 1.7. Timing tests indicate no significant performance  differences between ZRDOS 1.7 and 1.9. ZDDOS. Since ZSDOS and ZDDOS are largely the same code and since  comparative timings between them show less than a 1% difference,  only times for ZSDOS will be presented. BASIC IO SYSTEMS (BIOSes). MICRO MINT, SB-180. While MicroMint currently ships Version 3.2  with their systems, a slightly modified version of 2.7 was used  in these timings on the SB-180. The changes included independent  step rates for floppy drives, different floppy formats and fixing  of eight-inch drivers as well as a slight amount of optimization.  Little performance difference from the standard BIOS should be  noticed. A 54k system size was used. The BIOS uses programmed IO  on most peripherals with DMA functions of the 64180 processor  used for Floppy and RAM disk data movement. Š XBIOS, SB-180. XSystems' XBIOS version 1.1 is an extremely  powerful and flexible banked system with excellent tools and  interfaces. Malcom Kemp has concentrated on providing functions  in this release, and has deferred optimization to future  releases. XBIOS fully supports the ETS180 IO+ board, allows  complete configuration of peripherals, and provides a larger TPA  since only a small kernel resides in the primary memory area.  Most of the BIOS code resides in an alternate memory bank. XBIOS  installs the largest possible TPA when used which was 57.5k for  these tests. XBIOS was installed with three buffers for disk IO. AMPRO, Little Board-1A. A stock version of the Ampro version 3.8  BIOS assembled with no ZCPR support was used for testing. A  system size of 59k was chosen to provide support for 5 hard disk  partitions spread over two physical drives. NZCOM was then loaded  to provide Z-System support. The Ampro BIOS is strictly a polled  system and uses no interrupts or DMA. EVALUATION PROCEDURES. Since the goal of evaluating performance was to heavily exercise  BDOS functions, a set of fourteen assembly modules, thirteen of  which were 2-4k in size, and one of 6k were assembled to produce  Microsoft REL files. To restrict external influences, no file  date stamping was used, and many ZSDOS features such as Public  and Path were disabled. On the other hand, to provide a semi- realistic setting, ZEX.COM and the executable assemblers were  placed in a different Drive/User with the ZCPR search path set to  locate the files on the second directory scan. SLR's SLR180  assembler was used on system 2, while tests on systems 1 and 3  used Z80ASM+. Assembly was done under the control of a memory- based SUBMIT utility (ZEX Version 3.1A) script file. Times were  measured from the carriage return terminating the command  invoking the ZEX file to display of the "Done" message after  assembly of the last file. After each run, the .REL files  produced by the assembly were erased so that the same disk space  could be used in the next run. No other files were added or  deleted to any media during the timing runs. At least three runs  were performed for each configuration, and the results averaged.  Timing was manually performed with a stopwatch. Due to the radical differences in access times for different  media, three categories of times were considered; RAM disk, Hard  Disk, and Floppy disk. If you think you know how each system  fared, read on - there may be a twist or two in the plot. RAM DISK. The Ampro has no RAM disk, so timings in this category  reflect only the SB180. The SB180 computer is equipped with 256k  of memory. The standard MicroMint BIOS divides this into a 64k  main memory area and a 192k RAM disk. With XBIOS as tested here,  64k is allocated for the main memory, 24k for the banked portion  of XBIOS, buffers and banked system extensions. The remaining  space is available for a RAM disk. RAM disks on the SB180 use Šbuilt-in DMA capabilities of the HD64180 processor to move  "sectors" of data rather than the slower block move instructions  used by Z80 systems. Exiting a program via the Warm Boot vector in CP/M relogs the A  drive. To minimize time penalties imposed by this, a Hard disk  partition was defined as the A drive. Needed programs as well as  the assembly modules were placed on the RAM disk (M:), with  ZEX.COM and Z80ASM+.COM placed in User 15 and the sources files  in User 0. The search path for this phase was: Drive M, User 0 to  Drive M, User 15. Since the RAM disk is defined as a non-removable media in the  Disk Parameter Block, the "Rapid Relog" feature of ZSDOS and  ZRDOS was expected to produce much shorter execution times than  CP/M for this series of measurements. As can be seen from the  results, this was indeed the case. The raw timings in seconds  with percentage changes from the shortest time are: ZSDOS ZRDOS 1.9 CP/M 2.2 +------------------------------------------------+ BIOS 2.7 | 17.0 (---) 17.1 (+4%) 36.4 (+114%) | XBIOS 1.1 | 14.2 (---) 14.5 (+2%) 34.5 (+144%) | +------------------------------------------------+ The effects of the Rapid Relog feature were borne out, with ZSDOS  being a couple of percent faster. Disabling the Rapid Relog  feature of ZSDOS produced nearly identical results to CP/M, so  most of the additional time for that system may be attributed to  rebuilding the disk allocation bit maps for Drives A and M on  each warm boot. HARD DISK. Three systems, 6.144 MHz SB-180 (System 1), 4.0 MHz Ampro Little  Board-1A (System 2), 9.216 MHz Z-180 Homebrew SB-180 (System 3),  were used to gather information for this phase. This latter  system was added to demonstrate performance on a heavily loaded  system. ZSDOS ZRDOS 1.9 CP/M 2.2 +------------------------------------------------+ 1-BIOS 2.7 | 0:54.7 (---) 1:16.6 (+40%) 1:34.7 (+73%) | 1-XBIOS 1.1 | 0:52.2 (---) 1:15.4 (+44%) 1:33.4 (+79%) | 2-AMPRO | 1:55 (---) 2:44 (+43%) 3:15 (+70%) | 3-BIOS 2.7 | 1:07.7 (---) 1:40.6 (+49%) 1:50.2 (+63%) | 3-XBIOS 1.1 | 1:29.5 (---) 2:06.4 (+41%) 2:11.3 (+47%) | +------------------------------------------------+ As in the previous RAM Disk results, the results of ZSDOS with  "Rapid Relog" disabled and CP/M were nearly the same confirming  that rebuilding the allocation bit maps on a disk relog is the  principle cause for the increased CP/M times. ŠAll reported times were made with a path which forced a search of  the current directory before locating executable files on the  second path element. As an experiment, the path on the Ampro  system was changed to go directly to A2:, eliminating the current  directory scan. All DOSes showed an identical 10 second speedup,  indicating directory scan time for all DOSes was the same. A further point to note is the effect of multiple disk buffers on  performance. For system 1, the number of buffers was adequate to  retain directory information which improved performance over the  single-buffer Micromint BIOS by 1-5%. In system 3, the buffering  was inadequate to retain necessary information, so the multiple  buffers were of no benefit. FLOPPY DISK. Examination of system performance on a Floppy Disk system was  tailored to duplicate, as closely as possible, a hypothetical  operating configuration using multiple drives with non-trivial  search path along differing Drives and User area lines. Since all three primary operating systems of interest to this  analysis (ZSDOS, CP/M 2.2 and ZRDOS 1.9) rebuild removable-media  disk allocation maps on a relog, there was no need to explicitly  disable the "Rapid Relog" feature of ZSDOS for this portion of  the study. Results are: ZSDOS ZRDOS 1.9 CP/M 2.2 +----------------------------------------------+ BIOS 2.3 | 2:18.7 (+2%) 2:22.4 (+5%) 2:16.0 (---) | XBIOS 1.0 | 2:29.5 (+0.5%) 2:32.7 (+3%) 2:29.0 (---) | AMPRO | 2:26 (+1%) 2:28 (+2%) 2:25 (---) | +----------------------------------------------+ Since all of the operating systems are functionally identical in a  Floppy Disk configuration, we did not expect large differences in  measured times. We were therefore not surprised with variations  over a spread of only five percent. While we strove to make ZSDOS  as efficient as possible, CP/M was still the champ on floppy  systems by a nose. As a final comparison test between the three DOSes, the amount of  time WordStar 4 took to ^QC and ^QR through the 92k ZSDOS source  file was measured under all three DOSes. All timings were within  1%, indicating that read/write to open file times were similar. PERFORMANCE CONCLUSIONS. ZSDOS offers significant improvements in system performance on  CP/M 2.2 compatible Z80-compatible computer systems with fixed  media even under the restricted test conditions which disabled  some of the most powerful features of ZSDOS. Even more impressive  results may be obtained in a "tuned" installation with such Šfeatures as Public files, and proper selection of the DOS search  path (improvements of 9% on a hard disk system are typical). The other major conclusion that can be drawn from this effort is  that the selection of a BIOS tailored to the requirements is  crucial to achieving optimum performance. The multiple buffering  capability of XBIOS offers speed increases in systems where an  adequate number of buffers exists, but degrades floppy-based and  heavily loaded hard disk performance. During the data gathering for this report, an anomaly was noted  with respect to CP/M Plus (or P2DOS) stamps. System #1 was  initialized for P2DOS stamps on the disk holding data files to  quantify the differences. In all cases ZSDOS was affected less  than one percent, yet ZRDOS increased to seven percent longer  than ZSDOS on RAM disk, 20% longer on floppy and 144% longer on  hard disk. CP/M 2.2 was similarly affected, but to a lesser  degree, increasing times over ZSDOS to 115% on RAM disk, ten  percent on floppy and 140% on hard disk. While neither ZRDOS nor  CP/M 2.2 can manipulate this type of stamp, merely using a disk  which is so prepared will result in slower processing. HOW WE DID IT. During the year or so that we pursued our independent paths in  modifying H.A.J. Ten Brugge's excellent P2DOS alternative to CP/M  2.2's BDOS, our approaches were somewhat diverse. While Cam's  approach was directed at perfecting features, Hal's effort was  directed at streamlining the code to create a "speed demon"  operating system, and Carson concentrated on enhancing embedded  Date Stamping. In mid-1987, Bridger Mitchell was instrumental in  getting us to pool our resources and collaborate in a joint  venture. The results have been more than worth it. In Part I, we  described the functional enhancements and standards embodied in  ZSDOS, and have just shown the performance improvements compared  to CP/M 2.2 and ZRDOS 1.9. In our efforts to foster better code  for our 8-bit systems, we would now like to describe how the task  of adding features and decreasing execution time was accomplished  without increasing the Operating System memory requirements. The topic of code optimization is a controversial one. In the  early days of computers, programmers were saddled with small  memory space and slow processors, so every effort was made to  optimize programs for speed and size. As memory became cheaper  and processors emerged with ever increasing clock speeds,  programming techniques became lost to all but a few. This same  path of evolution has also been followed in the Personal Computer  field. To demonstrate this point, first compare the 3.5 kbyte CP/M 2.2  BDOS and the 1 kbyte Plu*Perfect DateStamper to the functionally  superior 3.5k ZDDOS. Next, compare the 3.5 kbyte size of CP/M 2.2  and ZSDOS to the 16 kbyte size of the functionally similar MS-DOS  2.1. To carry the point further, contrast the almost 16 kbyte ŠCOMMAND.COM to the 7 kbyte size of a more capable ZCPR3 Command  Processor with a full environment. Some of this bloat is  understandable with the change in processor chips. On the other  hand, the more powerful instructions of 16-bit 808x processors  should have counteracted a good portion of this code bloat. In line with the size comparisons, execution speeds also suffer  with the larger code. Friends and co-workers who are used to  working with PCs and clones operating at 4.77 and 8 MHz clock  rates are constantly amazed at the speed of even a lowly 4 MHz  ZSDOS system, and dazzled at the 6 and 9 MHz Hitachi 64180  systems running the same software! While much of this is  subjective, quite a bit is due to the fact that the "smaller" 8- bit code has been hand-coded and optimized, whereas the PC arena  is devoting more of its energy to coding in high-level languages.  This makes sense under certain circumstances (e.g. during  development and for long-term maintainability), but it most  certainly does NOT make sense for operating systems where size  and speed are of the essence. Since all of our efforts have been directed at the Zilog Z80 and  compatible family of microprocessors (including Hitachi's 64180  and National's NSC800), the optimization steps covered here apply  directly only to these. Having stated that, we also need to point  out that many of the basic concepts will still apply to other  processors, although details may differ. No matter what processor is used, the goals of faster program  execution and smaller memory size are in conflict. Smaller memory  size normally means using each section of code as many times as  possible - typically by using many subroutines. Faster code  execution often means avoiding as many subroutine calls as  possible. In every program undergoing optimization, the  conflicting size and speed requirements must be balanced. This  balance can be highly subjective. In ZSDOS, code size was the  primary concern though significant effort was given to making the  smaller code run as fast as possible. Now for the minutiae. If you are not a programmer, or are  interested only in how to use ZSDOS, you might want to skip to  PROGRAMMING FOR ZSDOS. For the diehards - here it is! One of the first techniques we used in optimizing code was to  examine all JUMP instructions. The basic instruction is three  bytes long and executes in 10 clock cycles on a Z80. These  absolute jumps may be unconditional (JP addr), or conditional (JP  C,addr) based on the contents of the Carry, Zero or  Parity/Overflow flags. The Z80 also features a two-byte Relative  jump (JR) which also may be absolute (JR addr), or conditional  (JR C,addr) based on the Carry or Zero flags. The relative jump  is only two bytes long and may branch only to addresses within  the range of +127 to -128 bytes of the jump instruction. While it  is relatively easy to blindly change all jump instructions within  range to Relative jumps, the careful programmer will also note  that the Relative jump may carry a time penalty. The absolute Šrelative jump, and conditional jumps where the condition is  satisfied (the jump is taken) require 12 clock cycles compared to  the long jump consuming only 10 cycles regardless of condition.  On the other hand, conditional relative jumps need only 7 cycles  if the condition is false. This type of optimization was one of  the first used in our efforts to enhance P2DOS. The next simple optimizing technique we used was to make maximum  use of the Decrement-B and Jump Relative if Not Zero (DJNZ)  instruction. This two-byte sequence executes in 8 or 13 clock  cycles (B=0 and B<>0 respectively) for an absolute time and code  saving over separate decrement/jump sequences. In some of our  work on ZSDOS, using this instruction required redefining  register usage to free up the B register for use as a counter. Another simple optimizing step was examining the use of the IX  register. IX holds the argument passed to DOS in the DE register  (typically a file control block pointer). Despite having this  value available all the time, there were a significant number of  cases when faster and/or shorter code was produced by moving the  pointer into HL. This was normally the case when the same offset  within the FCB was accessed two or more times in succession. The final "simple" optimization technique we used was to examine  all PUSHes and POPs to the stack and delete any found to be  unnecessary. While this sounds simple, it is quite a chore in a  complex program such as ZSDOS where CALLs call other CALLs which  call still other CALLs, etc. Each path must be examined to insure  that the registers are, in fact, not altered or needed. After the above "simple" optimizations were performed, A series  of what we term "moderate" optimization steps were undertaken.  One of these involved examining all series of sequential checks  on a byte (such as the input command character scanner) and  structure the check sequences to optimize performance based on  clock cycle counting mentioned above, and estimated frequency of  access for various commands. In the case of the command  dispatcher, this technique resulted in extremely fast command  parsing implemented with minimum code. Sequential bit shifts and rotates are another area where more  analysis is required before final code can be written. Sixteen- bit shifts, and 8-bit shifts in registers other than the  accumulator are areas where gains can be achieved. The usual  method of using a subroutine which loads all bytes to the  accumulator for shifts and rotates fares poorly if only one or  two bit shifts are needed. While most of these cases had been  removed from the P2DOS code by the original author, the  replacement inline code still suffered from some inefficiencies.  A two-bit shift right (division by 4) of the 16-bit HL register  pair in the STDIR routine using the code: SRL H ; Divide by 2 RR L SRL H ; Divide by 4 Š RR L proved optimum. Using a two-iteration loop with the DJNZ  instruction around a single SRL H, RR L sequence would have  produced the same 8-byte code length, but at a penalty of 21  clock cycles. A call to a subroutine would have fared even worse  with a 27 clock cycle CALL/RET penalty, and four bytes of  overhead. On the other hand, three-bit shifts of the HL  register pair occurred in a number of routines. These were  consolidated into a single callable routine that uses the B  register as a counter in an iterative loop with the sequence: SHRHL3: LD B,3 SHRHLB: SRL H RR L DJNZ SHRHLB RET While the replacement code added overhead, it saved 3-5 bytes of  code (depending on entry point) which were sorely needed to add  additional features. ZSDOS calls this routine from three places,  while ZDDOS calls it from five. The difference is due to ZSDOS  "unrolling" the loop in time critical routines. Shifts to the left were occasionally handled a little more  efficiently by using the 16-bit ADD instructions of the HL  register pair to perform bit shifts. An example of this appeared  in the CALST routine. In this case, the DE register pair was  rotated one bit to the left with sequential RL E, RL D  instructions, with the Carry bit shifted into the HL register  pair. Where the original code used the sequence: RL L, RL H to  shift the bit into the HL pair, a two byte code savings was  achieved with the single two-byte ADC HL,HL instruction. Another area where considerable code and time savings were  realized was in the consolidation of routines into "straight- line" code. While this seems to be an anathema to structured  programmers, it is often a must to obtain the performance  improvements which we sought from our efforts. As a first step,  all routines ending in Jump instructions were examined. Target  addresses were then checked to insure that no other routine "fell  through" to them. If it was in fact a "stand-alone" routine, it  was moved to the end of the first routine so that the Jump could  be deleted. An example of this is where the INITDR routine was  moved to follow SELDK directly saving the two-byte relative jump  and 12 clock cycles. Other cases involving long jumps saved three  bytes and 10 clock cycles. A minor variation in relocation of  code is to group functions to bring them within range of relative  jumps thereby saving one byte at the expense of two clock cycles.  This minor penalty in time often outweighed the value of a single  byte of code in our efforts. A variant on this concept involved examining sequences of code  for duplicity, and combining identical sequences into new  routines which "fall through" to the destination. This was amply Šused to define a new routine: SRCT15: LD A,15 CALL SEARCH This sequence was placed immediately before the TSTFCT routine,  and replaced three occurrences of: LD A,15 CALL SEARCH CALL TSTFCT with a single CALL to SRCT15. The overall effect of this one  change was a savings of 10 bytes of code and 24 clock cycles for  each of the three sequences replaced. Detailed examination of code also produced unexpected savings by  merely defining new labels. As an example, the last three  instructions of the routine OPENEX were: LD A,0FFH LD (PEXIT),A RET This sequence occurred two other times in the original code, and  three times in the latest version of ZSDOS. The last two  instructions were repeated in many locations, so one location was  selected (centrally located to take advantage of relative jumps),  with other instances accessing it with a call or jump to the new  label, SAVEA. Setting the value to 0FFH in OPENEX was labeled as  SETCFF, and the other two occurrences jumping to this location.  While a small time penalty was incurred in jumping to this common  code, the three byte savings was again needed to add features. Our code "walk-throughs" and optimization efforts did not stop  with the original code, but continued with every test version.  First, we discovered a common "shell" of instructions around the  DELETE, CSTAT, and RENAME functions and combined them with a net  savings of 12 bytes. Later, a trick used in public-domain inline  print routines to pass addresses on the processor's stack was  used to recover five bytes of code by replacing three sequences  of: LD HL,(address) JR COMCOD with three 3-byte CALL COMCOD instructions. The trick involved in  this change was to place the CALLs immediately in front of the  routines whose addresses were to be passed to COMCOD. When  executed, the CALL placed the routine address on the stack. A  one-byte POP HL instruction at the beginning of COMCOD completed  the change by placing the address in the desired HL register.  Still later, the internal code in the COMCOD routine was again  optimized to remove several memory references. This saved another  four bytes. Š Cameron's rewrite of the Console IO routines demonstrated another  technique of reducing code size with very little overhead. The  majority of affected code involved different DOS commands, yet  exited through common code with absolute jumps. By PUSHing the  exit address on the stack prior to jumping to the routines, a  simple RETurn instruction sufficed to direct execution through  the exit code saving two bytes per occurrence. The four bytes  required to set the return address meant that the code size  break-even point occurred at two instances. Since far more cases  than that were involved, a significant code size reduction was  realized. For DOS function calls, the time penalty incurred was  21 clock cycles, however, that was not considered significant  when dealing with the normal serial IO devices used in console  functions. A final noteworthy trick was added by Cameron which neither of us  had ever seen documented in the Z80 world. It used the sixteen- bit load instruction into the IX register (a four byte  instruction) to "fall through" successive 16-bit loads to the  primary registers. In this fashion, the sequence: CMND27: LD HL,(ALV) JR SAVHL CMND24: LD HL,(LOGIN) JR SAVHL CMND31: LD HL,(IXP) JR SAVHL CMND47: LD HL,(DMA) SAVHL: LD (PEXIT),HL RET was replaced by a more efficient (in code size) construct. The  bytes, as coded, are on the left, with the instructions seen by  CMND27 shown on the right: CMND27: LD HL,(ALV) CMND27: LD HL,(ALV) DEFB 0DDH LD IX,(LOGIN) CMND24: LD HL,(LOGIN) DEFB 0DDH LD IX,(IXP) CMND31: LD HL,(IXP) DEFB 0DDH LD IX,(DMA) CMND47: LD HL,(DMA) SAVHL: LD (PEXIT),HL LD (PEXIT),HL RET RET This code works because the IX register is not used in the  remainder of the exit code, and the entry IX value is restored  upon returns from ZSDOS functions. Each cascaded value saves one  byte of code, but adds additional clock cycles to the execution  time. Where the original code required a constant 28 clock cycles  before arriving at the SAVHL routine, the new code execution time Šis different for each entry point. In this example, the time (in  clock cycles) required for each entry point to arrive at SAVHL  is: CMND47 - 16 cycles CMND31 - 20 + 16 = 36 CMND24 - 20 + 20 + 16 = 56 CMND27 - 20 + 20 + 20 + 16 = 76 At this point, an analysis of probable calling frequency was done  to order the calls so that the most frequently used functions  would incur the least penalty. The ordering shown here was judged  to be the optimum sequence. In a similar manner, eight-bit loads of the A register were  consolidated at the beginning of the SEARCH routine. Our analyses  of the code showed that SEARCH was called several times with  values of 12 and 15 in the A register. Loading of these values  was relocated to the beginning of SEARCH, then consolidated with  another single-byte DEFB prefix. The resultant code as entered,  and as seen by SEAR12 is: SEAR12: LD A,12 SEAR12: LD A,12 DEFB 21H LD HL,0F3EH SEAR15: LD A,15 SEARCH: ... SEARCH: ... Instead of posing a time penalty as the LD IX,nn trick described  above, this case saved one byte over a relative jump and two  clock cycles (JR = 12 cycles, LD HL,nn = 10 cycles). As above,  this worked because the HL register contents were "don't care"  upon entry to the SEARCH routine. These techniques are very powerful when code size is at a  premium. Any sequence of code that loads a register or register  pair then jumps or calls a common routine is a candidate for this  technique. You need a register pair to throw away, but this is  usually easy to find. The final case of optimization is the most difficult, and  involved complete logic redesigns. This area is so specific and  lengthy that it will not be covered here. As so often stated in  textbooks, it is "left as an exercise for the reader" to examine  the original P2DOS source and identify areas which can be  redesigned. Much logic redesign was required as a part of the  added ZSDOS and ZDDOS features, though the effort didn't stop  there. Just as important as what we did to gain speed and reduce size is  what we didn't do. P2DOS originally used some self modifying code  in the error printing routine. We decided from the outset that we  would avoid this practice (tempting though it is..) in order to  produce code that could be ROMed and/or run on the Z280 in  protected mode. This decision cost us several bytes of code, but  allowed us to accomplish our goals. Š PROGRAMMING FOR ZSDOS. ZSDOS places a few restrictions on systems which do not exist in  other CP/M compatible operating systems. The most significant is  that the BIOS MUST NOT DISTURB THE IX REGISTER. So far, the Epson  QX-10 and Zorba computers have been identified as having BIOSes  that corrupt this register. With NZCOM, we have developed a  "protective" NZBIOS (look for ZSNZBI12.LBR on most Z-Nodes) that  shields the Z80 registers from ill-behaved BIOSes, but operation  without NZCOM on such systems will require that the BIOS be re- written. On this topic, we would like to propose that all programmers  observe register usage more closely. The Z80 alternate and index  registers belong to APPLICATION programs, and must be preserved  by all operating system components. On the other hand, the "I"  and "R" registers, as well as all new 64180 and Z280 registers  (with the exception of the Z280's SSP) belong to the BIOS since  they are hardware specific and directly I/O related. The Z280 SSP  should be reserved for BDOS use. Before trying to access any of the expanded ZSDOS features  discussed in the last issue, you should first insure that the  program is in fact executing under ZSDOS. This is a two-step  procedure involving a call to check for CP/M 2.2, then a call to  the ZSDOS Return Version function. By checking in this manner,  your program will be able to identify CP/M 1, 2 and 3 (aka Plus)  as well as ZSDOS, ZDDOS and ZRDOS. Code to accomplish this task  is: LD C,12 ; Return CP/M Version CALL 0005 ; ..via BDOS CP 30H ; Is it CP/M Plus? JR NC,ISCPM3 ; ..jump if so CP 20H ; Is it CP/M 1.x? JR C,ISCPM1 ; ..jump if so w/version # in A CP 22H ; Is it CP/M 2.2? JR NZ,BADVER ; ..jump to unknown 2.x version LD C,48 ; Now make the extended call CALL 0005 ; ..via BDOS LD A,H ; Check the DOS type first CP 'D' ; Is it ZDDOS? JR Z,ISZD ; ..jump if so, Ver # in L CP 'S' ; Is it ZSDOS? JR Z,ISZS ; ..jump if so, Ver # in L OR A ; Is it ZRDOS? JR Z,ISZR ; ..jump if so, Ver # in L ... ; Else can't identify, do error Bridger Mitchell's Advanced CP/M column in TCJ #36 also provides  sample code to perform this function. A slight variation on the  above sequence is used in utilities provided with ZSDOS to enable  them to work under a variety of different operating systems. We  propose that this technique be used for any future Disk Operating  systems by returning a different unique character in the "H" Šregister. Many programs in the past have relied on unpublished locations  within the BDOS to alter the performance or functionality of the  system. With ZSDOS, we provide published "standard" ways to  dynamically tailor DOS parameters. The most important way of  accomplishing this is with a set of configuration bits, or flags.  To accommodate future expansion, a word value of sixteen bits is  defined with only the lower seven used in the current 1.0  release. The Flag bits used in ZSDOS 1.0 are: D D D D D D D D 7 6 5 4 3 2 1 0 \ \ \ \ \ \ \ \_Public File Access \ \ \ \ \ \ \__Public/Path Write \ \ \ \ \ \___Read-Only Disk \ \ \ \ \____Fast Fixed Disk Relog \ \ \ \_____Disk Change Warning \ \ \______BDOS Search Path * \ \_______Path w/o SYS Attribute * \________(Reserved) The cited function is activated by setting the respective bit to  a "1", and disabled by clearing the bit to a "0". Since ZDDOS has  no search path capability, the features marked with an asterisk  pertain only to the full ZSDOS configuration, and are "don't  care" bits in ZDDOS. The bits will be returned as the lower byte  in the 16-bit word field in the "L" register. Code for returning  them is: LD C,100 ; Get the FLAGS bits CALL 0005 ; ..with DOS call ... ; "L" has present 7 bits Likewise, the flags may be set from applications programs with  Function 101 as: LD DE,(FLAGS) ; 1.0 only recognizes byte in E LD C,101 ; Now set flags in ZSDOS CALL 0005 ; ..with DOS call ... ; New settings are now effective Date and Time capabilities are just as easily accessed. The 6- byte Clock data may be retrieved to a specified buffer with DOS  Function 98 as: LD DE,TIMEAD ; Address of 6-byte buffer LD C,98 CALL 0005 ; Read Clock from DOS INC A ; Any Errors? (FF --> 0) JR Z,ERROR ; ..jump if error (no clock?) ... ; Else use the retrieved time TIMEAD: DEFB 0,0,0,0,0,0 ; Initialized Null DateSpec With the File Date Stamping capabilities of ZSDOS, we developed a Šsingle standardized way of accessing individual file stamps.  Function 102 will copy the set of stamps for a specified file to  the current DMA address, while 103 will set the stamps for the  specified file to the values at the current DMA address. Since  all supported stamping methods (currently DateStamper(tm) and the  CP/M Plus compatible P2DOS) feature the same format at the ZSDOS  level, no user conversions are needed. Indeed, using special  stamp drivers provided with the ZSDOS package, either stamp type  may be read with both being written by Function 103 if the  destination disk has been so prepared. A sample of code used to  copy stamp data from one file to another is: LD DE,DSBUF ; Point to 15-byte stamp buffer LD C,26 ; ..and set the DMA address CALL 0005 LD DE,SRCFCB ; Source FCB (User set already) LD C,102 ; Get the source's Stamps CALL 0005 ... ; Set User to destination? LD DE,DSTFCB ; Destination FCB LD C,103 ; Write Stamps from DMA buffer CALL 0005 ; ..to Dest file ... FINAL THOUGHTS. ZSDOS was a labor of love. Though we didn't really start out to  create such a significant step forward in 2.2 compatible BDOSes,  it turned out that way. It is our hope that the ideas presented  in ZSDOS will form the basis for the next generation of BDOS  replacements. If nothing else, we hope that ZSDOS stimulates the  Z80 compatible community to address the issues of standards for  datestamping, enhanced error handling, and global file access. The next step for an improved operating system will be to break  the 64k barrier. Joe Wright and Jay Sage's efforts in dynamic  system configuration with NZCOM are very useful, but fail to  address the fundamental problem - we need to use the banked  memory featured in most newer systems. Furthermore, this must be  done in a way that allows existing applications to run properly.  This means (unlike CP/M Plus) a BDOS that lets BIOS deblock, a  BIOS jump table that is directly callable from all banks, system  vectors at the normal locations, etc. This also means  establishing standards for bank sizes and addresses, hardware and  processor independence, and finally universal DOS level and BIOS  level interfaces to banked memory. Other standards that will be  needed by the next generation of OS's include banked RSX  standards (though Bridger Mitchell and Malcom Kemp seem to have  this nailed down), banked device driver standards, and expanded  TCAPS and ENV definitions (aren't these properly BIOS structures  folks?). Now is the time to come together, speak up on these  matters, carefully weigh all alternatives, and make our wishes  known. ŠAlso, we urge the community to support those doing active  development for our systems by purchasing legal copies of the  software you use. This will allow and encourage development of  things like a new, better, and faster banked systems with all the  goodies we really want. We applaud the efforts of MicroPro in  developing and releasing WordStar 4 for CP/M systems, and  encourage other vendors to update their CP/M offerings in the  fields of Database Management systems and Spreadsheets for the  new generation of systems. Further, let's agree to agree on what  we really want. In this manner, we can all concentrate our  efforts on applications programs, not rewriting BDOS. In short,  let's work together to create a computing environment that will  turn the big blue clones green with envy. In conclusion, what started as independent "labors of love" to  produce a better operating system rapidly became identical  obsessions as we reverted to counting clock cycles and bytes. We  are satisfied with the results, and hope that others will benefit  from our work and produce smaller, faster and more full-featured  programs to help make our lives easier (and keep from emptying  our wallets with requirements for constant upgrades). Finally, we  must thank H.A.J. Ten Brugge for beginning this entire episode by  releasing P2DOS. Without his efforts, none of us (Cam, Hal and  Carson) would have been tempted into the area of operating system  authorship, and would have left it to "others" to determine what  we need in our respective systems. APPENDIX: The hardware used in these analyses is: System #1: MicroMint SB-180. Processor: HD64180 operating at 6.144 MHz clock rate with No memory wait states and 2 IO wait states. Console: Serial Console connected to ACSI port 1 at 19.2 kbps, Interrupt-driven buffered keyboard input. Interfaces: ETS180 IO+ providing SCSI interface and RTC. CCP: ZCPR 3.3 with full environment. BIOS: MicroMint 2.7 modified / XSystems XBIOS 1.1. Search Path: $$:, A15: (Current Drive & User, then A15:) Hard Disk: Syquest SQ-306R 5 Megabyte removeable-media, Interleave of 3, 12 microsecond buffered seek, Adaptec 4010 controller. A: 1576k of 2552k free, 94 files, 68 in User 15. B: 2432k of 2568k Free, 17 files, 16 in User 1. Floppy Disks: A: NEC 80-track DSDD, 4 mS step, 4 mS Head Load, 16k of 782k free, 93 files, 68 in User 15. C: Shugart SA465 80-track DSDD, 6mS step, 736k of 782k Free, 17 files in User 1. System #2: Ampro Little Board 1A. Processor: Z80A operating at 4.0 MHz. Console: Serial Console connected to DART port 1 at 9600 baud, hardware handshake enabled. Interfaces: SCSI daughter board with NCR 5830 driving 1610-4Š controller. CCP: ZCPR 3.4 with full environment. BIOS: Ampro V3.8/NZCOM. Search Path: $$:, A2:, A0: (Current Drive & User, then A2, A0:) Hard Disks: Seagate ST-225 20 Megabyte, interleave of 2, 200 microsecond buffered seek, Shugart 1610-4 controller. A Shugart 5Mb full height drive was also connected to the controller, but was not used in the test. A: 2744k of 8160k free, 425 files, 77 in User 2. C: 984k of 4192k free, 258 files, 32 in User 3. Floppy Drives: A: Teac 55F 80 track DSDD, 6 mS step, 10k of 782k free, 74 files. B: Teac 55F 80 track DSDD, 6 mS step, 736k of 782k free, 17 files in User 0. System #3: Homebrew SB-180 compatible. Processor: Z-180 operating at 9.216 MHz clock rate with No memory wait states and 3 IO wait states. Console: Serial Console connected to ACSI port 1 at 19.2 kbps, Interrupt-driven buffered keyboard input. Interfaces: ETS180 IO+ providing SCSI interface and RTC. CCP: ZCPR 3.0 with full environment. BIOS: MicroMint 2.7 modified / XSystems XBIOS 1.1. Search Path: A15: (ZCPR 3.0 searches current, then A15:) Hard Disk: Shugart SA-712 10 Megabyte, Interleave of 1, 12 microsecond buffered seek, Shugart 1610-3 controller. A: 324k of 2552k free, 179 files, 101 in User 15. D: 252k of 2792k Free, 438 files, 16 in User 5.