Sunday, September 24, 2017

What's the time?

Previously in this blog: ...two ways from here: either fix the floppy emulation, or make OFW for 40p with no floppy...

... or skip the call. You know, I have an armed debugger here and am not afraid to use it. So just turn the fatal call:


/usr/lib/methods/cfgfda_isa -2 -l fda0

into something harmless, like:

/bin/echo -2 -l fda0

by using

set *(int *) 0x200c11a8 = 0x2f62696e
set *(int *) 0x200c11ac = 0x2f656368
set *(int *) 0x200c11b0 = 0x6f000000

Well actually it probably should have been "/usr/bin/echo", there is no "/bin/echo" in the system. But obviously the attempt above was good enough for AIX, as it doesn't really need the floppy disk adapter (nor mouse & keyboard which I had to hack in a similar way at the second attempt). This brings AIX here:


Completed method for: fda0, Elapsed time = 0
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/cfgfda_isa:  not found

Method error (/usr/lib/methods/cfgfda_isa -2 -l fda0 ):
        0514-068 Cause not known.
...
exec(/../usr/sbin/lqueryvg,-phdisk0,-L)
exec(/../usr/bin/grep,00000000000000000000000000000000)
exec(/usr/bin/dosread,-S,/preload,/preload)
exec(/usr/lpp/bosinst/datadaemon)
exec(/../usr/bin/sleep,1)

Where it hangs forever. And now the problem is sort of obvious. Yesterday I wrote that the boot log hadn't had shown any hint. But it did:


Time: 0 LEDS: 0x539
...
Time: 0 LEDS: 0x78a
...
Completed method for: bus0, Elapsed time = 0
...
Time: 0 LEDS: 0x539
...
Time: 0 LEDS: 0x868
...
Completed method for: scsi0, Elapsed time = 0
...

See? The clock is not ticking (it's probably caused by a QEMU bug, that "loadvm" command sometimes doesn't restore one of the machine timers. And I used the command a lot during the yesterdays session).

So basically there are two scenarios:
 - the clock is ticking - in this case AIX doesn't start any methods after spawning the init process
 - the clock is stopped - in this case it starts the methods up to the point where the timeouts are important. Probably if the clock had worked properly the boot process wouldn't had stopped at the floppy detection method.

Which means that debug process is getting real complicated. Now I have to debug the kernel scheduler, which is tricky. And obviously is different from AIX 4.2 which doesn't hang at that point.

The KDB from 5.1 has some features to see the scheduled timers, but I'm not sure it can be used to debug the interrupt handling. At least Solaris kadb was not good for debugging the interrupts, as it made a lot of side effects, and mostly hanged the system right after setting the breakpoint.

So, the good news: the most of the QEMU's 40p model devices are working properly. The bad news: finding a black sheep in a dark room is pretty hard.

Saturday, September 23, 2017

Some experiments with AIX 5.1

Since I could not find the AIX 4.2 install for Motorola, I gave AIX 5.1 under qemu-system-ppc a shot. The feelings are mixed, on one hand I've got no reference machine to check things, on the other hand the KDB debugger in AIX 5.1 is much more powerful than in 4.2. The initialization process of 5.1 is close to 4.2,  so I can recognize some structures. Which is good: the version 4.2 is quite different from 4.1.4 which I tried first. So I was afraid they made an equal leap in 4.x->5.x transition. Well, partially they did. Although the function names are more or less the same, the debugger made a great leap forward.

This stack trace looked like a flashback.

[01DE1BCC]init_pcicfg+000000 (2FF3A910 [??])
[01DE1380]config_pal+000030 (??)
[01DE12F8]config_planar_pal+0001D8 (??, ??)
[004832AC]config_kmod+000184 (??, ??, ??)
[004836E4]sysconfig+000104 (??, ??, ??)
[00003A94].sys_call+000000 ()
[10002668]cfgpal_rspc+0003E8 ()
[100016C0]main+000110 (??, ??, ??)
[10000188]__start+000088 ()

Under 4.2 it was like this:

(gdb) bt
#0  0x018d2b7c in ?? () -- pci_rw
#1  0x00088114 in ?? ()
#2  0x00088114 in ?? ()
#3  0x018d1db0 in ?? () -- init_crashdump 0x018d1d70
#4  0x018cf410 in ?? () -- config_pal  0x018cf35c
#5  0x018cf30c in ?? () -- config_planar_pal 0x018cf100
#6  0x000f9b3c in ?? () -- config_kmod 0x000f9a5c, 2 params, size 0x118
#7  0x000f9eb8 in ?? ()
#8  0x000037a8 in ?? ()

See the formatting differences and gaps? That's because under 4.2 I had to make the trace manually. Did I mention that the 4.2 debugger is eighties style? So, now I'm really enjoying the luxury of having a modern tool.

Also there is a possibility to make the output verbose:

KDB(0)> mw enter_dbg
enter_dbg+000000:  00000000  = 42
n_core+000000:  00000032  = .
KDB(0)>

But then there are also some bad news. There are still bugs (or missing features) in qemu. Even worse, there is at least one Heisenbug. Some times it gets to the PCI initialization and sometimes not. And in the cases where it doesn't get to PCI init it's really unclear why: it just sits in the idle loop, interrupts are enabled, and it receives the interrupts from the timer. Just for some reason it thinks there is nothing to do. Debugging such cases is a real nightmare.
So, I thought maybe go as far it can in case where it does reach PCI init and see any clues in the log.
No obvious clues, but here goes a pretty long log:

Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'bus0'
 cfgmgr LED{78A}
Time: 0 LEDS: 0x78a
Invoking /usr/lib/methods/cfgbus_pci -1 -l bus0
exec(/bin/sh,-c,/usr/lib/methods/cfgbus_pci -1 -l bus0)
Number of running methods: 1
exec(/usr/lib/methods/cfgbus_pci,-1,-l,bus0)
Breakpoint
.bus_register+000000     mflr    r0                  <01DEADA0>
KDB(0)> g
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -c bus -s pci -t isa -p bus0 -w 88 -L 04-A0 -d)
exec(/usr/lib/methods/define_rspc,-c,bus,-s,pci,-t,isa,-p,bus0,-w,88,-L,04-A0,-d)
exec(/bin/sh,-c,/usr/lib/methods/cfgbus_isa -1 -l bus1)
exec(/usr/lib/methods/cfgbus_isa,-1,-l,bus1)
Breakpoint
.bus_register+000000     mflr    r0                  <01DF8FF0>
KDB(0)> g
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t fda -p bus1 -w PNP0700ffffffff -L 01-B0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,fda,-p,bus1,-w,PNP0700ffffffff,-L,01-B0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t isa_keyboard -p bus1 -w PNP0303ffffffff -L 01-D0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,isa_keyboard,-p,bus1,-w,PNP0303ffffffff,-L,01-D0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t isa_mouse -p bus1 -w PNP0F03ffffffff -L 01-E0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,isa_mouse,-p,bus1,-w,PNP0F03ffffffff,-L,01-E0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -d -c adapter -s isa_sio -t s1a -p bus1 -w PNP05011 -L 01-F0)
exec(/usr/lib/methods/define_rspc,-d,-c,adapter,-s,isa_sio,-t,s1a,-p,bus1,-w,PNP05011,-L,01-F0)
exec(/bin/sh,-c,/usr/lib/methods/define_rspc -c adapter -s pci -t ncr810 -p bus0 -w 96 -L 04-B0 -d)
exec(/usr/lib/methods/define_rspc,-c,adapter,-s,pci,-t,ncr810,-p,bus0,-w,96,-L,04-B0,-d)
----------------
Completed method for: bus0, Elapsed time = 0
Return code = 0
***** stdout *****
:devices.isa_sio.IBM000E :devices.isa_sio.PNP0400 :devices.pci.22100020
fda0,sioka0,sioma0,sa0,scsi0

*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'fda0'
Method: /usr/lib/methods/cfgfda_isa not in boot image, configure in phase 2
----------------
Attempting to configure device 'sioka0'
Method: /usr/lib/methods/cfgkm_isa not in boot image, configure in phase 2
----------------
Attempting to configure device 'scsi0'
 cfgmgr LED{868}
Time: 0 LEDS: 0x868
Invoking /usr/lib/methods/cfgncr_scsi -1 -l scsi0
exec(/bin/sh,-c,/usr/lib/methods/cfgncr_scsi -1 -l scsi0)
exec(/usr/lib/methods/cfgncr_scsi,-1,-l,scsi0)
exec(/bin/sh,-c,/etc/methods/define -c disk -s scsi -t osdisk -p scsi0 -w 0,0)
exec(/etc/methods/define,-c,disk,-s,scsi,-t,osdisk,-p,scsi0,-w,0,0)
exec(/bin/sh,-c,/etc/methods/define -c cdrom -s scsi -t oscd -p scsi0 -w 2,0)
exec(/etc/methods/define,-c,cdrom,-s,scsi,-t,oscd,-p,scsi0,-w,2,0)
Number of running methods: 1
----------------
Completed method for: scsi0, Elapsed time = 0
Return code = 0
***** stdout *****
hdisk0 cd0
*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Attempting to configure device 'hdisk0'
Method: /etc/methods/cfgscdisk not in boot image, configure in phase 2
----------------
Attempting to configure device 'cd0'
 cfgmgr LED{723}
Time: 0 LEDS: 0x723
Invoking /etc/methods/cfgsccd -1 -l cd0
exec(/bin/sh,-c,/etc/methods/cfgsccd -1 -l cd0)
exec(/etc/methods/cfgsccd,-1,-l,cd0)
Number of running methods: 1
----------------
Completed method for: cd0, Elapsed time = 0
Return code = 0
*** no stdout ****
*** no stderr ****
----------------
Time: 0 LEDS: 0x539
Number of running methods: 0
 cfgmgr LED{539}
----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/deflvm"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/deflvm )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/deflvm:  not found

Method error (/usr/lib/methods/deflvm):
        0514-068 Cause not known.
sh: /usr/lib/methods/deflvm:  not found

----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/fdarcfgrule"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/fdarcfgrule )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/fdarcfgrule:  not found

Method error (/usr/lib/methods/fdarcfgrule):
        0514-068 Cause not known.
sh: /usr/lib/methods/fdarcfgrule:  not found

----------------
Time: 0 LEDS: 0x538
Invoking top level program -- "/usr/lib/methods/defssar"
 cfgmgr LED{538}
exec(/bin/sh,-c,/usr/lib/methods/defssar )
 cfgmgr LED{539}
Time: 0 LEDS: 0x539
Return code = 127
*** no stdout ****
***** stderr *****
sh: /usr/lib/methods/defssar:  not found

Method error (/usr/lib/methods/defssar):
        0514-068 Cause not known.
sh: /usr/lib/methods/defssar:  not found

 cfgmgr LED{FFF}
Configuration time: 0 seconds
+ 1> /etc/filesystems
+ /usr/lib/methods/showled 0x517
exec(/usr/lib/methods/showled,0x517)
 showled LED{517}
+ bootinfo -b
exec(/usr/sbin/bootinfo,-b)
exec(/usr/lib/boot/bin/bootinfo_rspc,-b)
+ mount -v cdrfs -o ro /dev/cd0 /SPOT
exec(/usr/sbin/mount,-v,cdrfs,-o,ro,/dev/cd0,/SPOT)
exec(/usr/bin/sh,-c,/usr/sbin/wlmcntrl -u -d "" > /dev/null 2>&1)
+ [ 0 -ne 0 ]
+ /usr/lib/methods/showled 0x512
exec(/usr/lib/methods/showled,0x512)
 showled LED{512}
+ /SPOT/usr/bin/rm -r /etc/init /usr/bin /usr/lib/boot /usr/lib/drivers/ataide /usr/lib/drivers/ataidepin /usr/lib/drivers/cfs.ext /usr/lib/drivers/idecdrom /usr/lib/drivers/idecdrompin /usr/lib/drivers/isa /usr/lib/drivers/pci /usr/lib/drivers/planar_pal_rspc /usr/lib/drivers/scdisk /usr/lib/drivers/scdiskpin /usr/lib/methods/cfgataide /usr/lib/methods/cfgbus_isa /usr/lib/methods/cfgbus_pci /usr/lib/methods/cfgidecdrom /usr/lib/methods/cfgncr_scsi /usr/lib/methods/cfgsccd /usr/lib/methods/cfgsys_rspc /usr/lib/methods/chggen /usr/lib/methods/chggen_rspc /usr/lib/methods/define /usr/lib/methods/define_rspc /usr/lib/methods/defsys /usr/lib/methods/showled /usr/lib/methods/ucfgdevice /usr/sbin
exec(/SPOT/usr/bin/rm,-r,/etc/init,/usr/bin,/usr/lib/boot,/usr/lib/drivers/ataide,/usr/lib/drivers/ataidepin,/usr/lib/drivers/cfs.ext,/usr/lib/drivers/idecdrom,/usr/lib/drivers/idecdrompin,/usr/lib/drivers/isa,/usr/lib/drivers/pci,/usr/lib/drivers/planar_pal_rspc,/usr/lib/drivers/scdisk,/usr/lib/drivers/scdiskpin,/usr/lib/methods/cfgataide,/usr/lib/methods/cfgbus_isa,/usr/lib/methods/cfgbus_pci,/usr/lib/methods/cfgidecdrom,/usr/lib/methods/cfgncr_scsi,/usr/lib/methods/cfgsccd,/usr/lib/methods/cfgsys_rspc,/usr/lib/methods/chggen,/usr/lib/methods/chggen_rspc,/usr/lib/methods/define,/usr/lib/methods/define_rspc,/usr/lib/methods/defsys,/usr/lib/methods/showled,/usr/lib/methods/ucfgdevice,/usr/sbin)
...
Attempting to configure device 'fda0'
 cfgmgr LED{828}
Time: 0 LEDS: 0x828
Invoking /usr/lib/methods/cfgfda_isa -2 -l fda0
exec(/bin/sh,-c,/usr/lib/methods/cfgfda_isa -2 -l fda0)
Number of running methods: 1
exec(/usr/lib/methods/cfgfda_isa,-2,-l,fda0)

Now it hangs on the floppy disk adapter init. Looks like there is no timeout. Strange.
There are two ways from here: either fix the floppy emulation, or make OFW for 40p with no floppy...

Saturday, September 16, 2017

AIX under QEMU boots up to NFS

Launching a proprietary OS under QEMU is never boring because every next problem has to do with yet another component. A few weeks ago I was mostly doing Forth to make a bootable firmware, then I fought with the missing residual data, at which point it was mostly debugging ODM database using the PPC assembly, then extracted the mock residual data from the live system, then spent some time with NCR/LSI script, and now after fixing the PCI layout it gets to the point of starting the NIS and NFS services. It works much slower than the real machine, and also much slower than Linux/PPC, but still:

MOT PowerStack2 (e0), Serial #0, 62 MiB memory installed
Open Firmware , Built  September 01, 2017 16:11:38
Copyright (c) 1995-2000, FirmWorks.
Copyright (c) 2014,2017, Artyom Tarasenko.

Rebooting with command: boot /pci/scsi@2/disk@0,0
Boot device: /pci/scsi@2/disk@0,0  Arguments:

+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts
mount: 1831-010 server axxxxs01 not responding:
RPC: 1832-018 Port mapper failure - RPC: 1832-006 Unable to send
mount: backgrounding
axxxxs01:/home
Multi-user initialization completed
Checking for srcmstr active...complete
Starting tcpip daemons:
0513-056 Timeout waiting for command response.
0513-056 Timeout waiting for command response.
vmtune:  current values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
   2968    11872       2          8        115      123     524288        0

  -M       -w       -k       -c         -b          -B          -u
maxpin   npswarn  npskill  numclust  numfsbufs   hd_pbuf_cnt  lvm_bufcnt
  12692     1536      384        0       93           64           9

number of valid memory pages = 15864    maxperm=74.8% of real memory
maximum pinable=80.0% of real memory    minperm=18.7% of real memory
number of file memory pages = 1443      numperm=9.1% of real memory

vmtune:  new values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
   2968    11872       2          8        115      123     524288       64

  -M       -w       -k       -c         -b          -B          -u
maxpin   npswarn  npskill  numclust  numfsbufs   hd_pbuf_cnt  lvm_bufcnt
  12692     1536      384        1       93           64           9

number of valid memory pages = 15864    maxperm=74.8% of real memory
maximum pinable=80.0% of real memory    minperm=18.7% of real memory
number of file memory pages = 1444      numperm=9.1% of real memory

Starting NIS services:
Starting NFS services:

0513-056 Timeout waiting for command response.
NIS: server not responding for domain "axxxxs01"; still trying.
NIS: server not responding for domain "axxxxs01"; still trying.
0513-056 Timeout waiting for command response.
NIS: server not responding for domain "axxxxs01"; still trying.
NIS: server not responding for domain "axxxxs01"; still trying.
NIS: server not responding for domain "axxxxs01"; still trying.

Woo-hoo! I could be proud that I made the second proprietary OS boot under QEMU. Or even the third one, because Solaris/sun4m is quite different from Solaris/sun4v, so probably each one counts for one.

But now I'm puzzled. Initially I planned to build a throw-away prototype for Powestack II Utah, fix the QEMU bugs preventing booting AIX with the existing machines, and dispose the prototype. The Motorola AIX I've got seems to expect the machine to be called "MOT PowerStack2 (e0)". Not sure if pushing such a model upstream wouldn't cause any copyright/trademark issues.

Now it boots to the same point as on the physical machine, but except for the PCI layout problems, I haven't found any bugs. And the PCI layout can not be the reason of AIX failing on the 40p machine, because it fails much earlier.

So obviously it's time to change the plan. And here is where things are going to slow down. I don't have the install media for the Motorola AIX 4.2, so I can not make my disk usable - it waits for NFS/NIS servers forever. I can not share the HDD image because it may contain the private data, so publishing the Powerstack II Utah target for QEMU makes a little sense for now. So, either I find the Motorola AIX 4.2 install CD (and probably a physical UW-SCSI drive if it won't work at the first attempt under QEMU), or the work has to be done again for the 40p target. Yes, now I have some know-how about the AIX boot process, but still at the moment I don't feel like starting again from scratch with the 40p target.

The good news is that AIX can definitely be booted under qemu-system-ppc.

Saturday, September 9, 2017

From SCSI to PCI

20 years ago SCSI devices ruled the world. Reading the NCR53c810 manual, I see that it was basically a computer in a computer. It can be programmed to transfer data from/to/between the disks without using the CPU at all.

Trying to understand what happens in the NCR/LSI script:

lsi_scsi: Select LUN 0
lsi_scsi: Extended message 0x1 (len 3)
lsi_scsi: SDTR (ignored)
lsi_scsi: SCRIPTS dsp=81c3b924 opcode 80080000 arg 81c3b9a4
lsi_scsi: Jump to 0x81c3b9a4
lsi_scsi: SCRIPTS dsp=81c3b9a4 opcode 870b0000 arg 81c3b9c4
lsi_scsi: Compare phase 2 == 7
lsi_scsi: Control condition failed
lsi_scsi: SCRIPTS dsp=81c3b9ac opcode 860a0000 arg 81c3b94c
lsi_scsi: Compare phase 2 == 6
lsi_scsi: Control condition failed
lsi_scsi: SCRIPTS dsp=81c3b9b4 opcode 98080000 arg 00000022
lsi_scsi: Interrupt 0x00000022
...
lsi_scsi: SCRIPTS dsp=81c3bb6c opcode 0e000002 arg 81c3bdb4
lsi_scsi: MSG out len=2
lsi_scsi: Select LUN 0
lsi_scsi: MSG: ABORT tag=0x0
lsi_scsi: SCRIPTS dsp=81c3bb74 opcode 80080000 arg 81c3bbbc
lsi_scsi: Jump to 0x81c3bbbc
lsi_scsi: SCRIPTS dsp=81c3bbbc opcode 60000008 arg 00000000
lsi_scsi: Clear ATN

Looks like it aborts if the selected SCSI target doesn't change phase to MSG_OUT or MSG_IN. So I implemented a hack for SDTR reply and it doesn't abort here. But indeed it's a red herring. AIX can also work with the devices which do not support the synchronous or wide transfers.

The actual problem happens later:

lsi_scsi: SCRIPTS dsp=81c43444 opcode c0000004 arg 010000dc
lsi_scsi: memcpy dest 0x81c435fc src 0x010000dc count 4

Or, with a bit more enhanced logging:

lsi_scsi: memcpy dest 0x81c475fc (Mem) src 0x010000dc (IO) count 4
lsi_mem_read, address_space_read status 2
lsi_scsi: the first 4 bytes: 00 00 00 00

It tries to read the port 0x10000dc and save it. QEMU doesn't have anything at the port 0x10000dc, so no wonder the NCR script fails. But what is supposed to be there? The Motorola Ultra 603/Ultra 603e/Ultra 604 Programmer’s Reference Guide suggests it must be the PCI I/O space.

So looks like I've learned enough of NCR/LSI script. Time to see how the PCI bus mastering is supposed to work on this machine.

Saturday, September 2, 2017

More fun with AIX cfgncr_scsi

<in the previous part>... doesn't find SCSI disks. Here it is tricky, it may be a problem with the interrupt routing, or DMA or SCSI host emulation...

... or a bug in the AIX driver itself.

As AIX 4.2 tries to perform scsi inquiry, that's what happens in the QEMU log:

(qemu) lsi_scsi: Write reg ??? ac = e4
lsi_scsi: Write reg ??? ad = 38
lsi_scsi: Write reg ??? ae = c4
lsi_scsi: Write reg ??? af = 81

The register at 0xac-0xaf is DSA Relative Selector (DRS). Is known to qemu, but seems to be not used in any operations.

The newer LSI53c1010-66 manual says:

"This register supplies AD[63:32] during Table Indirect
Fetches and Load/Store Data Structure Address (DSA)
relative operations"

So, maybe just add the support of this register to QEMU and allow the 64 bit DMA transfers, right?

Wrong. The write to this register is the last write and it doesn't start any SCSI command. Let's look where it happens:

p8xx_start_chip:
...
   0x018ff854:  stw     r8,-4(r7)
   0x018ff858:  li      r4,44         ; 0x2c
   0x018ff85c:  b       0x18fe348     ; p8xx_write_reg <= write happens here

The register r4 is 0x2c, but the procedure writes to 0xac. Weird.

Let's look at the other registers:

0x018fe368 in ?? ()
(gdb) info registers r3 r5 r4
r3             0x18f5000        26169344
r5             0x31000080       822083712
r4             0x2c     44

What's that 80 at the end of r5? 0x80 + 0x2c is 0xac. Coincidence? Don't think so.

So, what happens here is the driver tries to write 0x2c, but the bus is shifted, so it hits 0xac. After some chasing I found where this shift is coming from:

p8xx_config:
...
   0x018fdb9c:  bl      0x1909938
   0x018fdba0:  lwz     r2,20(r1)
   0x018fdba4:  cmpwi   cr1,r3,19
   0x018fdba8:  beq     cr1,0x18fdbc8
   0x018fdbac:  li      r8,1
   0x018fdbb0:  stb     r8,256(r28)
   0x018fdbb4:  li      r3,0
   0x018fdbb8:  bl      0x1909938
   0x018fdbbc:  lwz     r2,20(r1)
   0x018fdbc0:  lwz     r8,160(r28)
   0x018fdbc4:  b       0x18fdbd0
   0x018fdbc8:  stb     r29,256(r28)
   0x018fdbcc:  lwz     r8,160(r28)
   0x018fdbd0:  lis     r11,4096
   0x018fdbd4:  addic   r10,r8,128    ; this is the 0x80 I'm looking for
   0x018fdbd8:  li      r8,-1
   0x018fdbdc:  rlwinm  r31,r26,1,15,30
   0x018fdbe0:  addic   r23,r28,10580
   0x018fdbe4:  stw     r10,252(r28)
   0x018fdbe8:  li      r25,1
   0x018fdbec:  addi    r30,r28,0
   0x018fdbf0:  stwu    r25,10512(r30)
   0x018fdbf4:  stw     r8,10588(r28)
   0x018fdbf8:  lwz     r8,10500(r28)
   0x018fdbfc:  stw     r11,10520(r28)
   0x018fdc00:  stw     r10,10532(r28) ; and here it is stored
...

It is added and stored unconditionally. If I drop this addic, something different happens:

(qemu) lsi_scsi: Write reg DSP0 2c = e4
lsi_scsi: Write reg DSP1 2d = 58
lsi_scsi: Write reg DSP2 2e = c4
lsi_scsi: Write reg DSP3 2f = 81
lsi_scsi: SCRIPTS dsp=81c458e4 opcode 41000000 arg 81c45a44
lsi_scsi: Selected target 0
lsi_scsi: SCRIPTS dsp=81c458ec opcode 78370000 arg 00000000
lsi_scsi: Read-Modify-Write reg 0x37 MOV data8=0x00 sfbr=0x00
...

Why would it work on the physical hardware? I guess because the addresses are aliased. Pretty similar to the le bug in Solaris.

So, it's not that QEMU has some unimplemented registers. In this case it has too many implemented ones.

On the other hand, it still doesn't detect the scsi disk, so maybe it has not just too much features, but too few as well...

/Stay tuned

Saturday, August 26, 2017

Milestone: my OFW boots AIX on Powerstack II Utah

Created residual data for the PCI bus, serial port and SCSI adapter. This is the minimal set to boot AIX. And AIX is booting using this residual data, not the hard coded ones. So now I have a reference firmware which works on a physical machine. Here is the complete boot log (mostly for search engines, and digital archaeologists):

MOT PowerStack2 (e0), Serial #0, 62 MiB memory installed
Open Firmware , Built  August 25, 2017 23:09:26
Copyright (c) 1995-2000, FirmWorks.
Copyright (c) 2014,2017, Artyom Tarasenko.

ok boot /scsi/disk@6 -s prompt

Boot device: /scsi/disk@6  Arguments: -s prompt
 0) - DISABLE_PARITY                   16) - ENABLE_END_STOP
 1) - DISABLE_DCACHE                   17) - ENABLE_DEBUG
 2) - DISABLE_ICACHE                   18) + DISABLE_VME
 3) - DISABLE_L2                       19) + ENABLE_BH_IDE_DMA
 4) - DISABLE_SSCALAR                  20) + WINBOND_PATCH
 5) - DISABLE_BHIST                    21) - MANUAL_SCSI_TYPE
 6) - DISABLE_CPU_EMCP                 
 7) - DISABLE_EAGLE_CF_DPARK           
 8) - DISABLE_EAGLE_CF_APARK           
 9) - DISABLE_LEDS                     
10) - EAGLE_ERR_STATUS_RESET           
11) + DISABLE_MASTER_ABORT             
12) - AIX_USES_BUG                     
13) + JUNO_DISCONTIGUOUS               
14) - LLDB_STOP                        
15) - SERVICE_MODE                     31) - DISABLE_HARDSTOPS
Enter bit # to toggle (just <CR> to end): 17
 0) - Top level debug - function names 
 1) - Main line debug messages         
 2) - Subroutine internal messages     
 3) - PCI Bridge settings              
                                       
                                       
                                       
10) - GEV Data debug                   
                                       
12) - IPLCB data                       
                                       
                                       
15) - IPL control block offsets        
Enter bit #(s) to toggle, '*' for ALL enabled, 'C' to clear ALL, 
or just <CR> to return): *
 0) - DISABLE_PARITY                   16) - ENABLE_END_STOP
 1) - DISABLE_DCACHE                   17) + ENABLE_DEBUG
 2) - DISABLE_ICACHE                   18) + DISABLE_VME
 3) - DISABLE_L2                       19) + ENABLE_BH_IDE_DMA
 4) - DISABLE_SSCALAR                  20) + WINBOND_PATCH
 5) - DISABLE_BHIST                    21) - MANUAL_SCSI_TYPE
 6) - DISABLE_CPU_EMCP                 
 7) - DISABLE_EAGLE_CF_DPARK           
 8) - DISABLE_EAGLE_CF_APARK           
 9) - DISABLE_LEDS                     
10) - EAGLE_ERR_STATUS_RESET           
11) + DISABLE_MASTER_ABORT             
12) - AIX_USES_BUG                     
13) + JUNO_DISCONTIGUOUS               
14) - LLDB_STOP                        
15) - SERVICE_MODE                     31) - DISABLE_HARDSTOPS
Enter bit # to toggle (just <CR> to end): 
0x4fd0 Hints relocation
0x5000 SoftROS start() after relocation
0x39728 SoftROS end after relocations
0x3951c SoftROS start of bss
0x39727 SoftROS end of bss
0x39750 Current sbrk(0)
0x596ac Current stack
Space reserved for kernel when IPLCB is being built @ 0x59728
Original bootimage located @ 0x400000
hi->signature = 0x4149584d
hi->resid_data_address = 0x3dd7970
hi->bss_offset = 0x3951c
hi->bss_length = 0x20c
hi->jump_offset = 0x38c
hi->load_exec_address = 0x400430
hi->header_size = 0x400
hi->header_block_size = 0x25e
hi->image_length = 0x4b75b
hi->Spare = 0x3cb419c
hi->res_mem_size = 0x0
hi->mode_control = 0xdead00c0
LED(MOTLED_CHECKING_HARDSTOPS)=0x130c
Magic is 0x01DF0004 
Image size .............. 0x0035A000
Boot image loaded at .... 0x0044BC00
Saved address for jump .. 0x0000038C
LED(MOTLED_INVALID_BOOT_IMAGE)=0x1310
LED(MOTLED_FIRST_KERNEL_MOVE)=0x1308
LED(MOTLED_HARDWARE_INIT)=0x130b
mot_gencmd.c ====> pj_motorola
mot_gencmd.c ====> Machine Check Pin was disabled by F/W

LED(MOTLED_ENABLING_CPU_EMCP)=0x1318
LED(MOTLED_ENABLING_DCACHE)=0x131a
LED(MOTLED_ENABLING_604_SSCALAR)=0x131c
LED(MOTLED_ENABLING_604_BHIST)=0x131d
LED(MOTLED_HARDWARE_INIT_COMPLETE)=0x1320
LED(MOTLED_IPLCB_INIT)=0x1309
iplcb_init.c ====> - iplcb_init()
iplcb_init.c ====> - mem_find()
Mem_addr = 0x3c74000, byte_index = 0x1ef, bit_index = 0x4
Returned from mem_find:
IPLCB addr: ..... 0x03C74000 len = 49152
DMA buffer addr:  0x00FF8000
Memory bitmap addr0x03C7FE10
Serial # from residual data:  4d 4f 54 30 45 32 33 34 41 43 20 20 20 20 20 20
nvram_addr = 0x0074, nvram_data = 0x0077
Name = board-init?  <--> Value = true  Match = FALSE
Name = use-default-vals?  <--> Value = true  Match = FALSE
Name = edo-memory?  <--> Value = false  Match = FALSE
Name = pboot-probe?  <--> Value = false  Match = FALSE
Name = pboot-device-default  <--> Value = fdisk0 hdisk0 enet0  Match = FALSE
Name = fcode-debug?  <--> Value = true<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF>  Match = FALSE
Name = fw-boot-device  <--> Value = /pci@80000000/pci1000,3@2,0/harddisk@6,0  Match = TRUE
The IPLCB previpl_device string is:
        !!^A/pci@80000000/pci1000,3@2,0/harddisk@6,0
processor = 0x00000009
presoftros.c ====> - __mot_eth_addr()
Memory address of IPL Control Block = 0x03C74000
Directory: ......... 0x03C74080    offset: 0x00000080
IPL_info: .......... 0x03C742E0    offset: 0x000002E0
System area: ....... 0x03C74878    offset: 0x00000878
Buc Data area: ..... 0x03C749F4    offset: 0x00000914
Processor data area: 0x03C74A64    offset: 0x00000A64
Network data area: . 0x03C74D7C    offset: 0x00000D7C
Memory data area: .. 0x03C7677C    offset: 0x0000277C
L2_cache data area:  0x03C768B4    offset: 0x000028B4
Residual data area:  0x03C76974    offset: 0x00002974
ros_table area: .... 0x03C7D3E0
NVRAM cache area: .. 0x03C7D4C8    offset: 0x000094C8
Parameters passed to kernel boot image:
0x3c74000, 0xbe10, 0x4000, 0x1f0
LED(MOTLED_IPLCB_DUMP)=0x130a
IPLD.ipl_info_offset = 0x2e0
IPLD.ipl_info_size = 0x598
IPLD.system_info_offset = 0x878
IPLD.system_info_size = 0x9c
IPLD.buc_info_offset = 0x914
IPLD.buc_info_size = 0x150
IPLD.processor_info_offset = 0xa64
IPLD.processor_info_size = 0x318
IPLD.mem_data_offset = 0x277c
IPLD.mem_data_size = 0x138
IPLD.l2_data_offset = 0x28b4
IPLD.l2_data_size = 0xc0
IPLD.bit_map_offset = 0xbe10
IPLD.bit_map_size = 0x1f0
(IPLD.processor_info_size == sizeof(PROCESSOR_DATA)) failed
IPLD.user_struct_offset = 0x9380
IPLD.user_struct_size = 0x10
user_info->user_data_offset = 0x9390
user_info->user_data_len = 0x50
IPLD.nvram_cache_offset = 0x94c8
IPLD.nvram_cache_size = 0x2000
ipl_info->model = 0x80000e0
ipl_info->ram_size = 0x3e00000
ipl_info->bit_map_bytes_per_bit = 0x4000
ipl_info->ros_entry_table_ptr = 0x3c7d3e0
ipl_info->ros_entry_table_size = 0xe8
ipl_info->nvram_section_1_valid = 0x1
ipl_info->vpd_processor_serial_number = "00E20000"
ipl_info->previpl_device[0] = 0x21
ipl_info->previpl_device[1] = 0x21
ipl_info->previpl_device[2] = 0x1
ipl_info->previpl_device[3] = 0x2f
ipl_info->previpl_device[4] = 0x70
ipl_info->previpl_device[5] = 0x63
ipl_info->previpl_device[6] = 0x69
ipl_info->previpl_device[7] = 0x40
ipl_info->previpl_device[8] = 0x38
ipl_info->previpl_device[9] = 0x30
ipl_info->previpl_device[10] = 0x30
ipl_info->previpl_device[11] = 0x30
ipl_info->previpl_device[12] = 0x30
ipl_info->previpl_device[13] = 0x30
ipl_info->previpl_device[14] = 0x30
ipl_info->previpl_device[15] = 0x30
ipl_info->previpl_device[16] = 0x2f
ipl_info->previpl_device[17] = 0x70
ipl_info->previpl_device[18] = 0x63
ipl_info->previpl_device[19] = 0x69
ipl_info->previpl_device[20] = 0x31
ipl_info->previpl_device[21] = 0x30
ipl_info->previpl_device[22] = 0x30
ipl_info->previpl_device[23] = 0x30
ipl_info->previpl_device[24] = 0x2c
ipl_info->previpl_device[25] = 0x33
ipl_info->Power_Status_and_keylock_reg = 0x3
buc_info_ptr->num_of_structs = 0x3
buc_info_ptr->index = 0x1
buc_info_ptr->struct_size = 0x70
buc_info_ptr->bsrr_offset = 0x0
buc_info_ptr->bsrr_mask = 0x0
buc_info_ptr->bscr_value = 0x0
buc_info_ptr->cfg_status = 0x2
buc_info_ptr->device_type = 0x5
buc_info_ptr->num_of_buids = 0x0
buc_info_ptr->buid_data[0].buid_value = 0xffffffff
buc_info_ptr->buid_data[0].buid_Sptr = 0x0
buc_info_ptr->buid_data[1].buid_value = 0xffffffff
buc_info_ptr->buid_data[1].buid_Sptr = 0x0
buc_info_ptr->buid_data[2].buid_value = 0xffffffff
buc_info_ptr->buid_data[2].buid_Sptr = 0x0
buc_info_ptr->buid_data[3].buid_value = 0xffffffff
buc_info_ptr->buid_data[3].buid_Sptr = 0x0
buc_info_ptr->mem_alloc1 = 0x8000
buc_info_ptr->mem_addr1 = 0xff8000
buc_info_ptr->mem_alloc2 = 0x0
buc_info_ptr->mem_addr2 = 0x0
buc_info_ptr->vpd_rom_width = 0xffffffff
buc_info_ptr->cfg_addr_inc = 0x0
buc_info_ptr->device_id_reg = 0x2040
buc_info_ptr->aux_info_offset = 0x0
buc_info_ptr->feature_rom_code = 0x0
buc_info_ptr->IOCC_flag = 0x0
buc_info_ptr->location[0] = 0x30
buc_info_ptr->location[1] = 0x30
buc_info_ptr->location[2] = 0x30
buc_info_ptr->location[3] = 0x30
buc_info_ptr->num_of_structs = 0x3
buc_info_ptr->index = 0x2
buc_info_ptr->struct_size = 0x70
buc_info_ptr->bsrr_offset = 0x0
buc_info_ptr->bsrr_mask = 0x0
buc_info_ptr->bscr_value = 0x0
buc_info_ptr->cfg_status = 0x2
buc_info_ptr->device_type = 0x5
buc_info_ptr->num_of_buids = 0x2
buc_info_ptr->buid_data[0].buid_value = 0x100
buc_info_ptr->buid_data[0].buid_Sptr = 0x80000000
buc_info_ptr->buid_data[1].buid_value = 0x10100
buc_info_ptr->buid_data[1].buid_Sptr = 0xc0000000
buc_info_ptr->buid_data[2].buid_value = 0xffffffff
buc_info_ptr->buid_data[2].buid_Sptr = 0x0
buc_info_ptr->buid_data[3].buid_value = 0xffffffff
buc_info_ptr->buid_data[3].buid_Sptr = 0x0
buc_info_ptr->mem_alloc1 = 0x0
buc_info_ptr->mem_addr1 = 0x0
buc_info_ptr->mem_alloc2 = 0x0
buc_info_ptr->mem_addr2 = 0x0
buc_info_ptr->vpd_rom_width = 0xffffffff
buc_info_ptr->cfg_addr_inc = 0x0
buc_info_ptr->device_id_reg = 0x2020
buc_info_ptr->aux_info_offset = 0x0
buc_info_ptr->feature_rom_code = 0x0
buc_info_ptr->IOCC_flag = 0x1
buc_info_ptr->location[0] = 0x30
buc_info_ptr->location[1] = 0x30
buc_info_ptr->location[2] = 0x31
buc_info_ptr->location[3] = 0x30
sys_info->nvram_size = 0x0
sys_info->nvram_addr = 0x0
sys_info->todr_addr = 0x0
sys_info->architecture = 0x2
sys_info->implementation = 0x3
sys_info->pkg_descriptor="MOT3F00"
proc_info->num_of_structs = 0x1
proc_info->index = 0x0
proc_info->struct_size = 0xc8
proc_info->per_buc_info_offset = 0x3c749f4
proc_info->proc_int_area = 0x0
proc_info->proc_int_area_size = 0x0
proc_info->processor_present = 0x1
proc_info->test_run = 0xd5
proc_info->test_stat = 0x0
proc_info->link = 0x0
proc_info->link_address = 0x0
proc_info->phys_id = 0x0
proc_info->priv_lck_cnt = 0x0
proc_info->prob_lck_cnt = 0x0
proc_info->architecture = 0x2
proc_info->implementation = 0x10
proc_info->width = 0x20
proc_info->cache_attrib = 0x1
proc_info->icache_size = 0x8000
proc_info->dcache_size = 0x8000
proc_info->icache_asc = 0x4
proc_info->dcache_asc = 0x4
proc_info->tlb_attrib = 0x1
proc_info->itlb_size = 0x80
proc_info->dtlb_size = 0x80
proc_info->itlb_asc = 0x2
proc_info->dtlb_asc = 0x2
proc_info->slb_attrib = 0x0
proc_info->islb_size = 0x0
proc_info->dslb_size = 0x0
proc_info->islb_asc = 0x0
proc_info->dslb_asc = 0x0
proc_info->rtc_type = 0x2
proc_info->rtcXint = 0x0
proc_info->rtcXfrac = 0x0
proc_info->tbCfreq_HZ = 0x7f2815
proc_info->busCfreq_HZ = 0x0
proc_info->version = 0x50000
proc_info->L2_cache_size = 0x0
proc_info->L2_cache_asc = 0x0
proc_info->coherency_size = 0x20
proc_info->resv_size = 0x20
proc_info->icache_block = 0x20
proc_info->dcache_block = 0x20
proc_info->icache_line = 0x20
proc_info->dcache_line = 0x20
proc_info->proc_descriptor = "PowerPC_604"
l2_data->num_of_structs = 0x1
l2_data->index = 0x0
l2_data->struct_size = 0xc0
l2_data->shared_L2_cache = 0x0
l2_data->using_resource_offset = 0xa64
l2_data->mode = 0x0
l2_data->installed_size = 0x0
l2_data->configured_size = 0x0
l2_data->size[0] = 0x0
l2_data->type[0] = 0x30
l2_data->type[1] = 0x30
l2_data->adapter_present = 0x0
l2_data->adapter_bad = 0x0
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x3e
mem_data[i].state = 0x1
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x42
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x43
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x44
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x45
mem_data[i].num_of_structs = 0x6
mem_data[i].struct_size = 0x34
mem_data[i].card_or_SIMM_size = 0x0
mem_data[i].state = 0x0
mem_data[i].num_of_bad_simms = 0x0
mem_data[i].card_or_simm_indicator = 0x1
mem_data[i].EC_level = 0x0
mem_data[i].PD_bits = 0x0
mem_data[i].location [0][0] = 0x30
mem_data[i].location [0][1] = 0x30
mem_data[i].location [0][2] = 0x30
mem_data[i].location [0][3] = 0x46
user_info->user_data_len = 0x50
user_info->user_id_offset = 0x9390
user_info->next_offset = 0x0
mot_data->company = "Motorola Computer Group"
mot_data->board_model = 0x6
mot_data->board_revision = 0x42
mot_data->ethernet_na = 45 55 55 55 45 55
LED(MOTLED_RELOCATING_KERNEL)=0x13e0
+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts
mount: 1831-010 server a2231s01 not responding: RPC: 1832-018 Port mapper failure - RPC: 1832-006 Unable to send
mount: backgrounding
a2231s01:/home
Multi-user initialization completed
Checking for srcmstr active...complete
Starting tcpip daemons:
0513-059 The syslogd Subsystem has been started. Subsystem PID is 3434.
0513-059 The sendmail Subsystem has been started. Subsystem PID is 5232.
0513-059 The portmap Subsystem has been started. Subsystem PID is 5494.
0513-059 The inetd Subsystem has been started. Subsystem PID is 5756.
0513-059 The snmpd Subsystem has been started. Subsystem PID is 6018.
0513-059 The dpid2 Subsystem has been started. Subsystem PID is 6280.
0513-059 The muxatmd Subsystem has been started. Subsystem PID is 6542.
0513-059 The fibred Subsystem has been started. Subsystem PID is 6804.
vmtune:  current values:
  -p       -P        -r          -R         -f       -F       -N        -W
minperm  maxperm  minpgahead maxpgahead  minfree  maxfree  pd_npages maxrandwrt
   2968    11872       2          8        115      123     524288        0

  -M       -w       -k       -c         -b          -B          -u
maxpin   npswarn  npskill  numclust  numfsbufs   hd_pbuf_cnt  lvm_bufcnt
  12692     1536      384        0       93           64           9

number of valid memory pages = 15864    maxperm=74.8% of real memory
maximum pinable=80.0% of real memory    minperm=18.7% of real memory
number of file memory pages = 1445      numperm=9.1% of real memory

The next step would be to get it working under QEMU.
Under QEMU it gets pretty far, it does find the PCI- and the ISA buses and even the SCSI host.

Unfortunately it doesn't find SCSI disks. Here it is tricky, it may be a problem with the interrupt routing, or DMA or SCSI host emulation.

/Stay tuned

Sunday, July 23, 2017

Wiretapping AIX

Identified a couple of kernel and shared library functions, so I'm not poking in the dark anymore:

First of all I found execv. It gives a lot of insights about the AIX boot process. The process is quite different from Linux or Solaris boot. Kernel is small, and actually is already loaded, even under QEMU. The most other operating systems would write a greeting once a kernel is loaded. AIX does it all silently. On IBM machines there is a LED panel showing one byte of a status. On the Motorola there are just two LEDs which can light green or yellow, which altogether gives just 9 combinations. Not very informative. But even if I had one byte,  it still would not help. I look for error messages like "missing property", "unknown PCI chip", "missing residual data", etc.

The initialization of the PCI bus happens long after  the kernel spawns the /etc/init process.

Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x20051d08:     "/etc/methods/defsys"
(gdb) c
Continuing.
Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x2ff22090:     "/bin/sh"
(gdb) c
Continuing.
Breakpoint 20, 0x0008cd38 in ?? ()
(gdb) x/s $r3
0x20051d28:     "/usr/lib/methods/cfgsys_MOT3F00"       <= here is where it can't find the PCI bus

Then I found the printf and sprintf functions. Although AIX doesn't write anything on the screen, it still collects the boot log messages, so wiretapping  printf and fprintf helps to see them.

The house is still dark but now I have a search light. So whatever bugs are there, beware, you are going to be seen soon!

Saturday, July 22, 2017

Debugging AIX 4.2 boot

I wonder if it is possible to make the AIX 4.2 boot more verbose.
The various sources say that it should be done via
 
mw enter_dbg

under KDB. The AIX version I have doesn't have it. In fact it even doesn't have an option to disassemble a piece of code. Just the hardcore hex-dump, pretty much like it was in eighties.

That feeling when you started with a retro-computing and ended up with a steam punk computing.

ok  boot /scsi/disk@6 -s trap
Trap instruction interrupt.
> mw enter_dbg
032-001  You entered a command «mw» that is not valid.
> help
alter   … (a)lter — alter memory
back    … (b)ack — decrement the IAR
ditto   … «» — blank repeats the last command
break   … (br)eak — set a breakpoint
breaks  … (breaks) — list currently set breakpoints
buckets … (bu)ckets — display kmembucket structures
clear   … (c)lear — clear breakpoint(s)
display … (d)isplay — display a specified amount of memory
dmodsw  … (dm)odsw — display Streams dmodsw table
drivers … (dr)ivers — display device driver (devsw) table
find    … (f)ind — find a string in memory
float   … (fl)oat — display floating point registers
fmodsw  … (fm)odsw — display Streams fmodsw table
fs      … fs — display file system data structures
go      … (g)o — start executing the program
help    … (h)elp — display the list of valid commands
loop    … (l)oop — execute until control returns to this point
map     … (m)ap — display the system loadlist
mblk    … (mb)lk — display mblk/kmemstat structures
next    … (n)ext — increment the IAR
origin  … (o)rigin — set the origin
proc    … (p)roc — process table display
quit    … (q)uit — end the debugger session
queue   … (que)ue — display Streams queues
reset   … (r)eset — release a user defined variable
restore … (re)store — restore or do not restore the screen
screen  … (s)creen — display a screen containing registers and memory
set     … (se)t — define an/or set a variable
sregs   … (sr)egs — display segment registers
st      … (st) — store a full word into memory
stack   … (sta)ck — formatted stack trace
stc     … (stc) — store one byte into memory
step    … (ste)p — perform an instruction single-step
sth     … (sth) — store a half word into memory
stream  … (str)eam — display Stream head structures
swap    … (sw)ap — switch from the current display/keyboard to RS-232 port
thread  … (th)read — thread table display
trace   … (tr)ace — print traceback buffer
trb     … (trb) — display formatted timer request block info
tty     … (tt)y — Display tty struct
user    … (u)ser — formatted user area
uthread … (ut)hread — formatted uthread area
vars    … (v)ars — display a listing of the user_defined variables
vmm     … vmm — display virtual memory data structures
xlate   … (x)late — display the real address of a memory location
>

Sunday, July 16, 2017

Booting OFW from OFW

In order to use my new Motorola Powerstack II Utah machine as a reference for improving qemu there are two ways:

a) make qemu run the Powerstack II firmware
b) make Powerstack II run my firmware

I quickly tried running the Powerstack II firmware under qemu. After all that was the way which did let me run Solaris/SPARC under qemu 7 years ago. The firmware sort of starts, but gets into some very limited debugger. It looks to me that the debugger is from Motorola and it starts before launching the OFW. Last week I found out that the Powerstack II Utah firmware is no good for anything but one version of AIX, so this particular version is really not worth of launching.

So went for the option b) and made a firmware which can be netbooted on the Uhah.
It's even bootable on both Powerstack I and II, which was tricky. For some reason the different Powerstacks have slightly different ideas about the layout of the 0x41 partition. For instance, Solaris floppy image can not be netbooted. But there is one layout which works both for floppy any netbooting on both Powerstacks.

So now I netboot my OFW from the Motorola OFW, and then try booting AIX from the SCSI disk.
And then it hangs with the residual data. No wonder here - I used the device tree from qemu, so at least some devices are different or wired differently. But if I remove the creation of the residual data, it even boots AIX to the same point as the Motorola OFW.

The SCSI host on both Powerstacks is different than on the 40p machine.  The 40p (and qemu) have Vendor Id: 0x1000, Device ID 0x0001, which is according to the pcidatabase a LSI53C810 chip. The Powerstacks have Vendor Id: 0x1000, Device ID 0x0003, which is supposedly LSI53C1010-33. On the chip is written Symbios Logic 53C825A.

I may hit the difference beween the LSI (formerly known as Symbios and NCR) chips later, but at least 53C825A is reverse compatible to 810, otherwise my firmware would not be able to load anything.

Sunday, July 9, 2017

My new toys: Motorola PowerStack II

The second gift from Jochen is a Powerstack II mainboard with an AT power supply unit and a SCSI disk. The SCSI disk has "AIX" written on it, which looks promising, but Jochen doesn't remember if it was really installed, or just planned.

The board has Serial/Parallel/Ethernet/SCSI and even a couple of unsoldered IDE connectors.
The boot log shows it has a Firmworks based Open Firmware:

WARNING: NVRAM Header Test Failed - Auto Initializing
Starting real time clock...
screen not found.
Can't open input device.
Keyboard not present.  Using com1 for input and output.
, Serial #0, 64 MB memory
Power Firmware(TM) by FirmWorks , Built  Thu Jun 4 10:20:43 MST 1998
Copyright (c) 1995-1996 FirmWorks.  All Rights Reserved.
PowerPC Open Firmware
Version 1.2 RM11   Thu Jun 4 10:20:43 MST 1998
Copyright Motorola 1995-96, All Rights Reserved
Copyright FirmWorks 1995-96, All Rights Reserved

 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . =PowerPC,604e
 MicroProcessor Internal Clock Speed (MHZ) . . . . . . . . =300
 MicroProcessor External Clock Speed (MHZ) . . . . . . . . =67
 PCI Bus Clock Speed (MHZ) . . . . . . . . . . . . . . . . =33
 Local Memory Size . . . . . . . . . . . . . . . . . . . . =4000000 (64 MB)
 Memory Type . . . . . . . . . . . . . . . . . . . . . . . =EDO
 Memory Error Checking . . . . . . . . . . . . . . . . . . =ECC
 Memory Speed. . . . . . . . . . . . . . . . . . . . . . . =50 NS
 L2 Cache Size . . . . . . . . . . . . . . . . . . . . . . =256KB
 L2 Cache Type . . . . . . . . . . . . . . . . . . . . . . =Asynchronous
 L2 Cache Parity . . . . . . . . . . . . . . . . . . . . . =Disabled
 Configuration Checksum. . . . . . . . . . . . . . . . . . =Failed

Then it gets to a windowed menu interface (which doesn't look like the typical OFW at all), but under "Administrative options" it's possible to choose "Invoke the Command Line Prompt", which gives the famous "ok" prompt.

AIX starts booting from the SCSI disk:

Trying..., fdisk0 Recalibrate failed.  The floppy drive is either missing,
improperly connected, or defective.
Failed
Trying..., hdisk0 Booting
Please wait while the system is booting
Boot device: /pci/scsi@2/disk@6,0  File and args:
   

 ******* Please define the System Console. *******

 Type a 1 and press Enter to use this terminal as the
  system console.

cvga0
+ swcons -c

Saving Base Customize Data to boot disk
Starting the sync daemon
Starting the error daemon
System initialization completed.
Starting Multi-user Initialization
 Performing auto-varyon of Volume Groups
 Activating all paging spaces
swapon: Paging device /dev/hd6 activated.
 Performing all automatic mounts

And here it hangs. Probably it tries to perform a NFS mount, which I don't have.
Anyways it's much further than QEMU currently gets, so it's definitely can be used as a reference.

I don't have a UW-SCSI cdrom drive to boot from the Powerstack II  media. But it can be netbooted via tftp.

Surprisingly booting the Solaris/PPC did not work out. The floppy is not recognized, tried to netboot SOLARIS.ELF from the cd got an interesting error:

Rebooting with command: boot /pci/ethernet@4:172.22.0.20,SOLARIS.ELF,172.22.134.1
Boot device: /pci/ethernet@4:172.22.0.250,SOLARIS.ELF,172.22.134.51  File and args:
Trying to get Internet/Ethernet Address ...
       Contact your system administrator to see if
       a Boot Host and network is setup correctly.

so, obviously after switching to the Little-Endian mode the Motorola network driver doesn't work anymore. Looks like in 1998 netbooting Solaris and Windows NT was not relevant for Motorola anymore, otherwise it would have been tested.

Overall it looks like Motorola did heavily modify the OFW. For instance, there are no hidden words. Which is nice. It should be possible to peek if it has any quirks in creation of the residual data. Or better to say it could have been. It is all a one single quirk, there is no residual data.

Initially I thought that this would be a perfect firmware which would boot both PReP images and the later OFW-compatible ones. But alas. After poking around, I googled and found a couple of mails on the NetBSD mailing list stating that:
 1. The firmware doesn't provide any residual data
 2. The firmware doesn't have the PCI, DMA and interrupt mapping properties in the device tree.
Looking at the code I see that the first point is clearly caused by the second one. In the OFW the residual data is generated from the device tree. The code was not removed, but Motorola forgot to add the properties. 

Which makes it a worst possible firmware.

But still it can boot the AIX from the supplied SCSI disk. This explains at least one reason for a custom AIX: the Motorola version should be able to live without the residual data.

Probably the developers were in a rush, so instead of fixing the firmware properties, they just added a hack to the OS. Maybe the OS department had more resources than the firmware one, or maybe the developers who were able to do Forth, were on vacation or fired.

The result is ugly, but I think every software developer has done something similar at least once.

Anyway now I have a sort of reference machine which can sort of boot AIX.

P.S. And by the way, if you wonder why I keep writing  "Powerstack II Utah" instead of just "Powerstack II". It turned out multiple machines called "Powerstack II" were produced. And indeed they are incompatible. More gory details in the Linux kernel sources.

Sunday, July 2, 2017

My new toys: Motorola Powerstack I

Jochen Kunz sent me two Motorola PowerStack toys. Thank you very much, Jochen. Now I should have the reference machines which I can use for fixing QEMU.

The first one is a classic PowerStack I machine, which looks pretty cool and has some kind of proprietary firmware (PPC1Bug).  To connect it via a serial line I had to use pretty much all the cables and adapters I have: short 9F-9F cable, 9M-25F adapter, 25F-25F cable, 25M-9F adapter.
On the desktop Linux side I use GNU screen on a serial line. Found this feature just a few days ago. For those who missed it too, that's how it gets attached to a running screen:

screen -X screen /dev/ttyS0  # note 2 screens, that's not typo

That's what it tells on powering on:

Copyright Motorola Inc. 1988 - 1995, All Rights Reserved

PPC1 Debugger/Diagnostics Release Version 1.8 - 10/04/95
COLD Start

Local Memory Found =02000000 (&33554432)

WARNING: Board Configuration Data Failure

MPU Clock Speed =100Mhz
WARNING: Keyboard not connected

Initializing System Memory (DRAM)...

System Memory: 32MB, Parity Enabled (Parity-Memory Detected)
L2Cache:       NONE, Parity NOT Enabled


SelfTest/Boots about to Begin... Press <BREAK> at anytime to Abort ALL

SelfTest about to Begin... Press <ESC> to Bypass, <SPC> to Continue

RAM      ADR: Addressability......................... Running ---> PASSED
PC16550  REGA: Register Access....................... Running ---> PASSED

PC16550  IRQ: Interrupt.............................. Running ---> PASSED
PC16550  BAUD: Baud Rate............................. Running ---> PASSED

PC16550  LPBK: Internal Loopback..................... Running ---> PASSED
Z8536    CNT: Counter................................ Running ---> PASSED

Z8536    LNK: Linked Counter......................... Running ---> PASSED
Z8536    IRQ: Interrupt.............................. Running ---> PASSED

Z8536    REG: Register............................... Running ---> PASSED
SCC      ACCESS: Device/Register Access.............. Running ---> PASSED

SCC      IRQ: Interrupt Request...................... Running ---> PASSED
PAR87303 REG: PC87303 Parallel Port's Register/Data.. Running ---> PASSED

DEC21040 REGA: PCI Register Access................... Running ---> PASSED
DEC21040 XREGA: Extended PCI Register Access......... Running ---> PASSED

DEC21040 SPACK: Single Packet Xmit/Recv.............. Running ---> PASSED
DEC21040 ILR: Interrupt Line Register Access......... Running ---> PASSED

DEC21040 ERREN: ERREN and SERREN Bit Toggle.......... Running ---> PASSED
DEC21040 IOR: I/O Resource Register Access........... Running ---> PASSED

DEC21040 CINIT: Chip Initialization.................. Running ---> PASSED
NCR      PCI: NCR 53c8xx PCI Access.................. Running ---> PASSED

NCR      ACC1: NCR 53c8xx Device Access.............. Running ---> PASSED
NCR      ACC2: NCR 53c8xx Register Access............ Running ---> PASSED

NCR      SFIFO: NCR 53c8xx SCSI FIFO................. Running ---> PASSED
NCR      DFIFO: NCR 53c8xx DMA FIFO.................. Running ---> PASSED

NCR      IRQ: NCR 53c8xx Interrupts.................. Running ---> PASSED
NCR      SCRIPTS: NCR 53c8xx SCRIPTs Processor....... Running ---> PASSED

I82378   REG: i82378 Register Access................. Running ---> PASSED
I82378   IRQ: Interrupt Request...................... Running ---> PASSED

AutoBoot about to Begin... Press <ESC> to Bypass, <SPC> to Continue

NetBoot about to Begin... Press <ESC> to Bypass, <SPC> to Continue

1) Continue System Start Up
2) Select Alternate Boot Device
3) Go to System Debugger
4) Initiate Service Call
5) Display System Test Errors
6) Dump Memory to Tape
Enter Menu #:

It doesn't have anything on its hard drive, so the only reasonable option here is 3):

PPC1-Diag>ioi
I/O Inquiry Status:
CLUN  DLUN  CNTRL-TYPE  DADDR  DTYPE  RM  Inquiry-Data
  0     0  NCR53C825   0      $00    N   SEAGATE  ST31230W         0456
  0    50  NCR53C825   5      $05    Y   TOSHIBA  CD-ROM XM-4101TA 1084
  1     0  PC8477      0      $00    Y   <None> 

Tried all boot disks I have.

+ Boots Solaris 2.5.1/PPC floppy, which provides some very limited Open Firmware (not even sure it's based on the Firmworks OFW). After booting the floppy it's possible to boot Solaris from a CD. Nice to have, but not my toy of the choice: it works in a little endian mode, which currently doesn't work under QEMU/PReP, and hardly has any software. But if one day all the other OSes emulated I may get back to it.

- Unsurprisingly doesn't boot from any IBM AIX CDs. Already heard that AIX is quite picky about the hardware, was just curious if it gives any error message. It doesn't.

- Surprisingly doesn't boot from the two Motorla AIX CDs I have:
    "AOS1_3__RM02" (aka AIX v4.1.4 for Motorola PowerStack II)
    "AOS1_4__RM03" (aka AIX v4.1.4r4 for Motorola PowerStack II)
So, obviously the PowerStack II AIX is not compatible with PowerStack I.

* Haven't tried booting Windows NT on it. There is a report in google groups that NT flashes another firmware which can only boot NT and it's not possible to get back to PPC1Bug. On top of that, NT is little-endian, just like Solaris 2.5.1/PPC, so all the considerations from the above apply here too.

The good news: it has  an i82378 PCI controller and a NCR53C825 SCSI, which is quite close to what QEMU/PReP/40p target currently emulates.
The bad news: unless I find a boot disk for AIX for Motorola PowerStack I, this machine can not be used for debugging AIX.

The next weekend I'll write about the second toy.

/Stay tuned

Sunday, March 5, 2017

Hercules Terminator 64 strikes back

This is how the S3-Trio64 story (the beginning is in the previous post) went on.

The monitor LED was blinking as if there was no signal, and it looked like after all my suspect was correct and a generic "vga-video-on" code was not enough to initialize a S3-Trio card.

I don't have a null-modem cable to debug OFW using a serial line, so all the instruments I had to debug what's going wrong were the "beep" and "reset-all" OFW commands. Which is not much. So I considered trying another emulators to verify my S3 OFW fix.

First I tried 86Box. Just like PCem it uses a ROM file name to determine whether a PCI card should be available for the emulation. So there is no official way to start it without VGA BIOS. But it's easy to hack: just rename a ne2000.rom to a desired VGA BIOS name. Then there is a network BIOS instead of VGA BIOS, which is harmless because it exits after not finding the network card. Yeah, 25 years ago I used my network card to read arbitrary ROMs, the story repeats itself in reverse.

The screen stayed black just like on a real PC, so I thought I've got an easy way to debug where it goes wrong. Although 86Box (just like PCem) don't emulate serial connections, still using a floppy image is much easier than writing a physical floppy. Alas, after some debugging it turned out PCem and 86Box don't emulate the PLL registers which was the reason why OFW didn't like it. Disabling the PLL restart took the OFW a couple steps further. Up to a 86Box crash, reported here.

Being desperate I even tried Microsoft Virtual PC, which supposedly emulates S3-Trio64. Well, it doesn't emulate the PLL, so I had to get back to the experiments with the physical card.

To make a long story short, the "vga-video-on" was not the problem. It seems that OFW supported only the most of S3 Trio64 and some of Trio64V+ chips.  I was just lucky to have an unsupported one. The OFW developers described the challenge of supporting the V+ chips in this nice comments:

   \ Problem: none of the above will have worked if this is a "Z" version
   \ of the Trio-64. "Z" versions are those parts marked with an "X" after
   \ their part number. Hey, I didn't make this up, S3 did. Anyhow, if you
   \ have one of these beasties, you have to wake up the part differently.
   \ The catch is, you pretty much have to do this in the blind because
   \ until the chip is working, you can't tell which version it is for if
   \ you go poking at a chip that is not awake yet, you may hang the system.

   \ As it turns out, the Trio-64V+ (which at this point in the probe process
   \ is indistinguishable from all of the other versions of the Trio-64, also
   \ won't have initialized prior to the above command. So, that extra command
   \ is usefull for both the "Z" Trio-64 and the Trio-64V+ (also known as the
   \ '765 [all other Trios have a '764 part number]). Oh but wait, there is
   \ more. The 765 does not respond to IO accesses unless the memory access
   \ enable bit is also turned on. Which is why the above now includes this
   \ "feature".
   \ And now back to our regularly scheduled programming...

So I tried to improve the Trio-64V+ recognition process, and Hercules Terminator 64 suddenly worked! The question is if the support of Trio-64V+ breaks the regular Trio-64.

I suspected my Trio-64V+ to be one of the last working ones in this Universe, but I wanted to be sure about it. After some googling I've stumbled over vogons.org, the community of people who still run S3 VGA (among some other cool stuff from the past) on a daily basis. So I asked them to test, and got a lot of responses. (Once again, thanks to everyone who responded)
Up to now there is no report of a Trio-64 which wouldn't work with the current OFW. So the fix is committed in the upstream OFW.

But still if you have one of  S3-Trio32, S3-Trio64 or S3-Trio64V+ cards, please test if they work with OFW, as described on vogons.org.

/Stay tuned for more S3-Trio adventures.

Saturday, March 4, 2017

AMD or NVIDIA? S3-Trio 64! (21+)

While on the Internet there are some hot discussions whether AMD or NVIDIA graphic adapters are better for virtualization, I think I'm the first one to pass thru a S3 Trio 64 VGA.

Well, I'm sure the regular readers of my blog are not just 21+, but probably rather 38+, and still under 90, so they can re what S3 Trio cards are. A casual reader may look it up on the Wikipedia. I can only say - it was very cool in 1995. And in 1997 I think all of my friends already got rid of the Trio VGAs (replaced with, you know, AGP and all this "modern" stuff).

Why S3 Trio? It was used in the IBM PReP machines. At least in IBM 7020 40p aka Sandalfoot. As you may know from my previous posts, Hervé is working on adding S3 Trio 64 emulation to QEMU, and it's still work in progress. I wanted to make sure it will work under OFW, so no proprietary IBM firmware would be necessary.

Boom! It turned out it has been broken in the OFW tree all the time since the sources were published - for more than a decade. No big deal - they are hardly existing (thought I). Made a trivial fix - use a generic "vga-video-on" instead of a chipset-specific one. It didn't work under qemu from the first attempt, because some S3-specific sequencer registers were not implemented, so I had to disable some extra checks. Then it worked. Meanwhile Hervé added the support of these registers, so it also works unmodified.

Probably it would have been a good idea to consider the project finished at this point. But I had already an experience where firmware and emulator were built based on the specs of each other and quite different with the real hardware (qemu-system-sparc and OpenBIOS till 2009, interrupt handling qemu-system-sparc64 till 2012 and so on). So I wanted an independent test, to make sure the generic "vga-video-on" good enough for the S3 cards.

Luckily I still had a Hercules Terminator 64 (S3-Trio64V+) VGA card lying in a basement. So I could check the firmware on the physical hardware. I don't have a PPC machine, but it's no big deal, because the OFW drivers are cross-platform. I've built a floppy with OFW for i686 (the OFW has all the necessary sources and documentation how to do it, thanks Mitch). Booted and got to the OFW "ok" prompt. It works, so the case solved, right? No! Since it's a "video-on", it might be that it worked only because it was initialized by VGA BIOS in the text mode. So I unterminated the Terminator:

Hercules Terminator 64, "unterminated"


The system beeped as if there were no VGA, but after me pressing F1 booted the floppy. The screen stayed black.

/ to be continued, stay tuned.

Sunday, February 12, 2017

AIX KDB under 40p

Some news on 40p emulation: it's possible to launch the AIX kernel debugger under qemu-system-ppc.  For some reason the current PowerPC 601 CPU frequency is limited to 7.81 Mhz in the upstream qemu, so it takes more than a hour to load the debugger. But with a small modification it gets to the point within seconds.

The command line:

$ qemu-system-ppc -M 40p -bios p12h0456.img -hda aix-5.1-cd1.iso -cpu 601

^^^ -cpu 601 is crucial. With the default CPU (604) it just hangs after a greeting.

 And after 90 minutes,  on the serial line....

AIX Version pinmore.c, s.@(#)65 1.1
Instruction Storage Interrupt - PROC
[kdb_get_virtual_memory] no real storage @ 646E6D60
KDB(0)> f
pvthread+000000 STACK:
WARNING: bad IAR: 646E6D60, display stack from LR: 646E6D5D
KDB(0)>
KDB(0)> dr
r0  : 00000000  r1  : 00595910  r2  : 00595C58  r3  : 00000001  r4  : 01C08180
r5  : 00000000  r6  : 00000000  r7  : 00000000  r8  : 00000000  r9  : 00000000
r10 : 00000000  r11 : 00000000  r12 : 646E6D61  r13 : 00606178  r14 : 000000B8
r15 : 00000020  r16 : 00000020  r17 : 0803004D  r18 : 005AF0BC  r19 : 003FED04
r20 : 00606178  r21 : 00000020  r22 : 00606000  r23 : 00003F50  r24 : 00003F48
r25 : 00003F3C  r26 : 00000000  r27 : 63683A2C  r28 : 00003A24  r29 : 00003A20
r30 : 00590C70  r31 : 00000000
KDB(0)>

It's a pretty neat debugger somewhat similar to Solaris kadb:

KDB(0)> dc main 40
.main+000000     mflr    r0
.main+000004      lwz    r3,36E8(toc)        36E8(toc)=NON_DEBUG_AIX
.main+000008     stmw    r30,FFFFFFF8(stkp)
.main+00000C      stw    r0,8(stkp)
.main+000010       li    r0,1
.main+000014      stw    r0,0(r3)            r0=00000001
.main+000018     stwu    stkp,FFFFFFC0(stkp)
.main+00001C       bl    <.kdb_init>
.main+000020       bl    <.hardinit>
.main+000024       bl    <.vmsi>
.main+000028       bl    <.hardinit_defered>
.main+00002C       bl    <.init_locks>
.main+000030       bl    <.init_anyother_locks>
.main+000034       bl    <.ios_init>
.main+000038       bl    <.kdb_pin_symtable>
.main+00003C       bl    <.debugger_init>
.main+000040       bl    <.kx2init>
.main+000044       bl    <.kmem_init>
.main+000048       li    r3,B
.main+00004C       bl    <.i_enable>         r3=0000000B
.main+000050       bl    <.k_protect>
.main+000054       bl    <.wlm_ccb_init>
.main+000058       bl    <.strtdisp>
.main+00005C       bl    <.epost>
.main+000060       li    r4,0
.main+000064      lwz    r3,13C4(toc)        13C4(toc)=kernel_lock
.main+000068       bl    <.lockl>
.main+00006C       li    r30,0
.main+000070      lwz    r3,37EC(toc)        37EC(toc)=init_tbl

/stay tuned

Saturday, February 4, 2017

PReP IBM 40p Emulation in qemu-system-ppc

Hervé Poussineau is doing a great job on improving PReP emulation in qemu. The initial patch series is getting merged into upstream master, but there is more in Hervé's git tree: http://repo.or.cz/qemu/hpoussin.git/shortlog/refs/heads/40p. I've built an OpenFirmware binary with SCSI support for it, and once Hervé improves S3 Trio emulation there will be a build with S3 support too.

Btw, the S3 Trio support in the Herve's branch is already pretty cool. Here are some screen shots of the boot process with the proprietary IBM firmware (it's really just the firmware, not an OS):

40p screen right after the reset

Other than IBM PC, an IBM PReP machine starts in a graphic mode with some animation, showing the initialization process.

Initializing devices

More devices...

All devices are found

Now the firmware tries to boot an OS or a System Management Services (SMS) disk:

F4 was pressed

Now there is a hidden but a well known feature. Instead of inserting a floppy, blindly type "eatabug", no quotes. For a tech person it may sound like "Enhanced ATA Bug", but I guess the pronunciation is "eat a bug". And this will open a resident monitor, which looks quite powerful (I still think OFW is more powerful though ;-).

Resident monitor help
 That's all about the 40p emulation for today.

/Stay tuned

Saturday, January 21, 2017

sun4v emulation is in qemu master

sun4v emulation patches were merged into QEMU master on January, the 19th. Directly from my git tree. So now I'm a real co-maintaier. ;-)

Saturday, November 5, 2016

sun4v emulation update

Just pushed v1. No new features, just clean ups. As a part of the cleaning up process, improved memory flushes, so the v1 should be a bit faster than v0. The new version available here:

https://github.com/artyom-tarasenko/qemu/tree/sun4v-v1

Another visible change is that the machine name is now spelt lowercase for the consistency with the other SPARC machines emulated by QEMU.

The new launch line:

sparc64-softmmu/qemu-system-sparc64 -M niagara -L /path/to/S10image/ -nographic -m 256 -drive if=pflash,readonly=on,file=/path/to/S10image/disk.s10hw2

Saturday, October 1, 2016

QEMU sun4v/Niagara target went public

I’m publishing my work on the sun4v emulation on the GitHub site:

https://github.com/artyom-tarasenko/qemu/tree/sun4v-v0

Yes, I hope it’ll make it into the upstream soon, but those who like to boot Solaris 10/SPARC under QEMU can do it straight away.

It uses the firmware (hypervisor, machine definition and OpenBOOT) from the OpenSPARC T1 project. So in order to use it, download

http://download.oracle.com/technetwork/systems/opensparc/OpenSPARCT1_Arch.1.5.tar.bz2

$ tar xfj OpenSPARCT1_Arch.1.5.tar.bz2 ./S10image
$ cd path/to/qemu-sun4v

$ sparc64-softmmu/qemu-system-sparc64 -M Niagara -L /path/to/S10image/ -nographic -m 256 -drive if=pflash,readonly=on,file=/path/to/S10image/disk.s10hw2

Sun Fire T2000, No Keyboard
Copyright 2005 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.20.0, 256 MB memory available, Serial #1122867.
[mo23723 obp4.20.0 #0]
Ethernet address 0:80:3:de:ad:3, Host ID: 80112233.


ok boot –v
<…>
login: root

Enjoy!
In case you wonder why the path to drive image is not hard coded like all the paths to firmware components: it’s possible to specify a non-Solaris image, like HelenOS or NetBSD/sun4v (once it gets released).

Feel free to report me if you have more working OSes. :-)

 2016.11.04 Update: while the v0 version uses the name "Niagara", v1 and all subrequent ones will be using the lowercase name "niagara".

Saturday, August 6, 2016

Solaris 10 and year 2038 problem

Now I got a moment of a spare time to write why the Solaris 10 boot was failing under the new sun4v (sparc64) emulation target for QEMU.

It turned out that the now solved SMF issues I mentioned before were caused by a single character typo.

Stepping through the SQLite code I’ve noticed that there are two schemes: one persistent, which to my surprise has been opened with no problems, and a temporary one which failed because it could not create a file under /etc/svc/volatile which resides in RAM.

Why? Because of a very funny reason. The old Solaris versions used to check whether Real Time Clock (sometimes they call it “rtc”, sometimes they call it tod) returned a sane value and ignored it if it's not.

Solaris 10 issues a warning, but goes on and uses the given time. Then init system call creating file on a UFS considers time after 0x7fffffff invalid, which sends SMF into busy error loop.

The fatal typo was writing “qemu_clock_get_ns” instead of “qemu_clock_get_ms”, so I hit the error which the rest of the mankind using Solaris 10 for OpenSPARC T1 will hit 22 years later.

So let’s wait and see how many people will find my blog entries about SMF in February 2038.


Saturday, June 11, 2016

The second OS for the fresh sun4v emulation under QEMU

... is HelenOS. Although I was not able to boot the official 0.4 and 0.6.0 releases due to known problems with SILO (or OBP/Hypervisor), the current version works just fine:

HelenOS 0.6.0 revision 2521 under QEMU/sun4v
Note the nice reddish prompt. No other OS bootable under sun4v QEMU sparc64 emulation has something similar out of the box!

Saturday, April 16, 2016

FreeBSD-10.3/sparc64 under QEMU

I made a wrong statement on the debian-sparc mailing list, saying that the upstream qemu-system-sparc64 can already boot FreeBSD. As it turned out I spent too little time with the upstream QEMU. This made me feel obliged to fix it. This is how it's going to look in the QEMU 2.6.0, if my patches get accepted:

$ qemu-system-sparc64 -nographic -m 1024 -boot d -cdrom FreeBSD-10.3-RELEASE-sparc64-bootonly.iso
<...>
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 06:26:08 UTC 2016
    root@releng1.nyi.freebsd.org:/usr/obj/sparc64.sparc64/usr/src/sys/GENERIC sparc64
gcc version 4.2.1 20070831 patched [FreeBSD]

Console type [vt100]: xterm


When finished, type 'exit' to return to the installer.
# uname -a
FreeBSD  10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 06:26:08 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/sparc64.sparc64/usr/src/sys/GENERIC  sparc64
# ls
.cshrc          HARDWARE.HTM    bin             libexec         sbin
.profile        HARDWARE.TXT    boot            media           sys
.rr_moved       README.HTM      dev             mnt             tmp
COPYRIGHT       README.TXT      docbook.css     proc            usr
ERRATA.HTM      RELNOTES.HTM    etc             rescue          var
ERRATA.TXT      RELNOTES.TXT    lib             root
#
So, after all my statement should be correct. :-)
A pity the sun4v port of NetBSD is discontinued. So it's only for sun4u for now.

Tuesday, March 1, 2016

Hello, Solaris 10 under QEMU/sun4v!

SunOS Release 5.10 Version Generic_118822-23 64-bit
Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Ethernet address = 0:80:3:de:ad:3
mem = 1048576K (0x40000000)
avail mem = 1027579904
root nexus = Sun Fire T2000
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
virtual-device: hsimd0
hsimd0 is /virtual-devices@100/disk@0

root on /virtual-devices@100/disk@0:a fstype ufs
pseudo-device: dld0
dld0 is /pseudo/dld@0
cpu0: UltraSPARC-T1 (cpuid 0 clock 5 MHz)
iscsi0 at root
iscsi0 is /iscsi

INIT: Executing svc.startd
svc.startd: Unknown SMF option "=debug".
Booting to milestone "milestone/single-user:default".
Hostname: unknown
Requesting System Maintenance Mode
SINGLE USER MODE

Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.

Entering System Maintenance Mode

Mar  1 14:09:35 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
#

Well actually the local time is 23:09:35, but I'm cool with it.

Sunday, February 28, 2016

What do SQL and SPARCv9 assembly language have in common?

Well, here we go: I’m debugging SQL execution switching between the kmdb kernel debugger and gdb.

Breakpoint 70, 0x000000000003e528 in sqliteInitOne ()
0x000000000003ec9c in sqlite_exec ()
(gdb) x $i1
0xadea8:        "SELECT type, name, rootpage, sql, 0 FROM \"main\".sqlite_master"

SMF uses sqlite, so the boot process involves some SQLs.
Who would think that 20 years ago?

But it’s fun indeed. Booting Solaris/sparc under sun4v not just involves plain repetition of the old exercises, but requires some totally new ones as well.