or Buffalo to New York via Anchorage, on Four Wheels and Three Turbos.
by invelica from 300g curry rice. last updated 2022/10/08.
Hi, I'm canon / @invelica from 300g curry rice, and this page is a hastily-built monument to my summer spent digging into Zero4 Champ DooZy-J (1997), a racing / life simulation game for the Playstation 1 where you drag race across America while potentially courting a number of very 90s anime girls.
I started the project with the goal of, starting from near-zero ROM hacking experience, getting as close to creating a full English patch for DooZy-J as possible. After a few months of on-and-off work, I was able to complete a proof-of-concept toolkit that I used to translated the first scene of the game.
The toolkit allows for what appears to be full image replacement and full text extraction, but due to technical limitations, full text reinsertion is currently an unsolved problem. However, text reinsertion using this toolkit is possible for many of the game's scenes - though I would recommend that a full patch also tackle the issue of converting the text to half-width for better readability.
This site introduces DooZy-J as a game, details what I've figured out about its structure as a piece of software, and describes how I've approached it so far as a hacking project. It's my hope that you look at this page and say something like "wow, I think I'll go try hacking something too" or "wow, I think I'll go draw Kuhler Kupplung fanart".
But enough talking - let's see some translation:
Click here to view a short Youtube video demonstrating the entire single-scene English patch; keep scrolling to learn more about the patch, the game, and the madness of hacking.
Download and extract the DooZy-J hacking toolkit, which includes some sample text, some tools, and the Python scripts I wrote in all of their unedited and TODO-laden glory.
Place a single-track DooZy-J PSX ROM with the filename orig_doozyj_psx.bin
and orig_doozyj_psx.cue
in the same folder as the scripts. (If you have a rip of DooZy-J with multiple track files, please combine them using a tool like CDMage, which is included in external_tools.)
Run _unpack_rom_and_scripts.bat
. This should create many folders:
Folder | Description |
---|---|
orig_cd_contents |
The extracted contents of the ROM. |
orig_pac_contents |
The extracted contents of the PAC archives located in the PAC folder in the ROM. |
cd_contents , pac_contents |
Copies of the above, for your editing pleasure. Will not be created if they already exist. |
orig_csv_scripts |
The Japanese text for every script in the game (located in orig_pac_contents/TAK_BIN ), extracted into one CSV file per script. |
(If you are just running the proof of concept patch, skip this step.)
Perform any editing of files desired, including modifying scripts in csv_scripts_edited
and any image files in pac_contents/PCT_TIM
.
Run _modify_scripts_and_repack_rom.bat
.
This will patch script files in pac_contents with specific modified script CSVs located in csv_scripts_edited, then repack everything into a new ROM file, repacked_doozyj_psx.bin/cue
.
This can be run in a manner of your choice (tested on Duckstation).
One, I'm impressed, because it takes a comically long time to get to the part where you actually get to race a car, as opposed to text-adventuring around Fisherman's Wharf looking for foriegners to hit on, and two, DooZy-J manual transmissions have a unique shifting mechanism that emulates an H-gate, where you hold Triangle to clutch, then use the D-pad to move the shifter into the appropriate gear. It takes a little getting used to and a lot of claw-handing.
Aside from that, the rest is pretty standard. Circle gas, Cross brake (don't brake), and don't jump the gun at the start line. Anyway I realized I just had to put this note somewhere in the writeup. Carry on.
Zero4 Champ DooZy-J (1997) for the Playstation 1 is a drag-racing life-simulation-type game, perhaps best known in English for having an even longer and more obtuse title on the Sega Saturn, where it was released (with nearly zero discernable differences) as Zero4 Champ DooZy-J Type-R. One might assume that the R stands for Saturn. The manual notes that the J in DooZy-J stands for Jehu, Japanese, and Juvenescence.
It is the fifth game in the Zero4 Champ series, a series that lasted from 1991 to 2002 that is centered around wrapping a drag racing core in a RPG/simulation chocolate shell. Well, except for Drift Champ (PS2, 2002), which decided to make a hard shift into being a circuit drifter. So maybe that one is peanut M&Ms. I can't really say, having partaken in neither of those.
To speak more of DooZy-J, the only game in the series I've played to completion, it is a game that feels like it reflects the time it was born in. The previous game (RR-Z for the Super Famicom) looks like this:
And the next game (Drift Champ) looks like this:
And in-between, a few years after the release of Tokimeki Memorial, we get all the hair colors of the rainbow:
DooZy-J, from what I gathered from research, is a contentious game in the series. It makes a rhetorical and physical break from previous games in the series (which established continuity and appear to have a more serious, realistic tone) to follow an Extremely 90s Shonen Protagonist as he steals the dream of a girl he raced against to go to America and win, figuratively speaking, the World Series of Drag Racing, which is incidentally located in an unknown location in New York that happens to look exactly like Indianapolis Motor Speedway.
Indianapolis Motor Speedway is also in the game.
To me, DooZy-J is definitely a 90s game in that way that it has clear influences and clear Ideas and sometimes it does not really know what to do with those ideas, and the result is not necessarily a game that is good, but definitely interesting, in that way that simulation games can of mush together a weird pastiche of genres without much concern for the overall composition.
You start off in San Francisco, immediately commenting in the airport how much American booty you hope to snag, before heading to a hotel where you are forced to shout "Yankee go home, fuck you!" at the bellhop.
You hopelessly adventure through the city, trying to figure out whether Fisherman's Wharf is NORTH, SOUTH, or DENNIS of the Golden Gate Bridge, all while meeting countless women with Heroine effectively stapled to their shirt. You make a mental note to aim for the route of Kuhler Kupplung, a coke-bottle-glasses shy German girl whose name is literally just two car parts glued together.
After half an hour, you somehow still have no indication that you will drive a car in this game. You wonder how getting on specific character routes will work in this game. The game asks you who you want to partner with. Well, that was easy. (Note this only covers half the heroines - the others are a bit more complex.)
You slowly grind for money, setting your schedule to spend full weeks working 9-to-5 at an auto shop and making $500. Eventually you realize that scheduling five straight workdays playing the terrible fishing minigame pays like 15 times better. You CaRPG your way through San Francisco and Los Angeles, buying upgrades for your car and racing in some official drag tournaments. You look at the guide's item table for Kuhler and realize that if you want to raise her affection, your best options are either a $15,000 diamond ring (+4 affection) or a $100 necklace (+5 affection).
You arrive in Las Vegas. Someone tells you that in Vegas, you race cars in reverse. This has no effect on gameplay other than making everything scroll backwards as you scream through the goal at, presumably, 200 km/h in reverse.
Naturally, Vegas has a casino. You load up your savescumming abilities and instantly make money worthless for the rest of the game. Satisfied with the $100,000 in chips you made with no effort, you leave the casino. The sexy casino lady (unironically she is one of the most attractive characters in the game) immediately calls you a big boy and drags you to bed. When you wake up, your chips are gone. Shouldn't have made that much money in one night!
You reload your save and make $99,000 a few nights in a row instead.
You go to the Grand Canyon. Suddenly, instead of drag racing, you are thrown into a full-on RPG which you must clear to advance with the game. If I recall, the RPG involves you dropping everything to chase a racist Chinese caricature of Indiana Jones. The turn-based RPG, inexplicably, triggers random encounters based on real time elapsed, rather than on steps walked.
You decide to take the south route to New York, which means you are now in Phoenix. You are running 11.5 second quarter-miles, eight hours after running 13.5 second quarter-miles. The struggle is real.
After Phoenix, you zoom all the way to Daytona Beach on the east coast. Takenobu Mitsuyoshi echoes inside of your head. The Hot Swimsuit Event happens. Kuhler wears what might be unironically the least horny swimsuit of all time and ignores you to dig for crabs. She is the best.
You realize that you have actually already maxed out Kuhler's affection and will not see any new generic chat dialogs from her, yet you are nowhere near the end of the game. The game shows you an Independence Day cutscene on July 4th.
You go to Miami, then Indianapolis, and then Chicago. By this point you realize that the key to winning, like in the far superior and incredibly vibes-packed Racing Lagoon, is equipping more turbos onto your car than you think should be physically possible. The TV tells you that the Sears Tower is a hot date location. If you go to the Sears Tower, she'll be taken hostage, and you will have to play the RPG mode to rescue her. Unless you are comically overlevelled, you will be one-shot by the first robber you see, end up in the hospital, and lose like two months of game time.
You reload your save. Now you are in Buffalo, and uncertain you know where New York is anymore. You've played at the casino, you've gone fishing, you've grinded an RPG, you've played a cheap puzzle game. The next stop after Buffalo is Anchorage, Alaska. The story decides to redeem Racist Chinese Indiana Jones by giving him a twin brother up to his shoulders in debt to the mafia as motivation for why RCIJ stole all your stuff. Given all the other casual racist caricatures and general 90s horny anime boy sexism so far, you notch this up as the one point in DooZy-J's favor.
You head to Boston. Opponent JDM cars are doing 0-100 km/h in just about a second, the same pace as Funny Cars. You meet a hot college professor at MIT and immediately lose your mind. She comes on to you, just a little bit.
The next day she is dead, and you're framed as the killer. The game switches to a full-screen visual novel a la Fate. You navigate through screens and screens of serious text and sound effects. Eventually you escape the maze of text and, finally, end up in New York. The murder mystery is never mentioned again.
After amusing yourself with basketball and quiz minigames, you finally enter the World Series of Drag Racing in New York. You race against 12 opponents. You realize that you never wrote about how awful the load times in this game are.
Predictably, after becoming World Drag Racing Grandmaster, you go to Hawaii for a short bonus and a swimsuit competition, where you see every route-able girl in the game, and then some, in their skimpies. Kuhler changes from her 1930s swimsuit into a two-piece bikini. She takes off her glasses for the first time. Unfortunately, she is - by the text of the game - hot, and the developers are cowards.
A few final races later, you actually reach the credits of the game and the end of Kuhler's route. In the epilogue, Kuhler, for the second time, takes off her glasses for the first time. Unfortunately, she is - by the text of the game - hot, but fortunately, this fact means that the ending plays a "ok you're cool but you know what's cooler? women" gag. Given the character you are in this game, I can't disagree.
Double-unfortunately, once you clear the game with Kuhler once, her coke-bottle glasses are permanently defogged for all saves on your memory card. The ultimate coward's move.
This is the kind of game DooZy-J is, at least played through the lens of someone who knows enough Japanese to comprehend every time a filler one-line dialog like "Yeah, that's a building" was shoved into a random location's interaction, but not enough Japanese to understand the game 100%. It's a game that undoubtedly has made some Choices, when you look at it from the perspective of someone who has played good simulation games, played any game with a bunch of minigames in it, played any game where you court women, or been to America at some point in their life.
But it's a game that it definitely left me with stories, good, bad and Kuhler - and I think that makes it a fun game to talk about. So why not try a little bit to give others the chance to experience a game that is just peak 90s Weird Mediocrity?
This patch project is also inspired by a translation I played for Imadoki no Vampire: Bloody Bride, a 1996 Playstation game by ATLUS that also seemed to go after that TokiMemo money from a mostly-RPG perspective. This plays closer to TokiMemo, given the focus placed on balancing your stats properly, but I couldn't help but see its questionable attention to balance in DooZy-J. It's another title of its time, and its one-person translation project inspired me a lot to try the same with DooZy-J.
I won't be adding any more game screenshots in the technical portion of this page, by the way. Apologies.
For convenience, the 18-track ISO is merged into a single .BIN with an 18-track .CUE sheet. I do not recall the tool I used to perform this.
Afterward, dumpsxiso
and mkpsxiso
can be used to unpack and replace the CD.
dumpsxiso
should generate the XML file used to repack, but it is located in the toolkit as doozyj_bin.xml
.
The CD structure is as follows:
Path | Description |
---|---|
SLPS_007.55 |
The main game executable. |
ZERO.EXE |
Not investigated. Largely similar in structure and content to SLPS_007.55 but very different when comparing bytes. |
RACE.EXE |
Not investigated. Presumably handles racing logic. |
Z4OPEN_R.STR |
The opening movie. |
BGM/ |
Music, naturally. |
CARL/ |
Believed to be "car sprite info when on P1 side". |
CARR/ |
Same as above, but P2 side. |
ETC/ |
Believed to be an audio VAB, a few miscellaneous TIMs, and potentially some extra kanji (外字.BIN) |
PAC/ |
Contains multiple uncompressed PAC archives, which are of the most interest to this localization project |
PAC/CLT_CLT.PAC |
Presumably CLT palette data. |
PAC/MSC_SEQ.PAC |
Presumably SEQ format music; uncertain how responsibility is divided between this and the BGM folder. |
PAC/PAC_PAC.PAC |
Unknown. Contains some VAB audio files. |
PAC/PCT_TIM.PAC |
Images, spritesheets, all that TIM stuff. |
PAC/TAK_BIN.PAC |
The game's scripts, which handle essentially all menu/scene logic. The most important thing. |
PAC/VAB_VAB.PAC |
Presumably VAB format audio data. |
R_DATA1/ |
Presumably data for gameplay purposes; does not contain localizable content to the best of my knowledge. |
R_DATA2/ |
Same as above. |
TUNE/ |
Contains graphics and presumably car-specific data for the tune-up screen. |
TUNE2/ |
Same as above. |
PACs are archives with a simple format: a table of contents followed by uncompressed files, with both the TOC and each individual file padded to align to 2048-byte (0x800
) "block" boundaries.
The table of contents has a 4-byte entry for each file in the PAC:
0xFFFF
is reserved to signify the termination of the TOC.For example, here is a TOC that contains two files:
00 00 01 00 3C 05 06 00 FF FF 1E 00
0x0000
, which starts at block 0x0001
.0x053C
, which starts at block 0x0006
(indicating that file 0x0000
ends at block 0x0006
).0xFFFF
, which indicates that file 0x053C
ends at block 0x001E
.Files can be extracted from PACs using a quickbms script (doozyj_pac_unpack.bms
) following the logic above; to repack the PACs, I wrote a python script (repack_pac.py
) to perform the reverse operation.
The only identifier for each file in a PAC is its 2 byte ID, so each file is referred to below by this ID. In the long run, I would recommend writing something that can translate between these internal IDs and human-readable descriptions. It's not so easy remembering that the script for Kuhler's affection scenes is 0x01CC
.
I only performed a proof-of-concept for modifying and repacking the game's images, and this was at the start of the project, so my recollection of this is vague. I do, of course, remember that my proof-of-concept involved pasting an Amogus onto the main menu background.
In my proof-of-concept, I searched through the unpacked TIMs in PAC/PCT_TIM.PAC
and identified the TIM corresponding to the main menu background. I used TimViewer to convert it to BMP, edited it with an external image editing program, then once again used TimViewer to convert it back to TIM. It may need to be an 8-bit TIM.
I have not proven that this process can be extended to all images that require localization (for example, I do not have the expertise to know if messing with palettes will be necessary), but the amount of localization-critical Japanese images in the game is fairly small.
I informally call the files in BIN/TAK_BIN.PAC
"game scripts" or "scripts", as they handle the overwhelming majority of the game logic: from displaying text, to showing the opening movie, to modifying internal value, it all appears to be here.
A script consists of a brief table of contents, followed by a continuous series of opcodes that the game engine can parse. The table of contents consists of one or more 2-byte addresses, which represent the location in the script execution should begin for that TOC entry.
For example, here is the table of contents for script 0x01CC
, containing three addresses:
06 00 68 3F A3 5B
When a script is invoked, it is loaded with an parameter that is used to index into the TOC. Invoking script 0x01CC
with parameter 00
will start execution from 0x0006
in the script; parameter 01
will start from 0x3F68
; and parameter 02
will start from 0x5BA3
.
Note that when analyzing scripts, there is no explicit indication of where the TOC ends and where opcodes begin, and the TOC is not guaranteed to be in ascending order. However, it can be inferred by keeping track of the lowest address while scanning the TOC section, and checking when the current pointer reaches this lowest address.
For example, a table of contents beginning with 02 00
can be trivially inferred to have just one entry.
Also note that all addresses in script files are 2 bytes long. This puts a hard cap on script size at 0xFFFF
.
Script execution proceeds with the game advancing the script pointer forward in linear fashion, reading 1-byte opcodes and executing functions corresponding to each opcode. (See the debugging section below for more details.) The functions for each opcode may advance the script pointer further or even move it entirely, using a stack that handles call/return operations both inside a single script and between multiple scripts. Usually an opcode will read a fixed amount of parameters, but this is not always the case. And dropping the dry explanatory tone: oh, how much agony those cases cause!
Inventing some fictional opcodes for the sake of an example:
AA
is an opcode that takes no parameters.BB
is an opcode that takes one 4-byte parameter.CC
is an opcode that takes one 4-byte parameter, that indicates how many 2-byte addresses to read.Given the above rules, a segment of script could read like
AA BB 01 00 00 00 AA CC 02 00 00 00 99 99 77 77 BB 02 00 00 00
which would correspond to "AA
, BB
with parameter 0x01
, CC
with parameter 0x02
[meaning read the two addresses 0x9999
and 0x7777
], BB
with parameter 0x02
".
Given enough perusal of the executable in Ghidra and enough hand-manipulation and experimentation in an emulator, the rules for each opcodes can be roughly intuited enough to write a Python script (script_unpack.py
utilizing script_tools.py
) that can parse a game script from beginning to end. (Note there is a huge asterisk on this statement; again, please see the debugging section below.)
This Python script does not have anywhere close to full knowledge about what the game script does, but it does know the sequence of opcodes in the game script which allows it to extract text strings from the opcodes 95
and A1
, which write strings either to screen or to memory, correspondingly,
The 95
opcode contains one 2-byte address, followed by Shift-JIS encoded text (with some escape codes, detailed below). The address points to the location after the end of the text (that is, the first byte that is not part of the 95
opcode).
For example, with the 95
byte being at address 0x0000
:
95 10 00 82 60 82 8D 82 8F 82 8E 82 87 82 95 82 93
0x0010
is the address of the byte right after the final 93
byte, and 82 60 82 8D 82 8F 82 8E 82 87 82 95 82 93
is, of course, "Amongus" encoded in Shift-JIS.
script_unpack.py
takes a game script file and extracts all text in it to a CSV file, with a row containing information on the string's start and end location, whether it is a 95
or A1
opcode, the original string, and an empty column for the translated string.
In a full-scale project, consideration would need to be made as to how to organize all of this text in order to provide sufficient context for translating. This explanation, however, will continue while pretending that translated strings have magically appeared in the above CSV file.
Repacking the translated text presents many issues, many of which are still unsolved (thus the creation of this rambling documentation rather than a full patch). A handful of approaches are discussed below, all mostly concluding in "this was interesting, but didn't work". To borrow a common adage, if you have ideas on how to improve these approaches or new approaches of your own, sound off in the proverbial comments.
The original approach was to replace every string inline. That is, given a row containing the original size and location of the string and the new string, the corresponding 95
opcode instruction can be relatively easily updated.
However, this presents a very large challenge.
Consider the following example code:
95 05 00 82 60 64 CD AB
This represents a 95
opcode drawing the string "A", followed by a 64
opcode representing a jump to the address 0xABCD
.
Replacing the string "A" with "no" would change the code to
95 05 00 82 8E 82 8F 64 CD AB
But note that not only does the end-of-string address needs to be updated from 0x0005
to 0x0007
to reflect the new size of the string, but the address pointed to by the 64
opcode (which has nothing to do with text!) needs to be changed from 0xABCD
to 0xABCF
, to reflect how one byte was inserted at 0x0004
and offset all addresses after that location by two bytes.
To handwave, given a list of all addresses in a game script, and a list of all byte insertions in the game script, updating each one of them is not too complicated. However, determining every single address in a game script is very challenging (detailed below in Finding Every Address), and it is likely that missing even one would cause a game crash or undefined behavior.
For this reason, I consider this approach to be unfeasible. It seems like it would work for a majority of the game's scripts - but proving that every address is properly remapped is, at best, an extremely arduous task, and at worst, impossible.
Moving on to consider approaches that do not modify the length of existing game scripts, another approach is to replace every 95
/A1
string opcode with an 67
opcode that loads another script file. This script file would print the translated string, then return execution to the original script.
Mechnically, this approach is fairly sound. A 67
opcode takes 9 bytes (1 for the opcode, 4 for the script ID, and 4 for the TOC index), which means that as long as the 95
opcode has at least 3 characters of text, there would be sufficient space to replace it without expanding the instruction. (And of course, no-op bytes can be used to fill extra space.)
Given that script IDs are 4 bytes and that there are only 441 game scripts, there is also sufficient space to add new scripts to TAK_BIN.PAC
, even if more than one translated script per script is needed.
A proof-of-concept was successfully created for this approach. However, as you might expect, the performance of loading a separate script file for every single string is extremely comical, turning what is normally instant-display text into a line-by-line chug that would make any RPG with a slow text speed blush.
It may not seem survivable for visual-novel-style dialog sections, but given that every menu in the game runs on these scripts and this style of text rendering, it would quickly grow to be untenable with the current script loading speed.
A simple, similar approach would be to store the English text at the end of each script file, and replace each 95
/A1
string opcode with the jump opcode 64
, effectively performing the above approach, but within the same file.
The main limitation preventing this is that all script files are limited to 0xFFFF
bytes in size, and many files are already very close to that limit. This is a problem that even inline text replacement would eventually struggle with (given the verbosity of English relative to Japanese).
The script limit comes not just from the fact that addresses are 2 bytes long, but also from the memory that appears to be allocated to the script when it is loaded - there is other data stored directly afterward after 0xFFFF
bytes. This seems like it would prevent a simple extension approach, like replacing a no-op opcode with a new opcode that performs a jump to address + 0xFFFF
, and only storing string data after 0xFFFF
in each script.
It would be interesting to see if there are unused blocks of memory where additional translated text data could be safely stored and loaded from, but I do not have the expertise nor could I find the emulators to assist with this at the time of writing.
Ideally, for an English patch, it would be more visually appealing to render half-width text instead of full-width Shift-JIS (not to mention the space savings). I did not explore the feasibility of this yet, as it did not seem a trivial change.
DooZy-J does appear to use the BIOS font for rendering text (which is to say, when I managed to boot the game with a US BIOS, nearly nothing rendered), but the actual function for rendering text for the 95
opcode contains many exceptions for single-byte escape characters for handling things like font color and printing from variables. These would need to be changed to avoid colliding with regular English text, not to say the least of whatever the process of switching to render using single-byte text would entail.
This documentation also only covers text replacement in the script files - there are a handful of strings located elsewhere, such as in the main executable (mostly enemy names for the RPG subgame, it seems). Without loss of generality, it is assumed that handling text not in the script files is a problem that is less complicated and less critical than the existing problems.
An additional consideration when localizing JP->EN is that English lines may require more screen space than their Japanese counterpart. Breaking a single 95
/A1
line of text into multiple lines appears relatively trivial - adding a linebreak is as simple as adding an additional 95
opcode, and adding the instructions 97 03 00 00 00 93 01 00 00 00 92
seemed to be sufficient for a proof-of-concept dialog break (pausing for player input, then clearing the dialog window, while keeping the name of who is talking intact).
I built in some hardcoded instructions to script_repack.py
to handle this ("line" and "dialog"), though I feel in their current state they are slightly brittle.
The process of cracking DooZy-J's code (and opcodes) came in many small breakthroughs. To document the ones I remember -
Starting from an old save file in the middle of a gameplay run, it is easy to access Kuhler's max affection dialog from the navigation menu. Snapshotting memory (using no$psx or similar) while this scene is loaded and searching for the currently visible dialog line (making assumptions that it is encoded in plain Shift-JIS) makes it fairly easy to find the script data, and searching the extracted game files for the same bytecode eventually revealed that the scripts are stored in TAK_BIN.PAC
, and moreover, stored uncompressed.
Using pSX to put breakpoints on access to specific bytes in the memory corresponding to these lines eventually let to the discovery of the script pointer, located at 0x0007a008
in memory. This allowed for the ability to easily pause script execution at any location in the script (using read_word(0x0007a008)==0x0006
in the breakpoint, for example), which helped gain understanding of the TOC as well as how jumping around the script works.
Manual experimentation (both with tracing the script execution and by manually modifying/removing/etc bytes and rebuilding the ROM) led to a basic understanding of major opcodes (such as jump, return, print string, and conditional jumps) and script rules.
However, in order to attempt to find every single address in the script file - as is necessary for the first string replacement approach - deeper understanding is needed. Ghidra is used (along with the ghidra_psx_ldr plugin) to disassemble the executable and convert it to C-like readable code, and provide many powerful searching and tracing capabilities.
Examining raw strings in SLPS_007.55
quickly indicates that 0x800328e7
is an error reporting function that takes in an error message. This helps intuit the purpose of many functions that pass errors like Too many returns
or Kanji format invalid
.
With the help of breakpoints in pSX, major opcodes such as 95
can be traced to 0x8002959c
, where much of their high-level logic is fairly reasonable to decipher (including escape characters and pointer movements). It especially helps to remember to mark 0x0007a008
in the code as the script pointer.
All opcode function eventually can be traced back to a confusing pointer table starting at 0x800524b4
, which shows no obvious indication of its purpose or organization. However, with enough examples of known opcodes, it can be determined that this function table actually points to all opcodes in sequential order, with the first being opcode 64
(pointing to 0x80023340
).
Another great discovery is that every opcode will typically call 0x80028ae8
multiple times before calling its own payload function. Examining 0x80028ae8
shows that it moves the script pointer either 2 or 4 bytes depending on the parameter passed in, and by once again comparing with known opcode functionality, it can be determined that this function means "load X bytes from the script into a variable". Effectively, the calls to this function belie the number of parameters the opcode consumes, with all 2-byte parameters being addresses.
This is wonderful news! With these two pieces of information in tandem, knowing the exact size of each opcode is but a simple afternoon's transcription away, which in turn means that a Python script can be written to exactly parse a game script from start to finish, opcode by opcode, easily marking all the addresses and extracting all the strings.
... is what I'd like to say!
This is in fact true for the majority of the opcodes, and functions well enough to get proof-of-concepts working very well. But there are just enough Weird Opcodes that the entire scheme gets very complicated very fast.
B4
is a nice switch opcode. It consists of two parameters - the memory address to switch on, and the number of cases in the switch - followed by a number of addresses equal to the number of cases. Fairly straightforward to parse.
B3
is a mean switch opcode. It consists of one parameter - the memory address to switch on. The function simply takes the Nth value at that address, and moves the script pointer to the Nth address following the opcode. Which is to say, there is no safety, nor specification of how many cases the switch has!
Sometimes intuiting the number of cases in a B3
switch seems easy. For example, take the following code:
B3 01 02 03 80 01 AA 05 AA 09 AA 97 03 00 00 00 93 01 00 00 00
The "address block" in this switch stands out fairly easily - the destination addresses are all fairly close together (0xAA01
, 0xAA05
, and 0xAA09
), and the leading bytes in each address (01
, 05
, and 09
) do not resemble existing opcodes. However, as one would expect, neither the ordering nor the grouping of the addresses is guaranteed.
Still, considering the above, it's reasonable to construct an algorithm (to speak selfishly as someone who did exactly that) which checks each address in the tentative address block to see if it is the start of valid bytecode. To modify the above example:
B3 01 02 03 80 64 75 99 75 AA 75 97 03 00 00 00 93 01 00 00 00
Starting with 64 75
, the algorithm assumes 64
is a valid opcode, and parses the jump 64 75 99
. It then attempts to parse 75
as a valid opcode, which has two 4-byte parameters, so it parses 75 97 03 00 00 00 93 01 00
. It then would try to parse 00
as a valid opcode, and fail. So it can be assumed that 64 75
is an address in the switch table.
Advancing to 99 75
, the algorithm attempts to parse 99
as a valid opcode, and fails, because 99
requires eight 4-byte parameters (more than is available here).
Moving next to AA 75
, AA
is parsed as a valid opcode with one 4-byte parameter, and then the algorithm tries to parse 00
and fails.
Finally, the algorithm reaches 97 03
. 97
is parsed as a valid opcode with one 4-byte parameter, 93
is parsed as a valid opcode with one 4-byte parameter, and parsing completes successfully. So it is determined that 97 03
is the beginning of the bytecode after the switch block, and a B3
switch block of 3 addresses is parsed.
The actual implementation is non-trivial, given the tendency of this look-ahead parsing to branch out exponentially, but, to handwave a little, it's nothing a little memoization can't solve.
However, it is easy to defeat this algorithm with a careful example:
B3 01 02 03 80 64 75 99 64 AA 75 97 03 00 00 00 93 01 00 00 00
With this, 64 75
is parsed as first the jump 64 75 99
, then the jump 64 AA 75
, then 97
, 93
, and a successful parse of what is effectively a zero-address switch. However, a human can easily see that there are supposed to be 3 addresses in this switch.
This algorithm which only knows the number of parameters in the switch is fooled by a combination of addresses which happens to be parseable in a way such that it assembles into valid opcodes that eventually "realign" themselves with bytecode after the switch statement. This results in some of the game script not being properly marked as addresses, which inevitably results in undesired behavior (visible or invisible) when running the game.
The algorithm can be improved in subtle ways (assuming there is always at least 1 or 2 addresses in the switch, running heuristics on the tentative addresses to see if they follow patterns), and I even wrote a Python script to simply present the hundreds of switch blocks to the user and ask them to pick out the end of the switch (a fairly easy task, but error-prone), but after time I one conclusion: safely replacing strings inline eventually reduces to the question "do you know every single thing happening in the game script?", and my current answer to that question is no.
Here are some brief notes about some notable opcodes. These are not guaranteed to be correct, and more examining of the opcode function pointer table (starting at 0x800524b4
) in Ghidra may be beneficial.
Opcode Format | Description |
---|---|
64 AA AA |
jump to position A in the script |
65 AA AA AA AA BB BB BB BB |
jump to script file A with TOC index B , clearing the stack |
66 AA AA |
call to position A in the script |
67 AA AA AA AA BB BB BB BB |
call to script file A with TOC index B |
68 |
return (to the position / script of the previous call) |
69 AA AA |
conditional jump to position A in the script if the lowest bit of 8007a0a8 is 1 (unknown what 8007a0a8 is) |
6A AA AA |
conditional jump to position A in the script if the second lowest bit of 8007a0a8 is 1 |
6B AA AA |
conditional jump to position A in the script if the lowest bit of 8007a0a8 is 0 |
6C AA AA |
conditional jump to position A in the script if the second lowest bit of 8007a0a8 is 0 |
6D AA AA BB BB BB BB CC CC CC CC |
jump to position A in the script if the values referenced by B and C are equal (0x80028ae8 , the parameter loading function, appears to load the value from memory if the value referenced starts with 0x8000 , treating it as a raw value otherwise) |
6E BB BB BB BB CC CC CC CC AA AA |
jump to position A in the script if B != C |
6F BB BB BB BB CC CC CC CC AA AA |
jump to position A in the script if B >= C |
70 BB BB BB BB CC CC CC CC AA AA |
jump to position A in the script if B > C |
71 BB BB BB BB CC CC CC CC AA AA |
jump to position A in the script if B <= C |
72 BB BB BB BB CC CC CC CC AA AA |
jump to position A in the script if B < C |
7A BB BB BB BB AA AA |
jump to position A in the script if an unknown conditional is met |
7B BB BB BB BB AA AA |
jump to position A in the script if an unknown conditional is met |
7C AA AA AA AA BB BB BB BB |
appears to be logic related to drawing character sprites / expression |
92 |
appears to clear selected draw box |
93 AA AA AA AA |
appears to set selected draw box to ID A (there are separate draw boxes for the character name, dialog text, calendar date, etc, but they were not investigated as part of this writeup) |
95 (parameters omitted) |
draw string to selected draw box, as described elsewhere in this doc |
96 (parameters omitted) |
eight 4-byte arguments followed by a switch block of unknown size. purpose unknown. notes say "prompts for player choice" but I am uncertain if I trust this. |
97 AA AA AA AA |
notes indicate "wait for player input of given button" |
A1 |
another string-writing opcode similar to 95 , but not investigated too closely. suspected to be "write string to memory at location" |
B3 |
switch statement of unknown size, as described in other sections |
B4 |
switch statement of defined size, as described in other sections |
C1 BB BB BB BB AA AA |
jump to position A in the script if an unknown conditional is met |
F0 |
this can be used as a true no-op |
Here is a brief summary of the single-byte escape characters used in DooZy-J's script files, and their purpose, if known.
Byte | Example | Description |
---|---|---|
0x23 | ##, #Hnnn, #nn? | Examples like #72 appear to set text color. |
0x25 | %n | Unknown. |
0x2a | * | Unknown. |
0x2b | +n | Unknown. |
0x2d | -n | Unknown. |
0x3b | ; | Unknown. |
0x40 | @ | Unknown. |
0x43 | Cnnn... | C388 = last name, C389 = first name. Might be strings from memory. |
0x48 | Hnnn... | Unknown. |
0x5e | ^nn | Unknown. |
Here are some more details about the scripts found in the proof-of-concept toolkit.
Script | Description |
---|---|
all_scripts_unpack.py |
Runs script_unpack.py in bulk over a folder. |
repack_pac.py |
Rebuilds the PAC archives and their table of contents. |
script_graph_generator.py |
Scans the designated scripts for any loading of other scripts (using opcodes like 65). Used to gain understanding of how the game's scripts branch (as well as search for hidden debug scripts). |
script_repack.py |
Given a script and a CSV file of translated text, packs that text into the script. |
script_repack_create_new_file.py |
Same as script_repack.py, but repacks the text into a separate script file, so that the original script can maintain the same length. |
script_switch_finder.py |
Parses the designated scripts with an option enabled that allows the user to manually identify the length of each switch case, and output a config file with these manual overrides. |
script_tools.py |
Helper library that contains the logic for reading an opcode and its arguments from a given position in a game script. Used in pretty much every Python script. |
script_unpack.py |
Extract text from a script to a CSV file. |
text_size_checker.py |
Checks how much of a script file is text strings. Used to confirm how close some files would get to the 65535 byte limit. |
Link | Description |
---|---|
Tetracorp's Tokimeki Memorial page | Dives deep into the data of the famous sim, and was very helpful in teaching me basic tools and concepts. |
California Romance | A Zero4 Champ fansite (born in 1999 and last updated in 2009!) with fairly comprehensive coverage on DooZy-J. I referred to its strategy section heavily during my initial playthrough, as it covers both character routing and general tuning advice. |
プレイステーション裏技大辞典 | Lists a few cheat codes for DooZy-J. |
Dirty Cheater! JPPM | Contains some Action Replay style memory edits. I never particularly used these, but it's informative. |
@HilltopWorks | I consulted with Hilltop for some of the issues I had with my hacking. His work, including Racing Lagoon, is excellent. |
Change bit 0x0698
of script 0 from 64[C4]06
to 64[9A]06
and...
It's a debug menu! Perfect for reading any script in the game!
As far as I can tell, the value displayed here is the int value of the script ID to load (i.e. 5300
= script 14B4
).
Selecting "confirm" and then pressing Up or Down will open a series of debug menus which let you set money, partner, game flags, etc, before loading the script.
I think you may be able to press Left and Right or some other keys after hitting "confirm" to modify the TOC value used for the script, but it is not immediately clear to me. I tried to find a way to access this menu without hex editing, and did not find one yet from analyzing script ID 0.
Thanks for clicking! Kuhler Kupplung is a coke-bottled, Wendy's-pigtailed mechanic who is very hard to coax a full sentence out of, and I think she is very cute.
She most clearly occupies the "quiet girl" slot in a cast that tries to cover the sort of bases you'd expect from a 90s dating-type-game set in America, including:
But she gets the most points for having completely opaque glasses.
Here are some screenshots of Kuhler:
Alright, let's get you back to the Intro section now. Thanks for stopping by.
by invelica from 300g curry rice. last updated 2022/10/08.
stylesheet borrowed with permission from Sakurina.