In a bid to honor the fading art of the follow-up story, I have been rooting around to find out what has happened to the Tokyo Stock Exchange (TSE) since its systems outage earlier this month.
To remind you, the TSE had an unprecedented halt to trading for a full day on Oct. 1, but resumed trading on Friday, Oct. 2. It appears that hardware glitches and a series of mishaps forced the long stoppage, according to TSE officials. Officials at TSE and Fujitsu, a major IT supplier to the exchange, have acknowledged the IT problems and have apologized for them.
Officials have done more than apologize.
Last week, TSE’s owner Japan Exchange Group sent an incident report to Japan’s market regulator, the Financial Services Agency (FSA), according to the Reuters news agency and Japan Times. In addition, officials issued a report on the TSE website on Oct. 19 that provides more preliminary details.
What appears to be certain is that the glitch was not the result of a cyber-attack because the IT platforms involved did not have external connections, according to a report in The Wall Street Journal.
Initially, the blame is being put on magnetic-disk devices — used to warehouse trading information — that failed. In addition, a backup hardware system also malfunctioned, according to media reports and TSE statements.
For the moment, the TSE’s report, “Cash Equity Trading System Failure on Oct. 1,” is serving as the initial investigation.
For starters, the exchange has “a system requirement that operations should continue in the case of a NAS [ Network Attached Storage] failure by switching over to another device within 30 seconds,” according to the report.
“When we developed the current version of arrowhead [a hardware and software system developed by Fujitsu], we discussed with Fujitsu what NAS setting would be appropriate with reference to the Fujitsu product manual,” according to the report.
“Since the product manual said the automatic switchover would function regardless of the NAS setting, we decided on the NAS setting taking into account the past performance of arrowhead with the same setting. This decision was also confirmed by Fujitsu,” according to the report.
But an investigation after the Oct.1 failure “revealed that with NAS at said setting, the current product specifications are such that the automatic switchover would not function in the case of a memory module failure. The deficiency in the product manual prevented a proper understanding of the product specifications,” according to the report.
“Usually, Fujitsu conducts testing with default settings to check a product functions as described in the manual prior to shipment. This time, however, since the arrowhead settings were not the default settings, the production specifications were checked on paper, but no actual testing was conducted at the time of shipment. TSE did conduct NAS switchover testing, but this focused on checking business continuity after the switchover,” according to the report.
“Since our understanding was that the consistency between actual settings and those in the manual had been verified by Fujitsu during the shipment process, our testing involved creating a mock network failure to check whether the switchover functioned properly and that operations could continue normally,” according to the report. “The reason why it took some time to complete the manual switchover of NAS is because the failure response procedures were developed on the basis that the switchover would be conducted automatically.”
The report provides a few more key details such as the fact that while the network was shut down, transaction matching processes via Arrowhead continued.
“As a result, the number of necessary procedures and things to confirm in order to resume trading was very large,” according to the report. “We recognize that there was an issue in that, although we had multiple contingency plans to halt trading in the case of unexpected circumstances, we had not prepared a contingency plan to halt trading in the case of the NAS becoming unavailable.”
The TSE also had no agreement “with trading participants on rebooting arrowhead and had not carried out any tests” meaning that rebooting Arrowhead for trading “would be too risky to justify in our position as market operator. We recognize that there was an issue in our lack of rules on how to handle trade resumption after a trading halt in the event of a system failure,” according to the report.
Since that day, TSE officials have established “measures to enhance the reliability of arrowhead until now, under the slogan ‘Never Stop.’ Going forward, in order to build speedier and more appropriate recovery procedures, we will place the same level of importance on ‘Resilience’ (the ability to recover from a failure),” according to the report.
The saga is not over.
FSA officials will begin an investigation onsite soon and the results of that probe will determine the next steps of the regulator. The FSA could ultimately issue a business improvement order or related action.
The full report can be found here: https://bit.ly/31p4978