Unfortunately, much of my recent work is currently under NDA. I can provide specific details on request.
A large medical device manufacturer needed to replace an aging fleet of clinician-operated tools used to program lifesaving implantable devices. The existing tool was no longer being manufactured, and the company wanted to move away from custom-made hardware towards more flexible, off-the-shelf devices like the iPad. The new programmer needed to work with all currently implanted devices, and be future-proofed to work with new devices with an unknown range of features and improvements.
I was brought onboard after the initial version was completed. My job was to develop, execute, and document the summative testing studies needed to prove to the FDA that the new programmer was a safe and effective replacement. The timeline for the project was highly accelerated, with six studies in 15 months.
Designing a summative study is challenging in both its rigor and its breadth. Features that have been defined as safety-critical need to be thoroughly tested in environments that are as close to real-world settings as possible. Poorly designed tasks can lead to bad data, an FDA rejection, and a development timeline set back by six months or more.
The first thing I needed to do with each study was understand what features I was using. Without knowing how a feature worked, what it was used for, and what could commonly go wrong with it, I wouldn’t be capable of writing an effective task that actually measured a user’s ability to safely and correctly use that feature. I reached out to existing users of a range of skill levels to get a baseline understanding of how it was being used in the field1 and used that information to begin developing a protocol.
Each task required a unique, carefully-described success criteria. In order to work with as many users as we needed, studies were often carried out by multiple moderators at a time. This meant that the protocols had to be clear and usable by someone without my level of knowledge, and critically, that the moderator could correctly assess whether or not a user had correctly performed the task. This meant identifying what the intended outcome of a task should be and describing what it meant to accomplish that task safely — if the user had the right result but did it in an unsafe way, that was a failure of the task.
After each study had concluded, the data needed to be analyzed and summarized into two forms: one in an official document submitted to the FDA, and one for internal usage about any usability or safety issues that needed to be fixed. In order to meet development timelines, I needed to analyze and summarize data from 30 to 45 participants in order to develop design recommendations. Sometimes, the turnaround on this process was as short as a week.
The results of our studies were submitted to the FDA in phases, ensuring that if there were any rejections, the entire product would not have to be re-tested. All parts of the submissions I worked on were approved without issues or requests from the FDA. In early 2019, the new tool was competely approved for usage, sixth months ahead of schedule. That same year, I worked on formative studies preparing for the next round of validation testing to take place in 2020.
As we would learn, there could be a large gap between how a feature was used in the field and how it was supposed to be used. ↩