The Problem with Scientific Software
Updated: Mar 21, 2019
Would it surprise you that most scientific software consists of poorly written scraps of code held together with nothing more than the desperate smiles of scientists trying to meet their deliverable deadlines? In fact, when I think about scientific software, I can never shake the image of my MacBook Pro bursting into flames. As much as it pains me, I too am that desperate scientist praying through gritted teeth that my software will finally work, and the days of telling my advisor it will be done next week will come to a joyous, high-throughput end. Unfortunately for my imaginary computer, that day hasn’t arrived.
This begs the question: Why is scientific software so poorly written? Here are a few reasons I think we have a problem with obtaining well-written software in the scientific community.
Research scientists are not software engineers. This seems painfully obvious. Research scientists have advanced technical training, and many of us (like myself) have even taken coding classes. This hardly gives us the license to write shoddily written scraps of code and pass them off as software packages. The fact is that a lot of scientific software is written by research scientists, and that is not our job. The job of research scientists is to conduct research. In a field where the motto is “publish or perish,” software development, which generally does not yield publication in high impact journals, is seen as a means to an end rather than an end in and of itself. As a result, software development will always take second place in this two-runner race. That’s not necessarily a bad thing. However, someone, like a software engineer, needs to make the software take first place. But that reveals another problem:
Software engineers are not research scientists. One argument I hear a lot is that it’s easier to teach a research scientist how to code rather than teach a software engineer the scientific research. Personally, I think this is a tad arrogant on the side of the research scientist. However, I think it gets to the root of the problem. Communicating with people outside your field can be challenging. You see this problem in software development all the time. The client describes what they want. The software developers listen and build what they think the client describes. When the developers present the product to the client, the client is disappointed because it isn’t what they want, and the developer is frustrated because they don't speak client. Even when the client has advanced technical training and coding skills, they often run into communication snafus. Scientific software is no different.
Funding is difficult . Scientific software typically is built during large projects funded by software institutes. My PhD advisor was a Principal Investigator (PI) of a large software institute. These grants are typically hard to find, very difficult to get, and not necessarily desirable. Why wouldn't you want to be the PI of a big software institute? (Great question!) Because it takes away from the scientific research, and without research you don't get high impact publications, and without publications you perish. (I mean professionally, you don't actually die.)
Working with software engineers is expensive. Let's suppose you’re a PI who runs a big software institute that receives $1,000,000 each fiscal year. That sounds like a lot, but in the scheme of things, it isn't much. Let's say you have have a graduate student earning about $35,000 a year. With overhead and other costs, that students costs their advisor about $80,000 a year. Unfortunately, the graduate student doesn't get to see most of that money. To put it in context: on a large-scale software project, hiring a good senior software engineer with a competitive base salary would cost about $150,000. And that’s only for payroll. These software grants have to cover expenses like equipment, institutional overhead, health insurance, and sometimes even supplement the PI’s salary. At the end of the day, there just isn't enough money to hire a large software team to get the job done. In the end, it is much cheaper to have a computer-savvy graduate student do the job.
Scientific software development is struggling. Without addressing some of the larger issues regarding scientific funding, there isn't an easy fix. Here are some really basic (and potentially naive) steps the scientific community can take to improve it.
Write general frameworks.While scientists are not software engineers, that doesn’t mean that scientists shouldn’t be allowed to code at all. General frameworks—with elements that researchers can substitute or write scripts for—would allow scientists to meet their specific project needs without reinventing the wheel by building their own software on each project . Some softwares like this already exists at some large user facilities. However, even these face the issue of being too clunky or poorly maintained because they try to meet too many specific needs.
Promote open source software. Scientists like to guard their code because it makes their research unique. As a result, we often end up recreating the wheel because we don't have access to the existing code that performs the task we want to accomplish. Putting competition aside, this practice halts the progress of research across the industry. Fortunately, the solution here is simple: Sharing is caring!
Train hybrid research scientists/software engineers. One approach is to create a degree track that trains students in both software development and some research science field. This 'Jack/Jane of all Trades' specialist could fill a void that currently exists in the community. This approach might not be an easy fit in the current research employment landscape, but I would argue that these scientist/software engineers could otherwise find a place in Silicon Valley.
Fund scientific software. This is complicated because it’s hard to say who should fund the development of scientific software. I'm just going to leave this hanging here, because honestly, I don't have a clean solution to propose.
Scientists science and software need a more harmonious marriage. The current system of software development in research discourages the creation of workable software. With some simple improvements, research would progress faster, and less time would be wasted reinventing the wheel. Is this a sustainable way of doing science? I don’t know. All I know is that with a better system of software development, I’ll never dream of another burning MacBook.