A naive approach of software sustainability
When developing a piece of software, I had to weigh concerns beyond technical considerations or its quality. Often it is about its economics, i.e. investing in its development in order to reduce some cost: maintenance, operation, support, runtime, ...
For example, you run Software-as-a-Service (SaaS) that requires one day of maintenance each month. You want to reduce that maintenance, why not get rid of it? Depending on the amount of work you need to accomplish one or the other. Let's say you need 4 days of work to cut it in half, but 15 days to get rid of it.
It seems reasonable to take the time to get rid of the maintenance, since it's completely paid off after 16 months. However, this means that you can't use those 15 days to develop another feature or fix an bug. Depending on the situation, it might be wiser to cut the maintenance effort in half, leaving room for other tasks.
Another example is migration. This is the kind of code that you're only going to run once. When it's done, you're usually not going to use it again. Sure, you want to make sure it does what it was designed to do. But should you spend a lot of time on other aspects, such as its maintainability? If it doubles the work, probably not.
It's the same for all manual tasks of any kind (e.g. development, support, operations). When should you spend time automating them? In terms of efficiency1, the answer is not always positive2.
Cost is certainly a dominant factor. With the new challenges like climate or resource constraints, I'd like to take them into account, if not prioritize them. How should I do that?
Naively, I'd approach it similarly: compare the greenhouse gas footprints (I'll call them carbon footprints in the following) or the amount of resources required. In practice, even a rough estimate of either is not as easy as finding its monetary cost. After trying to estimate the carbon footprint of a given software (e.g. mobile app, web app) myself, I stumbled over 2 intertwined obstacles: the software dependencies and the indirect carbon emissions3.
Most software depends on other software, either to be developed or to run. There is a deep and long chain of software dependencies. This means that if you want to account for indirect carbon emissions, you have to go through the dependencies. If the software is open source, you can find most of them. Ideally, if each dependency did the same thing, we could estimate the total indirect carbon emissions. In practice, this information is missing. If the software is proprietary or run as a service, you don't have access to the dependencies. It's up to the company building and/or running the software to disclose this information, if any.
So it's not easy to accurately estimate the carbon footprint of a piece of software, and I haven't talked about its life cycle yet. Since we don't have accurate accounting, we have to find another approach.
We can look at existing tools4. But they are imperfect5, consider only on some parts (e.g. client side), are dedicated to a certain category of software (e.g. web application), work with a certain technology, measure on a given scenario (e.g. one visit), or use a small sample6. Even when supported by scientific work7, it's important to remember that there is no consensus among scientists8 yet.
We can use proxies, such as the cost of the infrastructure and the third-party services used. However, the footprint of 1EUR is not the same in all cases: it depends on the region of the infrastructure9 or the margin made.
Stepping back, we may not need to go into the details. In fact, there are two sides to such an exercise. Because we are accustomed to approaching this question in economic terms, we tend to value both sides as an amount of money. We have gone so far as to put a price on health. At the end of the day, it's political, it's a societal choice. When we talk about sustainability, we do so because we face a global threat. This threat is shaking our society, including the current economic model. It's not about meeting a standard and getting a label, it's about standing up for something we believe is good. Of course, we may not always agree on what is good, but there are some obvious cases, such as basic needs.
This gives us one side of the balance. For the other side, we can use a simple proxy: less is better. In fact, the footprint is directly related to usage. It's reasonable to assume that software A that requires more (machines, energy, frequency, storage, bandwidth, complexity, people) than software B will have a higher footprint. Without changing the purpose, you may need to spend more to reduce some aspects (e.g. development effort to increase performance). It comes back to comparing the investment and usage footprints: the investment is interesting if its footprint is covered by the resulting improvement in total usage. As a consequence, the investment margin evolves with the usage frequency: the benefit of reducing 1g of GHG 1000 times is the same as the benefit of reducing 1kg of GHG once.
Let's conclude this already long essay. When approaching the software sustainability, here are the first questions I'd ask:
- what iss the purpose of the software?
- how often will it be used?
- how heavy is the software in terms of computing and network resources, hardware, and dependencies?
- how heavy is the software development in terms of design complexity, required expertise, people, computing and network resources, services and tools, hardware, and dependencies?
The qualitative answers will help to roughly assess the situation and possibly decide what to do: don't do it, investigate further, do it.
When considering automation, there may be other aspects to consider besides efficiency. For example, knowledge documentation.
Is it worth the time? by xkcd.
The indirect carbon emission is all the emission caused by the producers and the users of the product or service. In the GHG protocol, it is covered by the scope 3. In the standard ISO 14064, it is counted in the Part 1 as indirect emission.
For instance, the free ecoIndex or GreenFrame.
This is well explained by Marmelab in its article "Argos: Measure the Carbon Footprint of Software, Improve Developer Practices".
Whatever you measure, we usually do not know the nature of the system behind: it might be stochastic, unstable, varying. As such, we can't rely on one single data point or even 3. Beyond the context of the measurement (e.g. how, when, what), it's usually a good practice to have enough samples to draw out any conclusion. In particular when benchmarking the performance of a software, the result can be affected by many factors such as the other processes, or the machine health.
See for instance the methodology of CodeCarbon or Greenfarm.
"The large discrepancies between the various estimates [of the consumption of data centers] call for a comparison of the modelling approaches, extrapolations and measurement", p12, "Did The Shift Project really overestimate the carbon footprint of online video?", M. Efoui and J.N. Geist, The Shift Project, June 2020.
Knowing the region allows us to find the carbon footprint of the region's electricity production.
If you have any comment, question, or feedback, please share them with me.
Subscribe to the blog!