Blog - Page 2 of 12 - Unintended Consequences

2021-06-222022-09-03

Disasters, Ugly and Cute

Recently, a story about species introduction and bad outcomes made the news. I had to write about this, but for a different reason than you might expect.

This story seems to have appeared in hundreds of news outlets over the past few days:

“Tasmanian devils devastate penguin population on Australian island“: “A project to preserve endangered Tasmanian devils on a small island has backfired after the predators killed seabirds in large numbers, a conservation group says.”
“Tasmanian Devils Devastate Penguin Population on Australian Island“: “Every time humans have deliberately or accidentally introduced mammals to oceanic islands, there’s always been the same outcome … a catastrophic impact on one or more bird species.”
“Tasmanian devils were given a safe, island home. Then the devils slaughtered 3K penguins.” “Although the loss of penguins is sad, it doesn’t come as a surprise to some.”

In 2012 a government initiative relocated 26 Tasmanian devils to Maria Island, a small island off the Tasmanian coast. The Tasmanian devil population has been in decline for years, due to a facial tumor cancer that spreads when they bite each other.

Unfortunately, Maria Island was also home to 3,000 little penguins — small, slow-walking birds that nest on land. (This species is actually called the “little penguin” as well as the “little blue penguin” and the “fairy penguin.”) With a new predator introduced, the penguins were all eaten. Continue reading “Disasters, Ugly and Cute”

2021-06-142021-06-14

Movements, Algorithms, Compliance, Tools

A recent paper titled Bad Machines Corrupt Good Morals caught my attention. In the paper, the authors demonstrate that AI agents can act as influencers and enablers of bad human behavior. This is something we’ve known for a while, but I appreciate the author’s organization of the methods.

Specifically, the authors called out four types of decisions that an AI might participate in along with a human. (I think that there is one more grouping as well.) Here’s what the authors focused on:

1. AI as an influencer in an advisory role. “Customers buy harmful products on the basis of recommender systems.”
2. AI as an influencer in a role model role. “Online traders imitate manipulative market strategies of trading systems.”
3. AI an an enabler in a partner role. “Students teaming up with NLG algorithms to create fake essays.”
4. AI an an enabler in a delegate role. “Outsourcing online pricing leading to algorithmic trading.”

Continue reading “Movements, Algorithms, Compliance, Tools”

2021-06-012021-06-01

Unintended UAPs

This post is about the UAP (Unidentified Aerial Phenomenon) “sightings” that have gained attention over the past few years and especially in the past few months. If you haven’t heard of this before or seen some of the (admittedly grainy) videos, it’s a tell of how a potentially big (and weird) story doesn’t get as much attention as it probably should.

Unlike previous UFO (Unidentified Flying Object) sightings from the past half century, the UAP situation is different. Rather than random individuals or conspiracy theorists, witnesses include fighter jet pilots. Interested parties include the military and senior government officials. The US government has established the Unidentified Aerial Phenomena Task Force. We went from oddballs claiming that they saw UFOs to people with a lot of credibility to lose claiming that they have either observed UAPs or believe that the question deserves serious attention.

No matter your opinion of what the reality is, there are few main outcomes we can expect the more we learn about this topic. These outcomes all seem to portend a change of some sort. Continue reading “Unintended UAPs”

2021-03-292023-04-18

100 Posts on Unintended Consequences

If you wonder why I’m interested in unintended consequences, just look at what enabled a single ship to block a major shipping lane for a week.

The list of unintended consequences is long. A fallen tree and software bug cut electric power to an entire region, a pain medication for dying cows kills vultures, a search for efficiency in grocery stores turns honest people into thieves. Politics and cotton production in one country impact major apparel producers, intentional species introductions go awry, smart people work to improve business client revenue outcomes at the expense of customer lives. Only some are famous, but all are fascinating in their own ways.

In spite of those handful of examples and a long list of others, I still often hear a kind of excuse. The excuse is that there are always one in a million outcomes, no one can predict them, that trying to account for everything makes progress too slow.

I don’t want to slow progress. I do want to learn about change. A few years ago I started to write essays on such things.

My writing ended up in media like Exponential View, The Browser, TechCrunch, law school journals, Marginal Revolution, Human Risk Blog, as well as being popular on Reddit and Hacker News. People reached out about my writing and I spoke on some podcasts. That plus reader comments and encouragement kept me going.

A couple weeks ago I finished my 100th original post. That seems like a milestone so I wrote this summary (list of the 100 articles at bottom).

Writing. With more connections in our world today, it’s important to learn about unintended consequences and systems.

How did I actually approach such a big topic? I researched and wrote most of my posts in inconvenient and unpredictable ways. Pre-dawn reading and note-taking. Forming an outline mentally while walking to a meeting (pre-COVID) or just walking around the neighborhood. Remembering, while in the shower, an example I read years ago.

I wrote most of my notes and outlines longhand. I never got the hang of any note-taking software. Friends spoke to me about Roam and other mind-mapping tools but I never found the discipline to use them.

A few years ago I sadly gave away 98% of my book collection. On at least 25 pre-COVID occasions I went to the library to borrow a book I used to own and flipped pages to find the example I remembered. On at least 25 other occasions I found other books, new to me, that provided material for the posts. I luckily made a library trip one of my last public outings pre-COVID so I still have a pile of books to read.

I also saw ideas in research papers, Tweets, offhand comments, followed links, and changed direction repeatedly.

The only need was that my interest remained.

Commitment. After a few months of writing posts I started sending them out in a weekly email. That artificial deadline focused my energy. As expected, the deadline also had tradeoffs. I started to care about whether people read the emails! A weekly timeline also meant that on a few occasions I sent out posts when I should have let them simmer a bit longer. Then again, the timeline helped me produce much more than I would have otherwise. I estimate that my 100 articles total at least 150,000 words.

I think such weekly schedules are good for writers who focus on breaking news and less so for those like me who try to connect dots within and across environments. Also, I have the day job (maybe two) and I came up against the limit of how much time I had to think, read, reflect, and write.

It took me a year to put my name on my writing. I don’t know if I am better for it.

On a few occasions I acted on my thinking and benefitted. I’m not claiming anything different than typical investors. I just may have come to the conclusions in different ways.

But I like writing. During the last summer I connected some other dots and as a break from unintended consequences wrote a short book on company growth patterns.

Differences. Casual conversations that overlapped with my writing turned out surprisingly for the other parties. Topics that I covered, like UBI, pandemics, self-driving cars, mosquito eradication, scale effects, university funding, disinformation and more just changed for me. I was surprised how few opinions are out there and how many opinions seem to be created somewhere other than the speakers’ minds. Storytellers capture dramatic amounts of our brainspace.

I saw otherwise smart people become unable to think if it meant going against their political side or a social norm. It was painful for me to write about that in posts, especially because I had to wade through poorly thought-through writing. I tried to avoid political topics since just figuring out what is really happening takes a lot of effort. And then I don’t feel better for it. We almost guarantee exposure to education but don’t also require that people think for themselves. There are lots of words exchanged, but I struggled to find meaning in them.

I created awkward silence by having different opinions than the norm. But I only had those opinions because I first thought through the situations and wrote about them.

I spent enough years reading, working, and living to see that the default case is that things don’t work out as planned. Yet people repeatedly assume that their plans will work out, that a new policy will fix the problems, that a new technology will produce the outcomes its inventors and investors claim. Why do these assumptions continue, especially in a more connected world that is at more risk of unintended consequences?

Here are my 100 articles, in chronological order, running from 2018 to 2021. I’m now working on the next phase of this project.

Voice AI, Telecom, Scams, and Co-evolution. The first post, which I quickly wrote following the Google Duplex demo and reflections on a friend’s voice AI startup which did the same thing. If this hadn’t been so easy to write I probably never would have started this project. I wish I chose a simpler domain name though.
The Cobra Effect Redesigned (with an Antidote). A primer on perverse outcomes, using the classic Cobra Effect examples (all animal-related), and how to redesign them. (Went viral)
Diet, Dying, and Demographics. Natural advantages and disadvantage that lie unnoticed until population life expectancy increases.
Smoking Bans. Smoking’s Back. How timing and technology opened the door for vaping and e-cigarettes.
The University Fundraising Arms Race. Why universities are in a strategic financing bind that will likely result in more scandals.
Egypt’s Aswan Damn. Written after seeing nonsense from smart people on Twitter. (Went viral.)
What Are Unexpected Benefits? How these work. And do they exist?
One Child Policy. Trying to reverse a 30+ year policy in China. (Seems to be used on a university syllabus.)
Visibility of Cost. What depends on who pays and when they pay.
Anything At Scale. Anything that suddenly operates at scale will come with second-order effects. (One of my favorites.)
Uncertainty Saves Lives? Driving, primarily.
Substitutions – The Temperance Movement and Ether. Why did drinking ether become popular in the 1800s?
Don’t Touch Anything. Disrupting ecosystems at scale. (Went viral.)
Importing Risk and Risky Regulations. About the mostly forgotten acclimatization societies that imported plants and animals around the world.
The Self-Defeating Prophecy. A reason that crises of today don’t come to be. (Went viral.)
The Kudzu Effect. What is highly noticeable gets more blame. Bigger problems are unseen.
Food from Thought. How theory and policy change the food supply. (I thought that this was my best title but it’s one of my least-read posts.)
Categories of Unintended Consequences. I gave my first talk on unintended consequences that same week and this post was good prep.
Under a Spell – The Armistice at 100. A post reflecting on WWI and the armistice that stopped it, which also opened the door for WWII. (Went viral.)
Food Follows Function. More on who and what determines the choices of the foods you eat. (This was my least favorite title.)
Destructive Collection. The ways in which things are destroyed. And what I think this means. (One of my stranger posts.)
Origins of Error. Where errors come from, what kinds of errors there are, what we can we do to minimize exposure to error.
A City Too Familiar – the Spread of Disease. How diseases impact us with unintended consequences. (Written pre-COVID.)
Acquiring Ignorance. What ignorance is, why we have it, and how it impacts us in unexpected ways.
Eradication’s Good Intentions. How the idea of eliminating specific diseases or animals often leads to problems. Focused on the problematic idea of mosquito eradication. Also a related post I wrote for TechCrunch: What would it mean to eradicate the mosquito?
Vultures and Ventures – Structure of Growth and Decline. Why are vultures dying in large numbers around the world? Why do well-funded, fast-growing startups die?
Victims of Fashion. The fashions that happen to be popular during our lifetimes impact our health.
Systems for Spreading. How do things spread? How fast can something spread? To how many people can one “infected” person transmit a condition? (Written pre-COVID.)
The Long Reach of Short-Term Interests. Unintended consequences in systems from the pursuit of immediate interests, or short-term thinking.
Uyghurs in Xinjiang – Onward to the Inevitable. How things changed in China’s Xinjiang province since my time there in 2001. And whether the situation is reversible. There’s more knowledge of this part of the world recently. (Went viral.)
The Difficulties of Elimitigation. To successfully eliminate something you must replace it with something new.
Autonomous Vehicles and Scaling Risk. Why AVs lead to immense systemic risk. (Cited on AV industry mailing lists. People reacted strongly to this one.)
College Admissions Scandal. On origins of the 2019 college admissions scandal.
Against the Natural Order of Things. Where our beliefs for what should and should not be come from. Inspired by Douglas Adams.
What is Emergence? Introducing the wild principle of emergence — the sum is different from the parts.
Who Plays the Stradivarius in Interstellar Space? A post about the loss of skills — even ones that are marks of great beauty and mastery — due to a change in environment. (Went viral years after I wrote it.)
Universal Basic Income (Part 1). Introducing the concept of UBI, risks, and different ways the program might work. (Andrew Yang supporters hated this one and told me so.)
Universal Basic Income (Part 2). Introducing some potential second-order effects of UBI.
The Clip. How clips of a story propagate and come to define the whole story. From Cardinal Richelieu to “the smirk.”
The Cobra Effect (Part 2). A continuation on perverse outcomes and how to redesign them.
The Emergence of Omniscience (Part 1 – Images). We are living through the transition from personal, difficult to share memory to public, easy to share memory.
A Very Different Funeral (for Deng Xiaoping). Reflections from being in Beijing during Deng’s funeral in 1997. Very different from the other well-known funeral that had come before his, which sparked the 1989 student protests.
Why Are There So Many Protests in Hong Kong? About the ongoing Hong Kong protests and the almost 200 year history behind them that few outsiders know.
Prester John and the Long History of Disinformation. A history of disinformation and how it will change in the future.
The Opioid Crisis (and addiction-based business models). We build online businesses based on high user retention. What can happen when we build offline addiction-based businesses?
Do We Create Shoplifters? Do automated checkout machines increase the likelihood people steal? (Included in The Browser newsletter.)
Selecting the Scalable Snapshot. The creation of images and their effect on idea transmission.
Basic Values. One of Robert Merton’s causes of unintended consequences and I think increasingly important today.
Incentives. How safety, pride and glory, and financial incentives lead to unintended consequences. From finance to the Dead Sea Scrolls.
The 70th anniversary of the PRC. On how history could have gone another way and on the importance of a painting that commemorates the founding the People’s Republic of China.
More on Mosquitoes (New Data). On unexpected outcomes of an experiment with genetically modified mosquitoes.
The Owl’s Right Eye. On protest symbols. (Went viral.)
Twitter Bans Political Ads. Potential outcomes from Twitter banning political ads as the presidential race heats up.
Should We Reevaluate the Precautionary Principle? Looking at outcomes of moving 150,000 people after the Fukushima earthquake.
Autonomous Vehicles and Organ Donations. Why the assumptions that AVs will lead to a shortage in organ donations are incorrect.
Problems or Puzzles – Why people end up putting their brainpower and energy toward trivial (or harmful) tasks.
Coronavirus Consequences – the first article I wrote about COVID (Feb 3, 2020) after following the story for a while. In the article I wondered how long the ability to complain about the government would last in China. The video of the masked guy in Wuhan complaining in Youtube has since been removed.
Coronavirus Consequences (Part 2) – How the coronavirus pandemic had manmade roots.
Disinformation and Disease (Coronavirus Edition) – Including examples from HIV, Ebola, measles, the Tuskegee syphilis study, and more.
Changing Minds on Coronavirus – Exponential experts who don’t understand exponential growth, behavioral scientists who don’t understand probability, historical wealth transfer, and localized economies.
Coronavirus Cocktail, Catalyst, and Closeout – Describing species mixing environments beyond COVID, including Ebola, the China wildlife protection law, and people-to-people mixing. It was only at this point that where I live entered a quasi-lockdown (March 16, 2020).
Changes in Value (Part 1) – Spanish colonial silver and Chinese inflation, plague and the Tulip bubble. Also drugs.
Illegal Drugs and Coronavirus – The impact on lower tourism on illegal rhino horn and cocaine.
Confused Commerce (Coronavirus Edition) – From rental cars in Hawaii to the dairy industry, COVID disturbed stable systems.
On Campuses Reopening – Seems quaint now, but at the time I wrote this college campuses were arguing that they needed to reopen for the fall 2020 semester. I took that to be a negotiation tactic for more federal support.
A Religion of Isolation – Looking at examples of successful isolation during the 1918 flu pandemic.
Pandemic Protests – How does the pandemic impact the ability of people to protest? Written a couple weeks before George Floyd’s death and subsequent protests.
Modeling Epidemics (Parrot Fever, 1918 Flu, Plague) – Understanding how different diseases spread.
Asylum from a Pack of Wolves – About a friend of mine who was granted asylum in the US. How social media leads to extreme outcomes.
Changes in Value (Part 2) – In education, art, spices, chicken paws, and conformity.
Fear, Fury, and Forgetting – Radon, Nuclear Winter, and George Floyd.
Inevitable Surveillance? – Does the inevitability of certain tech developments lead to predictable outcomes?
Loop In, Loop Out – Creation and breakdown of different ways to gain knowledge.
Crumpled Butterfly (When Is Something Too Fast?) – From Zorba the Greek to the Ford Pinto, you can’t rush some things.
Blank Paper – When protests become illegal, how does human creativity respond?
Bezmenov’s Steps (Ideological Subversion) – I had never heard of Bezmenov and his four step national subversion process until right before writing this. In my research I found something different than the others who wrote about him. (Went viral.)
Scaling a Scam (The Twitter Hack) – On the Twitter Bitcoin hack that affected popular accounts.
A Second Step – What keeps us from thinking of the system when we make decisions?
Garmin Hack and Dependence – How the Garmin hack should remind us of our dependence on tech.
The Tiktok Ban (and the Openness Trap) – The modern history of foreign business in China, reciprocity, and national security. Through the lens of Merton’s five causes of unintended consequences. (Went viral.)
A Question of Timing (Erasing History) – How do cultural changes work backward in time to affect history?
That Hair Trigger – Are we primed for reactions? From reactions to Trump to racism.
Amazon’s Unintended Consequences – The actions of large companies have their own unintended consequences. Here’s a long list for Amazon.
Choosing Chaos – A bit pessimistic about the election and related behavior. I kept this for subscribers only.
Reversible or Irreversible? (Voting) – What makes a decision reversible or irreversible? How to think through systems around the election.
Is the World Getting Safer? – How we use metrics that don’t matter and come up with unhelpful beliefs about violence. (Went viral.)
One-Way System Roads – Some decisions lead down hard-to-reverse paths. Choose wisely.
Decision Making (Startups and Patients) – The Lake Wobegone effect with cancer patients and startups.
What is Unity? – Contemplating post-election calls for unity.
Morals of the Moment – How good intentions can lead to bad outcomes.
CEOs, Students, and Algorithms – Remote testing and corporate reporting in an age of algorithms.
Information Control (Four Types) – Destroying, banning, blocking, and debauching our way along.
The Thunderbolt on Its Trial. How do you break the spell of the previous four years? I thought I wrote this one beautifully (for me) but few read it.
Proposition 22 Paradox. Why California’s Prop 22 on gig economy worker treatment ended up differently than voters expected.
Self-Driving Safety and Systems. Another post on the assumptions behind self-driving cars and why they will lead to more systems risk.
Three Wagers. Space junk, natural resources, and God.
Responsibility Clawbacks (McKinsey and Purdue Pharma). How the opioid crisis was created and why.
Onward, Robot Soldiers? On autonomous weapons and why they are the future.
A New Morality of Attainment (Goodhart’s Law). How a metric, once chose, starts to fail.
The Shape of Faces to Come (Facial Recognition and Political Orientation). Does it matter if we can tell political orientation from faces? And always read the primary sources!

Yes, I’m taking a break from this speed of writing. In the next phase I’d also like this project to be more sustainable, to include more talks, and perhaps a conference of sorts if that could be done in an engaging way. Beyond that, I’m still considering ideas.

Let me know if you’d like to talk.

Stay well and keep thinking.

2021-03-092021-08-07

The Shape of Faces to Come (Facial Recognition and Political Orientation)

Image recognition is a set of technologies where we’ve seen great progress recently. Some applications help us gain advantages of efficiency, for example identifying items that may be debris, for removal from an agricultural field. There’s also accuracy, for example identifying tumors in cancer screens at a better rate than human experts. And some applications are for convenience, for example enabling users to unlock their devices with their faces rather than passwords.

These applications can lead to good outcomes just as they can also have unintended consequences.

Related to that, at the start of the recent wave of protests in Hong Kong, a journalist opened an article with a beautiful summary of how important facial recognition had become.

“The police officers wrestled with Colin Cheung in an unmarked car. They needed his face.

“They grabbed his jaw to force his head in front of his iPhone. They slapped his face. They shouted, ‘Wake up!’ They pried open his eyes. It all failed: Mr. Cheung had disabled his phone’s facial-recognition login with a quick button mash as soon as they grabbed him.”

It seems legitimate that we fear misuse of facial recognition. It’s a question of suddenly being able to do something at a scale that would be difficult or costly earlier.

But what about subtler abuses?

That brings me to a new report, titled Facial recognition technology can expose political orientation from naturalistic facial images.

From the report: Continue reading “The Shape of Faces to Come (Facial Recognition and Political Orientation)”

2021-03-022021-03-01

A New Morality of Attainment (Goodhart’s Law)

Peter Drucker said “if you can’t measure it you can’t improve it,” but he didn’t mention the second-order effects of that statement. What changes after people get used to the measurements? What if we measure things that are only partly relevant to what we’re trying to improve?

Tracking metrics can tell us something new, but can also create problems. Let’s look at how Goodhart’s Law leads to unintended consequences.

A New Morality of Attainment

Goodhart’s original quote, about monetary policy in the UK (of all things) was:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

But Goodhart’s original is often reinterpreted so that we can talk about more than economics. Anthropologist Marilyn Strathern, in “Improving Ratings: Audit in the British University System,” summarized Goodhart’s Law as:

“When a measure becomes a target, it ceases to be a good measure.”

This is the version of the law that most people use today.

Other commonly used variations include the Lucas Critique (from Robert Lucas’s work on macroeconomic policy):

“Given that the structure of an econometric model consists of optimal decision rules of economic agents, and that optimal decision rules vary systematically with changes in the structure of series relevant to the decision maker, it follows that any change in policy will systematically alter the structure of econometric models.”

And also Campbell’s Law:

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

I’m going to stick with Strathern’s description because its simpler and more people seem to know it.

Strathern wrote about the emergence, in Cambridge in the mid 1700s, of written and oral exams as a way to rate university students.

Those students’ results were supposed to show how well the students learned their material, but also showed how well the faculty and university were doing. That is, how well the students had learned the subject matter, how well the professors taught, and the quality of the universities.

But to determine how well students, professors, and universities had performed, the exams couldn’t be graded in traditional, qualitative ways. They needed a way to rank the students.

“This culminated in 1792 in a proposal that all answers be marked numerically, so that… the best candidate will be declared Number One… The idea of an examination as the formal testing of human activity joined with quantification (that is, a numerical summary of attainment) and with writing, which meant that results were permanently available for inspection. With measurement came a new morality of attainment. If human performance could be measured, then targets could be set and aimed for.”

Strathern also described the difficulties of the rankings.

“When a measure becomes a target, it ceases to be a good measure. The more examination performance becomes an expectation, the poorer it becomes as a discriminator of individual performances. [T]argets that seem measurable become enticing tools for improvement…. This was articulated in Britain for the first time around 1800 as ‘the awful idea of accountability’….”

Again from Strathern’s paper:

“Education finds itself drawn into the rather bloated phenomenon I am calling the audit culture… The enhanced auditing of performance returns not to the process of examining students, then, but to other parts of the system altogether. What now are to be subject to ‘examination’ are the institutions themselves—to put it briefly, not the candidates’ performance but the provision that is made for getting the candidates to that point. Institutions are rendered accountable for the quality of their provision.

“This applies with particular directness in Teaching Quality Assessment (TQA), which scrutinizes the effectiveness of teaching—that is, the procedures the institution has in place for teaching and examining, assessed on a department by department basis within the university’s overall provision…. TQA focuses on the means by which students are taught and thus on the outcome of teaching in terms of its organization and practice, rather than the outcome in terms of students’ knowledge. The Research Assessment Exercise (RAE), on the other hand… specifically rates research outcome as a scholarly product. Yet here, too, means are also acknowledged. Good research is supposed to come out of a good ‘research culture’. If that sounds a bit like candidates getting marks for bringing their pencils into the exam, or being penalized for the examination room being stuffy, it is a reminder that, at the end of the day, it is the institution as such that is under scrutiny. Quality of research is conflated with quality of research department (or centre). 1792 all over again!”

Measuring the quality of teaching (or quality of research) rather than what students learn (or research results) is an odd outcome. But it makes sense. Adherence to a process seems related to the goals, so the process becomes what’s measured.

John Gall, a popular systems writer, also outlined something like this in his book Systemantics. Here he describes how a university researcher gets pulled into metrics-driven work.

“[The department head] fires off a memo to the staff… requiring them to submit to him, in triplicate, by Monday next, statements of their Goals and Objectives….

“Trillium [the scientist who has to respond] goes into a depression just thinking about it.

“Furthermore, he cannot afford to state his true goals [he just likes studying plants]. He must at all costs avoid giving the impression of an ineffective putterer or a dilettante…. His goals must be well-defined, crisply stated, and must appear to lead somewhere important. They must imply activity in areas that tend to throw reflected glory on the Department.”

Universities may have started our the focus with metrics, but today we see metrics used all the time. Here are some examples of metrics used to achieve goals and the problems they created.

Some Inappropriate Metrics

Robert McNamara and the Vietnam War. Vietnam War-era Secretary of Defense Robert McNamara was one of the “Whiz Kids” in Statistical Control, a management science operation in the military. McNamara went on to work for Ford Motor Company and became its president, only to resign shortly afterward when Kennedy asked him to become Secretary of Defense.

In his new role, McNamara brought a statistician’s mind to the Vietnam War, with disastrous results. People on his team presented skewed data, put them in models that told a desired story, and couldn’t assess more qualitative issues like willingness of the Viet Cong and the US to fight. The focus on numbers even when they don’t tell the full story is called the McNamara Fallacy.

The documentary The Fog of War presents 11 lessons about what McNamara learned as a senior government official. The numbered list of lessons:

1) Empathize with your enemy; 2) Rationality alone will not save us; 3) There’s something beyond one’s self; 4) Maximize efficiency; 5) Proportionality should be a guideline in war; 6) Get the data; 7) Belief and seeing are both often wrong; 8) Be prepared to reexamine your reasoning; 9) In order to do good, you may have to engage in evil; 10) Never say never; 11) You can’t change human nature.

Many of those 11 lessons deal with issues of McNamara’s Fallacy, including at least numbers 1, 2, 4, 6, 7, and 8.

Easy, rather than meaningful measurements. If we need to measure something as a step toward our goals, we might choose what is easier to measure instead of what might be more helpful. Examples: a startup ecosystem tracking startup funding rounds (often publicly shared) rather than startup success (takes years and results are often private). The unemployment rate, which tracks how many people looking for work who have not found it, rather than how many have given up without working and how many have taken poorly paid jobs.

Vanity metrics. In startups we often talk about “vanity metrics” as being ones that look good but aren’t helpful. A vanity metric would be growth in users (rather than accompanying metrics around revenue or retention) or website visits when those visits may come from expensive ad buys.

For example, when Groupon was getting ready to IPO, they quickly hired thousands of sales people in China. The purpose was to increase their valuation at IPO, not to bring in more revenue in a new market. When potential investors saw that Groupon had a large China team, they thought the company would succeed.

Vanity metrics can occur anywhere. When some police departments started to track crime statistics, they also started to underreport certain crimes since the police were judged based on how many crimes occurred.

Surrogate metrics in health care. There’s a trade-off when measuring efficacy of a medicine. How long do we wait to prove results? If there are proxies for knowing whether a patient seems to be on the path to recovery, when do we choose the proxy rather than the actual outcome? As described in Time to Review the Role of Surrogate End Points in Health Policy:

“The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA), have a long tradition of licensing technologies solely on the basis of evidence of their effects on biomarkers or intermediate end points that act as so-called surrogate end points. The role of surrogates is becoming increasingly important in the context of programs initiated by the FDA and the EMA to offer accelerated approval to promising new medicines. The key rationale for the use of a surrogate end point is to predict the benefits of treatment in the absence of data on patient-relevant final outcomes. Evidence from surrogate end points may not only expedite the regulatory approval of new health technologies but also inform coverage and reimbursement decisions.”

Nudging. Nudges are little encouragements used by governments and businesses to change what individuals choose to do. Sometimes the nudge comes in the form of information about what other people do. For example, in the UK, tax compliance increased when people received a letter stating that “9 out of 10 people in your area are up to date with tax payments.”

But what if the goals of the nudges don’t consider other outcomes?

In “The Power of Suggestion: Inertia in 401(k) Participation and Savings Behavior” the authors show how changing default choices in employee retirement decisions resulted in more people choosing to save, but also resulted in more people continuing with default conservative money market investment choices.

The nudge increased the goal of higher employee savings compliance but also created a situation where more employees gave up higher returns they could have had from long-term equity investing.

Artistic and sports performances. Judged artistic competitions are scored in different ways and judging criteria sometimes change. Tim Ferriss realized that tango dance competitions ranked turns highly and so he did lots of turns to win, as a relative novice to other competitors. Something similar happened when Olympic skating changed to value the technical difficulty in each component of the skaters’ performance. As a result, checking off technical moves and less of the subjective artistic moves can leave performances less beautiful to watch.

Alpha Chimp. Metrics are a modern invention, but there are versions of them in other societies. In Jane Goodall’s book In the Shadow of Man, we learn of a low-ranking chimpanzee “Mike” who suddenly became alpha male. The top-ranked male was often the toughest chimpanzee in the group, a position backed up by size, intimidation, and what the rest of the group accepted. Mike was at the bottom of the adult male hierarchy (attacked by almost all the other males, last access to bananas). But Mike realized that he could use some empty oil cans that Goodall had left at her campsite in a new way. He ran through the group of chimpanzees banging the cans together. The other chimpanzees had never heard noise like that before and scattered. Maybe Mike proved that the top-ranked position, which should be based on who best leads the group, was actually based on who was scariest.

Four and More Types of Goodhart

One of the best papers digging into variations of the metric and goal problem is Categorizing Variants of Goodhart’s Law, by David Manheim and Scott Garrabrant. Their paper outlines four types of Goodhart’s Law and why they happen.

Regressional. When selecting a metric also selects a lot of noise. Example: choosing to do whatever the winners of “person of the year” or “best company” awards did. You might not see that the person was chosen to send a political message or that the company was manipulating numbers and will fall next year.

Extremal. This comes from out-of-sample projections. When our initial information is within a specific boundary we may still want to project what could happen out of the boundary. In those extreme cases, the relationship between the metric and the goal may break down.

Casual. Where the regulator (the intermediary between the proxy metric and the goal) causes the problem. For example, when pain became a 5th vital sign doctors were measured by their ability to make their patients more comfortable. If doctors start to prescribe pain medication too often or too easily they may increase addiction.

Adversarial. Where agents have different goals from the regulator and find a loophole that harms the goal. For example, colonial powers wanting to decrease the number of cobras in India or rats in Vietnam and paying a bounty for dead cobras or rat tails. People discovered that they could raise their own cobras to kill or cut off rat tails and release the rats. This is known as the Cobra Effect.

Beyond Manheim and Garrabrant’s four examples, others find that the law itself has other different forms.

Right vs Wrong. Noah Smith splits Goodhart’s Law into wrong and right versions:

“The ‘law’ actually comes in several forms, one of which seems clearly wrong, one of which seems clearly right. Here’s the wrong one:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

“That’s obviously false. An easy counterexample is the negative correlation between hand-washing and communicable disease. Before the government made laws to encourage hand-washing by food preparation workers (and to teach hand-washing in schools), there was a clear negative correlation between frequency of hand-washing in a population and the incidence of communicable disease in that population. Now the government has placed pressure on that regularity for control purposes, and the correlation still holds…”

And here’s Smith’s version of Goodhart’s Law that seems true to him:

“As soon as the government attempts to regulate any particular set of financial assets, these become unreliable as indicators of economic trends.

“This seems obviously true if you define ‘economic trends’ to mean economic factors other than the government’s actions. In fact, you don’t even need any kind of forward-looking expectations for this to be true; all you need is for the policy to be effective.”

My Summary. I just summarize Goodhart’s Law as coming from two places. One is the behavior change that occurs when people start trying to achieve a metric rather than a goal. The other origin are problems with the metrics being proxies for goals.

Post Goodhart

There are many cases of metrics being poor proxies for a goal. There are also many cases of people changing their behavior to meet metric targets rather than goals.

More awareness of Goodhart’s Law should hopefully lead to less cases of it, though maybe I’m too optimistic.

We also have many examples of the way we work, live, and play that are without measurement. We often measure haphazardly but survive or measure nothing and also survive.

Or are we actually making some subconscious measurements that we don’t recognize? If so, could those subconscious measurements be beyond Goodhart style effects?

Tracking metrics can fool us into seeing relationships that don’t exist.

Here are some ways we can try to avoid Goodhart’s Law:

Be careful with situations where our morals skew the metrics we track. In Morals of the Moment I wrote about bad metric choices led to bad outcomes in forest fires, college enrollment, and biased hiring.
Check for signs of vanity metrics (metrics that can only improve or which have weak ties to outcomes).
Check whether we’re too process-focused (audit culture) rather than outcomes-focused.
Regularly update metrics (and goals) as we see behavior change or find better ways to track progress toward a goal.
Keep some metrics secret to avoid “self-defeating prophecies.”
Use “counter metrics,” a concept from Julie Zhou. “For each success metric, come up with a good counter metric that would convince you that you’re not simply plugging one hole with another. (For example, a common counter metric for measuring an increase in production is also measuring the quality of each thing produced.)”
Allow qualitative judgements that fall beyond numbers. This creates unintended consequences of its own, but can be a good check on our efforts. This is the difference between what Daniel Kahneman calls system one (quick intuition) and system two (slow, analytical, rational thinking). Is there room to act when a situation doesn’t feel right?
Be more mindful where our choice of metric can impact many people (metrics that change outcomes for large groups should be studied carefully).

That we are even have Goodhart’s Law is a symptom of a more complex and connected society. In a more localized world there wouldn’t be as much of a problem with tracking metrics to achieve goals. Either because our impact would be local, or we just wouldn’t track things at all.

We also might not have an incentive to learn from Goodhart’s Law. Why try to change the way we set metrics if we are not the ones penalized? Why care about skewed outcomes if our timescale of measurement is short and Goodhart outcomes take a longer time?

2021-02-232021-08-07

Onward, Robot Soldiers?

I’ve written multiple times about basic values, technology trends, and how they can be causes of unintended consequences.

Today I’m exploring the topic of autonomous weapons, reasons behind their development, and potential outcomes. This is a big topic that I will certainly return to multiple times.

Autonomous weapons are characterized by understanding battlefield goals and finding ways to achieve these goals without human action. Such weapons are currently being researched, developed, and tested as intelligent wingmen for fighter pilots, as support vehicles carrying supplies and fuel, and as offensive weapons. Continue reading “Onward, Robot Soldiers?”

2021-02-162021-08-07

Responsibility Clawbacks (McKinsey and Purdue Pharma)

In recent weeks consulting firm McKinsey has been back in the news because of the advice it gave its client Purdue Pharma, makers of OxyContin. The advice blatantly looks like increasing drug sales at the expense of patient health and a worsening opioid epidemic. As a result, McKinsey has been fined $573 million.

But even if the Purdue Pharma-related fine is extreme, the example is just one example of McKinsey’s many bad client outcomes. A short list of other bad outcomes or questionable clients include:

Advising badly-run government coronavirus responses.
Advising financial firms to increase their debt load in the lead up to the 2008 financial crisis.
Advising Enron in the lead-up to its financial scandal.
Advising Riker’s Island jail on ways to improve safety with the outcome a more dangerous situation.
Advising authoritarian governments including Saudi Arabia, Russia, and China.

Continue reading “Responsibility Clawbacks (McKinsey and Purdue Pharma)”

2021-02-092021-02-23

Three Wagers

It’s easy to write or talk about an issue without staking anything on its outcome. After all, that’s what casual forecasters do all the time. But having something in play — reputation or money or something else — can make sure that people remember their claims. This can keep us honest.

There are obviously a whole set of games that typically include betting. And there are many famous bets, from Ashley Revell gambling his life savings on one spin of the roulette wheel, John Gutfreund and John Meriwether’s proposed but then aborted $10 million bet on a single hand of liar’s poker, and even the dice bet over the sailors’ lives in Rime of the Ancient Mariner.

The following three examples are a bit different. The people making these wagers are trying to drive research, prove their model of the world is correct, or use logic to guide decision-making. It’s something we might consider when we make our own wagers.

‘Oumuamua’s Wager

Purpose: drive research in a specific area. “It’s good for us.”

In 2017 a strangely shaped object moved through the solar system. It was named, ‘Oumuamua, a Hawaiian word for “scout.”

‘Oumuamua was the first observed object passing through the solar system from elsewhere. Its strange shape (possibly like a giant cigar or pancake) led to speculation that it was alien in origin, including that ‘Oumuamua may be an alien-made solar sail or space junk.

These claims came from Avi Loeb, a professor of astronomy at Harvard who calls his speculation “‘Oumuamua’s Wager.”

But most astronomers are dismissive of the alien technology theory. Related to that, Loeb outlines the difficulty of setting a new direction for research. “And in terms of risk, in science, we are supposed to put everything on the table. We cannot just avoid certain ideas because we worry about the consequences of discussing them, because there is great risk in that, too. That would be similar to telling Galileo not to speak about Earth moving around the sun and to avoid looking in his telescope because it was dangerous to the philosophy of the day…. In the context of ‘Oumuamua, I say the available evidence suggests this particular object is artificial, and the way to test this is to find more [examples] of the same and examine them. It’s as simple as that.”

Loeb claims that believing ‘Oumuamua is alien in origin would be a net good because the invigorated search for alien life or technology would drive many other parts of scientific inquiry. In doing so, we would learn much more about the universe than otherwise.

Simon-Ehrlich Wager

Purpose: back up one’s theories. “Let’s prove who’s right.”

The Simon-Ehrlich Wager grew out of the doomsday writing of Paul Ehrlich, author of The Population Bomb, a 1968 book that forecast overpopulation would lead to global famines and resource shortages in the 1970s and 1980s.

Taking the other side of Ehrlich’s claims was Julian Simon, a professor of business. Simon believed that the world would not face the extreme shortages that Ehrlich forecast. The question was, how to gauge whether there was a change in resource availability?

As a solution, Simon proposed that Ehrlich chose any raw materials he wanted and a future date. Simon would win the wager if the prices of those items had decreased by that time. Ehrlich chose copper, chromium, nickel, tin, and tungsten and a date 10 years in the future (September 29, 1990).

The wager’s payoff was to be the difference between the $1000 of materials and the future prices. Ehrlich lost and mailed a check to Simon for $576.07.

Personally, I like that it was actually a business professor that won this one.

But notably, Ehrlich did not include anything other than the check and seemed to be bitter about the loss.

Since Simon was willing to wager again, Ehrlich (and climatologist Stephen Schneider) proposed a new wager — a set of 15 trends, including average temperature, emissions, oceanic harvests, availability of firewood in developing nations, and more.

But Simon passed on the new proposed wagers. As he explained:

“Let me characterize their offer as follows. I predict, and this is for real, that the average performances in the next Olympics will be better than those in the last Olympics. On average, the performances have gotten better, Olympics to Olympics, for a variety of reasons. What Ehrlich and others says is that they don’t want to bet on athletic performances, they want to bet on the conditions of the track, or the weather, or the officials, or any other such indirect measure.”

Pascal’s Wager

Purpose: attribute risk to outcomes and choose the best path. “It benefits me.”

Saying this is about personal benefit may be a bit odd since the wager weighs outcomes given the existence or absence of God.

There are only four outcomes of this wager:

God exists, people believe in God and receive infinite reward,
God exists, people do not believe in God and miss the infinite reward (or receive infinite punishment),
God does not exist, people believe in God and mildly inconvenience themselves,
God does not exist, people do not believe in God and have a finite amount of personal benefit.

Given that the outcomes include infinite reward or punishment and only minor costs, one would therefore be logical believing in God.

Pascal’s Wager has been compared to the Precautionary Principle:

“When an activity raises threats of harm to human health or the environment, precautionary measures should be taken even if some cause and effect relationships are not fully established scientifically.”

This principle states that we should work to avoid very bad outcomes even when their chance of occurring is tiny and cause and effect is not known.

Useful Wagers

What makes for a useful wager? There are a few elements.

The wager’s outcomes can’t be easily gamed. That is, those taking sides in the Simon-Ehrlich commodities wager can’t drive prices up or down.
Who wins is not debatable. This makes wagering on some issues problematic. A wager that is based on things becoming “better,” without a clear definition of how better is measured aren’t useful.
Those making the wagers must ride them to the end. One can’t make a wager and then remove oneself from it. Otherwise, people could pick and choose which wagers they commit to.
The wager helps us learn something new. We come away with a different understanding of the world after noting the wager’s results. Or, we create new knowledge needed to figure out who won the wager.

So go make wagers when it helps you train your view of the world.

Consider

Even Loeb admits that he might not have promoted the alien idea if he didn’t have tenure plus other academic positions. But as he admits, “what’s the worst thing that can happen to me? I’ll be relieved of my administrative duties? This will bring the benefit that I’ll have more time for science.”
Ehrlich, in spite of being wrong in his book and wager, is better known than Simon.
Pascal’s wager does not seek to prove God’s existence, but rather to bring rationality to belief.

2021-02-022021-02-23

Self-Driving Safety and Systems

Summary: Who wouldn’t want to improve the transportation status quo? But we’re looking at self-driving car safety in the wrong way. Self-driving cars will also lead to an increase in systemic risk, shifting some gains in safety. Over the next decade or so, there will be more serious discussions on autonomous vehicle implementations. Based on the way these companies have framed early public discussions I worry that people will look at risk in unhelpful ways.

A recent paper titled “Self-Driving Vehicles Against Human Drivers: Equal Safety Is Far From Enough” measures public perception in Korea and China. Since I’ve written about self-driving cars or autonomous vehicles (AVs) a few times I wanted to comment on it and ways to look at risk in a new system.

The paper outlines studies estimating how much safer AVs need to be for the public to accept them. The authors estimate that AVs need to be perceived as 4 to 5 times safer to match the trust and comfort people have with human-driven vehicles.

I’m going to go through a few parts of the paper and tell you why I think the findings aren’t relevant to the AV discussion (though they are interesting). Continue reading “Self-Driving Safety and Systems”