The Shape of Faces to Come (Facial Recognition and Political Orientation)

Image recognition is a set of technologies where we’ve seen great progress recently. Some applications help us gain advantages of efficiency, for example identifying items that may be debris, for removal from an agricultural field. There’s also accuracy, for example identifying tumors in cancer screens at a better rate than human experts. And some applications are for convenience, for example enabling users to unlock their devices with their faces rather than passwords.

These applications can lead to good outcomes just as they can also have unintended consequences.

Related to that, at the start of the recent wave of protests in Hong Kong, a journalist opened an article with a beautiful summary of how important facial recognition had become.

“The police officers wrestled with Colin Cheung in an unmarked car. They needed his face.

“They grabbed his jaw to force his head in front of his iPhone. They slapped his face. They shouted, ‘Wake up!’ They pried open his eyes. It all failed: Mr. Cheung had disabled his phone’s facial-recognition login with a quick button mash as soon as they grabbed him.”

It seems legitimate that we fear misuse of facial recognition. It’s a question of suddenly being able to do something at a scale that would be difficult or costly earlier.

But what about subtler abuses?

That brings me to a new report, titled Facial recognition technology can expose political orientation from naturalistic facial images.

From the report: Continue reading “The Shape of Faces to Come (Facial Recognition and Political Orientation)”

A New Morality of Attainment (Goodhart’s Law)

Peter Drucker said “if you can’t measure it you can’t improve it,” but he didn’t mention the second-order effects of that statement. What changes after people get used to the measurements? What if we measure things that are only partly relevant to what we’re trying to improve?

Tracking metrics can tell us something new, but can also create problems. Let’s look at how Goodhart’s Law leads to unintended consequences.

A New Morality of Attainment

Goodhart’s original quote, about monetary policy in the UK (of all things) was:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

But Goodhart’s original is often reinterpreted so that we can talk about more than economics. Anthropologist Marilyn Strathern, in “Improving Ratings: Audit in the British University System,” summarized Goodhart’s Law as:

“When a measure becomes a target, it ceases to be a good measure.”

This is the version of the law that most people use today.

Other commonly used variations include the Lucas Critique (from Robert Lucas’s work on macroeconomic policy):

“Given that the structure of an econometric model consists of optimal decision rules of economic agents, and that optimal decision rules vary systematically with changes in the structure of series relevant to the decision maker, it follows that any change in policy will systematically alter the structure of econometric models.”

And also Campbell’s Law:

“The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

I’m going to stick with Strathern’s description because its simpler and more people seem to know it.

Strathern wrote about the emergence, in Cambridge in the mid 1700s, of written and oral exams as a way to rate university students.

Those students’ results were supposed to show how well the students learned their material, but also showed how well the faculty and university were doing. That is, how well the students had learned the subject matter, how well the professors taught, and the quality of the universities.

But to determine how well students, professors, and universities had performed, the exams couldn’t be graded in traditional, qualitative ways. They needed a way to rank the students.

“This culminated in 1792 in a proposal that all answers be marked numerically, so that… the best candidate will be declared Number One… The idea of an examination as the formal testing of human activity joined with quantification (that is, a numerical summary of attainment) and with writing, which meant that results were permanently available for inspection. With measurement came a new morality of attainment. If human performance could be measured, then targets could be set and aimed for.”

Strathern also described the difficulties of the rankings.

“When a measure becomes a target, it ceases to be a good measure. The more examination performance becomes an expectation, the poorer it becomes as a discriminator of individual performances. [T]argets that seem measurable become enticing tools for improvement…. This was articulated in Britain for the first time around 1800 as ‘the awful idea of accountability’….”

Again from Strathern’s paper:

“Education finds itself drawn into the rather bloated phenomenon I am calling the audit culture… The enhanced auditing of performance returns not to the process of examining students, then, but to other parts of the system altogether. What now are to be subject to ‘examination’ are the institutions themselves—to put it briefly, not the candidates’ performance but the provision that is made for getting the candidates to that point. Institutions are rendered accountable for the quality of their provision.

“This applies with particular directness in Teaching Quality Assessment (TQA), which scrutinizes the effectiveness of teaching—that is, the procedures the institution has in place for teaching and examining, assessed on a department by department basis within the university’s overall provision…. TQA focuses on the means by which students are taught and thus on the outcome of teaching in terms of its organization and practice, rather than the outcome in terms of students’ knowledge. The Research Assessment Exercise (RAE), on the other hand… specifically rates research outcome as a scholarly product. Yet here, too, means are also acknowledged. Good research is supposed to come out of a good ‘research culture’. If that sounds a bit like candidates getting marks for bringing their pencils into the exam, or being penalized for the examination room being stuffy, it is a reminder that, at the end of the day, it is the institution as such that is under scrutiny. Quality of research is conflated with quality of research department (or centre). 1792 all over again!”

Measuring the quality of teaching (or quality of research) rather than what students learn (or research results) is an odd outcome. But it makes sense. Adherence to a process seems related to the goals, so the process becomes what’s measured.

John Gall, a popular systems writer, also outlined something like this in his book Systemantics. Here he describes how a university researcher gets pulled into metrics-driven work.

“[The department head] fires off a memo to the staff… requiring them to submit to him, in triplicate, by Monday next, statements of their Goals and Objectives….

“Trillium [the scientist who has to respond] goes into a depression just thinking about it.

“Furthermore, he cannot afford to state his true goals [he just likes studying plants]. He must at all costs avoid giving the impression of an ineffective putterer or a dilettante…. His goals must be well-defined, crisply stated, and must appear to lead somewhere important. They must imply activity in areas that tend to throw reflected glory on the Department.”

Universities may have started our the focus with metrics, but today we see metrics used all the time. Here are some examples of metrics used to achieve goals and the problems they created.

Some Inappropriate Metrics

Robert McNamara and the Vietnam War. Vietnam War-era Secretary of Defense Robert McNamara was one of the “Whiz Kids” in Statistical Control, a management science operation in the military. McNamara went on to work for Ford Motor Company and became its president, only to resign shortly afterward when Kennedy asked him to become Secretary of Defense.

In his new role, McNamara brought a statistician’s mind to the Vietnam War, with disastrous results. People on his team presented skewed data, put them in models that told a desired story, and couldn’t assess more qualitative issues like willingness of the Viet Cong and the US to fight. The focus on numbers even when they don’t tell the full story is called the McNamara Fallacy.

The documentary The Fog of War presents 11 lessons about what McNamara learned as a senior government official. The numbered list of lessons:

1) Empathize with your enemy; 2) Rationality alone will not save us; 3) There’s something beyond one’s self; 4) Maximize efficiency; 5) Proportionality should be a guideline in war; 6) Get the data; 7) Belief and seeing are both often wrong; 8) Be prepared to reexamine your reasoning; 9) In order to do good, you may have to engage in evil; 10) Never say never; 11) You can’t change human nature.

Many of those 11 lessons deal with issues of McNamara’s Fallacy, including at least numbers 1, 2, 4, 6, 7, and 8.

Easy, rather than meaningful measurements. If we need to measure something as a step toward our goals, we might choose what is easier to measure instead of what might be more helpful. Examples: a startup ecosystem tracking startup funding rounds (often publicly shared) rather than startup success (takes years and results are often private). The unemployment rate, which tracks how many people looking for work who have not found it, rather than how many have given up without working and how many have taken poorly paid jobs.

Vanity metrics. In startups we often talk about “vanity metrics” as being ones that look good but aren’t helpful.  A vanity metric would be growth in users (rather than accompanying metrics around revenue or retention) or website visits when those visits may come from expensive ad buys.

For example, when Groupon was getting ready to IPO, they quickly hired thousands of sales people in China. The purpose was to increase their valuation at IPO, not to bring in more revenue in a new market. When potential investors saw that Groupon had a large China team, they thought the company would succeed.

Vanity metrics can occur anywhere. When some police departments started to track crime statistics, they also started to underreport certain crimes since the police were judged based on how many crimes occurred.

Surrogate metrics in health care. There’s a trade-off when measuring efficacy of a medicine. How long do we wait to prove results? If there are proxies for knowing whether a patient seems to be on the path to recovery, when do we choose the proxy rather than the actual outcome? As described in Time to Review the Role of Surrogate End Points in Health Policy:

“The Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMA), have a long tradition of licensing technologies solely on the basis of evidence of their effects on biomarkers or intermediate end points that act as so-called surrogate end points. The role of surrogates is becoming increasingly important in the context of programs initiated by the FDA and the EMA to offer accelerated approval to promising new medicines. The key rationale for the use of a surrogate end point is to predict the benefits of treatment in the absence of data on patient-relevant final outcomes. Evidence from surrogate end points may not only expedite the regulatory approval of new health technologies but also inform coverage and reimbursement decisions.”

Nudging. Nudges are little encouragements used by governments and businesses to change what individuals choose to do. Sometimes the nudge comes in the form of information about what other people do. For example, in the UK, tax compliance increased when people received a letter stating that “9 out of 10 people in your area are up to date with tax payments.”

But what if the goals of the nudges don’t consider other outcomes?

In “The Power of Suggestion: Inertia in 401(k) Participation and Savings Behavior” the authors show how changing default choices in employee retirement decisions resulted in more people choosing to save, but also resulted in more people continuing with default conservative money market investment choices.

The nudge increased the goal of higher employee savings compliance but also created a situation where more employees gave up higher returns they could have had from long-term equity investing.

Artistic and sports performances. Judged artistic competitions are scored in different ways and judging criteria sometimes change. Tim Ferriss realized that tango dance competitions ranked turns highly and so he did lots of turns to win, as a relative novice to other competitors. Something similar happened when Olympic skating changed to value the technical difficulty in each component of the skaters’ performance. As a result, checking off technical moves and less of the subjective artistic moves can leave performances less beautiful to watch.

Alpha Chimp. Metrics are a modern invention, but there are versions of them in other societies. In Jane Goodall’s book In the Shadow of Man, we learn of a low-ranking chimpanzee “Mike” who suddenly became alpha male. The top-ranked male was often the toughest chimpanzee in the group, a position backed up by size, intimidation, and what the rest of the group accepted. Mike was at the bottom of the adult male hierarchy (attacked by almost all the other males, last access to bananas). But Mike realized that he could use some empty oil cans that Goodall had left at her campsite in a new way. He ran through the group of chimpanzees banging the cans together. The other chimpanzees had never heard noise like that before and scattered. Maybe Mike proved that the top-ranked position, which should be based on who best leads the group, was actually based on who was scariest.

Four and More Types of Goodhart

One of the best papers digging into variations of the metric and goal problem is Categorizing Variants of Goodhart’s Law, by David Manheim and Scott Garrabrant. Their paper outlines four types of Goodhart’s Law and why they happen.

Regressional. When selecting a metric also selects a lot of noise. Example: choosing to do whatever the winners of “person of the year” or “best company” awards did. You might not see that the person was chosen to send a political message or that the company was manipulating numbers and will fall next year.

Extremal. This comes from out-of-sample projections. When our initial information is within a specific boundary we may still want to project what could happen out of the boundary. In those extreme cases, the relationship between the metric and the goal may break down.

Casual. Where the regulator (the intermediary between the proxy metric and the goal) causes the problem. For example, when pain became a 5th vital sign doctors were measured by their ability to make their patients more comfortable. If doctors start to prescribe pain medication too often or too easily they may increase addiction.

Adversarial. Where agents have different goals from the regulator and find a loophole that harms the goal. For example, colonial powers wanting to decrease the number of cobras in India or rats in Vietnam and paying a bounty for dead cobras or rat tails. People discovered that they could raise their own cobras to kill or cut off rat tails and release the rats. This is known as the Cobra Effect.

Beyond Manheim and Garrabrant’s four examples, others find that the law itself has other different forms.

Right vs Wrong. Noah Smith splits Goodhart’s Law into wrong and right versions:

“The ‘law’ actually comes in several forms, one of which seems clearly wrong, one of which seems clearly right. Here’s the wrong one:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

“That’s obviously false. An easy counterexample is the negative correlation between hand-washing and communicable disease. Before the government made laws to encourage hand-washing by food preparation workers (and to teach hand-washing in schools), there was a clear negative correlation between frequency of hand-washing in a population and the incidence of communicable disease in that population. Now the government has placed pressure on that regularity for control purposes, and the correlation still holds…”

And here’s Smith’s version of Goodhart’s Law that seems true to him:

“As soon as the government attempts to regulate any particular set of financial assets, these become unreliable as indicators of economic trends.

“This seems obviously true if you define ‘economic trends’ to mean economic factors other than the government’s actions. In fact, you don’t even need any kind of forward-looking expectations for this to be true; all you need is for the policy to be effective.”

My Summary. I just summarize Goodhart’s Law as coming from two places. One is the behavior change that occurs when people start trying to achieve a metric rather than a goal. The other origin are problems with the metrics being proxies for goals.

Post Goodhart

There are many cases of metrics being poor proxies for a goal. There are also many cases of people changing their behavior to meet metric targets rather than goals.

More awareness of Goodhart’s Law should hopefully lead to less cases of it, though maybe I’m too optimistic.

We also have many examples of the way we work, live, and play that are without measurement. We often measure haphazardly but survive or measure nothing and also survive.

Or are we actually making some subconscious measurements that we don’t recognize? If so, could those subconscious measurements be beyond Goodhart style effects?

Tracking metrics can fool us into seeing relationships that don’t exist.

Here are some ways we can try to avoid Goodhart’s Law:

  • Be careful with situations where our morals skew the metrics we track. In Morals of the Moment I wrote about bad metric choices led to bad outcomes in forest fires, college enrollment, and biased hiring.
  • Check for signs of vanity metrics (metrics that can only improve or which have weak ties to outcomes).
  • Check whether we’re too process-focused (audit culture) rather than outcomes-focused.
  • Regularly update metrics (and goals) as we see behavior change or find better ways to track progress toward a goal.
  • Keep some metrics secret to avoid “self-defeating prophecies.”
  • Use “counter metrics,” a concept from Julie Zhou. “For each success metric, come up with a good counter metric that would convince you that you’re not simply plugging one hole with another. (For example, a common counter metric for measuring an increase in production is also measuring the quality of each thing produced.)”
  • Allow qualitative judgements that fall beyond numbers. This creates unintended consequences of its own, but can be a good check on our efforts. This is the difference between what Daniel Kahneman calls system one (quick intuition) and system two (slow, analytical, rational thinking). Is there room to act when a situation doesn’t feel right?
  • Be more mindful where our choice of metric can impact many people (metrics that change outcomes for large groups should be studied carefully).

That we are even have Goodhart’s Law is a symptom of a more complex and connected society. In a more localized world there wouldn’t be as much of a problem with tracking metrics to achieve goals. Either because our impact would be local, or we just wouldn’t track things at all.

We also might not have an incentive to learn from Goodhart’s Law. Why try to change the way we set metrics if we are not the ones penalized? Why care about skewed outcomes if our timescale of measurement is short and Goodhart outcomes take a longer time?

Onward, Robot Soldiers?

I’ve written multiple times about basic values, technology trends, and how they can be causes of unintended consequences.

Today I’m exploring the topic of autonomous weapons, reasons behind their development, and potential outcomes. This is a big topic that I will certainly return to multiple times.

Autonomous weapons are characterized by understanding battlefield goals and finding ways to achieve these goals without human action. Such weapons are currently being researched, developed, and tested as intelligent wingmen for fighter pilots, as support vehicles carrying supplies and fuel, and as offensive weapons. Continue reading “Onward, Robot Soldiers?”

Responsibility Clawbacks (McKinsey and Purdue Pharma)

In recent weeks consulting firm McKinsey has been back in the news because of the advice it gave its client Purdue Pharma, makers of OxyContin. The advice blatantly looks like increasing drug sales at the expense of patient health and a worsening opioid epidemic. As a result, McKinsey has been fined $573 million.

But even if the Purdue Pharma-related fine is extreme, the example is just one example of McKinsey’s many bad client outcomes. A short list of other bad outcomes or questionable clients include:

  • Advising badly-run government coronavirus responses.
  • Advising financial firms to increase their debt load in the lead up to the 2008 financial crisis.
  • Advising Enron in the lead-up to its financial scandal.
  • Advising Riker’s Island jail on ways to improve safety with the outcome a more dangerous situation.
  • Advising authoritarian governments including Saudi Arabia, Russia, and China.

Continue reading “Responsibility Clawbacks (McKinsey and Purdue Pharma)”

Three Wagers

It’s easy to write or talk about an issue without staking anything on its outcome. After all, that’s what casual forecasters do all the time. But having something in play — reputation or money or something else — can make sure that people remember their claims. This can keep us honest.

There are obviously a whole set of games that typically include betting. And there are many famous bets, from Ashley Revell gambling his life savings on one spin of the roulette wheel, John Gutfreund and John Meriwether’s proposed but then aborted $10 million bet on a single hand of liar’s poker, and even the dice bet over the sailors’ lives in Rime of the Ancient Mariner.

The following three examples are a bit different. The people making these wagers are trying to drive research, prove their model of the world is correct, or use logic to guide decision-making. It’s something we might consider when we make our own wagers.

‘Oumuamua’s Wager

Purpose: drive research in a specific area. “It’s good for us.”

In 2017 a strangely shaped object moved through the solar system. It was named, ‘Oumuamua, a Hawaiian word for “scout.”

‘Oumuamua was the first observed object passing through the solar system from elsewhere. Its strange shape (possibly like a giant cigar or pancake) led to speculation that it was alien in origin, including that ‘Oumuamua may be an alien-made solar sail or space junk.

These claims came from Avi Loeb, a professor of astronomy at Harvard who calls his speculation “‘Oumuamua’s Wager.”

But most astronomers are dismissive of the alien technology theory. Related to that, Loeb outlines the difficulty of setting a new direction for research. “And in terms of risk, in science, we are supposed to put everything on the table. We cannot just avoid certain ideas because we worry about the consequences of discussing them, because there is great risk in that, too. That would be similar to telling Galileo not to speak about Earth moving around the sun and to avoid looking in his telescope because it was dangerous to the philosophy of the day…. In the context of ‘Oumuamua, I say the available evidence suggests this particular object is artificial, and the way to test this is to find more [examples] of the same and examine them. It’s as simple as that.”

Loeb claims that believing ‘Oumuamua is alien in origin would be a net good because the invigorated search for alien life or technology would drive many other parts of scientific inquiry. In doing so, we would learn much more about the universe than otherwise.

Simon-Ehrlich Wager

Purpose: back up one’s theories. “Let’s prove who’s right.”

The Simon-Ehrlich Wager grew out of the doomsday writing of Paul Ehrlich, author of The Population Bomb, a 1968 book that forecast overpopulation would lead to global famines and resource shortages in the 1970s and 1980s.

Taking the other side of Ehrlich’s claims was Julian Simon, a professor of business. Simon believed that the world would not face the extreme shortages that Ehrlich forecast. The question was, how to gauge whether there was a change in resource availability?

As a solution, Simon proposed that Ehrlich chose any raw materials he wanted and a future date. Simon would win the wager if the prices of those items had decreased by that time. Ehrlich chose copper, chromium, nickel, tin, and tungsten and a date 10 years in the future (September 29, 1990).

The wager’s payoff was to be the difference between the $1000 of materials and the future prices. Ehrlich lost and mailed a check to Simon for $576.07.

Personally, I like that it was actually a business professor that won this one.

But notably, Ehrlich did not include anything other than the check and seemed to be bitter about the loss.

Since Simon was willing to wager again, Ehrlich (and climatologist Stephen Schneider) proposed a new wager — a set of 15 trends, including average temperature, emissions, oceanic harvests, availability of firewood in developing nations, and more.

But Simon passed on the new proposed wagers. As he explained:

“Let me characterize their offer as follows. I predict, and this is for real, that the average performances in the next Olympics will be better than those in the last Olympics. On average, the performances have gotten better, Olympics to Olympics, for a variety of reasons. What Ehrlich and others says is that they don’t want to bet on athletic performances, they want to bet on the conditions of the track, or the weather, or the officials, or any other such indirect measure.”

Pascal’s Wager

Purpose: attribute risk to outcomes and choose the best path. “It benefits me.”

Saying this is about personal benefit may be a bit odd since the wager weighs outcomes given the existence or absence of God.

There are only four outcomes of this wager:

  • God exists, people believe in God and receive infinite reward,
  • God exists, people do not believe in God and miss the infinite reward (or receive infinite punishment),
  • God does not exist, people believe in God and mildly inconvenience themselves,
  • God does not exist, people do not believe in God and have a finite amount of personal benefit.

Given that the outcomes include infinite reward or punishment and only minor costs, one would therefore be logical believing in God.

Pascal’s Wager has been compared to the Precautionary Principle:

“When an activity raises threats of harm to human health or the environment, precautionary measures should be taken even if some cause and effect relationships are not fully established scientifically.”

This principle states that we should work to avoid very bad outcomes even when their chance of occurring is tiny and cause and effect is not known.

Useful Wagers

What makes for a useful wager? There are a few elements.

  • The wager’s outcomes can’t be easily gamed. That is, those taking sides in the Simon-Ehrlich commodities wager can’t drive prices up or down.
  • Who wins is not debatable. This makes wagering on some issues problematic. A wager that is based on things becoming “better,” without a clear definition of how better is measured aren’t useful.
  • Those making the wagers must ride them to the end. One can’t make a wager and then remove oneself from it. Otherwise, people could pick and choose which wagers they commit to.
  • The wager helps us learn something new. We come away with a different understanding of the world after noting the wager’s results. Or, we create new knowledge needed to figure out who won the wager.

So go make wagers when it helps you train your view of the world.

Consider

  • Even Loeb admits that he might not have promoted the alien idea if he didn’t have tenure plus other academic positions. But as he admits, “what’s the worst thing that can happen to me? I’ll be relieved of my administrative duties? This will bring the benefit that I’ll have more time for science.”
  • Ehrlich, in spite of being wrong in his book and wager, is better known than Simon.
  • Pascal’s wager does not seek to prove God’s existence, but rather to bring rationality to belief.

Self-Driving Safety and Systems

Summary: Who wouldn’t want to improve the transportation status quo? But we’re looking at self-driving car safety in the wrong way. Self-driving cars will also lead to an increase in systemic risk, shifting some gains in safety. Over the next decade or so, there will be more serious discussions on autonomous vehicle implementations. Based on the way these companies have framed early public discussions I worry that people will look at risk in unhelpful ways. 

A recent paper titled “Self-Driving Vehicles Against Human Drivers: Equal Safety Is Far From Enough” measures public perception in Korea and China. Since I’ve written about self-driving cars or autonomous vehicles (AVs) a few times I wanted to comment on it and ways to look at risk in a new system.

The paper outlines studies estimating how much safer AVs need to be for the public to accept them. The authors estimate that AVs need to be perceived as 4 to 5 times safer to match the trust and comfort people have with human-driven vehicles.

I’m going to go through a few parts of the paper and tell you why I think the findings aren’t relevant to the AV discussion (though they are interesting). Continue reading “Self-Driving Safety and Systems”

Proposition 22 Paradox

Apart from the US presidential election, another well-funded campaign from 2020 was California ballot measure Proposition 22. If you live outside of California, you may have never heard of Prop 22, even though it could come to impact you.

Prop 22 was a California referendum that dealt with a question specifically addressed to rideshare and app delivery companies. Namely, should rideshare drivers legally be considered contractors or employees?

Now that attention to the presidential election and inauguration has past, let’s go back to look at Prop 22, its Yes vote, and how its implementation led to a system change.

The Fallout

Less than a month after Prop 22 came into effect, related companies took the following actions:

This is just the beginning, but each of the above outcomes have been met with some amount of shock and outrage. Why now?

The Ballot

What Proposition 22 actually said on the voter guide:

“Prop 22: Exempts App-based Transportation And Delivery Companies From Providing Employee Benefits To Certain Drivers. Initiative Statute.

“Summary: Classifies app-based drivers as “independent contractors,” instead of “employees,” and provides independent-contractor drivers other compensation, unless certain criteria are met. Fiscal Impact: Minor increase in state income taxes paid by rideshare and delivery company drivers and investors.”

As with many ballot propositions, it took lengthy explanations to show what voters were actually choosing. Rather than list all the listed arguments and rebuttals here, see the lengthy voter guide for the full details voters received.

Set aside the reality that few voters may actually read propositions carefully. Also set aside the issue that common knowledge of Prop 22 probably came more from ad campaigns than language listed on the voter information guide. To me, the most noticeable information was the information missing from the information provided. How would rideshare and related systems change if Prop 22 received a Yes or No vote?

The Campaign

Prop 22 put the business models of rideshare and delivery companies at risk as well as threatening to change gig worker treatment. The risk for either side was in their relative change. What followed were ad campaigns in support of either side.

But a ballot initiative doesn’t require that both arguments would be heard equally or presented just as well. Rideshare and delivery companies (Uber, Lyft, Postmates, DoorDash, and Instacart) spent approximately $200 million promoting a Yes vote. The opposition spent only $20 million.

How should we judge spending on Prop 22? To compare, in the last couple decades, we’ve seen dramatic increases in fundraising for presidential campaigns. In 2000, Bush and Gore together raised around $450 million. Another contentious presidential campaign in 2004 saw the campaigns raise $1 billion for the first time. For the record-breaking 2020 campaign, Biden and Trump collectively raised $1.7 billion (counting all the other candidates would double that amount).

Of all the 2020 presidential campaign money, $288 million came from donors in California. That gives us some perspective.

So perhaps the $220 million spent on Prop 22 in California makes this single issue similarly important to a presidential campaign. The financial support is different though. Companies backing the proposition can model their own benefit to the point that they know what a win is worth. That’s harder to do with a presidential campaign.

App companies also had distribution on their side. That is, they already had a direct line to customers and gig workers and could push messages like this.

Uber later updated the popup as follows.

Is it possible that 72% of drivers supported Prop 22? With that majority, shouldn’t voters follow the drivers?

This is where you might use the phrase “lies, damned lies, and statistics.” Uber drivers have different needs. Few regularly work over 15 hours a week just for Uber (the minimum needed to qualify for the Prop 22 benefits) while many work fewer hours and also split their time between other gig companies. Those in the second category would likely lose their ability to drive for Uber in the event of a No vote that classifies app workers as employees. That could explain why so many Uber drivers and delivery people supported the proposition.

The vote passed 58% in support of Proposition 22.

Comparisons

We need to back up a moment to an earlier bill. Prop 22 was itself a follow-up to Assembly Bill 5 (AB5), a 2019 bill which provided a three-part test to whether someone is an employee or a contractor.

The AB 5 bill’s three requirements (the ABC test) are as follows:

  1. “The person is free from the control and direction of the hiring entity in connection with the performance of the work, both under the contract for the performance of the work and in fact.
  2. “The person performs work that is outside the usual course of the hiring entity’s business.
  3. “The person is customarily engaged in an independently established trade, occupation, or business of the same nature as that involved in the work performed.”

AB 5 also granted numerous business-type exemptions, but not to app companies.

Exempted businesses were notably not individually well-funded or from Silicon Valley, though they may have held political sway with California legislators. These AB 5 business exceptions include doctors, dentists, lawyers, architects, engineers, accountants, commercial fishermen, designers, artists, barbers, and more.

They political sway was most allegorically seen in business awards a legislator bragged about for writing exemptions into AB 5.

AB 5 threatened the gig economy business model where drivers, delivery people, and other workers did not receive employee benefits. That threat was bound to receive a response from the companies that had the most at risk.

But how was Prop 22 described? From the Los Angeles Times:

“The ballot measure would require the companies to provide an hourly wage for time spent driving equal to 120% of either a local or statewide minimum wage. It would not pay drivers for the time they spend waiting for an assignment. It also requires that drivers receive a stipend for purchasing health insurance coverage when driving time averages at least 15 hours a week, a stipend that grows larger if average driving time rises to 25 hours a week.”

This is an example of where it’s difficult to assess outcomes from a quick read.

15 hours might sound like a low bar, but this is active driving time. Potentially double that amount of time to include driver wait time between fares. Also, the active driving time is tracked per company. That is, 10 hours driving for Uber and 5 hours driving for Lyft do not qualify as 15 hours. Just meeting that 15 hour minimum will prove difficult for many drivers.

Further, an earlier analysis of Prop 22 by the UC Berkeley Labor Center reassessed the 120% of minimum wage benchmark ($15.60) and claims that drivers with more than 15 active hours per week would actually earn only $5.64 per hour.

Why would drivers choose to work if their pay declines? There could be a number of reasons, including that they prefer driving to other work and can’t find other work. A distinction of gig economy work is that you typically don’t need to commit to a fixed schedule. In the case of driving for rideshare delivery businesses, gig workers can work just about any hours they want. That flexibility may make up for their ability to earn more doing something else.

Rideshare companies are famous for having basing their distribution model on moving into new municipalities in advance of any legal right to do so. Uber entered the New York City market without a license to operate as a transportation company. As a result, taxi drivers felt both their fares and medallion values fall and riders benefitted from lower fares and more supply.

A No vote on Prop 22 was supposed to challenge not the distribution part of rideshare businesses, but their cost structure.

But after the Yes vote, some fees have also been passed on to customers. For Uber and Lyft, the fees depend on location and range from $0.30 to $1.50 per ride. Postmates now charges a driver benefits fee of between $0.50 and $2.50, depending on location.

Prop 22 Paradox

Why did we see public shock and outrage in January when affected companies fired workers, changed policy, and added fees? Surely Proposition 22 voters would have expected some type of negative outcome, however they happened to vote.

It’s probably more likely that many Prop 22 voters didn’t really think about negative outcomes. Either because their chief concern was on what positive outcomes they would see, that they didn’t believe themselves negatively affected by potential changes. or they just didn’t have the habit of thinking about trade-offs.

A look at Prop 22 outcomes.

Yes (drivers classified as contractors). Rideshare companies start to provide some benefits. Drivers continue to set their own hours. Price increases passed on to customers.

Other potential outcomes: Slower driving? Drivers have an incentive to reach 20 hours of active driving time. That’s the point at which benefits kick in. If the value of the benefits is greater than that from driving fewer hours, then drivers may slow down or at least not speed.

No (drivers classified as employees). Drivers receive pay increases. Drivers have to work fixed  hours. Driver supply falls. Fares rise. Ride demand falls. Rideshare companies have difficulty operating in California. Price increases passed on to customers.

In the case of a No vote, affected companies estimated needed price increases of 25% to over 100%. Outside researchers estimated 5% to 10%.

Alison Stein, Uber’s economist makes this series of public projections for price increases, trips lost, and work opportunities lost.

By Alison Stein, Economist at Uber (link to larger version above)

Given Stein’s role we perhaps need to take the projections as one perspective. Notably, price increases are most extreme in the less populated parts of California (more inactive driver time between rides). But these price increases also stand out because Uber had subsidized rides to become a less expensive option than legacy taxis.

Trying to present the way a system change could have other outcomes is not unattainable. Especially in something like Proposition 22, which is relatively straightforward. I say relatively because unlike say electing an individual to political office, who might say one thing and do entirely something different, Prop 22 was more limited in scope.

It’s also not to say that putting out that systems map might produce one single correct answer. People can still weigh different outcomes differently. I just don’t like shock and outrage after a legal change goes into effect and companies act. The shock and outrage should have happened with the outcome of the vote itself.

The Thunderbolt on Its Trial

A couple years ago I wrote a piece on the WWI Armistice, titled Under a Spell.

The title came from a line in WWI journalist Philip Gibbs’ book Now It Can Be Told. Throughout, Gibbs quotes several people using that phrase — being under a spell — to describe how they went from normal life to the terror of the first modern war.

Because WWI was so different from previous wars, Gibbs’ book helped me think about the time of change we are in now. Not only the sudden change in the past weeks since the Capitol riot but also the buildup over the past few years and more. WWI itself was a break with the past and not just a larger version of earlier wars. Recent protests, riots, a pandemic, and more are different than earlier chaos.

We are under a spell now too.

But once cast, how do you break a spell?

A few scenes from Gibbs’ book stay with me. One takes place in a cafe in Cologne after the war’s end. English soldiers are having tea, served by a German waiter:

“I overheard a conversation between a young waiter and three of our cavalry officers. They had been in the same fight in the village of Noyelles, near Cambrai, a tiny place of ruin, where they had crouched under machine-gun fire. The waiter drew a diagram on the table-cloth. ‘I was just there.’ The three cavalry officers laughed. ‘Extraordinary! We were a few yards away.’ They chatted with the waiter as though he were an old acquaintance who had played against them in a famous football-match. They did not try to kill him with a table-knife. He did not put poison in the soup.”

Distant as that scene with its long-dead actors now seems, it is hopeful to know that it happened.

But where would that cafe be today? Physical cafes stopped being such places for us even before COVID shut them down. We give so much weight to user-generated content on social media and corporate-generated content on the news even though they often show our worst. Are those really our cafes? And what end is there to a war — if you can call it a war of beliefs or values or feelings — with opposing sides that happen to be in the same country, clumsily colored blue and red? And which are more isolated in the information they consume than where they live?

In the case of WWI, any good will on the part of the soldiers during the Armistice (a truce, but not technically “peace”) did not always transfer to the victors at home. As Gibbs also relayed:

“German music was banned in English drawing-rooms. Preachers and professors denied any quality of virtue or genius to German poets, philosophers, scientists, or scholars. A critical weighing of evidence was regarded as pro-Germanism and lack of patriotism. Truth was delivered bound to passion.”

This is the post-war scene I like least. However, the earlier hopeful exchange in the cafe was between soldiers who never wanted to be there. They both resented their politicians for sending them off. While they had killed each other, they hadn’t hated each other. But English soldiers returning home found different attitudes in their drawing-rooms, classrooms, and churches. Ironically, those with the least on the line felt the worst.

In our case, the US has become a country where family members repeat the wrong talking points and then start to hate each other. Where neighbors refuse to talk to each other because of front yard signs.

When WWI’s cannons cooled, Allied politicians heated up at Versailles in a much criticized peace treaty. The pandemic of that time, the 1918 flu, weakened some of those with milder views on how to pursue the peace, including the US president, who was too sick to continue to protest.

Old outcomes, whether intentional or not, can have long-term effects. People continued to fight for generations because of decisions made at Versailles. In the US, even today you may see that some still carry the flag from the losing side of a Civil War over 150 years ago.

A perfect conclusion to a war would leave no one ever wanting to carry an old flag and not because it was banned or because of what the neighbors would think. But I worry that many will still want to carry old flags.

To think about this mess we’re in, we can ask what was intentional, unintentional, and inevitable and what we could do now.

Intentional, Unintentional, Inevitable

Where does a mentally divided country go? When basic values shift enough so that most enemies are domestic, what then? Is there an Armistice?

Mobs in any of the protests or riots over the past year and more seem to be unintentional and inevitable. While large groups showing outrage might seem to emerge suddenly, they often have long windups. But large-scale top-down action seems to shock almost everyone with the potential for unintended consequences. Even many who hated Trump’s words leading up to the Capitol riot felt uncomfortable when companies such as Twitter, Facebook, Shopify, YouTube, and more deplatformed him.

On the intentional side, we have the desire to show outrage about whatever we are passionate about. It’s unintentional to lose control and go too far, but it always happens when a group grows. Enough people at the extremes start to dominate the group. Even “normal” people get swallowed up in the passion of the crowd. There are many examples of this over the past year. And it’s inevitable that this cycle happens eventually, given the supporting beliefs and feedback loops.

Tech and media business models that depend on engagement can also make downward spirals like the one we’re in inevitable.

When two sides separate and hold bad feelings against each other, there’s always a “what about.”

The construction is: “you may blame us for this thing, but what about that thing you did?” There is no end to it. People value different behavior differently. A rating of which is the better side only lasts so long until it flips. Keeping track takes brainpower better used elsewhere.

I was reminded of something else from history, a century before WWI. This account is from Victor Hugo’s Les Miserables. It’s a work of fiction, but one that captures cultural change and redemption like few others.

This is from the conversation between a dying member of the National Convention (the government after the French Revolution) and a bishop. The French Revolution is more apt than the WWI comparison because rather than a foreign enemy we’re in a time when many Americans reserve their most bitter hatred for other Americans.

Conventionary: “The work was incomplete, I admit: we demolished the ancient regime in deeds; we were not able to suppress it entirely in ideas. To destroy abuses is not sufficient; customs must be modified. The mill is there no longer; the wind is still there.”

Bishop: “You have demolished. It may be of use to demolish, but I distrust a demolition complicated with wrath.”

Conventionary: “Right has its wrath, Bishop; and the wrath of right is an element of progress. In any case, and in spite of whatever may be said, the French Revolution is the most important step of the human race since the advent of Christ. Incomplete, it may be, but sublime. It set free all the unknown social quantities; it softened spirits, it calmed, appeased, enlightened; it caused the waves of civilization to flow over the earth. It was a good thing. The French Revolution is the consecration of humanity.”

Bishop: “Yes? ’93!” [1793 was worst year of the reign of terror]

Conventionary: “Ah, there you go; ’93! I was expecting that word. A cloud had been forming for the space of fifteen hundred years; at the end of fifteen hundred years it burst. You are putting the thunderbolt on its trial.”

Onward

Only about 30 years come between the Reagan administration and the Trump administration. Yet the range of political tools grew dramatically in that time. A former good will between members of Congress dissolved. Also notably, the Reagan administration was the last one where Americans had a clear, multi-generation external enemy.

Politics became a sports season that never ends. We spend lots of effort talking and thinking about our feelings of injustice coming from the other side, whatever that side is. Not as much effort building things together.

Now, after a contested election, four years of ill will, another contested election, and on the eve of an inauguration where could we go? Should we let another cloud form and burst again?

Rather than waiting for the next storm, where could we focus on outcomes over intentions? What could we build? Of course unintended consequences can come from this work too, but I believe building something together is better than pulling ourselves apart.

Here is just a short list. I know you could add to it.

Education that enlightens students and doesn’t leave them with debt. Preventative healthcare that delivers better outcomes than better-compensated late-term procedures. Production of food that keeps people healthy. Affordable, vibrant neighborhoods rather than dull commuter towns. Protection of environmental resources. Creation of new businesses that serve a changing demographic. A better transportation system. Accurate, unbiased information in news and online….

Building the next society is difficult, but possibly more productive than arguing about the past one. Otherwise, what can we say about potential second-order effects of this time of ours?

Gibbs also wrote: “Or is war the law of human life? Is there something more powerful than kaisers and castes which drives masses of men against other masses in death-struggles which they do not understand?” He had no idea how his profession would change.

It’s a Start

2020 was a year of many things. Healthcare politicization, data politicization, writers duped by fake news outlets, news outlets duped by fake writers, economic downs and ups, hacks, protests, an emerging disease people thought was just the flu, and much more.

People suffered from bad systems and had difficultly creating better ones. Often there was no thought to systems at all. But 2020 was representative of a normal year in a world more exposed to unintended consequences. Or you could say 2020 shifted the expectation of what normal could be.

Many recent actions will reach far into the future. I’ll explore some of them in coming articles, which will include more on COVID, the election, and social media censorship, as well as topics like self-driving cars, ecosystem interference, and education.

I take my readers as thoughtful, wanting deeper insights, and appreciating a different perspective than they see elsewhere.

That means that I try to leave readers more knowledgeable on important topics that should be more studied (but often aren’t).

That also means that I’m going beyond writing to do monthly online talks (next one January 13th). Continue reading “It’s a Start”

Information Control (Four Types)

Societies have long valued information control but methods of control have changed over time. What systems drive these changes? And where do undesirable outcomes occur?

Using any of these methods doesn’t imply bad intentions. In some cases, there are good reasons for wanting to control information. But if we’re thinking only of intentions, we’ll lose our focus on outcomes. Good intentions can lead us down bad pathways.

Four major types of control information are destruction, banning, debauching, and blocking.

These methods are applied to recorded information as well as what we carry in our memories and pass down verbally.

And why do I care about this?

Continue reading “Information Control (Four Types)”