# Publications

## Working Papers

1. Driver Surge Pricing [arxiv/ssrn]  [code & data]  [abstract]
Major Revision, Management Science
(Supersedes EC version.)

Ride-hailing marketplaces like Uber and Lyft use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge) are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing platforms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.

## Journal Articles

1.  2018  Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes [link]  [code & data]  [abstract]  [talk video]
Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou
Proceedings of the National Academy of Sciences (PNAS)
Media: Stanford News (and EE department), Science Magazine, Smithsonian Magazine (in print), The World Economic Forum, Futurity, etc.

Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts - e.g., the women’s movement in the 1960s and Asian immigration into the United States - and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

2.  2019  Iterative Local Voting for Collective Decision-making in Continuous Spaces [link]  [abstract]  [demo]
Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, and Kamesh Munagala
Journal of Artificial Intelligence Research (JAIR)
(Supersedes WWW version.)

Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called Iterative Local Voting for collective decision-making in this setting. In this algorithm, voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm, with the size of the ball shrinking at a specified rate. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to Pareto optimal solutions with desirable fairness properties in certain natural settings: when the voters’ utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 2,000 workers, employing neighborhoods defined by various L-Norm balls. We make several observations that inform future implementations of such a procedure.
We have a demo of our Mechanical Turk experiment available live here. It can be used as follows:
1. If the URL is entered without any parameters, it uses the current radius (based on previous uses of the demo, going down by $1/N$) and uses the $\mathcal{L}^2$ mechanism.
2. To set the mechanism, navigate to http://54.183.140.235/mechanism/[option]/, where instead of [option] use either, l1, l2, linf, or full, for the respective mechanisms.
3. To set the radius, navigate to http://54.183.140.235/mechanism/[number]/, where any integer can be entered instead of [number]. This option resets the starting radius for the specific mechanism, which will go down by $1/N$ in subsequent accesses.
4. To set both the mechanism and the radius, navigate to http://54.183.140.235/radius/[number]/mechanism/[option]/, with the above options.

3.  2020  Designing Informative Rating Systems: Evidence from an Online Labor Market [arxiv/ssrn]  [abstract]  [talk video]
Nikhil Garg and Ramesh Johari
Manufacturing & Service Operations Management, Accepted
Media: New York Times, Stanford Engineering magazine.
(Supersedes EC version.)

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, drastically reducing the signal available to distinguish quality. We consider two questions: First, can rating systems better discriminate quality by altering the meaning and relative importance of the levels in the rating system? And second, if so, how should the platform optimize these choices in the design of the rating system? We first analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices. We further run an experiment on Amazon Mechanical Turk with similar structure, to confirm the labor market findings. Our tests reveal that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, scales that are positive-skewed and provide specific interpretations for what each label means yield rating distributions that are much more informative about quality. Second, we develop a theoretical framework to optimize the design of a rating system by choosing answer labels and their numeric interpretations in a manner that maximizes the rate of convergence to the true underlying quality distribution. Finally, we run simulations with an empirically calibrated model and use these to study the implications for optimal rating system design. Our simulations demonstrate that our modeling and optimization approach can substantially improve the quality of information obtained over baseline designs. Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

## Peer Reviewed Conference Proceedings

1.  2015  Impact of Dual Slope Path Loss on User Association in HetNets [link]  [pdf]  [abstract]
Nikhil Garg, Sarabjot Singh, and Jeffrey Andrews
IEEE Globecom Workshop

Intelligent load balancing is essential to fully realize the benefits of dense heterogeneous networks. Current techniques have largely been studied with single slope path loss models, though multi-slope models are known to more closely match real deployments. This paper develops insight into the performance of biasing and uplink/downlink decoupling for user association in HetNets with dual slope path loss models. It is shown that dual slope path loss models change the tradeoffs inherent in biasing and reduce gains from both biasing and uplink/downlink decoupling. The results show that with the dual slope path loss models, the bias maximizing the median rate is not optimal for other users, e.g., edge users. Furthermore, optimal downlink biasing is shown to realize most of the gains from downlink-uplink decoupling. Moreover, the user association gains in dense networks are observed to be quite sensitive to the path loss exponent beyond the critical distance in a dual slope model.

2.  2017  Collaborative Optimization for Collective Decision-making in Continuous Spaces [link]  [abstract]  [demo]
Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, and Kamesh Munagala
International Conference on World Wide Web (WWW ’17)
(Superseded by journal version.)

Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a meta-algorithm called Iterative Local Voting for collective decision-making in this setting, in which voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm. In general, such schemes do not converge, or, when they do, the resulting solution does not have a natural description. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to plausible solutions in certain natural settings: when the voters’ utilities can be exMediaed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 4,000 workers, employing neighborhoods defined by L1, L2 and L-infty balls. We make several observations that inform future implementations of such a procedure.
We have a demo of our Mechanical Turk experiment available live here. It can be used as follows:
1. If the URL is entered without any parameters, it uses the current radius (based on previous uses of the demo, going down by $1/N$) and uses the $\mathcal{L}^2$ mechanism.
2. To set the mechanism, navigate to http://54.183.140.235/mechanism/[option]/, where instead of [option] use either, l1, l2, linf, or full, for the respective mechanisms.
3. To set the radius, navigate to http://54.183.140.235/mechanism/[number]/, where any integer can be entered instead of [number]. This option resets the starting radius for the specific mechanism, which will go down by $1/N$ in subsequent accesses.
4. To set both the mechanism and the radius, navigate to http://54.183.140.235/radius/[number]/mechanism/[option]/, with the above options.

3.  2018  Markets for Public Decision-making [arxiv/ssrn]  [abstract]
Nikhil Garg, Ashish Goel, and Ben Plaut
Conference on Web and Internet Economics (WINE ’18)
(A journal version is in submission.)

A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred alternatives (we allow randomized outcomes). We first show that when each issue has a single price that is common to all agents, market equilibria can be arbitrarily bad. This negative result motivates a different approach. We present a novel technique called "pairwise issue expansion", which transforms any public decision-making instance into an equivalent Fisher market, the simplest type of private goods market. This is done by expanding each issue into many goods: one for each pair of agents who disagree on that issue. We show that the equilibrium prices in the constructed Fisher market yield a "pairwise pricing equilibrium" in the original public decision-making problem which maximizes Nash welfare. More broadly, pairwise issue expansion uncovers a powerful connection between the public decision-making and private goods settings; this immediately yields several interesting results about public decisions markets, and furthers the hope that we will be able to find a simple iterative voting protocol that leads to near-optimum decisions.

4.  2019  Designing Optimal Binary Rating Systems [link]  [arxiv/ssrn]  [abstract]
Nikhil Garg and Ramesh Johari
International Conference on Artificial Intelligence and Statistics (AISTATS ’19)

Modern online platforms rely on effective rating systems to learn about items. We consider the optimal design of rating systems that collect binary feedback after transactions. We make three contributions. First, we formalize the performance of a rating system as the speed with which it recovers the true underlying ranking on items (in a large deviations sense), accounting for both items’ underlying match rates and the platform’s preferences. Second, we provide an efficient algorithm to compute the binary feedback system that yields the highest such performance. Finally, we show how this theoretical perspective can be used to empirically design an implementable, approximately optimal rating system, and validate our approach using real-world experimental data collected on Amazon Mechanical Turk.

5.  2019  Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings [arxiv/ssrn]  [code & data]  [abstract]
Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Jesse Shapiro, Matthew Gentzkow, and Dan Jurafsky
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’19)
Media: Washington Post, Stanford News.

We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force. We quantify these aspects with existing lexical methods, and propose clustering of tweet embeddings as a means to identify salient topics for analysis across events; human evaluations show that our approach generates more cohesive topics than traditional LDA-based models. We apply our methods to study 4.4M tweets on 21 mass shootings. We provide evidence that the discussion of these events is highly polarized politically and that this polarization is primarily driven by partisan differences in framing rather than topic choice. We identify framing devices, such as grounding and the contrasting use of the terms "terrorist" and "crazy", that contribute to polarization. Results pertaining to topic choice, affect and illocutionary force suggest that Republicans focus more on the shooter and event-specific facts (news) while Democrats focus more on the victims and call for policy changes. Our work contributes to a deeper understanding of the way group divisions manifest in language and to computational methods for studying them.

6.  2019  Who is in Your Top Three? Optimizing Learning in Elections with Many Candidates [arxiv/ssrn]  [abstract]
Nikhil Garg, Lodewijk Gelauff, Sukolsak Sakshuwong, and Ashish Goel
AAAI Conference on Human Computation and Crowdsourcing (HCOMP ’19)

Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters’ preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election correctly recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning, for arbitrary voter noise models. Finally, we use data from 5 large participatory budgeting elections that we organized across several US cities, along with other ranking data, to demonstrate the utility of our methods. In particular, we find that historically such elections have set K too low and that picking the right mechanism can be the difference between identifying the correct winner with only a 80% probability or a 99.9% probability after 500 voters.

7.  2020  Fair Allocation through Selective Information Acquisition [arxiv/ssrn]  [abstract]
William Cai, Johann Gaebler, Nikhil Garg, and Sharad Goel
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES ’20)

Public and private institutions must often allocate scare resources under uncertainty. Banks, for example, extend credit to loan applicants based in part on their estimated likelihood of repaying a loan. But when the quality of information differs across candidates (e.g., if some applicants lack traditional credit histories), common lending strategies can lead to disparities across groups. Here we consider a setting in which decision makers—before allocating resources—can choose to spend some of their limited budget further screening select individuals. We present a computationally efficient algorithm for deciding whom to screen that maximizes a standard measure of social welfare. Intuitively, decision makers should screen candidates on the margin, for whom the additional information could plausibly alter the allocation. We formalize this idea by showing the problem can be reduced to solving a series of linear programs. Both on synthetic and real-world datasets, this strategy improves utility, illustrating the value of targeted information acquisition in such decisions. Further, when there is social value for distributing resources to groups for whom we have a priori poor information—like those without credit scores—our approach can substantially improve the allocation of limited assets.

8.  2020  Designing Informative Rating Systems: Evidence from an Online Labor Market [arxiv/ssrn]  [abstract]  [talk video]
Nikhil Garg and Ramesh Johari
ACM Conference on Economics and Computation (EC ’20)
Media: New York Times, Stanford Engineering magazine.
(Superseded by journal version.)

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, drastically reducing the signal available to distinguish quality. We consider two questions: First, can rating systems better discriminate quality by altering the meaning and relative importance of the levels in the rating system? And second, if so, how should the platform optimize these choices in the design of the rating system? We first analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices. We further run an experiment on Amazon Mechanical Turk with similar structure, to confirm the labor market findings. Our tests reveal that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, scales that are positive-skewed and provide specific interpretations for what each label means yield rating distributions that are much more informative about quality. Second, we develop a theoretical framework to optimize the design of a rating system by choosing answer labels and their numeric interpretations in a manner that maximizes the rate of convergence to the true underlying quality distribution. Finally, we run simulations with an empirically calibrated model and use these to study the implications for optimal rating system design. Our simulations demonstrate that our modeling and optimization approach can substantially improve the quality of information obtained over baseline designs. Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

9.  2020  Driver Surge Pricing [arxiv/ssrn]  [code & data]  [abstract]
ACM Conference on Economics and Computation (EC ’20)
(Superseded by journal version.)

Ride-hailing marketplaces like Uber and Lyft use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge) are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing platforms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.

## Other (workshops and technical reports)

1.  2013  Multi-Modal, Multi-State, Real-Time Crew State Monitoring System [pdf]  [abstract]
Kier Fortier, Nikhil Garg, and Elizabeth Pickering
NASA Glenn Research Center Research Report

I helped develop a Real-time, Multi-Modal, Crew State Monitoring System that integrates EEG, GSR, and HRV. I was in charge of overall design, EEG processing, machine learning, and multi-modal integration. I developed robust artifact rejection to detect and remove blinks and other artifacts from the EEG data; EEG feature extraction to represent blocks of data with frequency characteristics, statistical measures, and blink rate; and a machine learning classification system (through Support Vector Machines) that uses the features and characterizes data from a block of time as originating from either a state of rest or a state of concentration. I then integrated EEG and GSR features for joint classiﬁcation, and we demoed a end-to-end system that collected data from multiple sensors, extracted features, and trained and used the classiﬁer to predict subject state. The system successfully classiﬁed 80 percent of subject states.

2.  2015  Use of Electroencephalography and Galvanic Skin Response in the Prediction of an Attentive Cognitive State [link]  [pdf]  [abstract]
Beth Lewandowski, Kier Fortier, Nikhil Garg, Victor Rielly, Jeff Mackey, Tristan Hearn, Angela Harrivel, and Bradford Fenton
Health and Human Performance Research Summit, Dayton, CO

As part of an effort aimed at improving aviation safety, the Crew State Monitoring Element of the NASA Vehicle Systems Safety Technologies Project is developing a monitoring system capable of detecting cognitive states that may be associated with unsafe piloting conditions. The long term goal is a real-time, integrated system, that uses multiple physiological sensing modalities to detect multiple cognitive states with high accuracy, which can be used to help optimize human performance. Prior to realizing an integrated system, individual sensing modalities are being investigated, including the use of electroencephalographic (EEG) and galvanic skin response (GSR) signals, in the determination of an attentive or inattentive state. EEG and GSR data are collected during periods of rest and as subjects perform psychological tests including the psychomotor vigilance test, the Mackwork clock test and the attention network test. Subjects also perform tasks designed to simulate piloting tasks within the NASA multi-attribute task battery (MATB-II) program. The signals are filtered, the artifacts are rejected and the power spectral density (PSD) of the signals are found. Comparisons of the PSD between the rest and test blocks are made, along with the change in PSD over the time course of the blocks. Future work includes the collection of heart rate data and the investigation of heart rate variability as an additional measure to use in the prediction of attentive state, as well as the investigation of additional EEG signal processing methods such as source localization, multi- scale entropy and coherence measures. Preliminary results will be presented to highlight the methods used and to discuss our hypotheses. The challenges associated with realizing a real-time, accurate, multi-modal, cognitive state monitoring system are numerous. A discussion of some of the challenges will be provided, including real-time artifact rejection methods, quantification of inter- and intra-subject variability, determination of what information within the signals provides the best measurement of attention and determination of how information from the different modalities can be integrated to improve the overall accuracy of the system.

Nikhil Garg
Federal Communications Commission comment

4.  2015  Fair Use and Innovation in Unlicensed Wireless Spectrum: LTE unlicensed and Wi-Fi in the 5 GHz unlicensed band [pdf]
Nikhil Garg
IEEE-USA Journal of Technology and Public Policy

5.  2016  Transfer Learning: The Impact of Test Set Word Vectors, with Applications to Political Tweets [pdf]  [abstract]

A major difficulty in applying deep learning in novel domains is the expense associated with acquiring sufficient training data. In this work, we extend literature in deep transfer learning by studying the role of initializing the embedding matrix with word vectors from GLoVe on a target dataset before training models with data from another domain. We study transfer learning on variants of four models (2 RNNs, a CNN, and an LSTM) and three datasets. We conclude that 1) the simple idea of initializing word vectors significantly and robustly improves transfer learning performance, 2) cross-domain learning occurs in fewer iterations than in-domain learning, considerably reduces train time, and 3) blending various out-of-domain datasets before training improves transfer learning. We then apply our models to a dataset of over 400k tweets by politicians, classifying sentiment and subjectivity vs. objectivity. This dataset was provided unlabeled, motivating an unsupervised and transfer learning approach. With transfer learning, we achieve reasonable performance on sentiment classification, but fail in classifying subjectivity vs. objectivity.

6.  2018  Comparing Voting Methods for Budget Decisions on the ASSU Ballot [pdf]  [abstract]
Lodewijk Gelauff, Sukolsak Sakshuwong, Nikhil Garg, and Ashish Goel

During the 2018 Associated Students of Stanford University (ASSU; Stanford’s student body) election and annual grants process, the Stanford Crowdsourced Democracy Team (SCDT) ran a research ballot and survey to develop insights into voting behavior on the budget component of the ballot (annual grants) where multiple grant requests are considered. We provided voters with additional voting methods for the budget component, collected further insights through a survey and demonstrated the viability of the proposed workflow. Some of our findings are directly relevant to ASSU. Furthermore, the (appropriately anonymized) data gathered in this year’s research ballots is beneficial for research purposes. Overall, our platform and pipeline (PB Stanford) with post-validation of ballots functioned well on a large scale. In particular, the knapsack ballot mechanism shows promise in voter feedback.

7.  2019  Deliberative Democracy with the Online Deliberation Platform
James Fishkin, Nikhil Garg, Lodewijk Gelauff, Ashish Goel, Kamesh Munagala, Sukolsak Sakshuwong, Alice Siu, and Sravya Yandamuri
AAAI Conference on Human Computation and Crowdsourcing Demo Track

## Theses

Nikhil Garg
Bachelors Thesis, University of Texas at Austin.

2.  2020  Designing Marketplaces and Civic Engagement Platforms: Learning, Incentives, and Pricing [link]  [pdf]  [abstract]  [talk video]
Nikhil Garg
PhD Dissertation, Stanford University.

Platforms increasingly mediate interactions between people: both helping us find work and transportation, and supporting our civic society through discussion and decision-making. Principled system design requires formalizing the platform’s objective and understanding the incentives, behavioral tendencies, and capabilities of participants; in turn, the design influences participant behavior. In this dissertation, I describe work designing platforms in two domains – two-sided marketplaces and civic engagement platforms – combining both theoretical and empirical analyses of such systems. First, I consider the design of surge pricing that is incentive compatible for drivers in ride-hailing platforms. Second, I tackle rating system inflation and design on online platforms. Finally, I study the design and deployment of systems for participatory budgeting. The work in this dissertation has informed deployments at Uber, a large online labor platform, and in participatory budgeting elections across the U.S.

## Online Marketplaces

1.  2019  Designing Optimal Binary Rating Systems [link]  [arxiv/ssrn]  [abstract]
Nikhil Garg and Ramesh Johari
International Conference on Artificial Intelligence and Statistics (AISTATS ’19)

Modern online platforms rely on effective rating systems to learn about items. We consider the optimal design of rating systems that collect binary feedback after transactions. We make three contributions. First, we formalize the performance of a rating system as the speed with which it recovers the true underlying ranking on items (in a large deviations sense), accounting for both items’ underlying match rates and the platform’s preferences. Second, we provide an efficient algorithm to compute the binary feedback system that yields the highest such performance. Finally, we show how this theoretical perspective can be used to empirically design an implementable, approximately optimal rating system, and validate our approach using real-world experimental data collected on Amazon Mechanical Turk.

2.  2020  Designing Informative Rating Systems: Evidence from an Online Labor Market [arxiv/ssrn]  [abstract]  [talk video]
Nikhil Garg and Ramesh Johari
ACM Conference on Economics and Computation (EC ’20)
Media: New York Times, Stanford Engineering magazine.
(Superseded by journal version.)

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, drastically reducing the signal available to distinguish quality. We consider two questions: First, can rating systems better discriminate quality by altering the meaning and relative importance of the levels in the rating system? And second, if so, how should the platform optimize these choices in the design of the rating system? We first analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices. We further run an experiment on Amazon Mechanical Turk with similar structure, to confirm the labor market findings. Our tests reveal that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, scales that are positive-skewed and provide specific interpretations for what each label means yield rating distributions that are much more informative about quality. Second, we develop a theoretical framework to optimize the design of a rating system by choosing answer labels and their numeric interpretations in a manner that maximizes the rate of convergence to the true underlying quality distribution. Finally, we run simulations with an empirically calibrated model and use these to study the implications for optimal rating system design. Our simulations demonstrate that our modeling and optimization approach can substantially improve the quality of information obtained over baseline designs. Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

3.  2020  Driver Surge Pricing [arxiv/ssrn]  [code & data]  [abstract]
ACM Conference on Economics and Computation (EC ’20)
(Superseded by journal version.)

Ride-hailing marketplaces like Uber and Lyft use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge) are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing platforms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.

4.  2020  Designing Informative Rating Systems: Evidence from an Online Labor Market [arxiv/ssrn]  [abstract]  [talk video]
Nikhil Garg and Ramesh Johari
Manufacturing & Service Operations Management, Accepted
Media: New York Times, Stanford Engineering magazine.
(Supersedes EC version.)

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, drastically reducing the signal available to distinguish quality. We consider two questions: First, can rating systems better discriminate quality by altering the meaning and relative importance of the levels in the rating system? And second, if so, how should the platform optimize these choices in the design of the rating system? We first analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices. We further run an experiment on Amazon Mechanical Turk with similar structure, to confirm the labor market findings. Our tests reveal that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, scales that are positive-skewed and provide specific interpretations for what each label means yield rating distributions that are much more informative about quality. Second, we develop a theoretical framework to optimize the design of a rating system by choosing answer labels and their numeric interpretations in a manner that maximizes the rate of convergence to the true underlying quality distribution. Finally, we run simulations with an empirically calibrated model and use these to study the implications for optimal rating system design. Our simulations demonstrate that our modeling and optimization approach can substantially improve the quality of information obtained over baseline designs. Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

5. Driver Surge Pricing [arxiv/ssrn]  [code & data]  [abstract]
Major Revision, Management Science
(Supersedes EC version.)

Ride-hailing marketplaces like Uber and Lyft use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge) are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing platforms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.

## Civic Engagement

1.  2017  Collaborative Optimization for Collective Decision-making in Continuous Spaces [link]  [abstract]  [demo]
Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, and Kamesh Munagala
International Conference on World Wide Web (WWW ’17)
(Superseded by journal version.)

Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a meta-algorithm called Iterative Local Voting for collective decision-making in this setting, in which voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm. In general, such schemes do not converge, or, when they do, the resulting solution does not have a natural description. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to plausible solutions in certain natural settings: when the voters’ utilities can be exMediaed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 4,000 workers, employing neighborhoods defined by L1, L2 and L-infty balls. We make several observations that inform future implementations of such a procedure.
We have a demo of our Mechanical Turk experiment available live here. It can be used as follows:
1. If the URL is entered without any parameters, it uses the current radius (based on previous uses of the demo, going down by $1/N$) and uses the $\mathcal{L}^2$ mechanism.
2. To set the mechanism, navigate to http://54.183.140.235/mechanism/[option]/, where instead of [option] use either, l1, l2, linf, or full, for the respective mechanisms.
3. To set the radius, navigate to http://54.183.140.235/mechanism/[number]/, where any integer can be entered instead of [number]. This option resets the starting radius for the specific mechanism, which will go down by $1/N$ in subsequent accesses.
4. To set both the mechanism and the radius, navigate to http://54.183.140.235/radius/[number]/mechanism/[option]/, with the above options.

2.  2018  Comparing Voting Methods for Budget Decisions on the ASSU Ballot [pdf]  [abstract]
Lodewijk Gelauff, Sukolsak Sakshuwong, Nikhil Garg, and Ashish Goel

During the 2018 Associated Students of Stanford University (ASSU; Stanford’s student body) election and annual grants process, the Stanford Crowdsourced Democracy Team (SCDT) ran a research ballot and survey to develop insights into voting behavior on the budget component of the ballot (annual grants) where multiple grant requests are considered. We provided voters with additional voting methods for the budget component, collected further insights through a survey and demonstrated the viability of the proposed workflow. Some of our findings are directly relevant to ASSU. Furthermore, the (appropriately anonymized) data gathered in this year’s research ballots is beneficial for research purposes. Overall, our platform and pipeline (PB Stanford) with post-validation of ballots functioned well on a large scale. In particular, the knapsack ballot mechanism shows promise in voter feedback.

3.  2018  Markets for Public Decision-making [arxiv/ssrn]  [abstract]
Nikhil Garg, Ashish Goel, and Ben Plaut
Conference on Web and Internet Economics (WINE ’18)
(A journal version is in submission.)

A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred alternatives (we allow randomized outcomes). We first show that when each issue has a single price that is common to all agents, market equilibria can be arbitrarily bad. This negative result motivates a different approach. We present a novel technique called "pairwise issue expansion", which transforms any public decision-making instance into an equivalent Fisher market, the simplest type of private goods market. This is done by expanding each issue into many goods: one for each pair of agents who disagree on that issue. We show that the equilibrium prices in the constructed Fisher market yield a "pairwise pricing equilibrium" in the original public decision-making problem which maximizes Nash welfare. More broadly, pairwise issue expansion uncovers a powerful connection between the public decision-making and private goods settings; this immediately yields several interesting results about public decisions markets, and furthers the hope that we will be able to find a simple iterative voting protocol that leads to near-optimum decisions.

4.  2019  Iterative Local Voting for Collective Decision-making in Continuous Spaces [link]  [abstract]  [demo]
Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, and Kamesh Munagala
Journal of Artificial Intelligence Research (JAIR)
(Supersedes WWW version.)

Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called Iterative Local Voting for collective decision-making in this setting. In this algorithm, voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm, with the size of the ball shrinking at a specified rate. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to Pareto optimal solutions with desirable fairness properties in certain natural settings: when the voters’ utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 2,000 workers, employing neighborhoods defined by various L-Norm balls. We make several observations that inform future implementations of such a procedure.
We have a demo of our Mechanical Turk experiment available live here. It can be used as follows:
1. If the URL is entered without any parameters, it uses the current radius (based on previous uses of the demo, going down by $1/N$) and uses the $\mathcal{L}^2$ mechanism.
2. To set the mechanism, navigate to http://54.183.140.235/mechanism/[option]/, where instead of [option] use either, l1, l2, linf, or full, for the respective mechanisms.
3. To set the radius, navigate to http://54.183.140.235/mechanism/[number]/, where any integer can be entered instead of [number]. This option resets the starting radius for the specific mechanism, which will go down by $1/N$ in subsequent accesses.
4. To set both the mechanism and the radius, navigate to http://54.183.140.235/radius/[number]/mechanism/[option]/, with the above options.

5.  2019  Deliberative Democracy with the Online Deliberation Platform
James Fishkin, Nikhil Garg, Lodewijk Gelauff, Ashish Goel, Kamesh Munagala, Sukolsak Sakshuwong, Alice Siu, and Sravya Yandamuri
AAAI Conference on Human Computation and Crowdsourcing Demo Track

6.  2019  Who is in Your Top Three? Optimizing Learning in Elections with Many Candidates [arxiv/ssrn]  [abstract]
Nikhil Garg, Lodewijk Gelauff, Sukolsak Sakshuwong, and Ashish Goel
AAAI Conference on Human Computation and Crowdsourcing (HCOMP ’19)

Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters’ preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election correctly recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning, for arbitrary voter noise models. Finally, we use data from 5 large participatory budgeting elections that we organized across several US cities, along with other ranking data, to demonstrate the utility of our methods. In particular, we find that historically such elections have set K too low and that picking the right mechanism can be the difference between identifying the correct winner with only a 80% probability or a 99.9% probability after 500 voters.

## Natural Language Processing

1.  2016  Transfer Learning: The Impact of Test Set Word Vectors, with Applications to Political Tweets [pdf]  [abstract]

A major difficulty in applying deep learning in novel domains is the expense associated with acquiring sufficient training data. In this work, we extend literature in deep transfer learning by studying the role of initializing the embedding matrix with word vectors from GLoVe on a target dataset before training models with data from another domain. We study transfer learning on variants of four models (2 RNNs, a CNN, and an LSTM) and three datasets. We conclude that 1) the simple idea of initializing word vectors significantly and robustly improves transfer learning performance, 2) cross-domain learning occurs in fewer iterations than in-domain learning, considerably reduces train time, and 3) blending various out-of-domain datasets before training improves transfer learning. We then apply our models to a dataset of over 400k tweets by politicians, classifying sentiment and subjectivity vs. objectivity. This dataset was provided unlabeled, motivating an unsupervised and transfer learning approach. With transfer learning, we achieve reasonable performance on sentiment classification, but fail in classifying subjectivity vs. objectivity.

2.  2018  Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes [link]  [code & data]  [abstract]  [talk video]
Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou
Proceedings of the National Academy of Sciences (PNAS)
Media: Stanford News (and EE department), Science Magazine, Smithsonian Magazine (in print), The World Economic Forum, Futurity, etc.

Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts - e.g., the women’s movement in the 1960s and Asian immigration into the United States - and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

3.  2019  Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings [arxiv/ssrn]  [code & data]  [abstract]
Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Jesse Shapiro, Matthew Gentzkow, and Dan Jurafsky
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’19)
Media: Washington Post, Stanford News.

We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force. We quantify these aspects with existing lexical methods, and propose clustering of tweet embeddings as a means to identify salient topics for analysis across events; human evaluations show that our approach generates more cohesive topics than traditional LDA-based models. We apply our methods to study 4.4M tweets on 21 mass shootings. We provide evidence that the discussion of these events is highly polarized politically and that this polarization is primarily driven by partisan differences in framing rather than topic choice. We identify framing devices, such as grounding and the contrasting use of the terms "terrorist" and "crazy", that contribute to polarization. Results pertaining to topic choice, affect and illocutionary force suggest that Republicans focus more on the shooter and event-specific facts (news) while Democrats focus more on the victims and call for policy changes. Our work contributes to a deeper understanding of the way group divisions manifest in language and to computational methods for studying them.

## Algorithmic Fairness

1.  2020  Fair Allocation through Selective Information Acquisition [arxiv/ssrn]  [abstract]
William Cai, Johann Gaebler, Nikhil Garg, and Sharad Goel
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES ’20)

Public and private institutions must often allocate scare resources under uncertainty. Banks, for example, extend credit to loan applicants based in part on their estimated likelihood of repaying a loan. But when the quality of information differs across candidates (e.g., if some applicants lack traditional credit histories), common lending strategies can lead to disparities across groups. Here we consider a setting in which decision makers—before allocating resources—can choose to spend some of their limited budget further screening select individuals. We present a computationally efficient algorithm for deciding whom to screen that maximizes a standard measure of social welfare. Intuitively, decision makers should screen candidates on the margin, for whom the additional information could plausibly alter the allocation. We formalize this idea by showing the problem can be reduced to solving a series of linear programs. Both on synthetic and real-world datasets, this strategy improves utility, illustrating the value of targeted information acquisition in such decisions. Further, when there is social value for distributing resources to groups for whom we have a priori poor information—like those without credit scores—our approach can substantially improve the allocation of limited assets.

## Wireless Communications and Signal Processing

1.  2013  Multi-Modal, Multi-State, Real-Time Crew State Monitoring System [pdf]  [abstract]
Kier Fortier, Nikhil Garg, and Elizabeth Pickering
NASA Glenn Research Center Research Report

I helped develop a Real-time, Multi-Modal, Crew State Monitoring System that integrates EEG, GSR, and HRV. I was in charge of overall design, EEG processing, machine learning, and multi-modal integration. I developed robust artifact rejection to detect and remove blinks and other artifacts from the EEG data; EEG feature extraction to represent blocks of data with frequency characteristics, statistical measures, and blink rate; and a machine learning classification system (through Support Vector Machines) that uses the features and characterizes data from a block of time as originating from either a state of rest or a state of concentration. I then integrated EEG and GSR features for joint classiﬁcation, and we demoed a end-to-end system that collected data from multiple sensors, extracted features, and trained and used the classiﬁer to predict subject state. The system successfully classiﬁed 80 percent of subject states.

2.  2015  Use of Electroencephalography and Galvanic Skin Response in the Prediction of an Attentive Cognitive State [link]  [pdf]  [abstract]
Beth Lewandowski, Kier Fortier, Nikhil Garg, Victor Rielly, Jeff Mackey, Tristan Hearn, Angela Harrivel, and Bradford Fenton
Health and Human Performance Research Summit, Dayton, CO

As part of an effort aimed at improving aviation safety, the Crew State Monitoring Element of the NASA Vehicle Systems Safety Technologies Project is developing a monitoring system capable of detecting cognitive states that may be associated with unsafe piloting conditions. The long term goal is a real-time, integrated system, that uses multiple physiological sensing modalities to detect multiple cognitive states with high accuracy, which can be used to help optimize human performance. Prior to realizing an integrated system, individual sensing modalities are being investigated, including the use of electroencephalographic (EEG) and galvanic skin response (GSR) signals, in the determination of an attentive or inattentive state. EEG and GSR data are collected during periods of rest and as subjects perform psychological tests including the psychomotor vigilance test, the Mackwork clock test and the attention network test. Subjects also perform tasks designed to simulate piloting tasks within the NASA multi-attribute task battery (MATB-II) program. The signals are filtered, the artifacts are rejected and the power spectral density (PSD) of the signals are found. Comparisons of the PSD between the rest and test blocks are made, along with the change in PSD over the time course of the blocks. Future work includes the collection of heart rate data and the investigation of heart rate variability as an additional measure to use in the prediction of attentive state, as well as the investigation of additional EEG signal processing methods such as source localization, multi- scale entropy and coherence measures. Preliminary results will be presented to highlight the methods used and to discuss our hypotheses. The challenges associated with realizing a real-time, accurate, multi-modal, cognitive state monitoring system are numerous. A discussion of some of the challenges will be provided, including real-time artifact rejection methods, quantification of inter- and intra-subject variability, determination of what information within the signals provides the best measurement of attention and determination of how information from the different modalities can be integrated to improve the overall accuracy of the system.

Nikhil Garg
Bachelors Thesis, University of Texas at Austin.

Nikhil Garg
Federal Communications Commission comment

5.  2015  Fair Use and Innovation in Unlicensed Wireless Spectrum: LTE unlicensed and Wi-Fi in the 5 GHz unlicensed band [pdf]
Nikhil Garg
IEEE-USA Journal of Technology and Public Policy

6.  2015  Impact of Dual Slope Path Loss on User Association in HetNets [link]  [pdf]  [abstract]
Nikhil Garg, Sarabjot Singh, and Jeffrey Andrews
IEEE Globecom Workshop

Intelligent load balancing is essential to fully realize the benefits of dense heterogeneous networks. Current techniques have largely been studied with single slope path loss models, though multi-slope models are known to more closely match real deployments. This paper develops insight into the performance of biasing and uplink/downlink decoupling for user association in HetNets with dual slope path loss models. It is shown that dual slope path loss models change the tradeoffs inherent in biasing and reduce gains from both biasing and uplink/downlink decoupling. The results show that with the dual slope path loss models, the bias maximizing the median rate is not optimal for other users, e.g., edge users. Furthermore, optimal downlink biasing is shown to realize most of the gains from downlink-uplink decoupling. Moreover, the user association gains in dense networks are observed to be quite sensitive to the path loss exponent beyond the critical distance in a dual slope model.

## Working Papers

1. Driver Surge Pricing [arxiv/ssrn]  [code & data]  [abstract]
Major Revision, Management Science
(Supersedes EC version.)

Ride-hailing marketplaces like Uber and Lyft use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge) are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing platforms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.

## 2020

1. Fair Allocation through Selective Information Acquisition [arxiv/ssrn]  [abstract]
William Cai, Johann Gaebler, Nikhil Garg, and Sharad Goel
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES ’20)

Public and private institutions must often allocate scare resources under uncertainty. Banks, for example, extend credit to loan applicants based in part on their estimated likelihood of repaying a loan. But when the quality of information differs across candidates (e.g., if some applicants lack traditional credit histories), common lending strategies can lead to disparities across groups. Here we consider a setting in which decision makers—before allocating resources—can choose to spend some of their limited budget further screening select individuals. We present a computationally efficient algorithm for deciding whom to screen that maximizes a standard measure of social welfare. Intuitively, decision makers should screen candidates on the margin, for whom the additional information could plausibly alter the allocation. We formalize this idea by showing the problem can be reduced to solving a series of linear programs. Both on synthetic and real-world datasets, this strategy improves utility, illustrating the value of targeted information acquisition in such decisions. Further, when there is social value for distributing resources to groups for whom we have a priori poor information—like those without credit scores—our approach can substantially improve the allocation of limited assets.

2. Designing Informative Rating Systems: Evidence from an Online Labor Market [arxiv/ssrn]  [abstract]  [talk video]
Nikhil Garg and Ramesh Johari
ACM Conference on Economics and Computation (EC ’20)
Media: New York Times, Stanford Engineering magazine.
(Superseded by journal version.)

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, drastically reducing the signal available to distinguish quality. We consider two questions: First, can rating systems better discriminate quality by altering the meaning and relative importance of the levels in the rating system? And second, if so, how should the platform optimize these choices in the design of the rating system? We first analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices. We further run an experiment on Amazon Mechanical Turk with similar structure, to confirm the labor market findings. Our tests reveal that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, scales that are positive-skewed and provide specific interpretations for what each label means yield rating distributions that are much more informative about quality. Second, we develop a theoretical framework to optimize the design of a rating system by choosing answer labels and their numeric interpretations in a manner that maximizes the rate of convergence to the true underlying quality distribution. Finally, we run simulations with an empirically calibrated model and use these to study the implications for optimal rating system design. Our simulations demonstrate that our modeling and optimization approach can substantially improve the quality of information obtained over baseline designs. Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

3. Driver Surge Pricing [arxiv/ssrn]  [code & data]  [abstract]
ACM Conference on Economics and Computation (EC ’20)
(Superseded by journal version.)

Ride-hailing marketplaces like Uber and Lyft use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge) are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing platforms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.

4. Designing Marketplaces and Civic Engagement Platforms: Learning, Incentives, and Pricing [link]  [pdf]  [abstract]  [talk video]
Nikhil Garg
PhD Dissertation, Stanford University.

Platforms increasingly mediate interactions between people: both helping us find work and transportation, and supporting our civic society through discussion and decision-making. Principled system design requires formalizing the platform’s objective and understanding the incentives, behavioral tendencies, and capabilities of participants; in turn, the design influences participant behavior. In this dissertation, I describe work designing platforms in two domains – two-sided marketplaces and civic engagement platforms – combining both theoretical and empirical analyses of such systems. First, I consider the design of surge pricing that is incentive compatible for drivers in ride-hailing platforms. Second, I tackle rating system inflation and design on online platforms. Finally, I study the design and deployment of systems for participatory budgeting. The work in this dissertation has informed deployments at Uber, a large online labor platform, and in participatory budgeting elections across the U.S.

5. Designing Informative Rating Systems: Evidence from an Online Labor Market [arxiv/ssrn]  [abstract]  [talk video]
Nikhil Garg and Ramesh Johari
Manufacturing & Service Operations Management, Accepted
Media: New York Times, Stanford Engineering magazine.
(Supersedes EC version.)

Platforms critically rely on rating systems to learn the quality of market participants. In practice, however, these ratings are often highly inflated, drastically reducing the signal available to distinguish quality. We consider two questions: First, can rating systems better discriminate quality by altering the meaning and relative importance of the levels in the rating system? And second, if so, how should the platform optimize these choices in the design of the rating system? We first analyze the results of a randomized controlled trial on an online labor market in which an additional question was added to the feedback form. Between treatment conditions, we vary the question phrasing and answer choices. We further run an experiment on Amazon Mechanical Turk with similar structure, to confirm the labor market findings. Our tests reveal that current inflationary norms can in fact be countered by re-anchoring the meaning of the levels of the rating system. In particular, scales that are positive-skewed and provide specific interpretations for what each label means yield rating distributions that are much more informative about quality. Second, we develop a theoretical framework to optimize the design of a rating system by choosing answer labels and their numeric interpretations in a manner that maximizes the rate of convergence to the true underlying quality distribution. Finally, we run simulations with an empirically calibrated model and use these to study the implications for optimal rating system design. Our simulations demonstrate that our modeling and optimization approach can substantially improve the quality of information obtained over baseline designs. Overall, our study illustrates that rating systems that are informative in practice can be designed, and demonstrates how to design them in a principled manner.

## 2019

1. Iterative Local Voting for Collective Decision-making in Continuous Spaces [link]  [abstract]  [demo]
Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, and Kamesh Munagala
Journal of Artificial Intelligence Research (JAIR)
(Supersedes WWW version.)

Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a algorithm called Iterative Local Voting for collective decision-making in this setting. In this algorithm, voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm, with the size of the ball shrinking at a specified rate. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to Pareto optimal solutions with desirable fairness properties in certain natural settings: when the voters’ utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 2,000 workers, employing neighborhoods defined by various L-Norm balls. We make several observations that inform future implementations of such a procedure.
We have a demo of our Mechanical Turk experiment available live here. It can be used as follows:
1. If the URL is entered without any parameters, it uses the current radius (based on previous uses of the demo, going down by $1/N$) and uses the $\mathcal{L}^2$ mechanism.
2. To set the mechanism, navigate to http://54.183.140.235/mechanism/[option]/, where instead of [option] use either, l1, l2, linf, or full, for the respective mechanisms.
3. To set the radius, navigate to http://54.183.140.235/mechanism/[number]/, where any integer can be entered instead of [number]. This option resets the starting radius for the specific mechanism, which will go down by $1/N$ in subsequent accesses.
4. To set both the mechanism and the radius, navigate to http://54.183.140.235/radius/[number]/mechanism/[option]/, with the above options.

2. Designing Optimal Binary Rating Systems [link]  [arxiv/ssrn]  [abstract]
Nikhil Garg and Ramesh Johari
International Conference on Artificial Intelligence and Statistics (AISTATS ’19)

Modern online platforms rely on effective rating systems to learn about items. We consider the optimal design of rating systems that collect binary feedback after transactions. We make three contributions. First, we formalize the performance of a rating system as the speed with which it recovers the true underlying ranking on items (in a large deviations sense), accounting for both items’ underlying match rates and the platform’s preferences. Second, we provide an efficient algorithm to compute the binary feedback system that yields the highest such performance. Finally, we show how this theoretical perspective can be used to empirically design an implementable, approximately optimal rating system, and validate our approach using real-world experimental data collected on Amazon Mechanical Turk.

3. Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings [arxiv/ssrn]  [code & data]  [abstract]
Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Jesse Shapiro, Matthew Gentzkow, and Dan Jurafsky
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’19)
Media: Washington Post, Stanford News.

We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force. We quantify these aspects with existing lexical methods, and propose clustering of tweet embeddings as a means to identify salient topics for analysis across events; human evaluations show that our approach generates more cohesive topics than traditional LDA-based models. We apply our methods to study 4.4M tweets on 21 mass shootings. We provide evidence that the discussion of these events is highly polarized politically and that this polarization is primarily driven by partisan differences in framing rather than topic choice. We identify framing devices, such as grounding and the contrasting use of the terms "terrorist" and "crazy", that contribute to polarization. Results pertaining to topic choice, affect and illocutionary force suggest that Republicans focus more on the shooter and event-specific facts (news) while Democrats focus more on the victims and call for policy changes. Our work contributes to a deeper understanding of the way group divisions manifest in language and to computational methods for studying them.

4. Deliberative Democracy with the Online Deliberation Platform
James Fishkin, Nikhil Garg, Lodewijk Gelauff, Ashish Goel, Kamesh Munagala, Sukolsak Sakshuwong, Alice Siu, and Sravya Yandamuri
AAAI Conference on Human Computation and Crowdsourcing Demo Track

5. Who is in Your Top Three? Optimizing Learning in Elections with Many Candidates [arxiv/ssrn]  [abstract]
Nikhil Garg, Lodewijk Gelauff, Sukolsak Sakshuwong, and Ashish Goel
AAAI Conference on Human Computation and Crowdsourcing (HCOMP ’19)

Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters’ preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election correctly recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning, for arbitrary voter noise models. Finally, we use data from 5 large participatory budgeting elections that we organized across several US cities, along with other ranking data, to demonstrate the utility of our methods. In particular, we find that historically such elections have set K too low and that picking the right mechanism can be the difference between identifying the correct winner with only a 80% probability or a 99.9% probability after 500 voters.

## 2018

1. Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes [link]  [code & data]  [abstract]  [talk video]
Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou
Proceedings of the National Academy of Sciences (PNAS)
Media: Stanford News (and EE department), Science Magazine, Smithsonian Magazine (in print), The World Economic Forum, Futurity, etc.

Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts - e.g., the women’s movement in the 1960s and Asian immigration into the United States - and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

2. Markets for Public Decision-making [arxiv/ssrn]  [abstract]
Nikhil Garg, Ashish Goel, and Ben Plaut
Conference on Web and Internet Economics (WINE ’18)
(A journal version is in submission.)

A public decision-making problem consists of a set of issues, each with multiple possible alternatives, and a set of competing agents, each with a preferred alternative for each issue. We study adaptations of market economies to this setting, focusing on binary issues. Issues have prices, and each agent is endowed with artificial currency that she can use to purchase probability for her preferred alternatives (we allow randomized outcomes). We first show that when each issue has a single price that is common to all agents, market equilibria can be arbitrarily bad. This negative result motivates a different approach. We present a novel technique called "pairwise issue expansion", which transforms any public decision-making instance into an equivalent Fisher market, the simplest type of private goods market. This is done by expanding each issue into many goods: one for each pair of agents who disagree on that issue. We show that the equilibrium prices in the constructed Fisher market yield a "pairwise pricing equilibrium" in the original public decision-making problem which maximizes Nash welfare. More broadly, pairwise issue expansion uncovers a powerful connection between the public decision-making and private goods settings; this immediately yields several interesting results about public decisions markets, and furthers the hope that we will be able to find a simple iterative voting protocol that leads to near-optimum decisions.

## 2017

1. Collaborative Optimization for Collective Decision-making in Continuous Spaces [link]  [abstract]  [demo]
Nikhil Garg, Vijay Kamble, Ashish Goel, David Marn, and Kamesh Munagala
International Conference on World Wide Web (WWW ’17)
(Superseded by journal version.)

Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a meta-algorithm called Iterative Local Voting for collective decision-making in this setting, in which voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm. In general, such schemes do not converge, or, when they do, the resulting solution does not have a natural description. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to plausible solutions in certain natural settings: when the voters’ utilities can be exMediaed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 4,000 workers, employing neighborhoods defined by L1, L2 and L-infty balls. We make several observations that inform future implementations of such a procedure.
We have a demo of our Mechanical Turk experiment available live here. It can be used as follows:
1. If the URL is entered without any parameters, it uses the current radius (based on previous uses of the demo, going down by $1/N$) and uses the $\mathcal{L}^2$ mechanism.
2. To set the mechanism, navigate to http://54.183.140.235/mechanism/[option]/, where instead of [option] use either, l1, l2, linf, or full, for the respective mechanisms.
3. To set the radius, navigate to http://54.183.140.235/mechanism/[number]/, where any integer can be entered instead of [number]. This option resets the starting radius for the specific mechanism, which will go down by $1/N$ in subsequent accesses.
4. To set both the mechanism and the radius, navigate to http://54.183.140.235/radius/[number]/mechanism/[option]/, with the above options.

## 2015

1. Use of Electroencephalography and Galvanic Skin Response in the Prediction of an Attentive Cognitive State [link]  [pdf]  [abstract]
Beth Lewandowski, Kier Fortier, Nikhil Garg, Victor Rielly, Jeff Mackey, Tristan Hearn, Angela Harrivel, and Bradford Fenton
Health and Human Performance Research Summit, Dayton, CO

As part of an effort aimed at improving aviation safety, the Crew State Monitoring Element of the NASA Vehicle Systems Safety Technologies Project is developing a monitoring system capable of detecting cognitive states that may be associated with unsafe piloting conditions. The long term goal is a real-time, integrated system, that uses multiple physiological sensing modalities to detect multiple cognitive states with high accuracy, which can be used to help optimize human performance. Prior to realizing an integrated system, individual sensing modalities are being investigated, including the use of electroencephalographic (EEG) and galvanic skin response (GSR) signals, in the determination of an attentive or inattentive state. EEG and GSR data are collected during periods of rest and as subjects perform psychological tests including the psychomotor vigilance test, the Mackwork clock test and the attention network test. Subjects also perform tasks designed to simulate piloting tasks within the NASA multi-attribute task battery (MATB-II) program. The signals are filtered, the artifacts are rejected and the power spectral density (PSD) of the signals are found. Comparisons of the PSD between the rest and test blocks are made, along with the change in PSD over the time course of the blocks. Future work includes the collection of heart rate data and the investigation of heart rate variability as an additional measure to use in the prediction of attentive state, as well as the investigation of additional EEG signal processing methods such as source localization, multi- scale entropy and coherence measures. Preliminary results will be presented to highlight the methods used and to discuss our hypotheses. The challenges associated with realizing a real-time, accurate, multi-modal, cognitive state monitoring system are numerous. A discussion of some of the challenges will be provided, including real-time artifact rejection methods, quantification of inter- and intra-subject variability, determination of what information within the signals provides the best measurement of attention and determination of how information from the different modalities can be integrated to improve the overall accuracy of the system.