Conference Publications

  • Near-Optimal No-Regret Learning in General Games
    Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich
    NeurIPS 2021 Oral Presentation [paper]
  • Multi-item Non-truthful Auctions Achieve Good Revenue
    Constantinos Daskalakis, Maxwell Fishelson, Brendan Lucier, Vasilis Syrgkanis and Santhoshini Velusamy
    EC 2020 [paper] [talk] [low-level talk]

Journal Publications

  • Pattern Avoidance Over a Hypergraph
    Maxwell Fishelson, Benjamin Gunby
    Electronic Journal of Combinatorics 2021 [paper] [journal]

Preprints

  • Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games
    Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm
    Preprint [paper]

Resume


Near-Optimal No-Regret Learning in General Games

Overview

Awarded an Oral Presentation at the thirty-fifth conference on Neural Information Processing Systems (NeurIPS 21), this work establishes that Optimistic Hedge – a common variant of multiplicative-weights-updates with recency bias – attains \(poly(\log T)\) regret in multi-player general-sum games. In particular, when every player of a game uses Optimistic Hedge to iteratively update her strategy in response to the history of play, then after \(T\) rounds of interaction, each player experiences total regret that is \(poly(\log T)\). This proof was a huge milestone in the space of multi-agent learning theory, attaining an exponential improvement on the previous best-known \(O(T^{1/4})\) regret attainable in general-sum games.

Background

Modern machine learning applications are often tasked with learning in dynamic and potentially adversarial environments. This is in strict contrast with the foundational assumptions of classical machine learning. The traditional setting involves a single learner investigating a static environment, such as a fixed data set of traffic images. However, as machine learning becomes an increasingly ubiquitous tool in the modern world, these learning environments are rarely static. They are composed of several different learning agents all trying to optimize decision making, such as several companies in a shared economy trying to learn optimal pricing. In these settings, how one learner chooses to learn will impact the environment and impact how the others are learning. In this sense, a choice of a learning algorithm is analogous to a choice of a strategy in a game. These settings have led to a recent explosion of research on game theory interfaced with statistical learning.

The studying of online learning from an adversarial perspective was popularized back in 2006 with the work of Cesa-Bianchi and Lugosi. This work discussed the Follow-The-Regularized-Leader (FTRL) algorithm and demonstrated that a learner making decisions according to FTRL in a repeated, adversarial environment will only experience a “regret” of \(O(\sqrt{T} )\) over \(T\) time-step iterations of learning. Regret is a notion of loss incurred by the learner over this process. It is a comparison between the amount of loss they actually incurred versus the amount of loss they would have incurred had they chosen the best fixed action in hindsight: a meaningful benchmark. It was a break-through work when it came out, as it began to address the problem of learning in these dynamic environments which better model the modern world. It also came with a matching lower bound, that no algorithm could achieve regret better than \(O(\sqrt{T})\) in the purely adversarial setting.

However, a main open question remained: is a purely adversarial model too pessimistic? Under a more realistic model, can we improve on \(O(\sqrt{T})\)? Rather than assuming the feedback we receive from the world is the absolute worst-case scenario, what if we assume that it is the product of the actions of other learning agents? Indeed, even in the setting where the environment is composed of learning agents playing a purely-adversarial zero-sum game, there is some guaranteed regularity that comes from the fact that all agents involved are using some robust algorithm for learning.

In 2015, Syrgkanis et. al. made the first major breakthrough in this direction [SALS15]. Awarded the Best Paper Award at NeurIPS 2015, this work demonstrated that all players in a game running a modified algorithm “Optimistic FTRL” for decision making will experience regret at most \(O(T^{1/4})\). This was the first result to crack the \(O(\sqrt{T})\) barrier, and it got the whole world of theoretical machine learning investigating if it was possible to further improve this bound.

This investigation was only concluded 6 years later through this work of Constantinos Daskalakis, Noah Golowich, and myself. We demonstrated that Optimistic Hedge, a common variant of Optimistic FTRL, attains \(poly(\log(T))\) regret in multi-player general-sum games. This was an exponential improvement on the previously-best-known regret bound. For this discovery, we were awarded an Oral Presentation (top 50 papers out of 9000 submissions) at the NeurIPS 2021 conference.

Math

The crucial thing that separates the multi-agent game-theoretic setting of online learning from the adversarial setting is a guarantee that we refer to as “high-order smoothness”. What this means is that, if we look at the sequence of loss vectors experienced by each of the players over time, we can upper-bound the magnitude of the first, second, third, … \(h^{\text{th}}\)-order finite derivatives of the sequence. In the adversarial setting, the loss vector could be shifting wildly with the intention of throwing off the Optimistic Hedge algorithm. In the game setting though, even if the game is purely adversarial (zero-sum), there is some regularity guarantee that stems from the fact that all players are using the same Optimistic Hedge algorithm to govern their strategic dynamics. If all players except one are playing sequences of strategies that are smooth over time (having bounded high-order derivative), it will lead to a loss vector that is smooth over time for that one player. Her experienced loss is a function of the strategies of others. In turn, since the mixed strategy she plays according to Optimistic Hedge is a function of the cumulative loss, her strategy vectors will be guaranteed smooth over time. This leads to an inductive argument that demonstrates that all players have high-order smooth strategies and losses.

Along the way of this inductive argument, there are some neat combinatorial lemmas that come into play but aren’t in the spotlight of our write-up. I’m using this blog post to give some of these arguments a little exposition!

Lemma B.5: The Magic of Softmax

The weight that Optimistic Hedge places on strategy \(j\) at time step \(t\) is \[\phi_j\left(-\eta \cdot \bar{\ell}^{(t)}_1, -\eta \cdot \bar{\ell}^{(t)}_2, \cdots -\eta \cdot \bar{\ell}^{(t)}_n\right)\] where \(\bar{\ell}^{(t)}_i = \sum_{\tau = 1}^{t-1} \ell^{(\tau)}_i + \ell^{(t-1)}_i\) is the cumulative loss incurred for action \(i\) (with some recency bias) and \[\phi_j(z_1,\cdots,z_n) = \frac{\exp(z_j)}{\sum_{i=1}^n \exp(z_i)}\] is the softmax function. The inductive argument roughly goes as follows. By an inductive hypothesis, we assume that all players \(i\) have the \((h-1)^{\text{th}}\)-order finite differences of their strategy vectors bounded by \(D_{h-1}(x_i) = O(\eta^{h-1})\). This implies a bound on the finite differences of all players’ loss vectors \(D_{h-1}(\ell_i) = O(\eta^{h-1})\) as losses are a function of others’ strategies. This implies \(D_{h}(\bar{\ell}_i) = O(\eta^{h-1})\) as \(\bar{\ell}_i\) is a cumulative sum of the losses over time. Therefore, \(D_{h}(\eta \cdot \bar{\ell}_i) = O(\eta^{h})\) and the final step is to show \(\phi_j\left(-\eta \cdot \bar{\ell}^{(t)}_1, \cdots -\eta \cdot \bar{\ell}^{(t)}_n\right) = O(\eta^h)\).

For this final step, we need to prove that softmax has some nice property that allows for this “discrete chain rule” argument to be possible. How the math ends up shaking out, the important quantity ends up being the magnitude of the Taylor coefficients of the softmax function. We prove the following:

Let \(P_{\phi_j}(z) = \sum_{\gamma \in \mathbb{Z}_{\geq 0}^n} a_{j,\gamma} \cdot z^\gamma\) denote the Taylor series of \(\phi_j\). Then for any integer \(k \geq 1\),
\begin{align}
\sum_{\gamma \in \mathbb{Z}_{\geq 0}^n : \ |\gamma| = k} |a_{j,\gamma}| \leq \frac{1}{n}e^{\frac{k+1}{2}}
\end{align}

The first step, and one that I just want to skim over here as it is not the beautiful combinatorics I want to focus on, is that we have the equality
\[
\sum_{\gamma \in \mathbb{Z}_{\geq 0}^n : \ |\gamma| = k} |a_{j,\gamma}| =\frac{1}{k!}\sum_{t \in [n]^k} \left|\frac{\partial^k \phi_j(0)}{\partial z_{t_1} \partial z_{t_2}\cdots \partial z_{t_k}}\right|
\]

Namely, we can think of the important quantity we are trying to bound as the following: choose a \(t_1\) from \(1\) to \(n\); take the derivative \(\frac{\partial \phi_j}{\partial z_{t_1}}\); choose another \(t_2\) from \(1\) to \(n\); take the derivative \(\frac{\partial^2 \phi_j}{\partial z_{t_1}\partial z_{t_2}}\); repeat this process \(k\) times until we have \(\frac{\partial^k \phi_j}{\partial z_{t_1} \partial z_{t_2}\cdots \partial z_{t_k}}\). The quantity of interest comes from summing over all the possible choices of \((t_1,\cdots,t_k) \in [n]^k\) of the magnitude of \(\frac{\partial^k \phi_j}{\partial z_{t_1} \partial z_{t_2}\cdots \partial z_{t_k}}\) evaluated at \(0\).

Now, everybody learning about neural nets for the first time learns about the wonders of the sigmoid function as a non-linear activation \(\sigma(x) = \frac{e^x}{e^x+1}\). The main reason for this is it has such a beautiful derivative, making it ideal for back propagation: \(\sigma'(x)=\sigma(x)(1-\sigma(x))\). Softmax is the multidimensional extension of sigmoid and accordingly has a similarly nice derivative. This will form the basis of the argument. We have
\[\frac{\partial \phi_j}{\partial z_j} = \phi_j(1-\phi_j) \qquad \qquad \frac{\partial \phi_j}{\partial z_i} = -\phi_j\phi_i\]

These \((1-\phi_j)\) terms are gonna appear all throughout the analysis. We abbreviate them \(\phi^*_j = 1 – \phi_j\) and call them “perturbed \(\phi\)s”. We notice something interesting: when we take the derivative of a \(\phi\) or a perturbed \(\phi\) relative to any \(z_i\), we are left with the product of two \(\phi\)s (perhaps perturbed, perhaps not)
\[\frac{\partial \phi_j}{\partial z_j} = \phi_j\phi^*_j \qquad \frac{\partial \phi_j}{\partial z_i} = -\phi_j\phi_i \qquad \frac{\partial \phi^*_j}{\partial z_j} = -\phi_j\phi^*_j \qquad \frac{\partial \phi^*_j}{\partial z_i} = \phi_j\phi_i \]

This is going to have interesting implications on the value of \(\frac{\partial^k \phi_j}{\partial z_{t_1} \partial z_{t_2}\cdots \partial z_{t_k}}\). Let \(\phi^{?}\) denote a \(\phi\) that may or may not be perturbed. We see the following.

  • Before taking any derivatives, we have a single term: \(\phi_j\)
  • After one derivative, we have a product of the form: \(\pm \phi^?_j\phi^?_{t_1}\)
  • After two derivatives, we have: \(\pm \phi^?_j\phi^?_{t_1}\phi^?_{t_2} \pm \phi^?_j\phi^?_{t_1}\phi^?_{t_2}\)
  • After three derivatives, we have: \(\pm\phi^?_j\phi^?_{t_1}\phi^?_{t_2}\phi^?_{t_3}\pm\phi^?_j\phi^?_{t_1}\phi^?_{t_2}\phi^?_{t_3}\pm\phi^?_j\phi^?_{t_1}\phi^?_{t_2}\phi^?_{t_3}\pm\phi^?_j\phi^?_{t_1}\phi^?_{t_2}\phi^?_{t_3}\pm\phi^?_j\phi^?_{t_1}\phi^?_{t_2}\phi^?_{t_3}\pm\phi^?_j\phi^?_{t_1}\phi^?_{t_2}\phi^?_{t_3}\)
  • After \(k\) derivatives, we have the sum of \(k!\) terms each consisting of \((k+1)\) \(\phi\)s

This comes from the product rule of differentiation. If we have a product of \(\phi\)s and take the derivative with respect to some \(z_i\), we get
\begin{align*}
\frac{\partial}{\partial z_i} \phi^?_a \phi^?_b \phi^?_c &= \left ( \frac{\partial}{\partial z_i} \phi^?_a \right ) \phi^?_b \phi^?_c+\phi^?_a \left ( \frac{\partial}{\partial z_i} \phi^?_b \right ) \phi^?_c+\phi^?_a\phi^?_b \left ( \frac{\partial}{\partial z_i} \phi^?_c \right )\\
&= (\phi^?_a\phi^?_i)\phi^?_b\phi^?_c+\phi^?_a(\phi^?_b\phi^?_i)\phi^?_c+\phi^?_a\phi^?_b(\phi^?_c\phi^?_i)
\end{align*}

That is, for each \(\phi\) in the product, we have a corresponding term in the resulting summation that comes from taking the derivative of that \(\phi\). So, at the end of the process of evaluating \(\frac{\partial^k \phi_j}{\partial z_{t_1} \partial z_{t_2}\cdots \partial z_{t_k}}\), we are left with a sum of \(k!\) terms. We can enumerate these terms via the following generative process. At each time step \(i\) from \(1\) to \(k\), we will transform a product of \(i\) \(\phi\)s into a product of \((i+1)\) \(\phi\)s by selecting one of the \(i\) \(\phi\)s and taking the derivative of that \(\phi\) (which results in the product of two \(\phi\)s), and then multiplying back in the remaining \((i-1)\) \(\phi\)s. Going into the first derivative, we initialize our product to just be \(\phi_j\). This is a product of a singular \(\phi\), and we have 1 choice of \(\phi\) to select for differentiation. This leaves us with a product of two \(\phi\)s. So, at the second derivative, we have 2 choices of \(\phi\) for differentiation, each choice leaving us with a product of 3 \(\phi\)s. This goes on until we have a product of \(k\) \(\phi\)s and \(k\) choices of \(\phi\) to select for the final derivative. In this way, we can think of a bijection between the \(k!\) terms in the final summation and combinatorial structures we refer to as “factorial trees”.

Define a factorial tree to be a directed graph on vertices \(0,1,\cdots,k\). Each vertex \(i\) except \(0\) has exactly one incoming edge from some vertex \(j < i\). There are \(k!\) such trees: one choice for the parent of \(1\), two choices for the parent of \(2\), etc. The factorial tree corresponding to a particular term will have, for all \(i\), the parent of \(i\) be the index of the \(\phi\) that was selected for differentiation when taking the \(i^{\text{th}}\) derivative in the generative process (using \(0\) indexing).

The quantity of interest is the absolute value of the sum over all factorial trees of the corresponding product evaluated at \(0\). We upper bound this by the sum of the absolute values of the products, which will enable us to just ignore the \(\pm\) signs throughout the generative process.
\[\left|\sum_{\text{factorial trees }f} \pm \phi^?_{j}(0) \phi^?_{t_1}(0) \phi^?_{t_2}(0)\cdots \phi^?_{t_k}(0)\right| \leq \sum_{\text{factorial trees }f} \phi^?_{j}(0) \phi^?_{t_1}(0) \phi^?_{t_2}(0)\cdots \phi^?_{t_k}(0)\]

What each product will evaluate to when you plug in \(0\) is entirely dependent on how many of the \(\phi\)s are perturbed. We see
\[\phi_j(0)=\frac{1}{n} \qquad \phi_j^*(0)=1-\frac{1}{n}\]

Therefore, we can bound \(\phi^?_{j} \phi^?_{t_1} \phi^?_{t_2}\cdots \phi^?_{t_k} \leq (1/n)^{\text{number of unperturbed 𝜙s in the product}}\). Let’s formally go through our generative process to determine which \(\phi\)s are perturbed. To review
\[\left | \frac{\partial \phi_j}{\partial z_j}\right | = \phi_j\phi^*_j \qquad \left | \frac{\partial \phi_j}{\partial z_i}\right | = \phi_j\phi_i \qquad \left | \frac{\partial \phi^*_j}{\partial z_j}\right | = \phi_j\phi^*_j \qquad \left | \frac{\partial \phi^*_j}{\partial z_i}\right | = \phi_j\phi_i \]

We can interpret these differentiation rules as follows:

  1. Say we are taking the derivative with respect to \(z_i\) of a \(\phi_j\). If \(i \ne j\), then we will add an unperturbed \(\phi_i\) to the end of the current product. However, if \(i=j\), then we will add a perturbed \(\phi^*_i\) to the end of the product.
  2. If the \(\phi\) that is chosen for differentiation is perturbed: \(\phi^*_j\), it becomes unperturbed: \(\phi_j\)

These rules are a complete description of the behavior of these derivatives. Note, for example, that we can interpret \(\left | \frac{\partial \phi^*_j}{\partial z_j} \right | = \phi_j\phi^*_j\) as the unperturbation of an existing term \(\phi^*_j \to \phi_j\) as well as the addition of a new perturbed term \(\phi^*_j\) since the derivative with respect to \(z_j\) was directed at a \(\phi^*_j\) with matching index \(j\).

This leads to a clean parallel in the setting of factorial trees. We label the vertices of the factorial trees \(0,1,2,\cdots,k\) with the corresponding indeces of derivation \(j,t_1,t_2,\cdots,t_k\). Then, we can say the following. A vertex \(i\) corresponds to a perturbed \(\phi^*_{t_i}\) in the final product if and only if the parent of \(i\) had the same label \(t_i\) and if vertex \(i\) has no children. That is, the term \(\phi^*_{t_i}\) was perturbed when it got added, and it never got unperturbed as it was never selected for differentiation. We call such a leaf that has the same label as its parent a “petal”.

Wow, that was a lot of set up. But are you ready for the…

*•.BEAUTIFUL COMBINATORICS.•*

So, we have boiled the desired quantity down to the following. We are summing over all possible labelings \(t \in [n]^k\). For each such labeling, we sum over all possible factorial trees the quantity \((1/n)^{\text{number of non-petal vertices}}=(1/n)^{k+1}\cdot n^{\text{number of petals}}\). The beautiful double count: rather than fixing the labeling and summing over all possible trees, let’s fix the tree and sum over all possible labelings! Letting \(P_{f,t}\) denote the number of petals on tree \(f\) with labeling \(t\), we have
\begin{align*}
\frac{1}{k!} \sum_{t \in [n]^k}\sum_{\text{factorial trees }f} (1/n)^{k+1-P_{f,t}} &= \frac{1}{k!} \sum_{\text{factorial trees }f} \left((1/n)^{k+1} \sum_{t \in [n]^k}n^{P_{f,t}}\right)\\
&= \frac{1}{n}\mathbb{E}_f \left[ \mathbb{E}_t \left[n^{P_{f,t}}\right ]\right]
\end{align*}

For a fixed tree \(f\), the random variable \(P_{f,t}\) in \(t\) has a very simple interpretation. \(f\) has some fixed set of leaves \(L_f\). For a random labeling \(t \sim \mathcal{U}\left([n]^k\right)\), each leaf \(\ell \in L_f\) will have the same label as its parent with probability \(1/n\), independent of all other leaves. Therefore,
\[\mathbb{E}_t \left[n^{P_{f,t}}\right ] = \prod_{\ell \in L_f}\left(\frac{1}{n}\cdot n + \frac{n-1}{n} \cdot 1\right) \leq 2^{|L_f|}\]

We finish by bounding \(\frac{1}{n}\mathbb{E}_f \left[2^{|L_f|}\right]\). For a specific vertex \(\ell \in [0,k]\), we note that \(\ell \in L_f\) if and only if it is not the parent of any vertex \(\ell+1,\cdots,k\). So,
\begin{equation}
\Pr_{f}[\ell \in L_f] = \prod_{i=\ell+1}^{k} \frac{i-1}{i} = \frac{\ell}{k}
\end{equation}

Now, unlike the previous argument involving labels, these leaf events are not independent. Fortunately though, the correlation is negative here and works in our favor for an upper bound. That is, conditioning on one vertex \(\ell\) being a leaf, it is even less likely that other vertices are leaves, as the vertices \(\ell+1,\cdots,k\) have one fewer alternative option for their parent. We are able to show via an inductive argument that for any vertex set \(S \subseteq [0,k]\)
\begin{equation}
\Pr_{f}[S \subseteq L_f] \leq \prod_{\ell \in S} \frac{\ell}{k}
\end{equation}

This establishes a coupling between \(L_f\) and a random subset \(R \subseteq [0,k]\) generated from \(k+1\) mutually independent coin flips, where we include \(\ell \in R\) with probability \(\frac{\ell}{k}\) for all \(\ell \in [0,k]\). Therefore,
\begin{align*}
\frac{1}{n}\mathbb{E}_f \left[2^{|L_f|}\right] &\leq \frac{1}{n}\prod_{\ell=0}^k \left(\frac{\ell}{k}\cdot 2 + \frac{k-\ell}{k} \cdot 1\right)\\
&= \frac{1}{n}\prod_{\ell=0}^k \left(1+\frac{\ell}{k}\right)\\
&\leq \frac{1}{n}\exp\left(\sum_{\ell=0}^k \frac{\ell}{k}\right)\\
&=\frac{1}{n}e^{\frac{k+1}{2}}
\end{align*}
as desired.

Lemma B.4: Giant components in an unexpected place

This lemma comes into play when we are demonstrating that a bound on the Taylor coefficients of softmax implies that the “discrete chain rule” argument works. The lemma focuses on understanding the asymptotic behavior of the following quantity. Fix integers \(h,k \geq 1\). For a mapping \(\pi: [h] \to [k]\), let \(h_i(\pi) = |\{q \in [h] | \pi(q) = i\}|\) denote the number of integers that map to \(i\), for all \(i \in [k]\). The goal of the lemma is to investigate the asymptotic behavior of the quantity \(\sum_{\pi : [h] \to [k]}\prod_{i=1}^k h_i(\pi)^{Ch_i(\pi)}\) for some constant \(C\). Namely, we want to show, for sufficiently large \(C\),

\begin{equation}
\sum_{\pi : [h] \to [k]}\prod_{i=1}^k h_i(\pi)^{Ch_i(\pi)} \leq h^{Ch} \cdot poly(k,h)
\end{equation}

This is not at all obvious, as the summation is over \(k^h\) terms, and each term is as large as \(h^{Ch}\). Though, we can take advantage of the fact that most terms are much smaller than \(h^{Ch}\). The proof of the lemma presented in the paper isn’t particularly exciting. We use generating functions to bound the terms of the sum coming from two cases. The intuition behind the breakdown of the casework, though, stems from a very interesting connection: giant components in random graphs.

Consider a graph on \(h\) vertices, where \(\pi\) represents a coloring of each of the \(h\) vertices with one of \(k\) different colors. Then, \(\prod_{i=1}^k h_i(\pi)^{h_i(\pi)}\) represents the number of ways to draw a directed edge from each vertex to another vertex of the same color (allowing self-loops). Similarly, \(\prod_{i=1}^k h_i(\pi)^{Ch_i(\pi)}\) represents the number of ways to draw \(C\) monochromatic directed edges from each vertex (allowing repeats).

The crucial double-counting: rather than fixing a coloring \(\pi\) and then counting the monochromatic \(C\)-out digraphs on it, we instead fix a \(C\)-out digraph and count the number of colorings that have all edges monochromatic. This will be \(k^{\text{number of connected components}}\). This is the intuition behind the casework. We split into cases based on whether or not \(\max_i h_i > h/2\). Roughly, a small minority of the colorings have one color appear on over half the vertices, which is why the first case is boundable. Then, for the second case, when no color is present in the majority, there are very few \(C\)-out digraphs that have no edge-color conflicts, as a random \(C\)-out digraph will have a giant component with high probability for \(C \geq 2\) (we use \(C \geq 3\) in the proof for convenience). This gives the desired bound.


Multi-item Non-truthful Auctions Achieve Good Revenue (Simple, Credible and Approximately-Optimal Auctions)

Accepted to the twenty-first ACM conference on Economics and Computation (EC 20), this work establishes that a large class of multi-item auctions achieve a constant factor of the optimal revenue. Notably, we demonstrate the first revenue bounds for many common non-truthful auctions in the multi-item setting, settling an open problem in the literature. For the first time ever, we have guarantees that first-price-type auctions perform well. These are the most common auctions in practice, and until now, there was no certainty of their performance. We also establish the first approximately optimal static auction that is credible. What this means is that, the auctioneer cannot profitably deceive the bidders. A second price auction, for example, is not credible. Say 2 guys bid in a second price auction $100 and $50 respectively. As the auctioneer, I can lie to the $100 guy and tell him the other guy bid $99. This bidder would get charged $99 as he can’t disprove my claim. However, in the first price auction, you know you are paying your bid and there is no room for deception. We also obtain mechanisms with fixed entry fees that are amenable to tuning via online learning techniques.

Above is the talk co-author Santhoshini and I gave for the official conference. Below is a more introductory talk geared towards a general audience that is less familiar with auction theory.

The story behind this project is pretty crazy. It started as a final project in the graduate course Algorithmic Game Theory taught by Professor Costis Daskalakis and Vasilis Syrgkanis. I took it during the spring semester of my junior year of undergrad. I was absolutely obsessed with the class; it offered some of the coolest, most fascinating lectures I’d ever attended. I really wanted to do research with the professors. So, I decided to take on this really ambitious project. Here’s a video of Vasilis laughing at me for attempting the problem in a lecture hall of 50+ students.

My project partner Santhoshini Velusamy and I worked pretty hard at the problem for what remained of the semester. We ultimately produced a result on multi-item revenue optimality, but one that relied on bidder value distributions having a monotone hazard rate, an assumption often too strong in practice.

Flash forward to September: T minus 3 months till grad school apps were due. The safe thing to do was to try to squeeze out a new project in the time that remained, as this auction research was already in a decent place for undergraduate work. However, the mysteries of the problem still lingered in my mind, even after months of hiatus. So, I decided to contact Santhoshini and restart work on it. We had one final idea that we didn’t explore during the semester: an alternate formulation of multidimensional virtual value. It ended up being the crucial first step to the proof.

With this insight, the problem boiled down to one thing: bounding the welfare loss of the first price auction. As a non-truthful auction, first price is not always welfare optimal; sometimes the bidder that wants the item the most doesn’t win. However, we needed to demonstrate that this lost welfare \((Wel(OPT)-Wel(FP))\) was at most a constant factor of the revenue of Myerson’s auction. This puzzle turned from a fascination to an obsession. It seemed impossible to find a counter example, and yet the proof continued to allude me. I would spend 14 hour days thinking about it, wholly neglecting my homework and even my grad school apps.

Exactly 6 days before the apps were due, it finally clicked. It was a moment of pure joy followed by a moment of manically rewriting all of my apps to reflect the result. A pretty unforgettable way to finish off undergrad and the best 4 years of my life (so far).


Pattern Avoidance Over a Hypergraph

Accepted to the Electronic Journal of Combinatorics, this work proves a generalization of the Stanley-Wilf conjecture. The statement of the conjecture is as follows. There are \(n!\) (roughly \((n/e)^n\)) ways to permute the numbers \(1\) to \(n\). However, if we say that we only want to count permutations that avoid a particular subsequence pattern, then the total will only be singly exponential (roughly \(c^n\) for some constant \(c\)). For example, the number of permutations that do not contain increasing subsequences of length 3 is \(C_n\), the \(n\)th Catalan number (roughly \(4^n\)). The conjecture states that, no matter how crazy and convoluted the pattern is, we will always observe this singly exponential behavior.

The conjecture was settled in 2004 by Marcus and Tardos. The goal of this research was to generalize this result, where we only necessitate that the pattern be avoided at specific indices corresponding to the edges in a hypergraph.

This project was the first serious research I ever did, and it came about from somewhat humorous circumstances. It all started when I emailed Professor Asaf Ferber to complain that he gave me a B in his class Algebraic Combinatorics. He essentially responded with a challenge, “Do a research project with me next semester and prove to me you deserved an A.”

We met one on one a number of times that following semester, where he introduced me to the problem and the Hypergraph Containers method. I was working on the setting where the hypergraph edges are chosen randomly. Later, Asaf introduced me to Benjamin Gunby, a graduate student at Harvard who was working on the deterministic version of the problem. We came together and wrote a joint paper of our findings.

I never did end up getting my grade changed to an A, but looking back I realize that that B did more for me than any A ever could.