# Data Structures and Structured Data

## WDYM, data structures?

We will keep it casual and skip formal definitions for now. đź‘€

Data structures give us principled ways to *stow away* information. Itâ€™s important to do this nicely based on what you want to *do* with the information.

For example, the notes you might be taking in this class is information. If you have no plans of revisiting them later, you can take them as you please, or better yet, not take them at all!

However, you want your notes optimised for giving you quality company during a 2AM revision session on exam day, competing with Maggi for attention, you want your notes to be competently taken: they donâ€™t have to be neat, and itâ€™s enough for them to be useful.

On the other hand, if you are taking notes so that a special someone who will inevitably miss a few classes will almost certainly ask for later, then you would be making notes to impress, and that potentially requires a different approach.

## Representing Polynomials

Letâ€™s say that you are spending a fine evening watching the #LockdownMath playlist from 3blue1brown. The first episode happens to be all about solving quadratics:

Now, itâ€™s quite natural to want to â€śwrite a programâ€ť, so to speak, that can take a quadratic equation such as x^2 - 7x + 12 as input and output its two roots.

Given that programs running on your phone are able to make suggestions, even if dubious, for what series to binge-watch next on Netflix, finding roots of quadratics should be a fairly benign exercise.

You might recall that most programs let you declare variables that can hold on to specific *types* of information, for instance: numbers, strings, and so forth. Our input doesnâ€™t â€ślookâ€ť like a number, so it would be a fair take to simply store it as a string:

`= "x^2 - 7x + 12"; px `

While this is a perfectly faithful representation, you can imagine that it would be slightly painful to work with. You would have to write some code that can â€śpull outâ€ť the parts of the string that represent the *numbers* you care about (in this example, b = -7 and c = 12), so that you can move on to your calculation, which is an *expression* involving numbers.

Given that a quadratic with the leading coefficient normalized to one is uniquely determined by two numbers, it seems a lot simpler to directly represent the polynomial as two integers instead:

```
= -7
px_b = 12 px_c
```

You might appreciate that this saves us quite some circus and we can quite directly get to the computation weâ€™re interested in. What if you cared about higher order polynomials? You may want to solve them (even if you run out of expressions for solutions pretty quickly, you might be interested in other ways of getting to the roots), or manipulate them in other ways (for example, by adding or multiplying them).

## Representing a Game - I

The game of 100 goes like this: I pick a number between 1 and 10, and then you pick one within the next ten numbers, and on and on. The first person to reach 100 wins.

What if you want to write a program that mimics the winning strategy?

Note that this game can go on for at most a 100 steps, and in fact exactly 20 steps (or ten rounds) when you employ said winning strategy. So one way to go about this is to declare 20 variables to track the 20 numbers exchanged between the players. But a momentâ€™s reflection may reveal that you *donâ€™t* need to store anything at all.

## Representing a Game - II

If ~~you missed the first class~~ you havenâ€™t played the Game of Trust, you are welcome to take a break and experience it now. Letâ€™s recollect the setup:

Suppose you want to implement your own version of this game, where the program responds to inputs from the user and plays according to a specific, pre-meditated strategy. Remember you have seen some strategies already:

We reproduce these strategies below:

Letâ€™s say that your program is designed to play 5 rounds and that your program is playing the copycat strategy. To begin with, you might want to declare a couple of variables to keep track of the scores of the players, and ten variables to track the moves of both players in each round. With this, your code may start out looking like this:

```
= 0
my_points = 0
user_points
= input("Input 1 for Cooperate and 0 for Cheat.")
user_move_1
//Sanity check input:
if(user_move_1 != 1 and user_move_1 != 0):
express disappointment and abort
```

```
// My first move is to cooperate:
+= -1
my_points += 3
user_points
if(user_move_1):
+= 3
my_points -= 1 user_points
```

Now your next move is determined by the value of `user_move_1`

, so you might proceed as follows.

```
= input("Input 1 for Cooperate and 0 for Cheat.")
user_move_2
//Sanity check input:
if(user_move_2 != 1 and user_move_2 != 0):
express disappointment and abort
```

```
// My next move is based on the user's first:
if(user_move_1):
+= -1
my_points += 3
user_points
if(user_move_2):
+= 3
my_points -= 1 user_points
```

â€¦and so on and on, you get the drift.

Now, suppose we come up with our own player, whom we call the **majority mover**. This player looks at your entire game history, and cooperates if you have cooperated more than you have cheated, and cheats if you have cheated more than you have cooperated, and acts randomly otherwise.

It seems like implementing the majority mover strategy would really require keeping track of everything. Or would it? You might observe at this point that itâ€™s enough to keep track of two counts: the number of rounds and the *number* of moves where the user has cheated: note that it does not matter when the cheats happened in the history of the game.

How about a **completely random** player? This one chooses a number **K** between 1 and N uniformly at random (letâ€™s not worry about *how* this is done for now, because that would be a story for another day), where N is the number of rounds played so far; and mimics the other playerâ€™s **K**th move. To implement this strategy, you really would need to keep track of the userâ€™s entire game history with the five variables, and also assume that you have a way of picking a number at random.

Finally, consider that instead of fixing your program to play five rounds â€” đźĄ± â€” you want to politely ask the user how m*any *rounds they want to play.

Well, for the first few players, this is just a matter of upgrading your for loop (which you should have switched to already when you realised that you donâ€™t need all. those. variables.) to use N: and you are done.

## Representing a subset of a deck of cards

If you are implementing a card^{1} game, you might need a mechanism for keeping track of â€śhandsâ€ť, or various subsets of cards. Letâ€™s say a *hand* is a subset of cards. For many games, you would need the ability to be able to quickly:

- tell if a particular card belongs to a hand or not,
- add a card to a hand,
- remove a card from a hand, and
- replace a card in a hand with another one.

One way to meet these requirements is to declare a collection of 52 boolean (i.e, true/false or 0/1) variables to represent the hand: the cards in the hand are set to true while cards that donâ€™t belong are set to false.

Hereâ€™a another way, though: you could agree on a notation for the cards in the deck: e.g, a standard one is to use a number, A/J/Q/K to denote the value, and S/C/D/H to denote the suit, so every card can be represented as a pair of characters. For example the Ace of Diamonds would be AD, the five of spades would be 5S and the King of Hearts would be KH. With this in place, you could represent a hand also by simply *concatenating* these string representations of the cards in the hand.

Now for this toy example, if you were to implement both methods and clock the time taken to implement the four operations above, you may not notice a major difference. However, for actual applications, you may be in a situation where your *subsets* (here, the â€śhandsâ€ť) may be coming from a large *universe* (here, the â€śdeckâ€ť). On the other hand, you may have a very large number of operations to take care of efficiently.

Your choice of method will again be driven by the requirements: the one thing to keep in mind is that you cannot have it all, but we can usually get pretty damn close!

## Footnotes

Assume you are working with the standard 52-card deck.â†©ď¸Ž