What do you genuinely just not understand?

(this comment might look daunting, but it's really not so bad. Feel free to respond with questions if anything is unclear, confusing, incomplete, or if I've made any typos or calculation errors).

Matrix multiplication is very natural if you understand where it comes from! Unfortunately, most high school (and a lot of university) courses do a really bad job of explaining the why of anything to do with matrices.

Consider the following motivating toy problem. The problem looks very complicated, but using matrices we can solve it very simply and directly. (You can just skim the problem to get the gist; the numbers don't really matter, and obviously it's much simpler than what would actually be required in the real world).

You are running a very large restaurant, and need to plan your workers' schedules for an especially busy night.

You know the following about who will show up:

  • 8 couples have reservations

  • 4 singles have reservations

  • 5 families have reservations

  • 3 large parties have reservations

  • 2 childrens' parties has reservations

(you can assume no one else will show up to the restaurant tonight)

The breakdown of each of these groups are the following:

  • couples have 2 adults each

  • singles have 1 adult each

  • families have 2 adults, 2 children, and 1 elderly person each

  • large parties have 5 adults, 2 elderly persons, and 2 children each

  • childrens' parties have 3 adults and 9 children each

In addition, you know what each kind of person will need in terms of food and utensils:

  • each adult will need 1 appetizer, 1 entree, 1 dessert, and 1 drink

  • each child will need 1 entree and 1 dessert

  • each elderly person will need 1 entree, and 1 drink

Now, for our actual staffing:

  • each appetizer requires 10 minutes of cook time, 5 minutes of waiter time, and 1 minutes of wash time

  • each entree requires 15 minutes of cook time, 5 minutes of waiter time, and 2 minutes of wash time

  • each drink requires 1 minute of cook time, 1 minute of waiter time, and 1 minutes of wash time

  • each dessert requires 9 minutes of cook time, 10 minutes of waiter time, and 2 minutes of wash time

So, how much time will be spent cooking, waiting, and washing?

Actually answering this problem is not too hard, but it is somewhat tedious. I will go through how you can do it:

First, let's count how many adults, children, and elderly people we expect:

  • adults: 2 * 8 couples + 1 * 4 singles + 2 * 5 families + 5 * 3 large parties + 3 * 2 childrens' parties
  • children: 0 * 8 couples + 0 * 4 singles + 2 * 5 families + 2 * 3 large parties + 9 * 2 childrens' parties
  • elderly people: 0 * 8 couples + 0 * 4 singles + 1 * 5 families + 2 * 3 large parties + 0 * 2 childrens' parties

These come out to

  • adults: 51
  • children: 34
  • elderly people: 11

Next, let's count the number of dishes of each type:

  • appetizer: 1 * 51 adults + 0 * 34 children + 0 * 11 elderly people
  • entree: 1 * 51 adults + 1 * 34 children + 1 * 11 elderly people
  • drink: 1 * 51 adults + 0 * 34 children + 1 * 11 elderly people
  • dessert: 1 * 51 adults + 1 * 34 children + 0 * 11 elderly people

These come out to:

  • appetizer: 51
  • entree: 96
  • drink: 62
  • dessert: 85

Lastly, let's find out how much cook, waiting, and washing time is needed:

  • cook: 10 * 51 appetizers + 15 * 96 entrees + 1 * 62 drinks + 9 * 85 desserts
  • wait: 5 * 51 appetizers + 5 * 96 entrees + 1 * 62 drinks + 10 * 85 dessert
  • wash: 1 * 51 appetizers + 2 * 96 entrees + 1 * 62 drinks + 2 * 85 desserts

These come out to our answer:

  • cook: 2777 minutes
  • wait: 1647 minutes
  • wash: 475 minutes

That was a lot of work!

Suppose I said that oh, wait, actually, there aren't 8 couples; there are 9. And there aren't 3 large parties, there are 4. And one of the childrens' parties dropped out.

How does our answer change? Well, doing it like we just did, we'd have to start all over. That's terrible! Let's device a more-efficient way of arriving at this answer.

The first step is to write shorter notes. I described everything in prose, except for a few expressions that mix addition and multiplication and some prose. Let's re-examine one of these:

  • cook: 10 * 51 appetizers + 15 * 96 entrees + 1 * 62 drinks + 9 * 85 desserts
  • wait: 5 * 51 appetizers + 5 * 96 entrees + 1 * 62 drinks + 10 * 85 dessert
  • wash: 1 * 51 appetizers + 2 * 96 entrees + 1 * 62 drinks + 2 * 85 desserts

Note the repetition of numbers (51, 104, 66, and 89 all show up 3 times). In addition, they all follow the same _*_ + _*_ + _*_ + _*_ structure. We will use the fact that the structure is repetitive to avoid writing details that are always the same.

Let's write out all the steps in matrix form (remember, we compute everything in the same way as before, this is just a shorter way of writing down our work):

Step 1:

  • 8cpl 4sng 5fam 3lrg 2chl `
  • adults: [2 1 2 5 3]
  • children: [0 0 2 2 9]
  • elderly people: [0 0 1 2 0]

Step 2:

  • 51 adult 34 child 11 eld
  • appetizer: [1 0 0]
  • entree: [1 1 1]
  • drink: [1 0 1]
  • dessert: [1 1 0]

Step 3:

  • 51 app 96 entree 62 drink 85 dessert
  • cook: [10 15 1 9]
  • wait: [ 5 5 1 10]
  • wash: [ 1 2 1 2]

Answer:

  • 2777 cook 1647 wait 475 wash

This is our first matrix. We haven't explained yet why it makes sense to add or multiply matrices, though.

So, where do matrix operations come into this?

The first matrix operation we will examine is matrix-vector multiplication. We've been doing matrix-vector multiplications this whole time, and you probably didn't realize it!

                           [ 8 ]
[51]   [2   1   2   5   3] [ 4 ]
[38] = [0   0   2   2   9] [ 5 ]
[15]   [0   0   1   2   0] [ 3 ]
                           [ 2 ]

This matrix-vector multiplication expresses Step 1 in the above calculation!

When we write a column vector to the right of a matrix, we mean that you should compute a new vector:

It has one row for every row of the matrix.

To compute the value in each row, go across that row in the matrix, and down the vector.

For example,

51 = 2 * 8 + 1 * 4 + 2 * 5 + 3 * 3 + 3 * 2

This is the exact same computation we were doing above! Matrix multiplication against vectors is defined to do the most-useful thing!

Now, how about multiplying matrices?

Well, we know that:

                           [ 8 ]
[51]   [2   1   2   5   3] [ 4 ]
[38] = [0   0   2   2   9] [ 5 ]
[15]   [0   0   1   2   0] [ 3 ]
                           [ 2 ]

We also know that (Step 2):

[ 51]     [1   0   0] [51]
[104]  =  [1   1   1] [38]
[ 66]     [1   0   1] [15]
[ 89]     [1   1   0]

But when things are equal, we can generally plug them into equations. Let's replace [51 ; 38 ; 15] with the matrix-vector multiplication from Step 1:

                                          [ 8 ]
[ 51]     [1   0   0] [2   1   2   5   3] [ 4 ]
[104]  =  [1   1   1] [0   0   2   2   9] [ 5 ]
[ 66]     [1   0   1] [0   0   1   2   0] [ 3 ]
[ 89]     [1   1   0]                     [ 2 ]

This new, larger equation is still true: we do the matrix-vector multiplication at the right first, which gives us a vector. Then, we multiply that vector by the left matrix to obtain the left-hand-side vector.

Now it turns out that matrix multiplication will let us combine steps 1 and 2 together so that we don't need to do both of them! We can directly get the number of dishes from the types of groups that are arriving!

Matrix multiplication goes like this:

If the first matrix is (N by M), meaning it has N rows and M columns, and the second is (M by K), meaning it has M rows and K columns, then the result is (N by K).

In this case, we have a (4 by 3) multiplied with a (3 by 5).

So, the result will be a (4 by 5).

[? ? ? ? ?]   [1 0 0] [2 1 2 5 3]
[? ? ? ? ?] = [1 1 1] [0 0 2 2 9]
[? ? ? ? ?]   [1 0 1] [0 0 1 2 0]
[? ? ? ? ?]   [1 1 0]

To compute any given spot, you need to find the corresponding row in the left matrix, and column in the right matrix:

[_ _ _ _ _]   [_ _ _] [_ _ ! _ _]
[_ _ _ _ _] = [_ _ _] [_ _ ! _ _]
[_ _ X _ _]   [! ! !] [_ _ ! _ _]
[_ _ _ _ _]   [_ _ _]

Then, you multiply the corresponding values (remember, the first matrix has exactly as many columns as the second one has rows, so they'll always match up):

X = 1 * 2 + 0 * 2 + 1 * 1 = 3.

We repeat this for every spot:

[2  1  2  5  3]   [1 0 0] [2 1 2 5 3]
[2  1  5  9 12] = [1 1 1] [0 0 2 2 9]
[2  1  3  7  3]   [1 0 1] [0 0 1 2 0]
[2  1  4  7 12]   [1 1 0]

Now, if we multiply by this matrix, we get the same answer as multiplying by both matrices in series:

[2  1  2  5  3] [8]   [51]
[2  1  5  9 12] [4] = [96]
[2  1  3  7  3] [5]   [62]
[2  1  4  7 12] [3]   [85]
                [2]

If you do the third matrix multiplication, then we collapse all three steps into one!

Then, we can do about 1/3 the work to answer the original question whenever we change the input numbers (how many couples, singles, families, and groups there are).

This should answer why matrix multiplication is useful: it's compatible with matrix-vector multiplication (which you should already agree is useful from the above) and it allows you to collapse multiple matrices together without changing the answer.

As for why it doesn't change the answer... I will leave that as an exercise to the reader. See if you can convince yourself that this really is the case!

/r/AskReddit Thread Parent