I was recently given a puzzle. It was a sequence of numbers and I was asked what the next number in the sequence was.
The sequence was 1, 11, 21, 1211, 111221, ….
An answer, and the answer that most people think is the only one, is 312211.
This particular answer is arrived at by noticing that we are just describing what the previous number is.
We start at 1. In the number 1 we only have one 1, so we put down 11 (read one one) as the next number. In the number 11 we have two 1’s so we put down 21 (read two one[‘s]). In 21 we have one 2 and one 1 and so we put down 1211. In 1211 we start off with one 1 followed by one 2 followed by two 1’s so we put down 111221. In 111221 we start with three 1’s followed by two 2’s followed by one 1 and so the next number is 312211.
This is not the only answer, in actual fact any number is a valid answer as the question is too ambiguous. Take the following formula:

Then plugging in the values 1, 2, 3, 4, 5 and 6 for n we get the sequence 1, 11, 21, 1211, 111221, 544151. There isn’t anything in the question which forbids the sequence from being generated from this formula so 544151 must be counted as a valid answer as well.
The problem is that there are an infinite number of formulae which will generate sequences starting with 1, 11, 21, 1211, 111221 so which one is the right one?
Well to answer that question and give us an insight into what we need to specify in the question to make it unambiguous, the rest of the post will talk about how to generate a formula for a sequence.
The standard way of doing this would be by solving a system of linear equations. Given a sequence of n points, you would generate n equations with n unknowns by modelling the sequence by a polynomial with degree n-1.
To illustrate the point given the sequence 1, 8, 19 we would model this sequence as a polynomial of degree 2, that is
.
We know that
. That gives us the three equations:

Even though solving 2 simultaneous equations with 2 unknowns is taught in high school, solving systems of linear equations of n equations with n unknowns is not taught in most high schools. Solving these systems of equations would usually be done by writing the system as a matrix and finding the matrix inverse via row or column operations.
However my aim is to show how to generate these formulae whilst hiding away the machinery that is taught in higher level maths courses.
The method I am going to present is going to involve no more than simple procedures of multiplication, division, subtraction and addition.
To start with we need to define some terminology.
We say the first order difference of a polynomial,
, at the point x is defined by
.
The nth order difference of a polynomial,
, at the point x is defined by
.
Even though the definitions may look scary, the following diagrams fully explain the concept of first order, second order, …., nth order differences using both an abstract and concrete example:

The thing you need to know about these sequences is that given a sequence generated by a polynomial,
, of degree n then
where c is the coefficient of the degree n term in the polynomial. If you are unfamilar with the notation n! then read this
for all
.
The second part of the above gives us a way to find out the degree of the polynomial generating a sequence and the first part allows us to calculate the coefficient of the highest power term. We can use this to calculate the polynomial that generates a particular sequence.
The procedure we use to find the polynomial is as follows. Define a polynomial, f, with no terms. Whilst the sequence of numbers (denoted
) is not all zero repeat the following. Work out the degree of the polynomial that generates the sequence and work out the coefficient of the highest power in this polynomial. Once we know the coefficient of the highest power term in the polynomial, say
, generate a new sequence of numbers by calculating
. Add
to the polynomial f.
This procedure works as in every step when we generate a new sequence this sequence is for a polynomial of a smaller degree than the last so we will eventually end up with a sequence of all zeros.
To demonstrate this consider the following sequence 5, 0, 1, 20, 69. We analyse the differences below

We see that the fourth order difference is 0 and so the sequence can be generated by a degree 3 polynomial. Furthermore we know that the third order differences will be
where c is the coefficient of the cubic term of this polynomial and the third order difference is 12 so we know the coefficient is 12/3! = 12/6 = 2. We set our f to be
. We now calculate
as the new sequence and repeat our process. Our new sequence is 3, -16, -53, -108, -181. Analysing the differences below we have

We see that the third order differences are 0 and so we have a sequence generated by a polynomial of degree 2 and that the coefficient of the degree 2 term is -18/2! = -18/2 = -9 and so our term for this step is
. Adding this to our f gives us
. Our new sequence is calculated as
,
,
,
,
, giving us the new sequence 12, 20, 28, 36, 44. Analysing the differences we have

The second order differences are 0 and so we have a sequence generated by a polynomial of degree 1 and that the coefficient of the degree 1 term is 8/1! = 8/1 = 8 and so our term for this step is
. Adding this to our f gives us
. Our new sequence is calculated as
, giving us the new sequence 4, 4, 4, 4, 4. Analysing the differences we have

The first order differences are 0 and so we have a sequence generated by a polynomial of degree 0 and that the coefficient of the degree 0 term is 4/0! = 4/1 = 4 and so our term for this step is just 4. Adding this to our f gives us
. Our new sequence is calculated as
, giving us the new sequence 0, 0, 0, 0, 0 and so we are complete. The final polynomial is
which can be checked by computing
.
If we return to the question of what we need to specify to make a sequence question unambiguous, can you now specify the necessary conditions?
In a follow up post I will discuss the necessary conditions as well as attach a document justifying the methods used in this post with some mathematical rigour.