**Description**

> Solving CTF challenges often involves a lot of work, which is very unfair to
> lazy competitors. To even things out, we designed a self-solving challenge:
> Just type `make flag` and make yourself a coffee while the solution is
> computed.

**Files provided**

\-
[`Makefile`](https://github.com/EmpireCTF/empirectf/blob/master/writeups/2018-10-05-Hackover-
CTF/files/flagmaker)

**Solution**

> Note: this is a post-CTF analysis, I did not quite manage to get to the flag
> during the competition.

When I first downloaded this, I started `make flag` in the background while
looking at other challenges… As it turns out, even if `make` supported
arbitrarily big integers (which it doesn't), it would take an insane amount of
time to finish. I know my computer isn't the fastest and there were no solves
for this challenge for hours, so clearly it wasn't going to be that easy!

My first analysis was just a cursory glance through the functions in the
makefile.

```Makefile  
r = [-48, 2, -48, -8, -59, -18, 1, -59, 3, -5, -26, -57, 53, 3, -43, -3, -41,
-20, 1, -64, -65, -45, -71, -47, -16, -47, -38, -3, 46, -63, -54, 1, -49, 4,
-51, -45, -61, -46, -13, -4, -65, -48, -55, -51, -38, -64, -50, -5, -65, 2,
-54, -56, -1, -50, -28]

flag:  
@echo $(call l, $(o), $(n), $(p), $(q)) | sha256sum | python -c "import sys; print ''.join([chr(ord(c) - d) for c,d in zip(sys.stdin.read(),$(r))])" | tee $@  
```

The result of a very complex macro call to `l` was hashed using SHA-256, then
the hexadecimal digest was piped into a `python` one-liner, which used the
first 55 digits and shifted them using the `r` array.

Here I discovered the first method of solving this challenge. The output of
`python` was the flag, which had to be in the `hackover18{...}` format. The
input to `python` were lowercase hexadecimal digits, and there are only so
many of them. Based on this, some characters of the flag could be exactly
pinpointed. I did not actually try this dictionary attack. Maybe I should have
since apparently the first team to solve this used this method.

I was quite interested in how the script actually worked. The first step was
to create a dependency graph of the functions:

![](https://github.com/EmpireCTF/empirectf/raw/master/writeups/2018-10-05-Hackover-
CTF/screens/flagmaker.jpg)

This revealed that the complicated-looking mess of functions actually
completely separated into two almost tree-like dependency graphs, and only `l`
and `k` had a recursive dependency. With a graph like this it is easier to
reason about what each function does and what its inputs might represent.

Before analysing functions properly, I noticed one thing – the `q` variable:

```Makefile  
q = ++--+++++ ++-+-+--+ +++---++- ++++-+-++ --+-++-++ --+++++-- -++-+--+-
-+++---++ -+--+++-+ -+-++-+++ +-+-++++- +-++++-+- +---++--- +--++-+--  
```

`q` is passed to the `l` function as its last, 4th argument in the initial
call. When `l` calls `k` to recurse, it gives `q` unmodified to `k` as its
last, 5th argument. Likewise, when `k` calls `l` to recurse, it gives back `q`
unmodified to `l` as its 4th argument. `q` is indeed a constant, and even
though it is used within the functions in other places, the original value is
passed back and forth. I renamed `q` to `program`, since at this point I was
thinking the script is interpreting `program` as some sort of source code.

Let's first look at the functions referenced in `l`, which is also the initial
function called by `make`. Throughout the analysis, we can refer to the GNU
Makefile documentation on [text
functions](https://www.gnu.org/software/make/manual/html_node/Text-
Functions.html#Text-Functions) and [conditional
functions](https://www.gnu.org/software/make/manual/html_node/Conditional-
Functions.html#Conditional-Functions).

```Makefile  
a=$(subst -,- ,$(subst +,+ ,$(1)))  
d=$(if $(filter-out -,$(1)),,+)  
e=$(call a,$(filter $(subst $(call a) ,,$(strip $(1))$(strip $(2)))%,$(3)))  
j=$(word $(words $(1)), $(2))  
m=$(word 1,$(1)) $(or $(word 2,$(1)),+) $(or $(word 3,$(1)),-)

# l($1 ?, $2 ?, $3 ?, $4 program)  
l=$(if $(call d,$(call m,$(1))),$(words $(filter +,$(3))),$(call k,$(call e,
$(call m,$(1)), $(call j,$(2),$(3)),$(4)),$(call m,$(1)),$(2),$(3),$(4)))  
```

```Makefile  
# a($1 str)  
a=$(subst -,- ,$(subst +,+ ,$(1)))  
```

`a` takes a string, and separates all `-` and `+` with spaces. E.g. `$(call a,
--+-++) == - - + - + +`. Note that in `make`, spaces between the function name
and the first argument are ignored, and so are spaces after an argument comma
and the argument itself. Spaces at the end of arguments are preserved,
however, which is why `a` is not actually a no-op.

Also, at this point it is pretty safe to say that there seems to be no way for
`make` to produce strings containing anything other than `+`, `-`, or
whitespace.

```Makefile  
# e($1 prefix1, $2 prefix2, $3 program)  
e=$(call a,$(filter $(subst $(call a) ,,$(strip $(1))$(strip $(2)))%,$(3)))  
```

We know that the third argument for `e` is the `program`, since it is only
invoked in `l` and it gives `e` its last argument, the `program` constant.

Here it is important to check how the `filter` function of `make` works in
specific cases. `$(filter AB%, ABC AABC AB CAB) == ABC AB`. So `e` looks for
an exact prefix (created from its first two arguments) in the `program` words,
then takes that word and separates its `+` and `-` using `a`.

Note also the `subst` call, where `a` is called without any arguments
(resulting in an empty string). But `$(call a) ,` (with a space afterwards)
actually results in the string `" "` (space), so the `subst` call here removes
all spaces in the prefix, so it can be properly found in `program`.

```Makefile  
# d($1 str)  
d=$(if $(filter-out -,$(1)),,+)  
```

`d` checks if there are any characters other than `-` in its input. If yes, it
returns the empty string, otherwise it returns `+`.

We can now extend our assumption about the `+` and `-` characters. Given that
there are two kinds of symbols, it is natural to assume some sort of binary
encoding. The `l` function itself (as we'll see further down) uses `d` to
check if it should recurse or terminate. It will only terminate if `d` returns
a non-empty string, i.e. `+`. This will only happen if the input to `d` is all
`-`. Let's therefore understand `+` as a binary 1 and `-` as a binary 0.

`d` is then the stop condition checking if its input is equal to 0.

```Makefile  
# j($1 pos, $2 data)  
j=$(word $(words $(1)), $(2))  
```

`j` returns a part of its second argument based on its first argument. `words`
in `make` returns the number of words of its argument, `word` returns the n-th
(one-indexed) word of its argument.

There are very few situations I can think of where referencing data based on
the length of other data makes sense. With our `+` / `-` encoding, the only
thing that makes sense is a unary encoding – the first argument is a "number"
where `+ + +` represents `3`, `+ + + + + +` represents `6`, etc.

Then the second argument is some sort of array of bits ...?

```Makefile  
# m($1 state)  
m=$(word 1,$(1)) $(or $(word 2,$(1)),+) $(or $(word 3,$(1)),-)  
```

`m` seems to take three bits from its input. If the second bit is not present,
`+` is used. And if the third bit is not present `-` is used. The argument for
`m` is the first argument given to `l`, which comes from the `o` variable:

```Makefile  
o = +  
```

Clearly `o` always has at least one bit. When given to `m`, it is changed to
`++-`. So it seems that `m` ensures that the default value for its output is
`++-`, unless the input has enough data.

```Makefile  
# l($1 state, $2 pos, $3 data, $4 program)  
l=$(if $(call d,$(call m,$(1))),  
$(words $(filter +,$(3))),  
$(call k,  
$(call e,  
$(call m,$(1)),  
$(call j,$(2),$(3)),  
$(4)  
),  
$(call m,$(1)),  
$(2),  
$(3),  
$(4)  
)  
)  
```

Finally let's look at `l`. As explained in `d`, it stops recursing when its
first argument becomes 0. When it stops, it returns the number of `+` symbols
in `data`. Otherwise, it calls `k`.

Note that here we can see that the only possible way for this program to stop
is when `d` evaluates to `+` and causes `l` to count `+` symbols in `data`.
For a long time I was spending a lot of CPU resources to check a lot of
integers, SHA-256 hashing them, and checking if their prefix is `8c3c4df743a`
– which would then decode into `hackover18{`. Given how large the actual
target number is, this was a very futile effort. But, hindsight's 20/20!

`$(call m,$(1))` appears multiple times, and we already know that it simply
produces a default value if needed.

And finally, the `e` subcall resulting in the first argument of `k`. It takes
the potentially defaulted `state`, appends a single symbol from `data` based
on `pos`, and finds that prefix in the `program`.

Even without knowing `k` it might start to be more and more clear what this
program is. It is a [Turing
Machine](https://en.wikipedia.org/wiki/Turing_machine) simulator, with 3-bit
(up to 8) states, 2 symbols, and `data` is its tape. Given how long it
executes, we can also guess that it is a [Busy
beaver](https://en.wikipedia.org/wiki/Busy_beaver), i.e. a Turing machine
designed to run for as long as possible.

For the sake of completion, however, let's have a look at the functions
referenced by `k`.

```Makefile  
b=$(or $(if $(filter +,$(strip $(1))),$(2) +),$(if $(filter -,$(strip
$(1))),$(wordlist 2,$(words $(2)),$(2))),$(if $(filter N,$(strip $(1))),$(2)))  
c=$(if $(word 1,$(1)),$(if $(word $(words $(1)),$(2)),$(2),$(2) -),- $(2))  
f=$(wordlist 7,$(words $(1)),$(1))  
g=$(call b,$(word 6,$(1)), $(2))  
h=$(wordlist 1,$(words $(wordlist 2,$(words $(2)),$(2))),$(3)) $(word 5,$(1))
$(wordlist $(words $(2) -),$(words $(3)),$(3))  
i=$(or $(call g,$(1),$(2)),+)

# k($1 state spec, $2 state, $3 pos, $4 data, $5 program)  
k=$(call l,$(call f, $(1)),$(call i, $(1), $(3)),$(call c,$(call g, $(1),
$(3)),$(call h, $(1), $(3), $(4))),$(5))  
```

```Makefile  
# b($1 movement, $2 pos)  
b=$(or  
$(if $(filter +,$(strip $(1))),$(2) +),  
$(if $(filter -,$(strip $(1))),$(wordlist 2,$(words $(2)),$(2))),  
$(if $(filter N,$(strip $(1))),$(2))  
)  
```

`b` takes an argument `movement` that will always be either `+` or `-` and the
tape position (unary number). If `movement` is `+`, a `+` is appended to the
position - increment by 1. If `movement` is `-`, the first word of the
position is removed - decrement by 1. There is a check for `movement` `N`, but
it seems to be a red herring, since this can never happen.

```Makefile  
# c($1 pos, $2 data)  
c=$(if $(word 1,$(1)),$(if $(word $(words $(1)),$(2)),$(2),$(2) -),- $(2))  
```

`c` is a function to "normalise" the tape. Whenever `pos` becomes 0, it
prepends a `-` to `data`, thereby extending it to the left (`pos` itself is
then normalised in `i`). And whenever `pos` is larger than the length of
`data`, a `+` is appended to `data`. Note that the latter comparison is done
with `$(word $(words $(1)),$(2))`, which checks if the `pos`-th word exists in
`data`. If neither happens, `data` is left intact and returned as-is.

```Makefile  
# f($1 state spec)  
f=$(wordlist 7,$(words $(1)),$(1))  
```

`f` takes one of the (space-separated) words from `program` and returns its
7th through last words. We'll see soon the significance of this.

```Makefile  
# g($1 state spec, $2 pos)  
g=$(call b,$(word 6,$(1)), $(2))  
```

`g` takes one of the (space-separated) words from `program` and updates the
tape position `pos` based on its 6th word.

```Makefile  
# h($1 state spec, $2 pos, $3 data)  
h=$(wordlist 1,$(words $(wordlist 2,$(words $(2)),$(2))),$(3)) \  
$(word 5,$(1)) \  
$(wordlist $(words $(2) -),$(words $(3)),$(3))  
```

`h` is the function that actually takes care of writing symbols to the tape.
It takes `data` and returns it in three chunks - all the `data` words
(symbols) from beginning to `pos - 1`, then the 5th word of one of the (space-
separated) words from `program`, then all the `data` words (symbols) from `pos
+ 1` to the end. A Python equivalent would be `data[:pos] ++ [symbol] ++
data[pos + 1:]`.

```Makefile  
# i($1 state spec, $2 pos)  
i=$(or $(call g,$(1),$(2)),+)  
```

`i` just wraps `g`, but if the output of `g` is empty (i.e. `pos == 0`), it
returns `+` (i.e. `pos == 1`) instead. This is the other part of the position
normalisation. If `pos` becomes zero, the tape is extended to the left (see
`c` above), and `pos` is incremented to be one instead.

```Makefile  
# k($1 state spec, $2 state, $3 pos, $4 data, $5 program)  
k=$(call l,  
$(call f, $(1)),  
$(call i, $(1), $(3)),  
$(call c,  
$(call g, $(1), $(3)),  
$(call h, $(1), $(3), $(4))  
),  
$(5)  
)  
```

And finally `k`. This function always calls `l`, but it updates all of its
arguments, except for the `program`, which never changes. Let's finally crack
the `program` encoding, based on the functions we have just analysed.
`program` contains 14 words, each of which is composed of 9 `+` / `-` symbols.

```  
++--+++++  
++-+-+--+  
+++---++-  
++++-+-++  
\--+-++-++  
\--+++++--  
-++-+--+-  
-+++---++  
-+--+++-+  
-+-++-+++  
+-+-++++-  
+-++++-+-  
+---++---  
+--++-+--  
```

According to the various ways these symbols are used, we can split these words
into columns:

```  
++- - + + +++  
++- + - + --+  
+++ - - - ++-  
+++ + - + -++  
\--+ - + + -++  
\--+ + + + +--  
-++ - + - -+-  
-++ + - - -++  
-+- - + + +-+  
-+- + + - +++  
+-+ - + + ++-  
+-+ + + + -+-  
+-- - + + ---  
+-- + + - +--  
```

Note that the first column is always repated twice, with `-` then `+` in the
second column. The last column contains only strings that can be found in the
first column, with the exception of `---` (0, the halting state). The first
column is 3 bits, hence it can express 8 different numbers, but only 7 states
are present. Again, 0 is the halting state that is not described here.

Based on the analysis, we can describe these columns and use decimal
representations for the 3-bit ones:

```  
(bits) 123 4 5 6 789  
dec cur state cur tape write pos change next state dec  
6 ++- - + + +++ 7  
6 ++- + - + --+ 1  
7 +++ - - - ++- 6  
7 +++ + - + -++ 3  
1 --+ - + + -++ 3  
1 --+ + + + +-- 4  
3 -++ - + - -+- 2  
3 -++ + - - -++ 3  
2 -+- - + + +-+ 5  
2 -+- + + - +++ 7  
5 +-+ - + + ++- 6  
5 +-+ + + + -+- 2  
4 +-- - + + --- 0  
4 +-- + + - +-- 4  
```

Or in a more conventional notation, where the tuples represent (what to write,
which way to move, which state to go to):

| State | Current symbol: 0 | Current symbol: 1 |  
| --- | --- | --- |  
| 6 | (1, →, 7) | (0, →, 1) |  
| 7 | (0, ←, 6) | (0, →, 3) |  
| 1 | (1, →, 3) | (1, →, 4) |  
| 3 | (1, ←, 2) | (0, ←, 3) |  
| 2 | (1, →, 5) | (1, ←, 7) |  
| 5 | (1, →, 6) | (1, →, 2) |  
| 4 | (1, →, 0) | (1, ←, 4) |

The Turing machine works in steps on the tape. At each step, it looks at the
symbol at the current position on the tape. Based on the current symbol and
its current state, it writes a symbol to the tape, then moves either left or
right, and finally switches to another state (may be the same state).

The starting state is `++-` (6), thanks to the `m` function.

We can also see that the only way to get to state `0`, the halting state, is
from state `4`. State `4` itself does not modify the tape except for changing
a single symbol just before halting, so it is basically superfluous. So in
fact, this seems to be a 6-state busy beaver.

Now that we know all about this Turing machine simulator, we have another
problem: how to figure out its result. Busy beavers are *designed* to take as
long as possible, and the record holder for a 6-state BB produces more than
"3.514 x⋅10^{18267}" 1's on the tape. This obviously surpasses the memory
capabilities of any computer.

The good news is that we do not need to simulate the machine directly, nor do
we need to represent one symbol on the tape as one bit of real memory. Even
though the busy beaver machines take unfathomable number of steps before
halting, their behaviour is still modeled completely with a simple table, as
seen above. So the patterns they produce on the tape are expansive, but not
terribly complex. Instead of a billion ones, we can simply use a single symbol
that say "repeat 1 a billion times", since the number itself, even if
arbitrarily big, can still be represented in far less memory than the data.

This is where I was ~1 hour before the end of the CTF, in the process of
writing a simulator for the BB machine. From what I have seen of its
behaviour, my more efficient simulator would only need to compress long runs
of ones and zeros, and long runs of the alternating `0 1 0 1 ...` pattern. Had
I started earlier I might have finished the program, or perhaps had the much
more practical realisation that "somebody must have done this".

As it turns out, there was a page on the Internet with an efficient BB
simulator, though by chance it went offline shortly before the CTF. After the
CTF ended, an admin pointed me to [an archived version of the
site](http://web.archive.org/web/20160314012051/http://www.drb.insel.de/~heiner/BB/).
This site contains [an awk script to run BB
simulations](http://web.archive.org/web/20160305221928/http://www.drb.insel.de/~heiner/BB/bbsimtab.html#How),
as well as [some record-holding BB
machines](http://web.archive.org/web/20140717002127/http://www.drb.insel.de/~heiner/BB/bb-6list).

As mentioned above, the machine in this challenge has a superfluous state
which simply adds 1 to the number of 1's on the tape. Taking the number of
ones produced by one of the BB machines in the list (`Name = k`), adding 1 to
it, and taking the SHA-256 sum of it:

$ echo "2537699363594175843063 + 1" \  
| bc \  
| shasum -a 256 \  
| python -c "import sys; print ''.join([chr(ord(c) - d) for c,d in
zip(sys.stdin.read(),[-48, 2, -48, -8, -59, -18, 1, -59, 3, -5, -26, -57, 53,
3, -43, -3, -41, -20, 1, -64, -65, -45, -71, -47, -16, -47, -38, -3, 46, -63,
-54, 1, -49, 4, -51, -45, -61, -46, -13, -4, -65, -48, -55, -51, -38, -64,
-50, -5, -65, 2, -54, -56, -1, -50, -28])])"

Very creative challenge, I wish I did not waste some of my hours!

`hackover18{n0_beavers_were_h4rmed_gener4ting_this_fl4g}`

Original writeup
(https://github.com/EmpireCTF/empirectf/blob/master/writeups/2018-10-05-Hackover-
CTF/README.md#500-reverse--flagmaker).