# caBalS puking

Challenge description:

> Hey, did you hear the good news? We were finally able to get ADB access to
> kirschju’s phone. All we could find on the file system were two Signal
> backup files, though. Nevertheless, with hxp’s ever-delayed challenge
> development process, there is hope that we might find a flag in the backup.

> The first backup coincides closely with the registration time of the target
> Signal account, so we believe it represents the state of an empty account.
> The second backup file is a lot larger, for sure there’s some valuable piece
> of information in there.

> We’re also attaching the Signal app retrieved from the phone for your
> reference, but it doesn’t seem to be modified compared to a vanilla app.

Looking at [the challenge
files](https://raw.githubusercontent.com/NicolaiSoeborg/ctf-
writeups/master/2021/hxp%202021/caBalS%20puking/caBalS%20puking-2672391a1b33417f.tar.xz)
we get;

* `signal-2021-11-29-22-02-26.backup` ("initial database")  
* `signal-2021-11-30-00-18-47.backup` ("database w/ flag")  
* `Signal.apk`

## Analysing backup files

We found this great tool to decrypt Signal backups:

<https://github.com/mossblaser/signal_for_android_decryption>

We used this tool to analyse and understand the two backups.

```python  
initialisation_vector, salt = read_backup_header(backup_file)  
cipher_key, hmac_key = derive_keys(passphrase, salt)  
```

The header is a `BackupFrame`-without MAC (more on that later)

The passphrase is split into a cipher key and a MAC key by first doing
250000-rounds of SHA512 stretching and then splitting into two 32 bytes keys
using SHA256-HKDF.  
We don't know the key and the key is auto-generated by Signal when creating a
backup, so we can't attack these keys.

Normally a backup from Signal is a completely new, full dump, of the database
with a random IV and salt, but the two files we get has the same IV and salt
(!)

```  
IV = 87166ab8af3c58629ff5c5eb5b471ebc  
salt = 0e6621b28a618a652893e84299b8fc8204e80f0a3a00d612c31a8cb890f9f8e9  
```

The encryption primitive is [AES-
CTR](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)),
so this is _just_ a simple IV reuse attack!

To attack it we can take any known part of the initial database, XOR that with
the encrypted initial database (to get the keystream "at that point") and then
XOR that with the corresponding point in the database that has the flag. I.e:

```  
known_pt ^ initial_ct ^ flag_ct => flag_pt  
[_____ keystream _____]  
```

So we need to fulfill the following two assumptions:

> We can find known _cribs_ in the ciphertext and their offsets

E.g. it always starts/ends with a header/footer with known bytes)

> The offsets will be shifted or the two ciphertexts will differ at the offset

If the two ciphertext is identical at a specific offset and we _guess_ the
plaintext at that offset in the "initial backup", then we are not attacking
Counter Mode at that offset -- we are guessing and learning nothing new.

Instead what we hope happens is that some some parts partially overlap, which
means we will get a partial plaintext and because the underlying plaintext is
protobuf and SQLite we might be able to predict more bytes given the partial
plaintext. We can then use that newly predicted plaintext to recover more of
the initial database, and so forth.

So lets dig into the raw bytes so we can start the known-plaintext attack.

## Signal Backup Format (for Android)

The Signal `.backup` consist of multiple _chunks_.

Chunks without attachments has the following structure:

| **Length** | 4 | size - 10 | 10 |  
|-------------|------|-------------|-----|  
| **Content** | size | BackupFrame | MAC |

With `size` being a big-endian number and MAC is a HMAC-SHA256 using
`hmac_key`.

The field `BackupFrame` is the AES-CTR encrypted protobuf using the "global"
IV.  
Note: that the IV is increased after both every BackupFrame and Payload-chunk.

A BackupFrame-protobuf structure can contain multiple types, e.g. SqlStatement
(`INSERT INTO sms ...`), KeyValue-pairs, Preference, etc, but also a "out of
protobuf payload" for Attachment, Sticker and Avatars.

For the first types, the raw data will be inside the protobuf structure.  
For big data types the struct will contain a "has payload = true" field and a
size of the payload.

| **Length** | (defined in parent BackupFrame) | 10 |  
|-------------|---------------------------------|-----|  
| **Content** | Payload | MAC |

We can map the structure of the initial database by simply parsing the first
4-byte length field and then skipping that many bytes (to seek to the next
BackupFrame header).  
This works very well for the first 87 BackupFrames as these has no payloads,
but at offset 0x45c0 we find something which is too big to be the length of a
BackupFrame.

### Payloads in initial database

So which payloads are stored in the initial (empty) database? ... Stickers of
course!

![How would society function without this animated webp image your browser
probably cant display?](https://raw.githubusercontent.com/NicolaiSoeborg/ctf-
writeups/master/2021/hxp%202021/caBalS%20puking/sticker-3.webp)

Almost all of the file size of the empty backup is due to the 78 default
stickers.

So back to the initial problem, how do we know which sticker is starting at
offset 0x45c0?  
Solution: decrypt all of the default stickers from your own Signal database
and find the lengths of them, then for each length try to move the file
pointer that amount and see if the next chunk looks like a `BackupFrame`
header (i.e. starts with `\x00\x00` due to the BackupFrame being small).

We implemented this in [`find-
stickers.py`](https://raw.githubusercontent.com/NicolaiSoeborg/ctf-
writeups/master/2021/hxp%202021/caBalS%20puking/find-stickers.py) looking from
the bottom of the file and waiting for 4 bytes that points the the last known
"good offset".

The script quickly finds a single solution where each sticker is used exactly
once and all of the file is either mapped to a BackupFrame or a payload
matching a default sticker.

Because the initial database and the database with the flag diverge, we get a
sticker overlapping the flag!

I.e. we have something similar to the following structure:

| Initial Backup | Flag Backup |  
|----------------|-------------------------|  
| std settings | std settings |  
| Sticker 1 | `INSERT INTO mms(...)` |  
| Sticker 2 | Sticker 1 |  
| Sticker _n_ | Sticker 2 |  
| _EOF_ | Sticker _n_ |  
| | EOF |

Now we are back to the trivial nonce reuse attack, just needing to carefully
align the `sticker plaintext` XOR `initial db at sticker offset` XOR `flag db
at counter offset` and we get a decrypted chunk of the flag DB.

A big shout out to all the teammates in Kalmarunionen bringing useful insights
(killerdog, andyandpandy, eevee, etc), and of course hxp for making the
challenge.

![Image of a received MMS with the
flag](https://raw.githubusercontent.com/NicolaiSoeborg/ctf-
writeups/master/2021/hxp%202021/caBalS%20puking/flag.jpg)

Flag: `hxp{f0rmattin5+crypt0=<3}`

Original writeup (https://github.com/NicolaiSoeborg/ctf-
writeups/tree/master/2021/hxp%202021/caBalS%20puking/).