dlshogiで遊んでみよう

AobaZeroに続いて，こちらもシリーズ化するんかもしれません。

割と初期（2017年春）から山岡さんの追っかけです。著書にサインも頂いているので宝物扱いですね。

bleu48.hatenablog.com

前回，TensorRTの威力を垣間見た気がするが，今日リリースされた囲碁ソフトのAQもTensorRT対応だそうです。

囲碁AI「GLOBIS−AQZ」をオープンソースとしてGitHubに公開しました！

・日本ルールと中国ルールの両方に対応
・ Lizzieによるグラフィカル解析表示
・ TensorRTによる推論エンジンの最適化・高速化

などの機能があります。
ぜひ検討・研究にご活用ください。https://t.co/wUaYLh70mP
— 山口祐 (@ymg_aq) 2020年5月11日

今日はdlshogiを少々弄ってみました。

バージョンはQhapaqの澤田さんがインターバルタイマーでPV表示するようにしたものです。

まず，普通に実行させて少々違和感がありました。npsが多く出ます。

再利用分のノード数も盛ったままnps計算されているようで読み筋が当たってると前回探索した分がどんと乗ってくるのでとんでもない数字になります。

まぁ，AobaZeroでもやってたアレをやります。MulitPVですね。

以下のような関数を作りました。print_pvの代わりに呼ぶと楽しくなります。

void print_mpv()
{
	unsigned int select_index = 0;
	const child_node_t* uct_child = uct_node[current_root].child;
	const int child_num = uct_node[current_root].child_num;

	int sorted_index[700];
	std::iota(sorted_index, sorted_index + child_num, 0);
	sort(sorted_index, sorted_index + child_num, [&](const int i, const int j) {return uct_child[i].move_count > uct_child[j].move_count;});

	for (int i = 0; i < min(6, child_num); i++) {
		select_index = sorted_index[i];
		if (uct_child[select_index].move_count > 1) {
			Move move = uct_child[select_index].move;
			const float best_wp = get_win_value_from_root(select_index);
			int cp;
			if (best_wp >= 0.999f) {
				cp = 30000;
			}
			else if (best_wp <= 0.001f) {
				cp = -30000;
			}
			else {
				cp = int(-logf(1.0f / best_wp - 1.0f) * 756.0864962951762f);
			}

			// PV表示
			string pv = move.toUSI();
			int max_count = 0;
			int depth = 1;
			unsigned int best_index = select_index;
			const child_node_t* best_node = uct_child;

			while (best_node[best_index].index != NOT_EXPANDED) {
				const int best_node_index = best_node[best_index].index;

				best_node = uct_node[best_node_index].child;
				max_count = 0;
				best_index = 0;
				for (int i = 0; i < uct_node[best_node_index].child_num; i++) {
					if (best_node[i].move_count > max_count) {
						best_index = i;
						max_count = best_node[i].move_count;
					}
				}

				if (max_count < 1)
					break;

				pv += " " + best_node[best_index].move.toUSI();
				depth++;
			}
			cout << "info multipv " << i + 1 << " nodes " << uct_child[select_index].move_count << " score cp " << cp << " depth " << depth << " pv " << pv << endl;
		}
	}
	// 探索にかかった時間を求める
	double finish_time = GetSpendTime(begin_time);
	cout << "info nps " << int(uct_node[current_root].move_count / finish_time) << " time " << int(finish_time * 1000) << " nodes " << uct_node[current_root].move_count << " hashfull " << uct_hash.GetUctHashUsageRate() << endl;
}

あとはgo infinite対応かと思えば，詰めルーチンが邪魔して検討エンジンに使うにはまだちょっと安定感がない感じです。