DB-hub Technology 未分类 Terminal Bench

Terminal Bench

Set default python
# Pick the default
sudo update-alternatives --config python

which python
alias python='python3'
        /usr/bin/python3
Install venv
mkdir tb
cd tb
python -m venv venv
source venv/bin/activate
Install the CLI
pip install terminal-bench
Run the task creation wizard

Task ID: client-side-routing
Domain: fullstack

mkdir tasks
tb tasks create
Directories
tasks
└── client-side-routing
    ├── docker-compose.yaml
    ├── Dockerfile
    ├── run-tests.sh
    ├── setup
    │   ├── about.html
    │   └── index.html
    ├── solution.sh
    ├── solution.js
    ├── task.yaml
    └── tests
        └── test_outputs.py
Create the task environment

This Dockerfile defines the environment an agent will be interacting with through the terminal. Use this Dockerfile to add any dependencies of the task. If your task requires a multi-container environment or custom configuration when launching the container, consider using our docker-compose.yaml instead of a single Dockerfile.

cd tasks/client-side-routing
vi Dockerfile

FROM ghcr.io/laude-institute/t-bench/ubuntu-24-04:20250624

# Install Python packages in system first
FROM python:3.12-slim

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
        tmux \
        asciinema \
    && rm -rf /var/lib/apt/lists/*

# Install Chrome & dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        wget \
        gnupg \
        unzip \
        curl \
        libnss3 \
        libxi6 \
        libgbm1 \
        libgtk-3-0 \
        xvfb \
    && rm -rf /var/lib/apt/lists/*


# Add Google Chrome repo & install stable Chrome
# Add Google Chrome repo the modern way
RUN wget -q -O /usr/share/keyrings/google-linux-signing-key.gpg https://dl.google.com/linux/linux_signing_key.pub \
    && echo "deb [arch=amd64 signed-by=/usr/share/keyrings/google-linux-signing-key.gpg] http://dl.google.com/linux/chrome/deb/ stable main" \
       > /etc/apt/sources.list.d/google-chrome.list \
    && apt-get update \
    && apt-get install -y google-chrome-stable \
    && rm -rf /var/lib/apt/lists/*


# Install ChromeDriver (matching Chrome version automatically)
# Install matching ChromeDriver for installed Chrome
RUN CHROME_VERSION=(google-chrome --version | awk '{print3}') \
    && MAJOR_VERSION=(echoCHROME_VERSION | cut -d. -f1) \
    && DRIVER_VERSION=(curl -s "https://googlechromelabs.github.io/chrome-for-testing/LATEST_RELEASE_MAJOR_VERSION") \
    && wget -q "https://storage.googleapis.com/chrome-for-testing-public/${DRIVER_VERSION}/linux64/chromedriver-linux64.zip" -O /tmp/chromedriver.zip \
    && unzip /tmp/chromedriver.zip -d /usr/local/bin/ \
    && mv /usr/local/bin/chromedriver-linux64/chromedriver /usr/local/bin/chromedriver \
    && chmod +x /usr/local/bin/chromedriver \
    && rm -rf /tmp/chromedriver.zip /usr/local/bin/chromedriver-linux64


RUN python3 -m pip install --no-cache-dir --upgrade pip setuptools wheel

# Create the TB venv and install packages
RUN python3 -m venv .venv \
    && . .venv/bin/activate \
    && pip install --no-cache-dir selenium pytest

# Copy  task files (including solution.js and solution.sh) into /app
COPY . .

Tested the Dockerfile:

docker build --progress=plain -t client-side-routing .

seems the node.js installation part had been skipped.

Problem

In your Dockerfile, you have:

# Install Node.js directly from official tarball (Node 20 LTS)
RUN curl -fsSL https://nodejs.org/dist/v20.8.1/node-v20.8.1-linux-x64.tar.xz
RUN tar -xJf node-v20.8.1-linux-x64.tar.xz -C /usr/local --strip-components=1
  • These two RUN commands are separate layers.
  • In Docker, each RUN starts a fresh shell in a new layer, so the tarball you downloaded in the first layer does not exist in the second layer.
  • That’s why Node.js never got installed.

Additionally:

You are using multi-stage builds: your first stage is the Ubuntu-based Node.js stage, but the final stage is:

FROM python:3.12-slim

This overwrites everything from the previous stage, so even if Node.js had been installed in the first stage, it’s gone in the final image.

Solution

You need a single stage.

create some HTML files
mkdir setup
cd setup

index.html

<!DOCTYPE html>
<html>
<head>
    <title>Home - SPA Routing</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <nav class="navbar">
        <a href="/" class="nav-link">Home</a>
        <a href="/about" class="nav-link">About</a>
        <a href="/contact" class="nav-link">Contact</a>
    </nav>

    <main id="content">
        <h1>Welcome to the Home Page</h1>
        <p>This is the main content of the home page.</p>
        <button id="counter-btn">Clicks: 0</button>
    </main>

    <!-- 故意设置的问题:未包含 JavaScript -->
</body>
</html>

about.html

<!DOCTYPE html>
<html>
<head>
    <title>About - SPA Routing</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <nav class="navbar">
        <a href="/" class="nav-link">Home</a>
        <a href="/about" class="nav-link">About</a>
        <a href="/contact" class="nav-link">Contact</a>
    </nav>

    <main id="content">
        <h1>About Our Company</h1>
        <p>Learn more about what we do.</p>
    </main>
</body>
</html>

Write the solution.sh file

In most Terminal/Bench-style environments, solution.sh is executed before the test scripts like test_outputs.py.

Here’s the usual flow:

1.solution.sh runs first

  • This sets up your solution (e.g., generates files, runs build steps, prepares output).
  • Any files or outputs it creates are what the test scripts will later check.

2.test_outputs.py runs next

  • It reads or inspects the files/output produced by solution.sh.
  • It verifies correctness according to the test cases.

So in this case:

  • solution.sh → runs node solution.js → creates setup/app.js and setup/index.html
  • test_outputs.py → checks that the files exist and possibly that they contain the correct content.

💡 Tip: This is why your solution.sh must generate everything the tests expect, otherwise test_outputs.py will fail.

solution.sh

#!/bin/bash

# Make sure Node.js is available
node solution.js

solution.js

// 步骤 1:创建主应用 JavaScript 文件
const fs = require('fs');

const appJsContent = `
// 客户端路由实现
class SPARouter {
    constructor() {
        this.routes = {
            '/': 'index.html',
            '/about': 'about.html', 
            '/contact': 'contact.html'
        };
        this.init();
    }

    init() {
        // 处理初始加载
        window.addEventListener('DOMContentLoaded', () => {
            this.navigate(window.location.pathname);
        });

        // 处理导航点击
        document.addEventListener('click', (e) => {
            if (e.target.matches('.nav-link')) {
                e.preventDefault();
                const path = e.target.getAttribute('href');
                this.navigate(path);
            }
        });

        // 处理浏览器前进/后退按钮
        window.addEventListener('popstate', () => {
            this.navigate(window.location.pathname, false);
        });
    }

    async navigate(path, pushState = true) {
        const file = this.routes[path] || 'index.html';

        try {
            // 获取页面内容
            const response = await fetch(file);
            const html = await response.text();

            // 提取 <main> 标签之间的内容
            const parser = new DOMParser();
            const doc = parser.parseFromString(html, 'text/html');
            const newContent = doc.querySelector('main').innerHTML;
            const newTitle = doc.querySelector('title').textContent;

            // 更新页面内容
            document.querySelector('main').innerHTML = newContent;
            document.title = newTitle;

            // 更新 URL 而不刷新页面
            if (pushState) {
                history.pushState({}, '', path);
            }

            // 重新附加动态内容的事件监听器
            this.attachDynamicListeners();

        } catch (error) {
            console.error('Navigation failed:', error);
            this.show404();
        }
    }

    attachDynamicListeners() {
        // 为动态内容(如计数器按钮)重新附加监听器
        const counterBtn = document.getElementById('counter-btn');
        if (counterBtn) {
            let count = 0;
            counterBtn.addEventListener('click', () => {
                count++;
                counterBtn.textContent = \`Clicks: \${count}\`;
            });
        }
    }

    show404() {
        document.querySelector('main').innerHTML = \`
            <h1>404 - Page Not Found</h1>
            <p>The requested page could not be found.</p>
        \`;
    }
}

// 初始化路由器
new SPARouter();
`;

// 步骤 2:更新 index.html 以包含 JavaScript
const indexHtmlContent = `
<!DOCTYPE html>
<html>
<head>
    <title>Home - SPA Routing</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <nav class="navbar">
        <a href="/" class="nav-link">Home</a>
        <a href="/about" class="nav-link">About</a>
        <a href="/contact" class="nav-link">Contact</a>
    </nav>

    <main id="content">
        <h1>Welcome to the Home Page</h1>
        <p>This is the main content of the home page.</p>
        <button id="counter-btn">Clicks: 0</button>
    </main>

    <script src="app.js"></script>
</body>
</html>
`;

// 写入文件
fs.writeFileSync('setup/app.js', appJsContent);
fs.writeFileSync('setup/index.html', indexHtmlContent);

console.log("SPA 路由实现完成!");

Create unit tests

test_outputs.py

import unittest
import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

class TestSPARouting(unittest.TestCase):

    def setUp(self):
        """启动 Selenium WebDriver"""
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--no-sandbox")
        self.driver = webdriver.Chrome(options=chrome_options)
        self.driver.get(f"file://{os.path.abspath('setup/index.html')}")
        self.wait = WebDriverWait(self.driver, 10)

    def test_initial_page_load(self):
        """测试首页是否正确加载"""
        content = self.driver.find_element(By.TAG_NAME, 'main')
        self.assertIn('Welcome to the Home Page', content.text)

    def test_navigation_without_reload(self):
        """测试页面间导航不会完全刷新页面"""
        # 点击 about 链接
        about_link = self.driver.find_element(By.XPATH, '//a[@href="/about"]')
        about_link.click()

        # 等待内容更新
        self.wait.until(
            EC.text_to_be_present_in_element((By.TAG_NAME, 'main'), 'About Our Company')
        )

        # 验证 URL 改变但没有刷新
        self.assertIn('/about', self.driver.current_url)

        # 验证仍在同一页面(没有完全刷新)
        nav = self.driver.find_element(By.TAG_NAME, 'nav')
        self.assertTrue(nav.is_displayed())

    def test_back_button_functionality(self):
        """测试浏览器后退按钮是否正常工作"""
        # 导航到 about 页面
        self.driver.find_element(By.XPATH, '//a[@href="/about"]').click()
        self.wait.until(EC.text_to_be_present_in_element(
            (By.TAG_NAME, 'main'), 'About Our Company'
        ))

        # 点击后退
        self.driver.back()

        # 应该返回首页
        self.wait.until(EC.text_to_be_present_in_element(
            (By.TAG_NAME, 'main'), 'Welcome to the Home Page'
        ))

    def test_dynamic_content_persistence(self):
        """测试导航后动态内容(如计数器)是否正常工作"""
        # 在首页点击计数器
        counter_btn = self.driver.find_element(By.ID, 'counter-btn')
        counter_btn.click()
        self.assertIn('Clicks: 1', counter_btn.text)

        # 导航到其他页面再返回
        self.driver.find_element(By.XPATH, '//a[@href="/about"]').click()
        self.wait.until(EC.text_to_be_present_in_element(
            (By.TAG_NAME, 'main'), 'About Our Company'
        ))

        self.driver.find_element(By.XPATH, '//a[@href="/"]').click()
        self.wait.until(EC.text_to_be_present_in_element(
            (By.TAG_NAME, 'main'), 'Welcome to the Home Page'
        ))

        # 计数器应该重置(新实例)
        counter_btn = self.driver.find_element(By.ID, 'counter-btn')
        self.assertIn('Clicks: 0', counter_btn.text)

    def tearDown(self):
        self.driver.quit()

Test your task with the Oracle Agent

non-interactive

tb run --agent oracle --task-id client-side-routing  --livestream

interactive

tb tasks interact -t client-side-routing
Install vim for debug
cat <<'EOF' > /etc/apt/sources.list
deb http://archive.ubuntu.com/ubuntu noble main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu noble-updates main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu noble-security main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu noble-backports main restricted universe multiverse
EOF

apt-get update
apt-get install -y vim nano

If you want this inside your Dockerfile:
NOTE: It’s used for debug only, don’t add to Dockerfile.

# Restore sources.list and install editors
RUN bash -c "echo 'deb http://archive.ubuntu.com/ubuntu noble main restricted universe multiverse' > /etc/apt/sources.list" && \
    echo 'deb http://archive.ubuntu.com/ubuntu noble-updates main restricted universe multiverse' >> /etc/apt/sources.list && \
    echo 'deb http://archive.ubuntu.com/ubuntu noble-security main restricted universe multiverse' >> /etc/apt/sources.list && \
    echo 'deb http://archive.ubuntu.com/ubuntu noble-backports main restricted universe multiverse' >> /etc/apt/sources.list && \
    apt-get update && \
    apt-get install -y vim nano && \
    rm -rf /var/lib/apt/lists/*

Related Post