This commit is contained in:
Kanhaiya76618 2026-04-21 00:03:40 -04:00 committed by GitHub
commit 2c7f58a73b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 21 additions and 4 deletions

View file

@ -38,10 +38,10 @@ npm run test:e2e
## Running a specific set of tests
To run a subset of test files, you can use
`npm run <integration test command> <file_name1> ....` where &lt;integration
test command&gt; is either `test:e2e` or `test:integration*` and `<file_name>`
is any of the `.test.js` files in the `integration-tests/` directory. For
example, the following command runs `list_directory.test.js` and
`npm run <integration test command> <file_name1> ....` where
`<integration test command>` is either `test:e2e` or `test:integration*` and
`<file_name>` is any of the `.test.js` files in the `integration-tests/`
directory. For example, the following command runs `list_directory.test.js` and
`write_file.test.js`:
```bash

View file

@ -115,8 +115,19 @@ policy.** A subset that prove to be highly stable over time may be promoted to
settings).
- `assert`: An async function that takes the test rig and the result of the run
and asserts that the result is correct.
> **Note:** The `rig` parameter is an instance of `TestRig` from
> `@google/gemini-cli-test-utils`. It provides utilities to inspect the agent's
> state, files written, and tool calls made during the evaluation. For available
> methods and properties, see
> [`test-rig.ts`](../packages/test-utils/src/test-rig.ts).
- `log`: An optional boolean that, if set to `true`, will log the tool calls to
a file in the `evals/logs` directory.
- `approvalMode`: An optional string that controls how tool confirmations are
handled during the evaluation. Defaults to `'yolo'`, which means all tool
calls are auto-approved so evals can run non-interactively without requiring
manual confirmation.
### Example
@ -176,6 +187,12 @@ of the individual executions passed.
Googlers can schedule a manual run against their branch by clicking the link
above.
> **Note for external contributors:** If you are not a Googler, you do not have
> access to manually trigger the nightly workflow. This is expected — simply
> open your pull request with your changes and leave a comment asking a
> maintainer to trigger the nightly eval run for you. Maintainers are happy to
> help once they have done an initial review of your PR.
Tests should score at least 66% with key models including Gemini 3.1 pro, Gemini
3.0 pro, and Gemini 3 flash prior to check in and they must pass 100% of the
time before they are promoted.