Merge 63aae3de4a into a38e2f0048

2026-04-21 13:37:17 +00:00 · 2026-04-21 00:03:40 -04:00 · 2026-04-21 00:03:40 -04:00 · 2c7f58a73b
commit 2c7f58a73b
parent a38e2f0048 63aae3de4a
2 changed files with 21 additions and 4 deletions
--- a/docs/integration-tests.md
+++ b/docs/integration-tests.md
@ -38,10 +38,10 @@ npm run test:e2e
 ## Running a specific set of tests

 To run a subset of test files, you can use
-`npm run <integration test command> <file_name1> ....` where &lt;integration
-test command&gt; is either `test:e2e` or `test:integration*` and `<file_name>`
-is any of the `.test.js` files in the `integration-tests/` directory. For
-example, the following command runs `list_directory.test.js` and
+`npm run <integration test command> <file_name1> ....` where
+`<integration test command>` is either `test:e2e` or `test:integration*` and
+`<file_name>` is any of the `.test.js` files in the `integration-tests/`
+directory. For example, the following command runs `list_directory.test.js` and
 `write_file.test.js`:

 ```bash
--- a/evals/README.md
+++ b/evals/README.md
@ -115,8 +115,19 @@ policy.** A subset that prove to be highly stable over time may be promoted to
  settings).
 - `assert`: An async function that takes the test rig and the result of the run
  and asserts that the result is correct.
+
+> **Note:** The `rig` parameter is an instance of `TestRig` from
+> `@google/gemini-cli-test-utils`. It provides utilities to inspect the agent's
+> state, files written, and tool calls made during the evaluation. For available
+> methods and properties, see
+> [`test-rig.ts`](../packages/test-utils/src/test-rig.ts).
+
 - `log`: An optional boolean that, if set to `true`, will log the tool calls to
  a file in the `evals/logs` directory.
+- `approvalMode`: An optional string that controls how tool confirmations are
+  handled during the evaluation. Defaults to `'yolo'`, which means all tool
+  calls are auto-approved so evals can run non-interactively without requiring
+  manual confirmation.

 ### Example

@ -176,6 +187,12 @@ of the individual executions passed.
 Googlers can schedule a manual run against their branch by clicking the link
 above.

+> **Note for external contributors:** If you are not a Googler, you do not have
+> access to manually trigger the nightly workflow. This is expected — simply
+> open your pull request with your changes and leave a comment asking a
+> maintainer to trigger the nightly eval run for you. Maintainers are happy to
+> help once they have done an initial review of your PR.
+
 Tests should score at least 66% with key models including Gemini 3.1 pro, Gemini
 3.0 pro, and Gemini 3 flash prior to check in and they must pass 100% of the
 time before they are promoted.