mirror of
https://github.com/google-gemini/gemini-cli
synced 2026-04-21 13:37:17 +00:00
Merge 63aae3de4a into a38e2f0048
This commit is contained in:
commit
2c7f58a73b
2 changed files with 21 additions and 4 deletions
|
|
@ -38,10 +38,10 @@ npm run test:e2e
|
|||
## Running a specific set of tests
|
||||
|
||||
To run a subset of test files, you can use
|
||||
`npm run <integration test command> <file_name1> ....` where <integration
|
||||
test command> is either `test:e2e` or `test:integration*` and `<file_name>`
|
||||
is any of the `.test.js` files in the `integration-tests/` directory. For
|
||||
example, the following command runs `list_directory.test.js` and
|
||||
`npm run <integration test command> <file_name1> ....` where
|
||||
`<integration test command>` is either `test:e2e` or `test:integration*` and
|
||||
`<file_name>` is any of the `.test.js` files in the `integration-tests/`
|
||||
directory. For example, the following command runs `list_directory.test.js` and
|
||||
`write_file.test.js`:
|
||||
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -115,8 +115,19 @@ policy.** A subset that prove to be highly stable over time may be promoted to
|
|||
settings).
|
||||
- `assert`: An async function that takes the test rig and the result of the run
|
||||
and asserts that the result is correct.
|
||||
|
||||
> **Note:** The `rig` parameter is an instance of `TestRig` from
|
||||
> `@google/gemini-cli-test-utils`. It provides utilities to inspect the agent's
|
||||
> state, files written, and tool calls made during the evaluation. For available
|
||||
> methods and properties, see
|
||||
> [`test-rig.ts`](../packages/test-utils/src/test-rig.ts).
|
||||
|
||||
- `log`: An optional boolean that, if set to `true`, will log the tool calls to
|
||||
a file in the `evals/logs` directory.
|
||||
- `approvalMode`: An optional string that controls how tool confirmations are
|
||||
handled during the evaluation. Defaults to `'yolo'`, which means all tool
|
||||
calls are auto-approved so evals can run non-interactively without requiring
|
||||
manual confirmation.
|
||||
|
||||
### Example
|
||||
|
||||
|
|
@ -176,6 +187,12 @@ of the individual executions passed.
|
|||
Googlers can schedule a manual run against their branch by clicking the link
|
||||
above.
|
||||
|
||||
> **Note for external contributors:** If you are not a Googler, you do not have
|
||||
> access to manually trigger the nightly workflow. This is expected — simply
|
||||
> open your pull request with your changes and leave a comment asking a
|
||||
> maintainer to trigger the nightly eval run for you. Maintainers are happy to
|
||||
> help once they have done an initial review of your PR.
|
||||
|
||||
Tests should score at least 66% with key models including Gemini 3.1 pro, Gemini
|
||||
3.0 pro, and Gemini 3 flash prior to check in and they must pass 100% of the
|
||||
time before they are promoted.
|
||||
|
|
|
|||
Loading…
Reference in a new issue