Resolves#31890
This new approach allows up to 1000 consecutive failing requests per
minute.
If the threshold of 1000 consecutive failures is reached for an IP, then
we ban request (return 429) from such IP for a duration of 1 minute.
(Any successful request for an IP clears the count.)
This supports the scenario where all hosts are behind a NAT (same IP)
AND still provides protection against brute force attacks (attackers can
only probe 1k requests per minute).
This approach was discussed in Slack with @rfairburn:
https://fleetdm.slack.com/archives/C051QJU3D0V/p1755625131298319?thread_ts=1755101701.844249&cid=C051QJU3D0V.
- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
## Testing
- [X] Added/updated automated tests
- [X] Where appropriate, [automated tests simulate multiple hosts and
test for host
isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing)
(updates to one hosts's records do not affect another)
- [X] QA'd all new/changed functionality manually
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- New Features
- Introduced IP-based rate limiting for Fleet Desktop endpoints to
better support many hosts behind a single public IP (NAT). Requests from
abusive IPs may be temporarily blocked, returning 429 Too Many Requests
with a retry-after hint.
- Documentation
- Added README for a new desktop rate-limit tester, describing usage and
expected behavior.
- Tests
- Added integration tests covering desktop endpoint rate limiting and
Redis-backed banning logic.
- Chores
- Added a command-line tool to stress-test desktop endpoints and verify
rate limiting behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
#31592
There's still some QA to be done for edge cases and re-connects, but
this is ready for review.
<img width="341" height="103" alt="Screenshot 2025-08-07 at 11 19 33 AM"
src="https://github.com/user-attachments/assets/01e48ca2-8ab1-412c-be01-8e806a5a8b1c"
/>
Changes:
- To improve UX I'm now using `HEAD /api/fleet/device/ping` API every 10
seconds for connectivity/offline check (instead of the expensive
DesktopSummary one every 5 minutes). This is to address feedback from a
customer:
> "If the internet is not connected and we reconnect with an ethernet
connection for example, it would be good to try to see if we can refresh
it text from the offline indicator given that's not the case anymore.
- It might take up to 1m for Fleet Desktop to show the offline indicator
(we check every 10s with ping and now we are adding 6 more requests in 1
minute to make sure just one bad request doesn't unnecessarily display
the offline indicator).
- Requests without proper public IP were being incorrectly rate limited
(all under the same bucket). So we will now not make these requests and
instead log a WARNING. This is a-ok as the recommended approach to
deploy Fleet is with a TLS terminator that will add the public IP of the
request before sending it to Fleet.
---
- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
## Testing
- [X] Added/updated automated tests
- [ ] Where appropriate, [automated tests simulate multiple hosts and
test for host
isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing)
(updates to one hosts's records do not affect another)
- [ ] QA'd all new/changed functionality manually
## fleetd/orbit/Fleet Desktop
- [ ] Verified compatibility with the latest released version of Fleet
(see [Must
rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md))
- [ ] Verified that fleetd runs on macOS, Linux and Windows
- [ ] Verified auto-update works from the released version of component
to the new version (see [tools/tuf/test](../tools/tuf/test/README.md))
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved accuracy in identifying client public IP addresses, reducing
incorrect rate limiting for Fleet Desktop users.
* Offline indicator is now less sensitive to brief network
interruptions, reducing false offline signals and allowing faster
recovery when connectivity is restored.
* Updated offline message for clearer status communication.
* **New Features**
* Enhanced error messages and logging for rate limiting events,
providing clearer feedback when limits are reached.
* **Tests**
* Expanded test coverage for rate limiting, including scenarios with
missing public IPs and improved assertions for error handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Also adds some more rate limiter tests to make sure separate rate limit
buckets interact as expected.
Fixes#29614.
# Checklist for submitter
If some of the following don't apply, delete the relevant line.
<!-- Note that API documentation changes are now addressed by the
product design team. -->
- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.
- [x] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- For new Fleet configuration settings
- [x] Verified that the setting can be managed via GitOps, or confirmed
that the setting is explicitly being excluded from GitOps. (excluded;
env var or YAML)
- [x] Added/updated automated tests
- [ ] Manual QA for all new/changed functionality
---------
Co-authored-by: George Karr <georgekarrv@users.noreply.github.com>
Co-authored-by: Noah Talerman <47070608+noahtalerman@users.noreply.github.com>
#16076
This change removes ineffective rate limit to `/api/fleet/device/ping`
and `api/fleet/orbit/ping`.
Currently these endpoints are not rate limited, because the rate
limiting used in these was the `errorLimiter` which only takes effect if
the request fails and the ping endpoints never fail. So... we were
making ineffective Redis accesses on every `/api/fleet/device/ping` and
`api/fleet/orbit/ping` requests (we use Redis as the limiter store).
- [X] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- [x] Manual QA for all new/changed functionality
* Add rate limits for device authed endpoints
* Fix lint
* Add missing test
* Fix test
* Increase the quota for desktop endpoints
* Add comment about quota
This should support Redis in both cluster and non-cluster modes.
Updates were made separately to github.com/throttled/throttled to support the slight changes in types.
Co-authored-by: Joseph Macaulay <joseph.macaulay@uber.com>
Co-authored-by: Zach Wasserman <zach@fleetdm.com>
- Add policy.rego file defining authorization policies.
- Add Go integrations to evaluate Rego policies (via OPA).
- Add middleware to ensure requests without authorization check are rejected (guard against programmer error).
- Add authorization checks to most service endpoints.
Prevent abuse of these endpoints with rate limiting backed by Redis. The
limits assigned should be appropriate for almost any Fleet deployment.
Closes#530