fleet/client/orbit_client.go

779 lines
24 KiB
Go
Raw Permalink Normal View History

package client
import (
"bytes"
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
"context"
"crypto/tls"
"encoding/json"
"errors"
"fmt"
"io"
"io/fs"
"mime"
"net"
"net/http"
"net/http/httptrace"
"net/url"
"os"
"path/filepath"
"runtime"
"sync"
"time"
add headers denoting capabilities between fleet server / desktop / orbit (#7833) This adds a new mechanism to allow us to handle compatibility issues between Orbit, Fleet Server and Fleet Desktop. The general idea is to _always_ send a custom header of the form: ``` fleet-capabilities-header = "X-Fleet-Capabilities:" capabilities capabilities = capability * (,) capability = string ``` Both from the server to the clients (Orbit, Fleet Desktop) and vice-versa. For an example, see: https://github.com/fleetdm/fleet/commit/8c0bbdd291f54e03e19766bcdfead0fb8067f60c Also, the following applies: - Backwards compat: if the header is not present, assume that orbit/fleet doesn't have the capability - The current capabilities endpoint will be removed ### Motivation This solution is trying to solve the following problems: - We have three independent processes communicating with each other (Fleet Desktop, Orbit and Fleet Server). Each process can be updated independently, and therefore we need a way for each process to know what features are supported by its peers. - We originally implemented a dedicated API endpoint in the server that returned a list of the capabilities (or "features") enabled, we found this, and any other server-only solution (like API versioning) to be insufficient because: - There are cases in which the server also needs to know which features are supported by its clients - Clients needed to poll for changes to detect if the capabilities supported by the server change, by sending the capabilities on each request we have a much cleaner way to handling different responses. - We are also introducing an unauthenticated endpoint to get the server features, this gives us flexibility if we need to implement different authentication mechanisms, and was one of the pitfalls of the first implementation. Related to https://github.com/fleetdm/fleet/issues/7929
2022-09-26 10:53:53 +00:00
"github.com/fleetdm/fleet/v4/orbit/pkg/constant"
"github.com/fleetdm/fleet/v4/orbit/pkg/logging"
2024-11-21 16:31:03 +00:00
"github.com/fleetdm/fleet/v4/orbit/pkg/luks"
"github.com/fleetdm/fleet/v4/orbit/pkg/platform"
"github.com/fleetdm/fleet/v4/pkg/retry"
add headers denoting capabilities between fleet server / desktop / orbit (#7833) This adds a new mechanism to allow us to handle compatibility issues between Orbit, Fleet Server and Fleet Desktop. The general idea is to _always_ send a custom header of the form: ``` fleet-capabilities-header = "X-Fleet-Capabilities:" capabilities capabilities = capability * (,) capability = string ``` Both from the server to the clients (Orbit, Fleet Desktop) and vice-versa. For an example, see: https://github.com/fleetdm/fleet/commit/8c0bbdd291f54e03e19766bcdfead0fb8067f60c Also, the following applies: - Backwards compat: if the header is not present, assume that orbit/fleet doesn't have the capability - The current capabilities endpoint will be removed ### Motivation This solution is trying to solve the following problems: - We have three independent processes communicating with each other (Fleet Desktop, Orbit and Fleet Server). Each process can be updated independently, and therefore we need a way for each process to know what features are supported by its peers. - We originally implemented a dedicated API endpoint in the server that returned a list of the capabilities (or "features") enabled, we found this, and any other server-only solution (like API versioning) to be insufficient because: - There are cases in which the server also needs to know which features are supported by its clients - Clients needed to poll for changes to detect if the capabilities supported by the server change, by sending the capabilities on each request we have a much cleaner way to handling different responses. - We are also introducing an unauthenticated endpoint to get the server features, this gives us flexibility if we need to implement different authentication mechanisms, and was one of the pitfalls of the first implementation. Related to https://github.com/fleetdm/fleet/issues/7929
2022-09-26 10:53:53 +00:00
"github.com/fleetdm/fleet/v4/server/fleet"
"github.com/rs/zerolog/log"
)
// OrbitClient exposes the Orbit API to communicate with the Fleet server.
type OrbitClient struct {
*BaseClient
nodeKeyFilePath string
enrollSecret string
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
hostInfo fleet.OrbitHostInfo
enrolledMu sync.Mutex
enrolled bool
lastRecordedErrMu sync.Mutex
lastRecordedErr error
configCache configCache
onGetConfigErrFns *OnGetConfigErrFuncs
lastNetErrOnGetConfigLogged time.Time
lastIdleConnectionsCleanupMu sync.Mutex
lastIdleConnectionsCleanup time.Time
// TestNodeKey is used for testing only.
TestNodeKey string
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
// Interfaces that will receive updated configs
ConfigReceivers []fleet.OrbitConfigReceiver
// How frequently a new config will be fetched
ReceiverUpdateInterval time.Duration
// receiverUpdateContext used by ExecuteConfigReceivers to cancel the update loop.
receiverUpdateContext context.Context
// receiverUpdateCancelFunc is used to cancel receiverUpdateContext.
receiverUpdateCancelFunc context.CancelFunc
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
Orbit passes EUA token during enrollment (#43369) **Related issue:** Resolves #41379 # Checklist for submitter If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [x] Added/updated automated tests - [ ] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [ ] QA'd all new/changed functionality manually ## fleetd/orbit/Fleet Desktop - [x] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) - [x] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes - [ ] Verified that fleetd runs on macOS, Linux and Windows - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added EUA token support to Orbit enrollment workflow * Introduced `--eua-token` CLI flag for Windows MDM enrollment * Windows MSI packages now support EUA_TOKEN property (Orbit v1.55.0+) * **Tests** * Added tests for EUA token handling in enrollment and Windows packaging * **Documentation** * Added changelog entry documenting EUA token inclusion in enrollment requests <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-13 21:19:47 +00:00
// euaToken is a one-time Fleet-signed JWT from Windows MDM enrollment,
// sent during orbit enrollment to link the IdP account without prompting.
euaToken string
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
// hostIdentityCertPath is the file path to the host identity certificate issued using SCEP.
//
// If set then it will be deleted on HTTP 401 errors from Fleet and it will cause ExecuteConfigReceivers
// to terminate to trigger a restart.
hostIdentityCertPath string
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
// initiatedIdpAuth is a flag indicating whether a window has been opened
// to the sign-on page for the organization's Identity Provider.
initiatedIdpAuth bool
// openSSOWindow is a function that opens a browser window to the SSO URL.
openSSOWindow func() error
}
// time-to-live for config cache
const configCacheTTL = 3 * time.Second
type configCache struct {
mu sync.Mutex
lastUpdated time.Time
config *fleet.OrbitConfig
err error
}
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
func (oc *OrbitClient) SetOpenSSOWindowFunc(f func() error) {
oc.openSSOWindow = f
}
func (oc *OrbitClient) request(verb string, path string, params any, resp any) error {
return oc.requestWithExternal(verb, path, params, resp, false)
}
// requestWithExternal is used to make requests to Fleet or external URLs. If external is true, the pathOrURL
// is used as the full URL to make the request to.
func (oc *OrbitClient) requestWithExternal(verb string, pathOrURL string, params any, resp any, external bool) error {
var bodyBytes []byte
var err error
if params != nil {
bodyBytes, err = json.Marshal(params)
if err != nil {
return fmt.Errorf("making request json marshalling : %w", err)
}
}
oc.closeIdleConnections()
ctx := context.Background()
if os.Getenv("FLEETD_TEST_HTTPTRACE") == "1" {
ctx = httptrace.WithClientTrace(ctx, testStdoutHTTPTracer)
}
var request *http.Request
if external {
request, err = http.NewRequestWithContext(
ctx,
verb,
pathOrURL,
nil,
)
if err != nil {
return err
}
} else {
parsedURL, err := url.Parse(pathOrURL)
if err != nil {
return fmt.Errorf("parsing URL: %w", err)
}
request, err = http.NewRequestWithContext(
ctx,
verb,
oc.URL(parsedURL.Path, parsedURL.RawQuery).String(),
bytes.NewBuffer(bodyBytes),
)
if err != nil {
return err
}
oc.SetClientCapabilitiesHeader(request)
}
response, err := oc.DoHTTPRequest(request)
if err != nil {
oc.setLastRecordedError(err)
return fmt.Errorf("%s %s: %w", verb, pathOrURL, err)
}
defer response.Body.Close()
if err := oc.ParseResponse(verb, pathOrURL, response, resp); err != nil {
oc.setLastRecordedError(err)
return err
}
return nil
}
// OnGetConfigErrFuncs defines functions to be executed on GetConfig errors.
type OnGetConfigErrFuncs struct {
// OnNetErrFunc receives network and 5XX errors on GetConfig requests.
// These errors are rate limited to once every 5 minutes.
OnNetErrFunc func(err error)
// DebugErrFunc receives all errors on GetConfig requests.
DebugErrFunc func(err error)
}
var (
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
netErrInterval = 5 * time.Minute
configRetryOnNetworkError = 30 * time.Second
defaultOrbitConfigReceiverInterval = 30 * time.Second
)
// NewOrbitClient creates a new OrbitClient.
//
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
// - rootDir is the Orbit's root directory, where the Orbit node key is loaded-from/stored.
// - addr is the address of the Fleet server.
// - orbitHostInfo is the host system information used for enrolling to Fleet.
// - onGetConfigErrFns can be used to handle errors in the GetConfig request.
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
func NewOrbitClient(
rootDir string,
addr string,
rootCA string,
insecureSkipVerify bool,
enrollSecret string,
fleetClientCert *tls.Certificate,
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
orbitHostInfo fleet.OrbitHostInfo,
onGetConfigErrFns *OnGetConfigErrFuncs,
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
httpSignerWrapper func(*http.Client) *http.Client,
hostIdentityCertPath string,
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
) (*OrbitClient, error) {
orbitCapabilities := fleet.GetOrbitClientCapabilities()
bc, err := NewBaseClient(addr, insecureSkipVerify, rootCA, "", fleetClientCert, orbitCapabilities, httpSignerWrapper)
if err != nil {
return nil, err
}
nodeKeyFilePath := filepath.Join(rootDir, constant.OrbitNodeKeyFileName)
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
ctx, cancelFunc := context.WithCancel(context.Background())
return &OrbitClient{
nodeKeyFilePath: nodeKeyFilePath,
BaseClient: bc,
enrollSecret: enrollSecret,
hostInfo: orbitHostInfo,
enrolled: false,
onGetConfigErrFns: onGetConfigErrFns,
lastIdleConnectionsCleanup: time.Now(),
ReceiverUpdateInterval: defaultOrbitConfigReceiverInterval,
receiverUpdateContext: ctx,
receiverUpdateCancelFunc: cancelFunc,
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
hostIdentityCertPath: hostIdentityCertPath,
}, nil
}
Orbit passes EUA token during enrollment (#43369) **Related issue:** Resolves #41379 # Checklist for submitter If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [x] Added/updated automated tests - [ ] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [ ] QA'd all new/changed functionality manually ## fleetd/orbit/Fleet Desktop - [x] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) - [x] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes - [ ] Verified that fleetd runs on macOS, Linux and Windows - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added EUA token support to Orbit enrollment workflow * Introduced `--eua-token` CLI flag for Windows MDM enrollment * Windows MSI packages now support EUA_TOKEN property (Orbit v1.55.0+) * **Tests** * Added tests for EUA token handling in enrollment and Windows packaging * **Documentation** * Added changelog entry documenting EUA token inclusion in enrollment requests <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-13 21:19:47 +00:00
// SetEUAToken sets a one-time EUA token to include in the enrollment request.
func (oc *OrbitClient) SetEUAToken(token string) {
oc.euaToken = token
}
// TriggerOrbitRestart triggers a orbit process restart.
func (oc *OrbitClient) TriggerOrbitRestart(reason string) {
log.Info().Msgf("orbit restart triggered: %s", reason)
oc.receiverUpdateCancelFunc()
}
// RestartTriggered returns true if any of the config receivers triggered an orbit restart.
func (oc *OrbitClient) RestartTriggered() bool {
select {
case <-oc.receiverUpdateContext.Done():
return true
default:
return false
}
}
// closeIdleConnections attempts to close idle connections from the pool every 55 minutes.
//
// Some load balancers (e.g. AWS ELB) have a maximum lifetime for a connection
// (no matter if the connection is active or not) and will forcefully close the
// connection causing errors in the client (e.g. https://github.com/fleetdm/fleet/issues/18783).
// To prevent these errors, we will attempt to cleanup idle connections every 55
// minutes to not let these connection grow too old. (AWS ELB's default value for maximum
// lifetime of a connection is 3600 seconds.)
func (oc *OrbitClient) closeIdleConnections() {
oc.lastIdleConnectionsCleanupMu.Lock()
defer oc.lastIdleConnectionsCleanupMu.Unlock()
if time.Since(oc.lastIdleConnectionsCleanup) < 55*time.Minute {
return
}
oc.lastIdleConnectionsCleanup = time.Now()
rawClient := oc.GetRawHTTPClient()
c, ok := rawClient.(*http.Client)
if !ok {
return
}
t, ok := c.Transport.(*http.Transport)
if !ok {
return
}
t.CloseIdleConnections()
}
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
func (oc *OrbitClient) RunConfigReceivers() error {
config, err := oc.GetConfig()
if err != nil {
return fmt.Errorf("RunConfigReceivers get config: %w", err)
}
var errs []error
var errMu sync.Mutex
var wg sync.WaitGroup
wg.Add(len(oc.ConfigReceivers))
for _, receiver := range oc.ConfigReceivers {
go func() {
defer func() {
if err := recover(); err != nil {
errMu.Lock()
errs = append(errs, fmt.Errorf("panic occured in receiver: %v", err))
errMu.Unlock()
}
wg.Done()
}()
err := receiver.Run(config)
if err != nil {
errMu.Lock()
errs = append(errs, err)
errMu.Unlock()
}
}()
}
wg.Wait()
if len(errs) != 0 {
return errors.Join(errs...)
}
return nil
}
func (oc *OrbitClient) RegisterConfigReceiver(cr fleet.OrbitConfigReceiver) {
oc.ConfigReceivers = append(oc.ConfigReceivers, cr)
}
func (oc *OrbitClient) ExecuteConfigReceivers() error {
ticker := time.NewTicker(oc.ReceiverUpdateInterval)
defer ticker.Stop()
for {
select {
case <-oc.receiverUpdateContext.Done():
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
return nil
case <-ticker.C:
if err := oc.RunConfigReceivers(); err != nil {
log.Error().Err(err).Msg("running config receivers")
}
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
}
}
}
func (oc *OrbitClient) InterruptConfigReceivers(err error) {
oc.receiverUpdateCancelFunc()
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
}
// GetConfig returns the Orbit config fetched from Fleet server for this instance of OrbitClient.
// Since this method is called in multiple places, we use a cache with configCacheTTL time-to-live
// to reduce traffic to the Fleet server.
// Upon network errors, this method will retry the get config request (every 30 seconds).
func (oc *OrbitClient) GetConfig() (*fleet.OrbitConfig, error) {
oc.configCache.mu.Lock()
defer oc.configCache.mu.Unlock()
// If time-to-live passed, we update the config cache
now := time.Now()
if now.After(oc.configCache.lastUpdated.Add(configCacheTTL)) {
verb, path := "POST", "/api/fleet/orbit/config"
var (
resp fleet.OrbitConfig
err error
)
// Retry until we don't get a network error or a 5XX error.
_ = retry.Do(func() error {
err = oc.authenticatedRequest(verb, path, &fleet.OrbitGetConfigRequest{}, &resp)
var (
netErr net.Error
statusCodeErr *StatusCodeErr
)
if err != nil && oc.onGetConfigErrFns != nil && oc.onGetConfigErrFns.DebugErrFunc != nil {
oc.onGetConfigErrFns.DebugErrFunc(err)
}
if errors.As(err, &netErr) || (errors.As(err, &statusCodeErr) && statusCodeErr.StatusCode() >= 500) {
now := time.Now()
if oc.onGetConfigErrFns != nil && oc.onGetConfigErrFns.OnNetErrFunc != nil && now.After(oc.lastNetErrOnGetConfigLogged.Add(netErrInterval)) {
oc.onGetConfigErrFns.OnNetErrFunc(err)
oc.lastNetErrOnGetConfigLogged = now
}
return err // retry on network or server 5XX errors
}
return nil
}, retry.WithInterval(configRetryOnNetworkError))
oc.configCache.config = &resp
oc.configCache.err = err
oc.configCache.lastUpdated = now
}
return oc.configCache.config, oc.configCache.err
}
// SetOrUpdateDeviceToken sends a request to the server to set or update the device token.
func (oc *OrbitClient) SetOrUpdateDeviceToken(deviceAuthToken string) error {
verb, path := "POST", "/api/fleet/orbit/device_token"
params := fleet.SetOrUpdateDeviceTokenRequest{
DeviceAuthToken: deviceAuthToken,
}
var resp fleet.SetOrUpdateDeviceTokenResponse
if err := oc.authenticatedRequest(verb, path, &params, &resp); err != nil {
return err
}
return nil
}
// SetOrUpdateDeviceMappingEmail sends a request to the server to set or update the device mapping email.
func (oc *OrbitClient) SetOrUpdateDeviceMappingEmail(email string) error {
verb, path := "PUT", "/api/fleet/orbit/device_mapping"
params := fleet.OrbitPutDeviceMappingRequest{
Email: email,
}
var resp fleet.OrbitPutDeviceMappingResponse
if err := oc.authenticatedRequest(verb, path, &params, &resp); err != nil {
return err
}
return nil
}
// GetHostScript returns the script fetched from Fleet server to run on this host.
func (oc *OrbitClient) GetHostScript(execID string) (*fleet.HostScriptResult, error) {
verb, path := "POST", "/api/fleet/orbit/scripts/request"
var resp fleet.OrbitGetScriptResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitGetScriptRequest{
ExecutionID: execID,
}, &resp); err != nil {
return nil, err
}
return resp.HostScriptResult, nil
}
// SaveHostScriptResult saves the result of running the script on this host.
func (oc *OrbitClient) SaveHostScriptResult(result *fleet.HostScriptResultPayload) error {
verb, path := "POST", "/api/fleet/orbit/scripts/result"
var resp fleet.OrbitPostScriptResultResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitPostScriptResultRequest{
HostScriptResultPayload: result,
}, &resp); err != nil {
return err
}
return nil
}
func (oc *OrbitClient) GetInstallerDetails(installId string) (*fleet.SoftwareInstallDetails, error) {
verb, path := "POST", "/api/fleet/orbit/software_install/details"
var resp fleet.OrbitGetSoftwareInstallResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitGetSoftwareInstallRequest{
InstallUUID: installId,
}, &resp); err != nil {
return nil, err
}
return resp.SoftwareInstallDetails, nil
}
func (oc *OrbitClient) SaveInstallerResult(payload *fleet.HostSoftwareInstallResultPayload) error {
verb, path := "POST", "/api/fleet/orbit/software_install/result"
var resp fleet.OrbitPostSoftwareInstallResultResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitPostSoftwareInstallResultRequest{
HostSoftwareInstallResultPayload: payload,
}, &resp); err != nil {
return err
}
return nil
}
func (oc *OrbitClient) DownloadSoftwareInstaller(installerID uint, downloadDirectory string, progressFunc func(n int)) (string, error) {
verb, path := "POST", "/api/fleet/orbit/software_install/package?alt=media"
resp := FileResponse{
DestPath: downloadDirectory,
ProgressFunc: progressFunc,
}
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitDownloadSoftwareInstallerRequest{
InstallerID: installerID,
}, &resp); err != nil {
return "", err
}
return resp.GetFilePath(), nil
}
func (oc *OrbitClient) DownloadSoftwareInstallerFromURL(url string, filename string, downloadDirectory string, progressFunc func(int)) (string, error) {
resp := FileResponse{
DestPath: downloadDirectory,
DestFile: filename,
SkipMediaType: true,
ProgressFunc: progressFunc,
}
if err := oc.requestWithExternal("GET", url, nil, &resp, true); err != nil {
return "", err
}
return resp.GetFilePath(), nil
}
// NullFileResponse discards downloaded file content.
type NullFileResponse struct{}
func (f *NullFileResponse) Handle(resp *http.Response) error {
_, _, err := mime.ParseMediaType(resp.Header.Get("Content-Disposition"))
if err != nil {
return fmt.Errorf("parsing media type from response header: %w", err)
}
_, err = io.Copy(io.Discard, resp.Body)
if err != nil {
return fmt.Errorf("copying from http stream to io.Discard: %w", err)
}
return nil
}
// DownloadAndDiscardSoftwareInstaller downloads the software installer and discards it.
// This method is used during load testing by osquery-perf.
func (oc *OrbitClient) DownloadAndDiscardSoftwareInstaller(installerID uint) error {
verb, path := "POST", "/api/fleet/orbit/software_install/package?alt=media"
resp := NullFileResponse{}
return oc.authenticatedRequest(verb, path, &fleet.OrbitDownloadSoftwareInstallerRequest{
InstallerID: installerID,
}, &resp)
}
// Ping sends a ping request to the orbit/ping endpoint.
func (oc *OrbitClient) Ping() error {
verb, path := "HEAD", "/api/fleet/orbit/ping"
err := oc.request(verb, path, nil, nil)
if err == nil || IsNotFoundErr(err) {
// notFound is ok, it means an old server without the capabilities header
return nil
}
return err
}
func (oc *OrbitClient) enroll() (string, error) {
verb, path := "POST", "/api/fleet/orbit/enroll"
params := fleet.EnrollOrbitRequest{
EnrollSecret: oc.enrollSecret,
HardwareUUID: oc.hostInfo.HardwareUUID,
HardwareSerial: oc.hostInfo.HardwareSerial,
Hostname: oc.hostInfo.Hostname,
Platform: oc.hostInfo.Platform,
PlatformLike: oc.hostInfo.PlatformLike,
OsqueryIdentifier: oc.hostInfo.OsqueryIdentifier,
ComputerName: oc.hostInfo.ComputerName,
HardwareModel: oc.hostInfo.HardwareModel,
Orbit passes EUA token during enrollment (#43369) **Related issue:** Resolves #41379 # Checklist for submitter If some of the following don't apply, delete the relevant line. - [x] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. ## Testing - [x] Added/updated automated tests - [ ] Where appropriate, [automated tests simulate multiple hosts and test for host isolation](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/reference/patterns-backend.md#unit-testing) (updates to one hosts's records do not affect another) - [ ] QA'd all new/changed functionality manually ## fleetd/orbit/Fleet Desktop - [x] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) - [x] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes - [ ] Verified that fleetd runs on macOS, Linux and Windows - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added EUA token support to Orbit enrollment workflow * Introduced `--eua-token` CLI flag for Windows MDM enrollment * Windows MSI packages now support EUA_TOKEN property (Orbit v1.55.0+) * **Tests** * Added tests for EUA token handling in enrollment and Windows packaging * **Documentation** * Added changelog entry documenting EUA token inclusion in enrollment requests <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-13 21:19:47 +00:00
EUAToken: oc.euaToken,
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
}
var resp fleet.EnrollOrbitResponse
err := oc.request(verb, path, params, &resp)
if err != nil {
return "", err
}
return resp.OrbitNodeKey, nil
}
// enrollLock helps protect the enrolling process in case mutliple OrbitClients
// want to re-enroll at the same time.
var enrollLock sync.Mutex
// getNodeKeyOrEnroll attempts to read the orbit node key if the file exists on disk
// otherwise it enrolls the host with Fleet and saves the node key to disk
func (oc *OrbitClient) getNodeKeyOrEnroll() (string, error) {
if oc.TestNodeKey != "" {
return oc.TestNodeKey, nil
}
enrollLock.Lock()
defer enrollLock.Unlock()
orbitNodeKey, err := os.ReadFile(oc.nodeKeyFilePath)
switch {
case err == nil:
return string(orbitNodeKey), nil
case errors.Is(err, fs.ErrNotExist):
// OK, if there's no orbit node key, proceed to enroll.
default:
return "", fmt.Errorf("read orbit node key file: %w", err)
}
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
var orbitNodeKey_ string
if err := retry.Do(
func() error {
orbitNodeKey_, err = oc.enrollAndWriteNodeKeyFile()
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
return err
},
// The below configuration means the following retry intervals (exponential backoff):
// 10s, 20s, 40s, 80s, 160s and then return the failure (max attempts = 6)
// thus executing no more than ~6 enroll request failures every ~5 minutes.
retry.WithInterval(orbitEnrollRetryInterval()),
retry.WithMaxAttempts(constant.OrbitEnrollMaxRetries),
retry.WithBackoffMultiplier(constant.OrbitEnrollBackoffMultiplier),
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
retry.WithErrorFilter(func(err error) (errorOutcome retry.ErrorOutcome) {
log.Info().Err(err).Msg("orbit enroll attempt failed")
switch {
case IsNotFoundErr(err):
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
// Do not retry if the endpoint does not exist.
return retry.ErrorOutcomeDoNotRetry
case errors.Is(err, ErrEndUserAuthRequired):
// If we get an ErrEndUserAuthRequired error, then the user
// needs to authenticate with the identity provider.
//
// Open a browser window to the sign-on page and
// then keep retrying until they authenticate.
log.Debug().Msg("enroll unauthenticated, waiting for end-user to authenticate via SSO")
if !oc.initiatedIdpAuth {
if oc.openSSOWindow == nil {
log.Error().Msg("SSO window open function not set")
return retry.ErrorOutcomeNormalRetry
}
log.Debug().Msg("opening SSO window")
openWindowErr := oc.openSSOWindow()
if openWindowErr != nil {
log.Error().Err(openWindowErr).Msg("opening SSO window")
return retry.ErrorOutcomeNormalRetry
}
oc.initiatedIdpAuth = true
}
// Sleep for 20 seconds, making the total retry interval 30 seconds
time.Sleep(20 * time.Second)
return retry.ErrorOutcomeResetAttempts
default:
logging.LogErrIfEnvNotSet(constant.SilenceEnrollLogErrorEnvVar, err, "enroll failed, retrying")
return retry.ErrorOutcomeNormalRetry
}
}),
); err != nil {
if IsNotFoundErr(err) {
End-user authentication for Window/Linux setup experience: agent (#34847) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #34528 # Details This PR implements the agent changes for allowing Fleet admins to require that users authenticate with an IdP prior to having their devices set up. I'll comment on changes inline but the high-level is: 1. Orbit calls the enroll endpoint as usual. This is triggered lazily by any one of a number of subsystems like device token rotation or requesting Fleet config 2. If the enroll endpoint returns the new `ErrEndUserAuthRequired` response, then it opens a window to the `/mdm/sso` Fleet page and retries the enroll endpoint every 30 seconds indefinitely. 3. Any other non-200 response to the enroll request is treated as before (limited # of retries, with backoff) # Checklist for submitter If some of the following don't apply, delete the relevant line. - [ ] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing- changes.md#changes-files) for more information. Will add changelog when story is one. ## Testing - [X] Added/updated automated tests Added test for new retry logic - [X] QA'd all new/changed functionality manually This is kinda hard to test without the associated backend PR: https://github.com/fleetdm/fleet/pull/34835 ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) This is compatible with all Fleet versions, since older ones won't send the new error. - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes This is compatible with all platforms, although it currently should only ever run on Windows and Linux since macOS devices will have end-user auth taken care of before they even download Orbit. - [ ] Verified that fleetd runs on macOS, Linux and Windows Testing this now. - [ ] Verified auto-update works from the released version of component to the new version (see [tools/tuf/test](../tools/tuf/test/README.md)) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added SSO (Single Sign-On) enrollment support for end-user authentication * Enhanced error messaging for authentication-required scenarios * **Bug Fixes** * Improved error handling and retry logic for enrollment failures <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-03 22:41:57 +00:00
return "", errors.New("enroll endpoint does not exist")
}
return "", fmt.Errorf("orbit node key enroll failed, attempts=%d", constant.OrbitEnrollMaxRetries)
}
return orbitNodeKey_, nil
}
// GetNodeKey gets the orbit node key from file.
func (oc *OrbitClient) GetNodeKey() (string, error) {
orbitNodeKey, err := os.ReadFile(oc.nodeKeyFilePath)
if err != nil {
return "", err
}
return string(orbitNodeKey), nil
}
func (oc *OrbitClient) enrollAndWriteNodeKeyFile() (string, error) {
orbitNodeKey, err := oc.enroll()
if err != nil {
return "", fmt.Errorf("enroll request: %w", err)
}
if runtime.GOOS == "windows" {
// creating the secret file with empty content
if err := os.WriteFile(oc.nodeKeyFilePath, nil, constant.DefaultFileMode); err != nil {
return "", fmt.Errorf("create orbit node key file: %w", err)
}
// restricting file access
if err := platform.ChmodRestrictFile(oc.nodeKeyFilePath); err != nil {
return "", fmt.Errorf("apply ACLs: %w", err)
}
}
// writing raw key material to the acl-ready secret file
if err := os.WriteFile(oc.nodeKeyFilePath, []byte(orbitNodeKey), constant.DefaultFileMode); err != nil {
return "", fmt.Errorf("write orbit node key file: %w", err)
}
return orbitNodeKey, nil
}
func (oc *OrbitClient) authenticatedRequest(verb string, path string, params any, resp any) error {
nodeKey, err := oc.getNodeKeyOrEnroll()
if err != nil {
return err
}
s := params.(fleet.SetOrbitNodeKeyer)
s.SetOrbitNodeKey(nodeKey)
err = oc.request(verb, path, params, resp)
switch {
case err == nil:
oc.setEnrolled(true)
return nil
case errors.Is(err, ErrUnauthenticated):
if err := os.Remove(oc.nodeKeyFilePath); err != nil {
log.Info().Err(err).Msg("remove orbit node key")
}
oc.setEnrolled(false)
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
if oc.hostIdentityCertPath != "" {
if err := os.Remove(oc.hostIdentityCertPath); err != nil {
log.Info().Err(err).Msg("remove orbit host identity cert")
}
log.Info().Msg("removed orbit host identity cert, triggering a restart")
oc.receiverUpdateCancelFunc()
}
return err
default:
return err
}
}
func (oc *OrbitClient) Enrolled() bool {
oc.enrolledMu.Lock()
defer oc.enrolledMu.Unlock()
return oc.enrolled
}
func (oc *OrbitClient) setEnrolled(v bool) {
oc.enrolledMu.Lock()
defer oc.enrolledMu.Unlock()
oc.enrolled = v
}
func (oc *OrbitClient) LastRecordedError() error {
oc.lastRecordedErrMu.Lock()
defer oc.lastRecordedErrMu.Unlock()
return oc.lastRecordedErr
}
func (oc *OrbitClient) setLastRecordedError(err error) {
oc.lastRecordedErrMu.Lock()
defer oc.lastRecordedErrMu.Unlock()
oc.lastRecordedErr = fmt.Errorf("%s: %w", time.Now().UTC().Format("2006-01-02T15:04:05Z"), err)
}
func orbitEnrollRetryInterval() time.Duration {
interval := os.Getenv("FLEETD_ENROLL_RETRY_INTERVAL")
if interval != "" {
d, err := time.ParseDuration(interval)
if err == nil {
return d
}
}
return constant.OrbitEnrollRetrySleep
}
// SetOrUpdateDiskEncryptionKey sends a request to the server to set or update the disk encryption keys.
func (oc *OrbitClient) SetOrUpdateDiskEncryptionKey(diskEncryptionStatus fleet.OrbitHostDiskEncryptionKeyPayload) error {
verb, path := "POST", "/api/fleet/orbit/disk_encryption_key"
var resp fleet.OrbitPostDiskEncryptionKeyResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitPostDiskEncryptionKeyRequest{
EncryptionKey: diskEncryptionStatus.EncryptionKey,
ClientError: diskEncryptionStatus.ClientError,
}, &resp); err != nil {
return err
}
return nil
}
const httpTraceTimeFormat = "2006-01-02T15:04:05Z"
var testStdoutHTTPTracer = &httptrace.ClientTrace{
ConnectStart: func(network, addr string) {
fmt.Printf(
"httptrace: %s: ConnectStart: %s, %s\n",
time.Now().UTC().Format(httpTraceTimeFormat), network, addr,
)
},
ConnectDone: func(network, addr string, err error) {
fmt.Printf(
"httptrace: %s: ConnectDone: %s, %s, err='%s'\n",
time.Now().UTC().Format(httpTraceTimeFormat), network, addr, err,
)
},
}
// GetSetupExperienceStatus checks the status of the setup experience for this host.
Stop setup experience on software install failure (#34173) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #33173 **Related issue:** Resolves #33111 # Details This is the remaining work to implement the "Stop the setup experience when required software fails to install" feature. This didn't turn out to be quite as straightforward as expected so I ended up doing a bit of design-by-code and expect some feedback on the approach. I tried to make it as low-touch as possible. The general design is: 1. In the `maybeUpdateSetupExperienceStatus` function which is called in various places when a setup experience step is marked as completed, call a new `maybeCancelPendingSetupExperienceSteps` function if the setup step was marked as failed. Similarly call `maybeCancelPendingSetupExperienceSteps` if a VPP app install fails to enqueue. 2. In `maybeCancelPendingSetupExperienceSteps`, check whether the specified host is MacOS and whether the "RequireAllSoftwareMacOS" flag is set in the team (or global) config. If so, mark the remaining setup experience items as canceled and cancel any upcoming activities related to those steps. 3. On the front-end, if the `require_all_software_macos` is set and a software step is marked as failed, show a new failure page indicating that setup has failed and showing details of the failed software. 4. On the agent side, when checking setup experience status, send a `reset_after_failure` flag _only the first time_. If this flag is set, then the code in the `/orbit/setup_experience/status` handler will clear and re-queue any failed setup experience steps (but leave successful steps to avoid re-installing already-installed software). This facilitates re-starting the setup experience when the host is rebooted. I also updated the way that software (packages and VPP) is queued up for the setup experience to be ordered alphabetically, to make it easier to test _and_ because this is a desired outcome for a future story. Since the order is not deterministic now, this update shouldn't cause any problems (aside from a couple of test updates), but I'm ok taking it out if desired. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) ## Testing - [X] Added/updated automated tests * Added a new integration test for software packages, testing that a failed software package causes the rest of the setup experience to be marked as failed when `require_all_software_macos` is set, and testing that the "reset after failure" code works. * Added a new integration test for VPP packages, testing that a failed VPP enqueue causes the same halting of the setup experience. I _don't_ have test for a failure _during_ a VPP install. It should go through the same code path as the software package failure, so it's not a huge gap. - [ ] QA'd all new/changed functionality manually Working on it ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes - [X] Verified that fleetd runs on macOS, Linux and Windows <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Configurable option to halt macOS device setup if any software install fails. - Device setup page now shows a clear “Device setup failed” state with expandable error details when all software is required on macOS. - Improvements - Setup status now includes per-step error messages for better troubleshooting. - Pending setup steps are automatically canceled after a failure when applicable, with support to reset and retry the setup flow as configured. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ian Littman <iansltx@gmail.com>
2025-10-17 13:38:53 +00:00
func (oc *OrbitClient) GetSetupExperienceStatus(resetFailedSetupSteps bool) (*fleet.SetupExperienceStatusPayload, error) {
verb, path := "POST", "/api/fleet/orbit/setup_experience/status"
var resp fleet.GetOrbitSetupExperienceStatusResponse
err := oc.authenticatedRequest(verb, path, &fleet.GetOrbitSetupExperienceStatusRequest{
Stop setup experience on software install failure (#34173) <!-- Add the related story/sub-task/bug number, like Resolves #123, or remove if NA --> **Related issue:** Resolves #33173 **Related issue:** Resolves #33111 # Details This is the remaining work to implement the "Stop the setup experience when required software fails to install" feature. This didn't turn out to be quite as straightforward as expected so I ended up doing a bit of design-by-code and expect some feedback on the approach. I tried to make it as low-touch as possible. The general design is: 1. In the `maybeUpdateSetupExperienceStatus` function which is called in various places when a setup experience step is marked as completed, call a new `maybeCancelPendingSetupExperienceSteps` function if the setup step was marked as failed. Similarly call `maybeCancelPendingSetupExperienceSteps` if a VPP app install fails to enqueue. 2. In `maybeCancelPendingSetupExperienceSteps`, check whether the specified host is MacOS and whether the "RequireAllSoftwareMacOS" flag is set in the team (or global) config. If so, mark the remaining setup experience items as canceled and cancel any upcoming activities related to those steps. 3. On the front-end, if the `require_all_software_macos` is set and a software step is marked as failed, show a new failure page indicating that setup has failed and showing details of the failed software. 4. On the agent side, when checking setup experience status, send a `reset_after_failure` flag _only the first time_. If this flag is set, then the code in the `/orbit/setup_experience/status` handler will clear and re-queue any failed setup experience steps (but leave successful steps to avoid re-installing already-installed software). This facilitates re-starting the setup experience when the host is rebooted. I also updated the way that software (packages and VPP) is queued up for the setup experience to be ordered alphabetically, to make it easier to test _and_ because this is a desired outcome for a future story. Since the order is not deterministic now, this update shouldn't cause any problems (aside from a couple of test updates), but I'm ok taking it out if desired. # Checklist for submitter If some of the following don't apply, delete the relevant line. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) ## Testing - [X] Added/updated automated tests * Added a new integration test for software packages, testing that a failed software package causes the rest of the setup experience to be marked as failed when `require_all_software_macos` is set, and testing that the "reset after failure" code works. * Added a new integration test for VPP packages, testing that a failed VPP enqueue causes the same halting of the setup experience. I _don't_ have test for a failure _during_ a VPP install. It should go through the same code path as the software package failure, so it's not a huge gap. - [ ] QA'd all new/changed functionality manually Working on it ## fleetd/orbit/Fleet Desktop - [X] Verified compatibility with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)) - [X] If the change applies to only one platform, confirmed that `runtime.GOOS` is used as needed to isolate changes - [X] Verified that fleetd runs on macOS, Linux and Windows <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Configurable option to halt macOS device setup if any software install fails. - Device setup page now shows a clear “Device setup failed” state with expandable error details when all software is required on macOS. - Improvements - Setup status now includes per-step error messages for better troubleshooting. - Pending setup steps are automatically canceled after a failure when applicable, with support to reset and retry the setup flow as configured. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Ian Littman <iansltx@gmail.com>
2025-10-17 13:38:53 +00:00
ResetFailedSetupSteps: resetFailedSetupSteps,
}, &resp)
if err != nil {
return nil, err
}
return resp.Results, nil
}
2024-11-21 16:31:03 +00:00
func (oc *OrbitClient) SendLinuxKeyEscrowResponse(lr luks.LuksResponse) error {
verb, path := "POST", "/api/fleet/orbit/luks_data"
var resp fleet.OrbitPostLUKSResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitPostLUKSRequest{
2024-11-21 16:31:03 +00:00
Passphrase: lr.Passphrase,
KeySlot: lr.KeySlot,
Salt: lr.Salt,
ClientError: lr.Err,
}, &resp); err != nil {
return err
}
return nil
}
func (oc *OrbitClient) InitiateSetupExperience() (fleet.SetupExperienceInitResult, error) {
verb, path := "POST", "/api/fleet/orbit/setup_experience/init"
var resp fleet.OrbitSetupExperienceInitResponse
if err := oc.authenticatedRequest(verb, path, &fleet.OrbitSetupExperienceInitRequest{}, &resp); err != nil {
return fleet.SetupExperienceInitResult{}, err
}
return resp.Result, nil
}