fleet/server/service/orbit_client.go

751 lines
22 KiB
Go
Raw Normal View History

package service
import (
"bytes"
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
"context"
"crypto/tls"
"encoding/json"
"errors"
"fmt"
"io"
"io/fs"
"mime"
"net"
"net/http"
"net/http/httptrace"
"net/url"
"os"
"path/filepath"
"runtime"
"sync"
"time"
add headers denoting capabilities between fleet server / desktop / orbit (#7833) This adds a new mechanism to allow us to handle compatibility issues between Orbit, Fleet Server and Fleet Desktop. The general idea is to _always_ send a custom header of the form: ``` fleet-capabilities-header = "X-Fleet-Capabilities:" capabilities capabilities = capability * (,) capability = string ``` Both from the server to the clients (Orbit, Fleet Desktop) and vice-versa. For an example, see: https://github.com/fleetdm/fleet/commit/8c0bbdd291f54e03e19766bcdfead0fb8067f60c Also, the following applies: - Backwards compat: if the header is not present, assume that orbit/fleet doesn't have the capability - The current capabilities endpoint will be removed ### Motivation This solution is trying to solve the following problems: - We have three independent processes communicating with each other (Fleet Desktop, Orbit and Fleet Server). Each process can be updated independently, and therefore we need a way for each process to know what features are supported by its peers. - We originally implemented a dedicated API endpoint in the server that returned a list of the capabilities (or "features") enabled, we found this, and any other server-only solution (like API versioning) to be insufficient because: - There are cases in which the server also needs to know which features are supported by its clients - Clients needed to poll for changes to detect if the capabilities supported by the server change, by sending the capabilities on each request we have a much cleaner way to handling different responses. - We are also introducing an unauthenticated endpoint to get the server features, this gives us flexibility if we need to implement different authentication mechanisms, and was one of the pitfalls of the first implementation. Related to https://github.com/fleetdm/fleet/issues/7929
2022-09-26 10:53:53 +00:00
"github.com/fleetdm/fleet/v4/orbit/pkg/constant"
"github.com/fleetdm/fleet/v4/orbit/pkg/logging"
2024-11-21 16:31:03 +00:00
"github.com/fleetdm/fleet/v4/orbit/pkg/luks"
"github.com/fleetdm/fleet/v4/orbit/pkg/platform"
"github.com/fleetdm/fleet/v4/pkg/retry"
add headers denoting capabilities between fleet server / desktop / orbit (#7833) This adds a new mechanism to allow us to handle compatibility issues between Orbit, Fleet Server and Fleet Desktop. The general idea is to _always_ send a custom header of the form: ``` fleet-capabilities-header = "X-Fleet-Capabilities:" capabilities capabilities = capability * (,) capability = string ``` Both from the server to the clients (Orbit, Fleet Desktop) and vice-versa. For an example, see: https://github.com/fleetdm/fleet/commit/8c0bbdd291f54e03e19766bcdfead0fb8067f60c Also, the following applies: - Backwards compat: if the header is not present, assume that orbit/fleet doesn't have the capability - The current capabilities endpoint will be removed ### Motivation This solution is trying to solve the following problems: - We have three independent processes communicating with each other (Fleet Desktop, Orbit and Fleet Server). Each process can be updated independently, and therefore we need a way for each process to know what features are supported by its peers. - We originally implemented a dedicated API endpoint in the server that returned a list of the capabilities (or "features") enabled, we found this, and any other server-only solution (like API versioning) to be insufficient because: - There are cases in which the server also needs to know which features are supported by its clients - Clients needed to poll for changes to detect if the capabilities supported by the server change, by sending the capabilities on each request we have a much cleaner way to handling different responses. - We are also introducing an unauthenticated endpoint to get the server features, this gives us flexibility if we need to implement different authentication mechanisms, and was one of the pitfalls of the first implementation. Related to https://github.com/fleetdm/fleet/issues/7929
2022-09-26 10:53:53 +00:00
"github.com/fleetdm/fleet/v4/server/fleet"
"github.com/fleetdm/fleet/v4/server/service/contract"
"github.com/rs/zerolog/log"
)
// OrbitClient exposes the Orbit API to communicate with the Fleet server.
type OrbitClient struct {
*baseClient
nodeKeyFilePath string
enrollSecret string
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
hostInfo fleet.OrbitHostInfo
enrolledMu sync.Mutex
enrolled bool
lastRecordedErrMu sync.Mutex
lastRecordedErr error
configCache configCache
onGetConfigErrFns *OnGetConfigErrFuncs
lastNetErrOnGetConfigLogged time.Time
lastIdleConnectionsCleanupMu sync.Mutex
lastIdleConnectionsCleanup time.Time
// TestNodeKey is used for testing only.
TestNodeKey string
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
// Interfaces that will receive updated configs
ConfigReceivers []fleet.OrbitConfigReceiver
// How frequently a new config will be fetched
ReceiverUpdateInterval time.Duration
// receiverUpdateContext used by ExecuteConfigReceivers to cancel the update loop.
receiverUpdateContext context.Context
// receiverUpdateCancelFunc is used to cancel receiverUpdateContext.
receiverUpdateCancelFunc context.CancelFunc
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
// hostIdentityCertPath is the file path to the host identity certificate issued using SCEP.
//
// If set then it will be deleted on HTTP 401 errors from Fleet and it will cause ExecuteConfigReceivers
// to terminate to trigger a restart.
hostIdentityCertPath string
}
// time-to-live for config cache
const configCacheTTL = 3 * time.Second
type configCache struct {
mu sync.Mutex
lastUpdated time.Time
config *fleet.OrbitConfig
err error
}
func (oc *OrbitClient) request(verb string, path string, params interface{}, resp interface{}) error {
return oc.requestWithExternal(verb, path, params, resp, false)
}
// requestWithExternal is used to make requests to Fleet or external URLs. If external is true, the pathOrURL
// is used as the full URL to make the request to.
func (oc *OrbitClient) requestWithExternal(verb string, pathOrURL string, params interface{}, resp interface{}, external bool) error {
var bodyBytes []byte
var err error
if params != nil {
bodyBytes, err = json.Marshal(params)
if err != nil {
return fmt.Errorf("making request json marshalling : %w", err)
}
}
oc.closeIdleConnections()
ctx := context.Background()
if os.Getenv("FLEETD_TEST_HTTPTRACE") == "1" {
ctx = httptrace.WithClientTrace(ctx, testStdoutHTTPTracer)
}
var request *http.Request
if external {
request, err = http.NewRequestWithContext(
ctx,
verb,
pathOrURL,
nil,
)
if err != nil {
return err
}
} else {
parsedURL, err := url.Parse(pathOrURL)
if err != nil {
return fmt.Errorf("parsing URL: %w", err)
}
request, err = http.NewRequestWithContext(
ctx,
verb,
oc.url(parsedURL.Path, parsedURL.RawQuery).String(),
bytes.NewBuffer(bodyBytes),
)
if err != nil {
return err
}
oc.setClientCapabilitiesHeader(request)
}
response, err := oc.http.Do(request)
if err != nil {
oc.setLastRecordedError(err)
return fmt.Errorf("%s %s: %w", verb, pathOrURL, err)
}
defer response.Body.Close()
if err := oc.parseResponse(verb, pathOrURL, response, resp); err != nil {
oc.setLastRecordedError(err)
return err
}
return nil
}
// OnGetConfigErrFuncs defines functions to be executed on GetConfig errors.
type OnGetConfigErrFuncs struct {
// OnNetErrFunc receives network and 5XX errors on GetConfig requests.
// These errors are rate limited to once every 5 minutes.
OnNetErrFunc func(err error)
// DebugErrFunc receives all errors on GetConfig requests.
DebugErrFunc func(err error)
}
var (
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
netErrInterval = 5 * time.Minute
configRetryOnNetworkError = 30 * time.Second
defaultOrbitConfigReceiverInterval = 30 * time.Second
)
// NewOrbitClient creates a new OrbitClient.
//
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
// - rootDir is the Orbit's root directory, where the Orbit node key is loaded-from/stored.
// - addr is the address of the Fleet server.
// - orbitHostInfo is the host system information used for enrolling to Fleet.
// - onGetConfigErrFns can be used to handle errors in the GetConfig request.
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
func NewOrbitClient(
rootDir string,
addr string,
rootCA string,
insecureSkipVerify bool,
enrollSecret string,
fleetClientCert *tls.Certificate,
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
orbitHostInfo fleet.OrbitHostInfo,
onGetConfigErrFns *OnGetConfigErrFuncs,
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
httpSignerWrapper func(*http.Client) *http.Client,
hostIdentityCertPath string,
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
) (*OrbitClient, error) {
orbitCapabilities := fleet.GetOrbitClientCapabilities()
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
bc, err := newBaseClient(addr, insecureSkipVerify, rootCA, "", fleetClientCert, orbitCapabilities, httpSignerWrapper)
if err != nil {
return nil, err
}
nodeKeyFilePath := filepath.Join(rootDir, constant.OrbitNodeKeyFileName)
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
ctx, cancelFunc := context.WithCancel(context.Background())
return &OrbitClient{
nodeKeyFilePath: nodeKeyFilePath,
baseClient: bc,
enrollSecret: enrollSecret,
hostInfo: orbitHostInfo,
enrolled: false,
onGetConfigErrFns: onGetConfigErrFns,
lastIdleConnectionsCleanup: time.Now(),
ReceiverUpdateInterval: defaultOrbitConfigReceiverInterval,
receiverUpdateContext: ctx,
receiverUpdateCancelFunc: cancelFunc,
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
hostIdentityCertPath: hostIdentityCertPath,
}, nil
}
// TriggerOrbitRestart triggers a orbit process restart.
func (oc *OrbitClient) TriggerOrbitRestart(reason string) {
log.Info().Msgf("orbit restart triggered: %s", reason)
oc.receiverUpdateCancelFunc()
}
// RestartTriggered returns true if any of the config receivers triggered an orbit restart.
func (oc *OrbitClient) RestartTriggered() bool {
select {
case <-oc.receiverUpdateContext.Done():
return true
default:
return false
}
}
// closeIdleConnections attempts to close idle connections from the pool
// every 55 minutes.
//
// Some load balancers (e.g. AWS ELB) have a maximum lifetime for a connection
// (no matter if the connection is active or not) and will forcefully close the
// connection causing errors in the client (e.g. https://github.com/fleetdm/fleet/issues/18783).
// To prevent these errors, we will attempt to cleanup idle connections every 55
// minutes to not let these connection grow too old. (AWS ELB's default value for maximum
// lifetime of a connection is 3600 seconds.)
func (oc *OrbitClient) closeIdleConnections() {
oc.lastIdleConnectionsCleanupMu.Lock()
defer oc.lastIdleConnectionsCleanupMu.Unlock()
if time.Since(oc.lastIdleConnectionsCleanup) < 55*time.Minute {
return
}
oc.lastIdleConnectionsCleanup = time.Now()
c, ok := oc.baseClient.http.(*http.Client)
if !ok {
return
}
t, ok := c.Transport.(*http.Transport)
if !ok {
return
}
t.CloseIdleConnections()
}
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
func (oc *OrbitClient) RunConfigReceivers() error {
config, err := oc.GetConfig()
if err != nil {
return fmt.Errorf("RunConfigReceivers get config: %w", err)
}
var errs []error
var errMu sync.Mutex
var wg sync.WaitGroup
wg.Add(len(oc.ConfigReceivers))
for _, receiver := range oc.ConfigReceivers {
receiver := receiver
go func() {
defer func() {
if err := recover(); err != nil {
errMu.Lock()
errs = append(errs, fmt.Errorf("panic occured in receiver: %v", err))
errMu.Unlock()
}
wg.Done()
}()
err := receiver.Run(config)
if err != nil {
errMu.Lock()
errs = append(errs, err)
errMu.Unlock()
}
}()
}
wg.Wait()
if len(errs) != 0 {
return errors.Join(errs...)
}
return nil
}
func (oc *OrbitClient) RegisterConfigReceiver(cr fleet.OrbitConfigReceiver) {
oc.ConfigReceivers = append(oc.ConfigReceivers, cr)
}
func (oc *OrbitClient) ExecuteConfigReceivers() error {
ticker := time.NewTicker(oc.ReceiverUpdateInterval)
defer ticker.Stop()
for {
select {
case <-oc.receiverUpdateContext.Done():
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
return nil
case <-ticker.C:
if err := oc.RunConfigReceivers(); err != nil {
log.Error().Err(err).Msg("running config receivers")
}
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
}
}
}
func (oc *OrbitClient) InterruptConfigReceivers(err error) {
oc.receiverUpdateCancelFunc()
Orbit config receiver (#18518) New interface for adding periodic jobs that rely on notifications/config changes in Orbit. Previously if we wanted to have recurring checks in Orbit, we would add them into a chain of `GetConfig` calls. This call chain would be run periodically by one of the runners registered with the cli application framework. The new method to register `OrbitConfigReceivers` with the `OrbitClient`, and then register the orbit client itself with the application framework. Instead of having giving each fetcher an internal reference to the previous fetcher that it must call, the receiver is registered with the client and the new config is passed to the receiver. This is the old `GetConfig()` interface: ```go type OrbitConfigFetcher interface { GetConfig() (*fleet.OrbitConfig, error) } ``` This is the new `OrbitConfigReceiver` interface: ```go type OrbitConfigReceiver interface { Run(*OrbitConfig) error } ``` To register a new receiver, you call the `RegisterConfigReceiver` method on the client. ```go orbitClient.RegisterConfigReceiver(extRunner) ``` Downsides of the old method: - Spaghetti call chain setup - Cascading failure, of one fails, all after it fail - Run in series, one long function call holds up the rest - Anything that wants to restart orbit is added as a Runner to the application, meaning there could be several timers calling `GetConfig` and running the chain Benefits of the new method: - Clean `RegisterConfigReceiver` api, no call chaining required - Config receivers can be added at runtime - Isolated receivers, one failing call don't effect others - All calls are run in parallel in goroutines, no calls can hold up the rest - No more need for multiple runners, using a context cancel, any receiver can queue a call to restart orbit - Single point to handle errors and logging for all receivers - Panic recovery to stop orbit from crashing - Easier to test, configs are passed in and do not require a call chain This branch contains a little bit of code from the installer method I was working on because I branched it off of that. (oops) Not all code comments surrounding old `GetConfig()` methods have been fully updated yet Possible changes: - Update the interface to take a context, so we can let receivers know to exit early. I can imagine two cases for this: - The application is about to restart - We can set a timeout for how long receivers are allowed to take Closes #12662 --------- Co-authored-by: Martin Angers <martin.n.angers@gmail.com> Co-authored-by: Roberto Dip <dip.jesusr@gmail.com>
2024-05-09 19:22:56 +00:00
}
// GetConfig returns the Orbit config fetched from Fleet server for this instance of OrbitClient.
// Since this method is called in multiple places, we use a cache with configCacheTTL time-to-live
// to reduce traffic to the Fleet server.
// Upon network errors, this method will retry the get config request (every 30 seconds).
func (oc *OrbitClient) GetConfig() (*fleet.OrbitConfig, error) {
oc.configCache.mu.Lock()
defer oc.configCache.mu.Unlock()
// If time-to-live passed, we update the config cache
now := time.Now()
if now.After(oc.configCache.lastUpdated.Add(configCacheTTL)) {
verb, path := "POST", "/api/fleet/orbit/config"
var (
resp fleet.OrbitConfig
err error
)
// Retry until we don't get a network error or a 5XX error.
_ = retry.Do(func() error {
err = oc.authenticatedRequest(verb, path, &orbitGetConfigRequest{}, &resp)
var (
netErr net.Error
statusCodeErr *statusCodeErr
)
if err != nil && oc.onGetConfigErrFns != nil && oc.onGetConfigErrFns.DebugErrFunc != nil {
oc.onGetConfigErrFns.DebugErrFunc(err)
}
if errors.As(err, &netErr) || (errors.As(err, &statusCodeErr) && statusCodeErr.code >= 500) {
now := time.Now()
if oc.onGetConfigErrFns != nil && oc.onGetConfigErrFns.OnNetErrFunc != nil && now.After(oc.lastNetErrOnGetConfigLogged.Add(netErrInterval)) {
oc.onGetConfigErrFns.OnNetErrFunc(err)
oc.lastNetErrOnGetConfigLogged = now
}
return err // retry on network or server 5XX errors
}
return nil
}, retry.WithInterval(configRetryOnNetworkError))
oc.configCache.config = &resp
oc.configCache.err = err
oc.configCache.lastUpdated = now
}
return oc.configCache.config, oc.configCache.err
}
// SetOrUpdateDeviceToken sends a request to the server to set or update the
// device token with the given value.
func (oc *OrbitClient) SetOrUpdateDeviceToken(deviceAuthToken string) error {
verb, path := "POST", "/api/fleet/orbit/device_token"
params := setOrUpdateDeviceTokenRequest{
DeviceAuthToken: deviceAuthToken,
}
var resp setOrUpdateDeviceTokenResponse
if err := oc.authenticatedRequest(verb, path, &params, &resp); err != nil {
return err
}
return nil
}
// SetOrUpdateDeviceMappingEmail sends a request to the server to set or update the
// device mapping email with the given value.
func (oc *OrbitClient) SetOrUpdateDeviceMappingEmail(email string) error {
verb, path := "PUT", "/api/fleet/orbit/device_mapping"
params := orbitPutDeviceMappingRequest{
Email: email,
}
var resp orbitPutDeviceMappingResponse
if err := oc.authenticatedRequest(verb, path, &params, &resp); err != nil {
return err
}
return nil
}
// GetHostScript returns the script fetched from Fleet server to run on this
// host.
func (oc *OrbitClient) GetHostScript(execID string) (*fleet.HostScriptResult, error) {
verb, path := "POST", "/api/fleet/orbit/scripts/request"
var resp orbitGetScriptResponse
if err := oc.authenticatedRequest(verb, path, &orbitGetScriptRequest{
ExecutionID: execID,
}, &resp); err != nil {
return nil, err
}
return resp.HostScriptResult, nil
}
// SaveHostScriptResult saves the result of running the script on this host.
func (oc *OrbitClient) SaveHostScriptResult(result *fleet.HostScriptResultPayload) error {
verb, path := "POST", "/api/fleet/orbit/scripts/result"
var resp orbitPostScriptResultResponse
if err := oc.authenticatedRequest(verb, path, &orbitPostScriptResultRequest{
HostScriptResultPayload: result,
}, &resp); err != nil {
return err
}
return nil
}
func (oc *OrbitClient) GetInstallerDetails(installId string) (*fleet.SoftwareInstallDetails, error) {
verb, path := "POST", "/api/fleet/orbit/software_install/details"
var resp orbitGetSoftwareInstallResponse
if err := oc.authenticatedRequest(verb, path, &orbitGetSoftwareInstallRequest{
InstallUUID: installId,
}, &resp); err != nil {
return nil, err
}
return resp.SoftwareInstallDetails, nil
}
func (oc *OrbitClient) SaveInstallerResult(payload *fleet.HostSoftwareInstallResultPayload) error {
verb, path := "POST", "/api/fleet/orbit/software_install/result"
var resp orbitPostSoftwareInstallResultResponse
if err := oc.authenticatedRequest(verb, path, &orbitPostSoftwareInstallResultRequest{
HostSoftwareInstallResultPayload: payload,
}, &resp); err != nil {
return err
}
return nil
}
func (oc *OrbitClient) DownloadSoftwareInstaller(installerID uint, downloadDirectory string, progressFunc func(n int)) (string, error) {
verb, path := "POST", "/api/fleet/orbit/software_install/package?alt=media"
resp := FileResponse{
DestPath: downloadDirectory,
ProgressFunc: progressFunc,
}
if err := oc.authenticatedRequest(verb, path, &orbitDownloadSoftwareInstallerRequest{
InstallerID: installerID,
}, &resp); err != nil {
return "", err
}
return resp.GetFilePath(), nil
}
func (oc *OrbitClient) DownloadSoftwareInstallerFromURL(url string, filename string, downloadDirectory string, progressFunc func(int)) (string, error) {
resp := FileResponse{
DestPath: downloadDirectory,
DestFile: filename,
SkipMediaType: true,
ProgressFunc: progressFunc,
}
if err := oc.requestWithExternal("GET", url, nil, &resp, true); err != nil {
return "", err
}
return resp.GetFilePath(), nil
}
type NullFileResponse struct{}
func (f *NullFileResponse) Handle(resp *http.Response) error {
_, _, err := mime.ParseMediaType(resp.Header.Get("Content-Disposition"))
if err != nil {
return fmt.Errorf("parsing media type from response header: %w", err)
}
_, err = io.Copy(io.Discard, resp.Body)
if err != nil {
return fmt.Errorf("copying from http stream to io.Discard: %w", err)
}
return nil
}
// DownloadAndDiscardSoftwareInstaller downloads the software installer and discards it.
// This method is used during load testing by osquery-perf.
func (oc *OrbitClient) DownloadAndDiscardSoftwareInstaller(installerID uint) error {
verb, path := "POST", "/api/fleet/orbit/software_install/package?alt=media"
resp := NullFileResponse{}
return oc.authenticatedRequest(verb, path, &orbitDownloadSoftwareInstallerRequest{
InstallerID: installerID,
}, &resp)
}
// Ping sends a ping request to the orbit/ping endpoint.
func (oc *OrbitClient) Ping() error {
verb, path := "HEAD", "/api/fleet/orbit/ping"
err := oc.request(verb, path, nil, nil)
if err == nil || errors.Is(err, notFoundErr{}) {
// notFound is ok, it means an old server without the capabilities header
return nil
}
return err
}
func (oc *OrbitClient) enroll() (string, error) {
verb, path := "POST", "/api/fleet/orbit/enroll"
params := contract.EnrollOrbitRequest{
EnrollSecret: oc.enrollSecret,
HardwareUUID: oc.hostInfo.HardwareUUID,
HardwareSerial: oc.hostInfo.HardwareSerial,
Hostname: oc.hostInfo.Hostname,
Platform: oc.hostInfo.Platform,
PlatformLike: oc.hostInfo.PlatformLike,
OsqueryIdentifier: oc.hostInfo.OsqueryIdentifier,
ComputerName: oc.hostInfo.ComputerName,
HardwareModel: oc.hostInfo.HardwareModel,
Orbit to set `--database_path` when invoking osquery to retrieve system info (#10308) #9132 The actual fix for the empty hosts is adding the `--database_path` argument in the initial `osqueryd -S` invocation when retrieving the UUID. Osquery attempts to retrieve the UUID from OS files/APIs, when not possible (which is what happens on some linux distributions), then it resorts to generating a new random UUID and storing it in the `osquery.db`. The issue was Orbit's first invocation of `osqueryd -S` was not using the same `osquery.db` as the main daemon invocation of `osqueryd`. I'm also adding a `hostname` + `platform` to the orbit enroll phase so that if there are any issues in the future we can avoid the "empty" host and have some information to help us troubleshoot. ## How to reproduce On Linux, osquery reads `/sys/class/dmi/id/product_uuid` to load the hardware UUID. Some Linux distributions running on specific hardware or container environments do not have such file available. The way to reproduce on a Linux VM is to do the following: ```sh $ sudo su # chmod -r /sys/class/dmi/id/product_uuid ``` which will turn the file inaccessible to root. ## Checklist - [X] Changes file added for user-visible changes in `changes/` or `orbit/changes/`. See [Changes files](https://fleetdm.com/docs/contributing/committing-changes#changes-files) for more information. - ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or docs/Contributing/API-for-contributors.md)~ - ~[ ] Documented any permissions changes~ - [X] Input data is properly validated, `SELECT *` is avoided, SQL injection is prevented (using placeholders for values in statements) - [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for new osquery data ingestion features. - [X] Added/updated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [x] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [x] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)).
2023-03-13 21:54:18 +00:00
}
var resp EnrollOrbitResponse
err := oc.request(verb, path, params, &resp)
if err != nil {
return "", err
}
return resp.OrbitNodeKey, nil
}
// enrollLock helps protect the enrolling process in case mutliple OrbitClients
// want to re-enroll at the same time.
var enrollLock sync.Mutex
// getNodeKeyOrEnroll attempts to read the orbit node key if the file exists on disk
// otherwise it enrolls the host with Fleet and saves the node key to disk
func (oc *OrbitClient) getNodeKeyOrEnroll() (string, error) {
if oc.TestNodeKey != "" {
return oc.TestNodeKey, nil
}
enrollLock.Lock()
defer enrollLock.Unlock()
orbitNodeKey, err := os.ReadFile(oc.nodeKeyFilePath)
switch {
case err == nil:
return string(orbitNodeKey), nil
case errors.Is(err, fs.ErrNotExist):
// OK, if there's no orbit node key, proceed to enroll.
default:
return "", fmt.Errorf("read orbit node key file: %w", err)
}
var (
orbitNodeKey_ string
endpointDoesNotExist bool
)
if err := retry.Do(
func() error {
var err error
orbitNodeKey_, err = oc.enrollAndWriteNodeKeyFile()
switch {
case err == nil:
return nil
case errors.Is(err, notFoundErr{}):
// Do not retry if the endpoint does not exist.
endpointDoesNotExist = true
return nil
default:
logging.LogErrIfEnvNotSet(constant.SilenceEnrollLogErrorEnvVar, err, "enroll failed, retrying")
return err
}
},
// The below configuration means the following retry intervals (exponential backoff):
// 10s, 20s, 40s, 80s, 160s and then return the failure (max attempts = 6)
// thus executing no more than ~6 enroll request failures every ~5 minutes.
retry.WithInterval(orbitEnrollRetryInterval()),
retry.WithMaxAttempts(constant.OrbitEnrollMaxRetries),
retry.WithBackoffMultiplier(constant.OrbitEnrollBackoffMultiplier),
); err != nil {
return "", fmt.Errorf("orbit node key enroll failed, attempts=%d", constant.OrbitEnrollMaxRetries)
}
if endpointDoesNotExist {
return "", errors.New("enroll endpoint does not exist")
}
return orbitNodeKey_, nil
}
// GetNodeKey gets the orbit node key from file.
func (oc *OrbitClient) GetNodeKey() (string, error) {
orbitNodeKey, err := os.ReadFile(oc.nodeKeyFilePath)
if err != nil {
return "", err
}
return string(orbitNodeKey), nil
}
func (oc *OrbitClient) enrollAndWriteNodeKeyFile() (string, error) {
orbitNodeKey, err := oc.enroll()
if err != nil {
return "", fmt.Errorf("enroll request: %w", err)
}
if runtime.GOOS == "windows" {
// creating the secret file with empty content
if err := os.WriteFile(oc.nodeKeyFilePath, nil, constant.DefaultFileMode); err != nil {
return "", fmt.Errorf("create orbit node key file: %w", err)
}
// restricting file access
if err := platform.ChmodRestrictFile(oc.nodeKeyFilePath); err != nil {
return "", fmt.Errorf("apply ACLs: %w", err)
}
}
// writing raw key material to the acl-ready secret file
if err := os.WriteFile(oc.nodeKeyFilePath, []byte(orbitNodeKey), constant.DefaultFileMode); err != nil {
return "", fmt.Errorf("write orbit node key file: %w", err)
}
return orbitNodeKey, nil
}
func (oc *OrbitClient) authenticatedRequest(verb string, path string, params interface{}, resp interface{}) error {
nodeKey, err := oc.getNodeKeyOrEnroll()
if err != nil {
return err
}
s := params.(setOrbitNodeKeyer)
s.setOrbitNodeKey(nodeKey)
err = oc.request(verb, path, params, resp)
switch {
case err == nil:
oc.setEnrolled(true)
return nil
case errors.Is(err, ErrUnauthenticated):
if err := os.Remove(oc.nodeKeyFilePath); err != nil {
log.Info().Err(err).Msg("remove orbit node key")
}
oc.setEnrolled(false)
fleetd generate TPM key and issue SCEP certificate (#30932) #30461 This PR contains the changes for the happy path. On a separate PR we will be adding tests and further fixes for edge cases. - [X] Changes file added for user-visible changes in `changes/`, `orbit/changes/` or `ee/fleetd-chrome/changes`. See [Changes files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files) for more information. - [ ] Added/updated automated tests - [x] Manual QA for all new/changed functionality - For Orbit and Fleet Desktop changes: - [ ] Make sure fleetd is compatible with the latest released version of Fleet (see [Must rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md)). - [ ] Orbit runs on macOS, Linux and Windows. Check if the orbit feature/bugfix should only apply to one platform (`runtime.GOOS`). - [ ] Manual QA must be performed in the three main OSs, macOS, Windows and Linux. - [ ] Auto-update manual QA, from released version of component to new version (see [tools/tuf/test](../tools/tuf/test/README.md)). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for using a TPM-backed key and SCEP-issued certificate to sign HTTP requests, enhancing security through hardware-based key management. * Introduced new CLI and environment flags to enable TPM-backed client certificates for Linux packages and Orbit. * Added a local HTTPS proxy that automatically signs requests using the TPM-backed key. * **Bug Fixes** * Improved cleanup and restart behavior when authentication fails with a host identity certificate. * **Tests** * Added comprehensive tests for SCEP client functionality and TPM integration. * **Chores** * Updated scripts and documentation to support TPM-backed client certificate packaging and configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-18 14:31:52 +00:00
if oc.hostIdentityCertPath != "" {
if err := os.Remove(oc.hostIdentityCertPath); err != nil {
log.Info().Err(err).Msg("remove orbit host identity cert")
}
log.Info().Msg("removed orbit host identity cert, triggering a restart")
oc.receiverUpdateCancelFunc()
}
return err
default:
return err
}
}
func (oc *OrbitClient) Enrolled() bool {
oc.enrolledMu.Lock()
defer oc.enrolledMu.Unlock()
return oc.enrolled
}
func (oc *OrbitClient) setEnrolled(v bool) {
oc.enrolledMu.Lock()
defer oc.enrolledMu.Unlock()
oc.enrolled = v
}
func (oc *OrbitClient) LastRecordedError() error {
oc.lastRecordedErrMu.Lock()
defer oc.lastRecordedErrMu.Unlock()
return oc.lastRecordedErr
}
func (oc *OrbitClient) setLastRecordedError(err error) {
oc.lastRecordedErrMu.Lock()
defer oc.lastRecordedErrMu.Unlock()
oc.lastRecordedErr = fmt.Errorf("%s: %w", time.Now().UTC().Format("2006-01-02T15:04:05Z"), err)
}
func orbitEnrollRetryInterval() time.Duration {
interval := os.Getenv("FLEETD_ENROLL_RETRY_INTERVAL")
if interval != "" {
d, err := time.ParseDuration(interval)
if err == nil {
return d
}
}
return constant.OrbitEnrollRetrySleep
}
// SetOrUpdateDiskEncryptionKey sends a request to the server to set or update the disk
// encryption keys and result of the encryption process
func (oc *OrbitClient) SetOrUpdateDiskEncryptionKey(diskEncryptionStatus fleet.OrbitHostDiskEncryptionKeyPayload) error {
verb, path := "POST", "/api/fleet/orbit/disk_encryption_key"
var resp orbitPostDiskEncryptionKeyResponse
if err := oc.authenticatedRequest(verb, path, &orbitPostDiskEncryptionKeyRequest{
EncryptionKey: diskEncryptionStatus.EncryptionKey,
ClientError: diskEncryptionStatus.ClientError,
}, &resp); err != nil {
return err
}
return nil
}
const httpTraceTimeFormat = "2006-01-02T15:04:05Z"
var testStdoutHTTPTracer = &httptrace.ClientTrace{
ConnectStart: func(network, addr string) {
fmt.Printf(
"httptrace: %s: ConnectStart: %s, %s\n",
time.Now().UTC().Format(httpTraceTimeFormat), network, addr,
)
},
ConnectDone: func(network, addr string, err error) {
fmt.Printf(
"httptrace: %s: ConnectDone: %s, %s, err='%s'\n",
time.Now().UTC().Format(httpTraceTimeFormat), network, addr, err,
)
},
}
// GetSetupExperienceStatus checks the status of the setup experience for this host.
func (oc *OrbitClient) GetSetupExperienceStatus() (*fleet.SetupExperienceStatusPayload, error) {
verb, path := "POST", "/api/fleet/orbit/setup_experience/status"
var resp getOrbitSetupExperienceStatusResponse
err := oc.authenticatedRequest(verb, path, &getOrbitSetupExperienceStatusRequest{}, &resp)
if err != nil {
return nil, err
}
return resp.Results, nil
}
2024-11-21 16:31:03 +00:00
func (oc *OrbitClient) SendLinuxKeyEscrowResponse(lr luks.LuksResponse) error {
verb, path := "POST", "/api/fleet/orbit/luks_data"
var resp orbitPostLUKSResponse
if err := oc.authenticatedRequest(verb, path, &orbitPostLUKSRequest{
Passphrase: lr.Passphrase,
KeySlot: lr.KeySlot,
Salt: lr.Salt,
ClientError: lr.Err,
}, &resp); err != nil {
return err
}
return nil
}
func (oc *OrbitClient) InitiateSetupExperience() (fleet.SetupExperienceInitResult, error) {
verb, path := "POST", "/api/fleet/orbit/setup_experience/init"
var resp orbitSetupExperienceInitResponse
if err := oc.authenticatedRequest(verb, path, &orbitSetupExperienceInitRequest{}, &resp); err != nil {
return fleet.SetupExperienceInitResult{}, err
}
return resp.Result, nil
}