fleet/ee/server/service/setup_experience.go
Scott Gress 61970118e9
Stop setup experience on software install failure (#34173)
<!-- Add the related story/sub-task/bug number, like Resolves #123, or
remove if NA -->
**Related issue:** Resolves #33173
**Related issue:** Resolves #33111 

# Details

This is the remaining work to implement the "Stop the setup experience
when required software fails to install" feature. This didn't turn out
to be quite as straightforward as expected so I ended up doing a bit of
design-by-code and expect some feedback on the approach. I tried to make
it as low-touch as possible. The general design is:

1. In the `maybeUpdateSetupExperienceStatus` function which is called in
various places when a setup experience step is marked as completed, call
a new `maybeCancelPendingSetupExperienceSteps` function if the setup
step was marked as failed. Similarly call
`maybeCancelPendingSetupExperienceSteps` if a VPP app install fails to
enqueue.
2. In `maybeCancelPendingSetupExperienceSteps`, check whether the
specified host is MacOS and whether the "RequireAllSoftwareMacOS" flag
is set in the team (or global) config. If so, mark the remaining setup
experience items as canceled and cancel any upcoming activities related
to those steps.
3. On the front-end, if the `require_all_software_macos` is set and a
software step is marked as failed, show a new failure page indicating
that setup has failed and showing details of the failed software.
4. On the agent side, when checking setup experience status, send a
`reset_after_failure` flag _only the first time_. If this flag is set,
then the code in the `/orbit/setup_experience/status` handler will clear
and re-queue any failed setup experience steps (but leave successful
steps to avoid re-installing already-installed software). This
facilitates re-starting the setup experience when the host is rebooted.

I also updated the way that software (packages and VPP) is queued up for
the setup experience to be ordered alphabetically, to make it easier to
test _and_ because this is a desired outcome for a future story. Since
the order is not deterministic now, this update shouldn't cause any
problems (aside from a couple of test updates), but I'm ok taking it out
if desired.

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

- [X] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/guides/committing-changes.md#changes-files)
for more information.

- [X] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)

## Testing

- [X] Added/updated automated tests
* Added a new integration test for software packages, testing that a
failed software package causes the rest of the setup experience to be
marked as failed when `require_all_software_macos` is set, and testing
that the "reset after failure" code works.
* Added a new integration test for VPP packages, testing that a failed
VPP enqueue causes the same halting of the setup experience.
I _don't_ have test for a failure _during_ a VPP install. It should go
through the same code path as the software package failure, so it's not
a huge gap.

- [ ] QA'd all new/changed functionality manually
Working on it 

## fleetd/orbit/Fleet Desktop

- [X] Verified compatibility with the latest released version of Fleet
(see [Must
rule](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/workflows/fleetd-development-and-release-strategy.md))
- [X] If the change applies to only one platform, confirmed that
`runtime.GOOS` is used as needed to isolate changes
- [X] Verified that fleetd runs on macOS, Linux and Windows


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Configurable option to halt macOS device setup if any software install
fails.
- Device setup page now shows a clear “Device setup failed” state with
expandable error details when all software is required on macOS.
- Improvements
- Setup status now includes per-step error messages for better
troubleshooting.
- Pending setup steps are automatically canceled after a failure when
applicable, with support to reset and retry the setup flow as
configured.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ian Littman <iansltx@gmail.com>
2025-10-17 08:38:53 -05:00

326 lines
12 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package service
import (
"context"
"errors"
"io"
"net/http"
"path/filepath"
"github.com/fleetdm/fleet/v4/server/authz"
"github.com/fleetdm/fleet/v4/server/contexts/ctxerr"
"github.com/fleetdm/fleet/v4/server/fleet"
"github.com/fleetdm/fleet/v4/server/ptr"
"github.com/go-kit/log/level"
)
func (svc *Service) SetSetupExperienceSoftware(ctx context.Context, platform string, teamID uint, titleIDs []uint) error {
if err := svc.authz.Authorize(ctx, &fleet.SoftwareInstaller{TeamID: &teamID}, fleet.ActionWrite); err != nil {
return err
}
var teamName string
if teamID == 0 {
teamName = ""
ac, err := svc.ds.AppConfig(ctx)
if err != nil {
return ctxerr.Wrap(ctx, err, "getting app config")
}
if ac.MDM.MacOSSetup.ManualAgentInstall.Value && len(titleIDs) != 0 {
return fleet.NewUserMessageError(errors.New("Couldnt add setup experience software. To add software, first disable manual_agent_install."), http.StatusUnprocessableEntity)
}
} else {
team, err := svc.ds.Team(ctx, teamID)
if err != nil {
return ctxerr.Wrap(ctx, err, "load team")
}
teamName = team.Name
if team.Config.MDM.MacOSSetup.ManualAgentInstall.Value && len(titleIDs) != 0 {
return fleet.NewUserMessageError(errors.New("Couldnt add setup experience software. To add software, first disable manual_agent_install."), http.StatusUnprocessableEntity)
}
}
if err := svc.ds.SetSetupExperienceSoftwareTitles(ctx, platform, teamID, titleIDs); err != nil {
return ctxerr.Wrap(ctx, err, "setting setup experience titles")
}
if err := svc.NewActivity(
ctx,
authz.UserFromContext(ctx),
fleet.ActivityEditedSetupExperienceSoftware{
Platform: platform,
TeamID: teamID,
TeamName: teamName,
},
); err != nil {
return ctxerr.Wrap(ctx, err, "create activity for set setup experience software")
}
return nil
}
func (svc *Service) ListSetupExperienceSoftware(ctx context.Context, platform string, teamID uint, opts fleet.ListOptions) ([]fleet.SoftwareTitleListResult, int, *fleet.PaginationMetadata, error) {
if err := svc.authz.Authorize(ctx, &fleet.AuthzSoftwareInventory{
TeamID: &teamID,
}, fleet.ActionRead); err != nil {
return nil, 0, nil, err
}
titles, count, meta, err := svc.ds.ListSetupExperienceSoftwareTitles(ctx, platform, teamID, opts)
if err != nil {
return nil, 0, nil, ctxerr.Wrap(ctx, err, "retrieving list of software setup experience titles")
}
return titles, count, meta, nil
}
func (svc *Service) GetSetupExperienceScript(ctx context.Context, teamID *uint, withContent bool) (*fleet.Script, []byte, error) {
if err := svc.authz.Authorize(ctx, &fleet.Script{TeamID: teamID}, fleet.ActionRead); err != nil {
return nil, nil, err
}
script, err := svc.ds.GetSetupExperienceScript(ctx, teamID)
if err != nil {
return nil, nil, ctxerr.Wrap(ctx, err, "get setup experience script")
}
var content []byte
if withContent {
content, err = svc.ds.GetAnyScriptContents(ctx, script.ScriptContentID)
if err != nil {
return nil, nil, ctxerr.Wrap(ctx, err, "get setup experience script contents")
}
}
return script, content, nil
}
func (svc *Service) SetSetupExperienceScript(ctx context.Context, teamID *uint, name string, r io.Reader) error {
if err := svc.authz.Authorize(ctx, &fleet.Script{TeamID: teamID}, fleet.ActionWrite); err != nil {
return err
}
if teamID == nil {
ac, err := svc.ds.AppConfig(ctx)
if err != nil {
return ctxerr.Wrap(ctx, err, "getting app config")
}
if ac.MDM.MacOSSetup.ManualAgentInstall.Value {
return fleet.NewUserMessageError(errors.New("Couldnt add setup experience script. To add script, first disable manual_agent_install."), http.StatusUnprocessableEntity)
}
} else {
team, err := svc.ds.Team(ctx, *teamID)
if err != nil {
return ctxerr.Wrap(ctx, err, "load team")
}
if team.Config.MDM.MacOSSetup.ManualAgentInstall.Value {
return fleet.NewUserMessageError(errors.New("Couldnt add setup experience script. To add script, first disable manual_agent_install."), http.StatusUnprocessableEntity)
}
}
b, err := io.ReadAll(r)
if err != nil {
return ctxerr.Wrap(ctx, err, "read setup experience script contents")
}
script := &fleet.Script{
TeamID: teamID,
Name: name,
ScriptContents: string(b),
}
if err := svc.ds.ValidateEmbeddedSecrets(ctx, []string{script.ScriptContents}); err != nil {
return fleet.NewInvalidArgumentError("script", err.Error())
}
// setup experience is only supported for macOS currently so we need to override the file
// extension check in the general script validation
if filepath.Ext(script.Name) != ".sh" {
return fleet.NewInvalidArgumentError("script", "File type not supported. Only .sh file type is allowed.")
}
// now we can do our normal script validation
if err := script.ValidateNewScript(); err != nil {
return fleet.NewInvalidArgumentError("script", err.Error())
}
if err := svc.ds.SetSetupExperienceScript(ctx, script); err != nil {
var (
existsErr fleet.AlreadyExistsError
fkErr fleet.ForeignKeyError
)
if errors.As(err, &existsErr) {
err = fleet.NewInvalidArgumentError("script", err.Error()).WithStatus(http.StatusConflict) // TODO: confirm error message with product/frontend
} else if errors.As(err, &fkErr) {
err = fleet.NewInvalidArgumentError("team_id", "The team does not exist.").WithStatus(http.StatusNotFound)
}
return ctxerr.Wrap(ctx, err, "create setup experience script")
}
// NOTE: there is no activity specified for set setup experience script
return nil
}
func (svc *Service) DeleteSetupExperienceScript(ctx context.Context, teamID *uint) error {
if err := svc.authz.Authorize(ctx, &fleet.Script{TeamID: teamID}, fleet.ActionWrite); err != nil {
return err
}
if err := svc.ds.DeleteSetupExperienceScript(ctx, teamID); err != nil {
return ctxerr.Wrap(ctx, err, "delete setup experience script")
}
// NOTE: there is no activity specified for delete setup experience script
return nil
}
func (svc *Service) SetupExperienceNextStep(ctx context.Context, host *fleet.Host) (bool, error) {
hostUUID, err := fleet.HostUUIDForSetupExperience(host)
if err != nil {
return false, ctxerr.Wrap(ctx, err, "failed to get host's UUID for the setup experience")
}
statuses, err := svc.ds.ListSetupExperienceResultsByHostUUID(ctx, hostUUID)
if err != nil {
return false, ctxerr.Wrap(ctx, err, "retrieving setup experience status results for next step")
}
var installersPending, appsPending, scriptsPending []*fleet.SetupExperienceStatusResult
var installersRunning, appsRunning, scriptsRunning int
for _, status := range statuses {
if err := status.IsValid(); err != nil {
return false, ctxerr.Wrap(ctx, err, "invalid row")
}
switch {
case status.SoftwareInstallerID != nil:
switch status.Status {
case fleet.SetupExperienceStatusPending:
installersPending = append(installersPending, status)
case fleet.SetupExperienceStatusRunning:
installersRunning++
}
case status.VPPAppTeamID != nil:
switch status.Status {
case fleet.SetupExperienceStatusPending:
appsPending = append(appsPending, status)
case fleet.SetupExperienceStatusRunning:
appsRunning++
}
case status.SetupExperienceScriptID != nil:
switch status.Status {
case fleet.SetupExperienceStatusPending:
scriptsPending = append(scriptsPending, status)
case fleet.SetupExperienceStatusRunning:
scriptsRunning++
}
}
}
switch {
case len(installersPending) > 0:
// enqueue installers
for _, installer := range installersPending {
installUUID, err := svc.ds.InsertSoftwareInstallRequest(ctx, host.ID, *installer.SoftwareInstallerID, fleet.HostSoftwareInstallOptions{
SelfService: false,
ForSetupExperience: true,
})
if err != nil {
return false, ctxerr.Wrap(ctx, err, "queueing setup experience install request")
}
installer.HostSoftwareInstallsExecutionID = &installUUID
installer.Status = fleet.SetupExperienceStatusRunning
if err := svc.ds.UpdateSetupExperienceStatusResult(ctx, installer); err != nil {
return false, ctxerr.Wrap(ctx, err, "updating setup experience result with install uuid")
}
}
case installersRunning == 0 && len(appsPending) > 0:
// enqueue vpp apps
var skipRemainingVPPInstalls bool
enqueueVPPApps:
for _, app := range appsPending {
vppAppID, err := app.VPPAppID()
if err != nil {
return false, ctxerr.Wrap(ctx, err, "constructing vpp app details for installation")
}
if app.SoftwareTitleID == nil {
return false, ctxerr.Errorf(ctx, "setup experience software title id missing from vpp app install request: %d", app.ID)
}
vppApp := &fleet.VPPApp{
TitleID: *app.SoftwareTitleID,
VPPAppTeam: fleet.VPPAppTeam{
VPPAppID: *vppAppID,
},
}
cmdUUID, err := svc.installSoftwareFromVPP(ctx, host, vppApp, true, fleet.HostSoftwareInstallOptions{
SelfService: false,
ForSetupExperience: true,
})
app.NanoCommandUUID = &cmdUUID
app.Status = fleet.SetupExperienceStatusRunning
if err != nil {
// if we get an error (e.g. no available licenses) while attempting to enqueue the
// install, then we should immediately go to an error state so setup experience
// isn't blocked.
level.Warn(svc.logger).Log("msg", "got an error when attempting to enqueue VPP app install", "err", err, "adam_id", app.VPPAppAdamID)
app.Status = fleet.SetupExperienceStatusFailure
app.Error = ptr.String(err.Error())
// At this point we need to check whether the "cancel if software install fails" setting is active,
// in which case we'll cancel the remaining pending items.
requireAllSoftware, err := svc.IsAllSetupExperienceSoftwareRequired(ctx, host)
if err != nil {
return false, ctxerr.Wrap(ctx, err, "checking if all software is required after vpp app install failure")
}
if requireAllSoftware {
err := svc.MaybeCancelPendingSetupExperienceSteps(ctx, host)
if err != nil {
return false, ctxerr.Wrap(ctx, err, "cancelling remaining setup experience steps after vpp app install failure")
}
skipRemainingVPPInstalls = true
}
}
if err := svc.ds.UpdateSetupExperienceStatusResult(ctx, app); err != nil {
return false, ctxerr.Wrap(ctx, err, "updating setup experience with vpp install command uuid")
}
if skipRemainingVPPInstalls {
break enqueueVPPApps
}
}
case installersRunning == 0 && appsRunning == 0 && len(scriptsPending) > 0:
// enqueue scripts
for _, script := range scriptsPending {
if script.ScriptContentID == nil {
return false, ctxerr.Errorf(ctx, "setup experience script missing content id: %d", *script.SetupExperienceScriptID)
}
req := &fleet.HostScriptRequestPayload{
HostID: host.ID,
ScriptName: script.Name,
ScriptContentID: *script.ScriptContentID,
// because the script execution request is associated with setup experience,
// it will be enqueued with a higher priority and will run before other
// items in the queue.
SetupExperienceScriptID: script.SetupExperienceScriptID,
}
res, err := svc.ds.NewHostScriptExecutionRequest(ctx, req)
if err != nil {
return false, ctxerr.Wrap(ctx, err, "queueing setup experience script execution request")
}
script.ScriptExecutionID = &res.ExecutionID
script.Status = fleet.SetupExperienceStatusRunning
if err := svc.ds.UpdateSetupExperienceStatusResult(ctx, script); err != nil {
return false, ctxerr.Wrap(ctx, err, "updating setup experience script execution id")
}
}
case installersRunning == 0 && appsRunning == 0 && scriptsRunning == 0:
// finished
return true, nil
}
return false, nil
}