The Health Check assessment reports against your current Datadog APM, Infrastructure, Logging and Security configuration and provides report findings aligned to Datadog & industry best practices with an overall grading including recommendations for improvements.
The assessment considers multiple aspects such as agent release, APM dependencies checks, quality of monitors, dashboards and the most effective use and storage of logs.
Outcomes
- Comprehensive report including a rating of the overall alignment against Datadog best practices
- Unveil potential areas for improvements to your configuration
- Identify the existing knowledge of specific areas within your Datadog platform
- Identify the efficiencies and effectiveness of the log management
- Review your data tagging implementation and provide recommendations for improve consistencies, efficiencies, and alignment against Datadog best practices. This is critical to ensure maximum benefits from the platform and Datadog support service
- Identification and recommendation of additional enablement to level up the knowledge of your existing Datadog team
Monitors
- Review monitor lists
- Alerts assessment
- Alert automation potential
- Escalations assessment – e.g. critical alerts to on-call, non-critical alerts to email / ticketing system
- Assess for sub optimally configured monitors. Aimed at avoiding alert fatigue.
- Quality notification content.
- Monitor tagging & grouping convection and consistency
- Identification of persistent monitor triggers
- Point out any monitors in “NO DATA” state
- Point out any monitors muted indefinitely, and check whether mutes have comments
- Check suggested monitors from APM / services
- Check recommended monitors (Monitors → Create new, recommended tag)
Monitor your infrastructure without having to learn a query language
Logs
- Comprehensive Log monitor review
- Filters and exclusions for indexing aimed at minimising costs while maintaining effective visibility
- Retention periods assessed
- Daily quotas
- Use of log metrics within dashboards for improved visibility
APM
- Assess Application, database, and API integrations for quality of data and integration and visibility
- Service & application map dependencies review
- End to end trace linkage of monitors and services
- Code tracing library inspection
- Transaction error rate, latency, alert, and telemetry context review
- Deployment transport and automation review
Dashboards
- Assess the views and visualizations alignment to your team or department
- Report on dashboard structures, grouping, and tagging
- Review dashboard alert settings against appropriate visibility to avoid alert fatigue and improve timely issue responses
Agent
- Examine Infrastructure List – Agent grouping and versions
- Agent deployment gap analysis
- Agent version review
- Assess tags on infrastructure hosts