Adopting an observability culture

Using Tallyfy Manufactory effectively for deep insights into your event-driven systems goes beyond just implementing the right tools and instrumentation. It requires fostering an observability culture within your teams - a way of thinking and working that prioritizes understanding system behavior through data.

Observability is more than just tools

While having well-instrumented applications sending rich event data to Tallyfy Manufactory is essential, true observability is achieved when your teams embrace a cultural shift. This involves:

Curiosity: Encouraging engineers and other stakeholders to proactively ask questions about how their systems and Tallyfy Manufactory workflows are behaving, rather than just reacting to problems.
Shared Ownership: Recognizing that understanding system performance and reliability, including event processing through Manufactory, is everyone’s responsibility, not just a dedicated operations team.
Blamelessness: When incidents occur, focusing on learning what happened and how to improve (including improving Manufactory configurations or event instrumentation), rather than assigning blame.
Data-Driven Decision-Making: Using insights gleaned from Tallyfy Manufactory event data and other observability sources to inform design choices, prioritize improvements, and validate changes.

Observability-driven development with Manufactory

Observability-Driven Development (ODD) is one of the most impactful ways to embed observability into your culture. This practice is especially valuable for services that interact with or are orchestrated by Tallyfy Manufactory:

Think About Observability from the Start: When designing a new feature that involves events passing through Tallyfy Manufactory, ask critical questions upfront: “How will we observe this new workflow?” “What specific events does Manufactory need to see, and with what attributes, for us to understand its behavior?” “What potential failure points exist in this Manufactory project, and how will our events help us diagnose them?”
Instrument as You Code: Make emitting detailed, contextual events (destined for or generated by Manufactory) a natural part of the development process, not an afterthought.
Use Manufactory Data During Development and Testing: Observe event flows and actor behaviors in pre-production environments. This can help catch integration issues with Manufactory or flawed logic before it reaches production.
Ship with Confidence: Knowing you have the visibility through Tallyfy Manufactory (and associated tools) to understand how new features are performing in production reduces deployment anxiety and speeds up feedback loops.

Fostering a data-driven culture with Tallyfy Manufactory

To make decisions based on data from Tallyfy Manufactory, that data needs to be accessible and its exploration encouraged:

Democratize Access to Manufactory Data: Where appropriate and feasible (considering data sensitivity and tool capabilities), make insights from Tallyfy Manufactory event data available not just to operations engineers, but also to developers, product managers, and support teams. This might involve dashboards, query tools, or regular reports derived from Manufactory data.
Encourage Exploration: Create an environment where teams feel empowered and psychologically safe to explore Tallyfy Manufactory event data. They should be able to ask ad-hoc questions without fear of “breaking something” or inadvertently running up excessive costs (within reasonable, guided bounds).
Regularly Review Manufactory Event Patterns: Incorporate the analysis of event data from Tallyfy Manufactory into regular team meetings, sprint reviews, or incident retrospectives. Discuss questions like: “What did we learn from our Manufactory event flows this past week?” or “Are there any surprising or anomalous event patterns emerging in our Manufactory projects?”

Shared ownership and collaboration around Tallyfy Manufactory events

Observability thrives when silos are broken down and different teams collaborate using shared data, such as that flowing through Tallyfy Manufactory:

Developers Owning Their Code in Production: Using event data (including data on how their services interact with Manufactory) to understand the real-world impact and performance of their code.
Support Teams Using Manufactory Insights: Empowering support personnel to troubleshoot customer issues more effectively by examining relevant event flows and statuses within Tallyfy Manufactory (or tools that display its data).
Product Teams Using Manufactory Data for Feature Validation: Understanding how features that trigger, or are triggered by, Tallyfy Manufactory events are being used by customers, and identifying areas for improvement.
A Common Language: Tallyfy Manufactory event data can become a shared language that helps different teams (development, operations, product, support) communicate more effectively about system behavior.

Blameless learning from incidents involving Tallyfy Manufactory

When an incident occurs involving a workflow managed or monitored by Tallyfy Manufactory, the cultural approach is critical:

Focus on Learning, Not Blame: The primary goal of a post-incident review should be to understand the sequence of events (using Manufactory data, traces, etc.) and identify systemic improvements, not to find individuals at fault.
Reconstruct What Happened: Use event data from Tallyfy Manufactory, distributed traces, logs, and metrics to build a clear timeline of the incident.
Ask Improvement-Oriented Questions: “How could our event instrumentation for Manufactory have helped us detect this issue sooner?” “How could data from Manufactory have allowed us to diagnose the root cause faster?” “What changes to our Manufactory project configurations or actor logic could prevent this in the future?”
Feed Learnings Back: Ensure that insights from incidents lead to concrete actions, such as improving event schemas, enhancing instrumentation for Manufactory, refining actor logic, or updating alert configurations.

Practical steps to cultivate an observability culture for Manufactory

Building this culture takes deliberate effort:

Leadership Buy-in and Advocacy: Management must champion the importance of observability and allocate the necessary resources (time, tools, training) for initiatives related to Tallyfy Manufactory.
Training and Enablement: Provide training on how to effectively instrument applications for Tallyfy Manufactory, how to query and analyze its event data, and how to interpret distributed traces that include Manufactory spans.
Internal Champions or an “Observability Guild”: Identify and empower enthusiasts within your organization to share best practices, help onboard other teams to using Tallyfy Manufactory for observability, and drive continuous improvement.
Show, Don’t Just Tell: Demonstrate the value of Tallyfy Manufactory event analysis by using it to solve real, painful problems and then widely sharing those success stories.
Integrate with Existing Workflows: Add observability considerations into your existing processes. For example, include a checklist item in code reviews: “Is this new service/feature adequately instrumented to send relevant events to/from Manufactory?”
Start Small and Iterate: Don’t attempt to transform your entire organization’s culture overnight. Pick a key workflow involving Tallyfy Manufactory, establish good observability practices for it, demonstrate the benefits, and then expand from there.

Measuring the impact of your observability culture

While cultural change can be hard to quantify precisely, you can look for indicators of progress:

Qualitative Measures: Increased confidence among teams when deploying changes that interact with Manufactory, better collaboration between developers and operations, more proactive problem-solving discussions centered around Manufactory event data.
Quantitative Measures (often influenced by multiple factors, but indicative):
- Reduction in Mean Time To Resolution (MTTR)¹ for incidents involving Tallyfy Manufactory workflows.
- Improved reliability (higher SLO 2 compliance) for services and processes integrated with Manufactory.
- Potentially faster development cycles for features that use Tallyfy Manufactory, due to quicker debugging and validation.
- Increased usage of Tallyfy Manufactory data by diverse teams (not just operations) for decision-making.

Continuous improvement

An observability culture, much like the Tallyfy Manufactory system itself and the applications it integrates with, is never truly “done.” It requires continuous nurturing and improvement. Regularly solicit feedback on how Tallyfy Manufactory is being utilized for observability within your organization and actively look for opportunities to refine your tools, processes, and skills.

Manufactory > Introduction to observability best practices

This guide explains how observability practices enable deep understanding of event-driven systems through Tallyfy Manufactory by providing structured approaches to monitoring troubleshooting and analyzing system behavior using rich event data.

Best Practices > What is observability?

Observability enables deep understanding of complex systems through detailed event data analysis to explore and debug both known and unknown issues without relying solely on predefined metrics.

Best Practices > Analyzing events and deriving insights

Effective event analysis in Tallyfy Manufactory transforms raw data into actionable insights through an iterative core analysis loop of observing issues and formulating hypotheses and testing them by filtering and grouping event data to answer questions about system behavior and performance and failures which ultimately drives improvements in system design and workflow optimization and actor logic.

Best Practices > Best practices for instrumenting applications

Good instrumentation in Tallyfy Manufactory requires including unique identifiers along with event semantics and timestamps and rich contextual payload data to enable effective routing and processing while supporting meaningful analysis and troubleshooting of event-driven workflows.

Average time from incident detection to full resolution, key metric for operational efficiency ↩
Service Level Objective - measurable reliability targets like uptime or response time goals ↩

Was this helpful?

Get in touch

About Tallyfy