Manufactory > Introduction to observability best practices
What is observability?
Observability helps you understand what’s happening inside your complex systems. It’s about being able to ask any question about your system’s state and get answers, even for conditions you didn’t anticipate.
Imagine you’re trying to understand why your car is making a strange noise. Traditional monitoring is like having a few dashboard lights: oil pressure is low, engine temperature is high. These are pre-defined checks for known problems.
Observability, on the other hand, is like having a sophisticated diagnostic tool that can query every sensor in your car, look at historical data, and correlate different readings to figure out precisely what’s going on, even if it’s a problem the car manufacturer never specifically designed a warning light for. It’s the ability to understand the internal workings of your system by examining its external outputs – primarily the rich, detailed data from events and traces.
A key aspect of observability is its power to help you debug “unknown unknowns.” These are the novel or bizarre states your system might get into that you couldn’t have predicted and therefore couldn’t have set up a specific monitor for. With Tallyfy Manufactory managing crucial event lifecycles, being able to explore these unknown states is vital.
While often used interchangeably, observability and monitoring serve different, though complementary, purposes, especially in the context of event-driven systems like those Tallyfy Manufactory supports.
Traditional Monitoring:
- Focuses on “known unknowns”: It answers questions you already know to ask. For example, “Is the CPU utilization high?” or “Is the database connection pool exhausted?”
- Relies on pre-defined metrics and dashboards: You set up specific checks and visualizations for conditions you anticipate might go wrong.
- Primarily reactive: Alerts fire when a known threshold is breached, prompting you to investigate a problem that is likely already occurring.
Observability:
- Addresses “unknown unknowns”: It helps you understand issues you didn’t predict. For example, “Why are events from a specific region processed 50% slower only between 2 AM and 3 AM on Tuesdays after a specific Manufactory actor runs?”
- Relies on rich, high-cardinality event data: This detailed data allows for deep exploration and ad-hoc querying.
- Enables proactive exploration and debugging: You can interrogate your system to understand nuanced behaviors and identify subtle issues before they escalate.
For complex, event-driven systems where Tallyfy Manufactory plays a central role in ingesting, routing, and triggering actions based on events, monitoring alone often falls short. The sheer number of potential states and interactions makes it impossible to pre-define all relevant metrics or anticipate every failure mode. Observability provides the tools to navigate this complexity.
Implementing observability practices around your Tallyfy Manufactory event data brings significant advantages:
- Faster troubleshooting: When an event fails to process or a workflow involving Manufactory gets stuck, rich event data allows you to quickly pinpoint the “why” and “where.” You can trace the event’s journey, inspect its payload at various stages (including within Manufactory), and understand the context of the failure.
- Understanding event lifecycles: Gain deep clarity on how events are ingested by Manufactory, how they are processed by different projects or actors, and how they trigger subsequent actions. This is crucial for optimizing complex event chains.
- Improving system reliability and performance: By analyzing event patterns, error rates, and processing times through Manufactory, you can identify bottlenecks, recurring error types, and opportunities for optimization in your event-driven workflows.
- Increased confidence in changes: When you modify a Manufactory project, update an actor, or integrate a new event source, observability allows you to see the real-time impact of these changes on event processing and system behavior.
- Proactive problem detection: You can often identify subtle anomalies or deviations from expected event behavior within Manufactory before they escalate into user-facing outages or data inconsistencies.
Consider an automated customer onboarding process. A new customer signs up on your website, which generates a CustomerRegistered
event. This event is sent to Tallyfy Manufactory.
Manufactory is configured with a project that listens for CustomerRegistered
events. Upon receiving such an event, it triggers several actors:
- An actor to create a welcome package in your CRM.
- An actor to send a welcome email via your marketing automation platform.
- An actor to provision access to specific product features.
Now, imagine a customer reports they never received their welcome email.
- Without observability: You might check email logs, CRM logs, and access control logs separately, trying to piece together what happened. It could be time-consuming.
- With observability: You would find the
CustomerRegistered
event associated with that customer. You could then trace its journey:- Confirm Manufactory ingested the event.
- See if the Manufactory project correctly identified and triggered the “send welcome email” actor.
- Examine the event data passed to that actor and any response or error it logged back to an observable system.
This allows you to quickly determine if the issue was with the event data itself, a configuration in Manufactory, or a problem within the welcome email actor or the email platform it called. This targeted investigation saves significant time and effort.
Traditional metrics and logs certainly provide value. Metrics can give you a high-level overview of event throughput in Manufactory (e.g., “events processed per minute”), and logs from Manufactory actors can offer specific error details.
However, Tallyfy Manufactory’s core strength lies in its event-centric nature. The rich, structured event data that Manufactory processes and potentially enriches offers a much deeper level of insight for event-driven architectures. While a metric might tell you that event processing is slow, observable event data (especially when traced) can help you understand why specific events are slow, which services are involved, and what particular conditions correlate with the slowness. This granular detail is key to truly understanding and optimizing the complex workflows managed by Tallyfy Manufactory.
Best Practices > Adopting an observability culture
Best Practices > Analyzing events and deriving insights
Best Practices > Best practices for instrumenting applications
- 2025 Tallyfy, Inc.
- Privacy Policy
- Terms of Use
- Report Issue
- Trademarks