Skip to main content

Observability Guide

Related docs: SDLC/MAINTENANCE · Microservices · Hosting Infrastructure


1. Accessing Dashboards

Access your dashboards at Middleware Console.

  • API Performance: Overview of request rates, error rates, and latency across all 19 services.
  • Payment Success: Tracking transaction success rates by provider (PawaPay, Flutterwave, Mock).
  • ZRA Compliance: Monitoring TPIN and NRC validation rates and failure causes.
  • Fraud Detection: Real-time fraud score distribution, blocked transaction trends, review queue depth.
  • Delivery Tracking: GPS update frequency, geofence trigger accuracy, ETA accuracy.
  • Cache Performance: Redis hit/miss ratios, memory usage, eviction rates.

2. Troubleshooting with Logs

All logs are structured as JSON and include traceId correlation.

2.1 Log Format (pino)

{
"level": 30,
"time": 1716374400000,
"pid": 1234,
"hostname": "pakashop-prod-01",
"service": "pakashop-backend",
"traceId": "abc123-def456",
"spanId": "span789",
"msg": "Payment initiation started",
"orderId": "ord_123",
"gateway": "PAWAPAY",
"phone": "+26097*****56",
"duration_ms": 45
}

2.2 Example Queries

Find all logs for a specific order:

journalctl -u pakashop-backend.service --since "1 hour ago" | jq 'select(.orderId == "ord_123")'

View all errors for ZRA validation:

journalctl -u pakashop-invoicing.service --since "24 hours ago" | jq 'select(.level >= 50 and .spanName == "zra.validate")'

Trace a single transaction across services:

# Find traceId from backend logs
TRACE_ID="abc123-def456"
journalctl -u pakashop-backend.service | jq "select(.traceId == \"$TRACE_ID\")"
journalctl -u pakashop-tracking.service | jq "select(.traceId == \"$TRACE_ID\")"
journalctl -u pakashop-notifications.service | jq "select(.traceId == \"$TRACE_ID\")"

Find slow database queries:

journalctl -u pakashop-backend.service | jq 'select(.duration_ms > 500 and .spanName == "db.query")'

3. Interpreting Traces

Custom spans have been added for critical flows across all services:

Span NameServiceDescription
checkout.completebackendEnd-to-end checkout process
payment.processbackendGateway communication time
payment.webhookbackendWebhook processing time
zra.validateinvoicingZRA compliance API response times
zra.transmitinvoicingVSDC transmission time
fraud.evaluatefraudFraud rule evaluation time
search.querysearchMeilisearch query time
tracking.locationtrackingGPS processing + Kalman filter
moderation.analyzemoderationSightengine API call time
recommendations.generaterecommendationsCollaborative filtering computation
report.generatereportsPDF/CSV generation time
settlement.batchsettlementBatch payout processing time

Slow checkout? Look at the payment.process child span to see if the gateway is the bottleneck.

High fraud false positives? Check the fraud.evaluate span for rule execution times and thresholds.


4. Custom Spans & Events

4.1 Adding a New Span (Node.js)

const { withSpan } = require('./lib/tracing');

await withSpan('my.operation', {
'order.id': orderId,
'user.role': userRole
}, async (span) => {
// your logic
span.setAttribute('custom.metric', value);
});

4.2 Adding a New Span (Go)

import "go.opentelemetry.io/otel"

ctx, span := otel.Tracer("pakashop-search").Start(ctx, "search.query")
defer span.End()
span.SetAttributes(
attribute.String("search.query", query),
attribute.Int("search.results", len(results)),
)

4.3 Adding a New Span (Python)

from opentelemetry import trace

tracer = trace.get_tracer("pakashop-moderation")

with tracer.start_as_current_span("moderation.analyze") as span:
span.set_attribute("asset.id", asset_id)
span.set_attribute("asset.type", asset_type)
# moderation logic

5. Alert Response Runbook

AlertImpactAction
payment_error_rate > 5%Customers cannot payCheck payment.process logs. Verify PawaPay/Flutterwave API status. Check fraud service for false positives.
zra_validation_failures > 10Shops cannot be approvedCheck zra.validate spans. Verify ZRA credentials and API availability.
api_p95_latency > 500msSlow user experienceCheck db_query_duration and high-latency spans. Review Redis cache hit rates.
unhandled_exceptionPotential platform crashUse traceId to identify the failing request and environment.
fraud_queue_backlog > 100Fraud reviews delayedScale fraud service workers. Check rule thresholds.
tracking_ws_disconnects > 50/minBuyers lose live trackingCheck tracking service health. Review Redis Pub/Sub connection pool.
moderation_retry_exhausted > 10Images unmoderatedCheck Sightengine API status. Review moderation service health.
redis_memory > 80%Cache eviction riskReview cache TTLs. Flush stale keys. Consider scaling Redis instance.
meilisearch_index_lag > 60sSearch results staleCheck Meilisearch task queue. Restart Meilisearch if necessary.
settlement_batch_failures > 3Vendors not paidCheck settlement service logs. Verify PawaPay/Flutterwave payout API status.

6. Rollback

To disable observability without code changes, set:

ENABLE_OBSERVABILITY=false

in your environment variables. This disables Middleware.io export while preserving local pino logging.


For internal use only. Do not distribute outside Pakashop engineering.