Commit Graph

57 Commits

Author SHA1 Message Date
Eason Zhao
9ece101d8c feat: alert manager set-up for all services 2025-10-20 21:31:25 -04:00
Nicolas
ec93fc987c feat(opentelemetry): add additional Kubernetes labels and enhance logging attributes 2025-09-28 09:07:42 +08:00
Nicolas
921c9f4e10 feat(config): add ENVIRONMENT variable to central-storage-config.yaml for enhanced configuration management 2025-09-26 15:55:23 +08:00
Nicolas
beb355249a fix: update centralStorage OpenTelemetry configuration
- Change start_at from 'end' to 'beginning' for complete log history
- Fix transform configuration to match authentication service
- Add k8s_cluster receiver to collect container stdout logs
- Remove problematic json_parser operator
- Ensure consistent log processing across services
2025-09-23 08:58:01 +08:00
Nicolas
cd9f42e143 Changed the secret configuration of central storage 2025-08-18 17:38:47 +08:00
Nicolas
69a2c112d1 feat(centralStorage): migrate alpha environment to use Azure Key Vault for sensitive data
- Add FreeleapsSecret configuration for Azure Key Vault integration
- Move sensitive data (mongodbUri, azureStorageDocumentApiKey, azureStorageDocumentApiEndpoint) from config to secrets
- Update deployment template to read from both config and FreeleapsSecret
- Comment out sensitive fields in central-storage-config.yaml
- Create freeleapssecret.yaml template for secret management
2025-08-18 16:24:11 +08:00
Nicolas
aa74f6a4f7 fix: remove invalid k8scluster receiver to resolve OpenTelemetry startup error
- Remove k8scluster receiver (invalid type name)
- Remove unused otlp receiver definitions
- Keep only filelog receiver which is actually used in pipeline
- Fixes CrashLoopBackOff issue in central-storage pod
2025-08-01 07:52:37 +08:00
Nicolas
d85f9408e4 fix: change Chinese comments to English in OpenTelemetry configs
- Replace Chinese comments with English in opentelemetry.yaml files
- Maintain consistent English-only comments in freeleaps-ops repository
- Keep the same functionality while improving code readability
2025-07-31 23:17:45 +08:00
Nicolas
5101e6d2cd fix: optimize OpenTelemetry configuration to prevent duplicate log collection
- Change receivers from [filelog, otlp, k8scluster] to [filelog] only
- Prevent duplicate logs from multiple collection sources
- Keep only filelog receiver to collect logs from application log files
- This eliminates duplicate logs appearing in Grafana for devsvc and central-storage
2025-07-31 23:08:10 +08:00
Nicolas
cc73ad92a9 feat: add container logs collection to OpenTelemetry config for central storage 2025-07-31 11:27:28 +08:00
Nicolas
da7e43547f fix: use hasKey to safely check for logIngest.persistence existence 2025-07-31 10:21:50 +08:00
Nicolas
85fa39f8e2 fix: add default values for logIngest.persistence to prevent nil pointer errors 2025-07-31 10:18:47 +08:00
Nicolas
7a7fcf1398 fix: optimize central storage logging configuration to resolve hourly log bursts
- Change start_at from beginning to end to avoid historical log duplication
- Add poll_interval: 1s for real-time file monitoring
- Optimize batch processing: send_batch_size: 1, timeout: 1s
- Add PVC template for log persistence to reduce log loss risk
- Update deployment to support persistent volume for logs
2025-07-31 10:14:55 +08:00
Nicolas
f9c2c7e696 Change OTC start_at to beginning to read existing logs 2025-07-29 18:04:53 +08:00
Nicolas
401a2166cd Simplify OTC config: remove timestamp parsing to fix startup issues 2025-07-29 17:57:30 +08:00
Nicolas
7ae5966da1 Fix timestamp parsing: use move operator to extract timestamp from attributes 2025-07-29 17:54:25 +08:00
Nicolas
12a7f1b98c Fix timestamp parsing in OTC config for central-storage logs 2025-07-29 17:40:57 +08:00
Nicolas
c4bd3df730 Fix OTLP exporter config: remove unsupported labels options 2025-07-29 17:22:40 +08:00
Nicolas
18eb4df3fb Revert to OTLP config to fix OTC startup - will investigate Loki config later 2025-07-29 17:16:46 +08:00
Nicolas
d234716697 Comment out Loki labels config to fix OTC startup - can be enabled later 2025-07-29 17:13:01 +08:00
Nicolas
b1c9f59cff Simplify Loki exporter configuration to fix OTC startup issues 2025-07-29 17:08:32 +08:00
Nicolas
a1d5652efb Fix Loki exporter configuration: remove invalid keys 2025-07-29 17:05:24 +08:00
Nicolas
b7b6ddcae4 Fix central-storage logging: change OTLP exporter to Loki HTTP exporter 2025-07-29 16:47:47 +08:00
Nicolas
acd5379c27 fix: restore json_parser with timestamp parsing based on local testing 2025-07-29 15:40:26 +08:00
Nicolas
beab7cd117 fix: remove all parsing operators to use raw log content 2025-07-29 15:29:53 +08:00
Nicolas
37db3b4404 fix: remove time_parser to use default timestamp handling 2025-07-29 14:46:13 +08:00
Nicolas
5479b85968 fix: separate timestamp parsing from json_parser to resolve time parsing error 2025-07-29 14:42:29 +08:00
Nicolas
aa8e626b7a fix: remove problematic ParseJSON operations from OpenTelemetry transform processor 2025-07-29 14:30:00 +08:00
Nicolas
22ab4b99ef fix: simplify OpenTelemetry Collector config to resolve timestamp parsing errors 2025-07-29 14:07:39 +08:00
Nicolas
5ced605a20 fix: improve OpenTelemetry Collector JSON parsing for central storage logs 2025-07-29 10:15:39 +08:00
Nicolas
605e66a26b fix: add missing logging environment variables for central storage K8s deployment
- Add LOG_BASE_PATH environment variable
- Add BACKEND_LOG_FILE_NAME environment variable
- Add APPLICATION_ACTIVITY_LOG environment variable
- Fix logging path configuration for K8s environment
2025-07-29 09:36:01 +08:00
Nicolas
8fc5281e23 feat: add DEBUG_MODE environment variable for central-storage 2025-07-28 21:00:33 +08:00
Nicolas
381d25c11b chore(logging): update logPathPattern to /var/log/pods/*/*/*.log for k8s standard log collection 2025-07-25 11:05:28 +08:00
zhenyus
3aad06eeb7 feat: add Vertical Pod Autoscaler (VPA) configuration across multiple services for improved resource management
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-06-10 23:53:31 +08:00
zhenyus
6d9d15d4d2 Add OpenTelemetry logging support across multiple services
- Introduced log ingestion configuration in values files for centralStorage, content, notification, and payment services.
- Updated deployment templates to conditionally include OpenTelemetry annotations and volume mounts based on log ingestion settings.
- Added OpenTelemetry RBAC configurations for service accounts and cluster roles to enable logging.
- Implemented OpenTelemetry collector configuration to process logs and export them to Loki.
- Ensured compatibility with existing Helm chart structure and maintained backward compatibility for services without log ingestion enabled.

Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-04-21 22:03:00 +08:00
zhenyus
b3ad25d6e5 fix: update config file paths in config-checksum annotations for deployment templates
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-27 14:51:41 +08:00
zhenyus
df2ab1c3a4 fix: correct config file path in config-checksum annotations for deployment templates
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-27 14:50:08 +08:00
zhenyus
580f3f8d71 feat: add config checksum annotations to deployment templates and update site URL in values files
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-27 14:48:25 +08:00
zhenyus
a90ee717b2 fix: update Prometheus query intervals from 30s to 1m for improved accuracy
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-18 05:59:05 +08:00
zhenyus
336f1fa0e2 fix: remove hardcoded uid values in dashboard templates for consistency
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-18 00:26:16 +08:00
zhenyus
a1ab488cbf fix: update dashboard file names to use dynamic values for consistency
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-18 00:25:04 +08:00
zhenyus
a6803210e0 feat: add dashboard configuration for content, payment, notification, and central storage services
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-18 00:22:06 +08:00
zhenyus
56094aa1bd fix: remove version and managed-by labels from service monitor templates for consistency
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:53:49 +08:00
zhenyus
e8a07a08e6 feat: add metricsEnabled and probesEnabled configuration options to payment, content, centralStorage, authentication, and notification services
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:51:35 +08:00
zhenyus
96ab638756 fix: remove serviceMonitor.enabled variable from service templates in authentication, centralStorage, content, notification, and payment services
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:47:29 +08:00
zhenyus
e7bb64108c fix: rename service monitor resources to use a consistent naming convention
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:45:47 +08:00
zhenyus
bb2bebc164 fix: update service monitor templates to use service-specific values for namespace, labels, interval, and scrapeTimeout
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:45:16 +08:00
zhenyus
c9cfa0827e fix: update service monitor templates to check individual serviceMonitor.enabled values
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:42:46 +08:00
zhenyus
32198e2f9a fix: remove port definitions from service monitor configurations
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:36:46 +08:00
zhenyus
5c9f74c609 fix: add name label to service monitor configuration
Signed-off-by: zhenyus <zhenyus@mathmast.com>
2025-03-17 23:26:34 +08:00