We are in the process of deploying COS into production and are now looking for some information about tuning the data ingestion/retention with Loki in COS.
We have about 30TB of data in a microceph which need to tune ingestion of data to match to metrics and logs. But we also have a significant amount of ingesting units.
Our question is:
What can we do to affect the amount of logs that we keep or collect since we don’t have infinite of storage?
For example.
Can we set a max capacity level for logs - in lets say X TB?
Can we filter at the logging source effectively not ship at all?
Can we compress data?
Can we filter our logs in Loki? Eg. removing data?
etc.?
Note: This is likely something that anyone in the process of deploying COS into production would have to deal with. So, the question is kind of in the format that it should go into some upstream documentation for COS stacks itself.
Thanx back for a fantastic COS stack. We hope to get this running in a near future in production. Our journey here has been very long since we have had to battle multiple learning and design phases. @0x12b has been following us with this and I hope to some day be able to let you in on the various challenges, but also benefits of getting here.
We will study up on your suggestions, but also I think this needs to go into a operational manual or deployment guide for others that might follow in these footsteps.
The level of isolation depends on what you’re measuring. If you only want one workload’s logs, then it should be the only one related to loki. An alternative option is to relate the same workload to two different loki instances: one that is part of COS and within your complete context, and another loki that is dedicated to the measurement.
The original diagram is a relation diagram: the lines mean a juju relation. In your diagram, it looks like the lines are HTTP connections?
In any case, it would be interesting to compare results from the following two loki instances: