Disk Capacity Planning for Whisper / Graphite

Solution 1:

whisper-info.py gives you a lot of insight into what and how each file is aggregated, including the file's size.

However it's only useful for existing whisper files.

When you want to see predictive sizing of a schema before putting it in place, try a Whisper Calculator, such as the one available at https://gist.github.com/jjmaestro/5774063

EDIT:

When asked for an example...

storage_schema:

{
    :catchall => {
      :priority   => "100",
      :pattern    => "^\.*",
      :retentions => "1m:31d,15m:1y,1h:5y"
    }
}

Looking at my file applied-in-last-hour.wsp, ls -l yields

-rwxr-xr-x 1 root root 4415092 Sep 16 08:26 applied-in-last-hour.wsp

and whisper-info.py ./applied-in-last-hour.wsp yields

maxRetention: 157680000
xFilesFactor: 0.300000011921
aggregationMethod: average
fileSize: 4415092

Archive 0
retention: 604800
secondsPerPoint: 10
points: 60480
size: 725760
offset: 52

Archive 1
retention: 2678400
secondsPerPoint: 60
points: 44640
size: 535680
offset: 725812

Archive 2
retention: 157680000
secondsPerPoint: 600
points: 262800
size: 3153600
offset: 1261492

So, basically you combine your hosts per retention-match per retention-period-segment per stat, multiply by a factor of systems that you intend to apply this too, factor in the number of new stats that you're going to track. Then you take whatever amount of storage that is and at least double it (because we're buying storage, and we know we'll use it...)

Solution 2:

In the documentation for statsd they give an example for a data retention policy.

The retentions are 10s:6h,1min:7d,10min:5y which is 2160 + 10080 + 262800 = 275040 data points and they give an archive size of 3.2 MiB.

Assuming a linear relationship, this would be approximately 12.2 Bytes per data point.