Measurement infrastructure

Doing measurements is an important part of any system evaluation. Back in the old times measurements in ProtoPeer were done in a very ad hoc way by dumping them to logs here and there and then grepping, cutting and sorting those logs to get the final results. Over time we noticed several repeating measurement patterns and ways in which measurement data was aggregated. The ProtoPeer's measurement infrastructure implements these common patterns and allows the measurement to be done in a more uniform, systematic way.

Basics

The philosophy behind the ProtoPeer's measurement API is the same as behind log4j: to be able to log from any place in the code. Instead of logging text messages like in log4j, the measurement API is for logging doubles. Once these numbers are logged they can sliced and diced and aggregated in many different ways. In particular, there are ways of aggregating the measurements from the whole system and for looking at how the measured values change over time.

--- insert a picture here illustrating the tags ---

Each measurement is associated with a set of tags. The tags describe the measured values. A tag can be anything, any Java Object that has a properly defined equals and hash methods and can be reliably compared with other tags. Probably the simplest tag is a String or an enum indicating the type of a measured value. Despite its simplicity, tagging is a versatile mechanism and we'll later show how it can be used for a wide range of complex measurements.

The measured values are not individually timestamped, but instead the time is quantized into equally-sized time windows - the measurement epochs. Epoch are indexed by integers. In most practical scenarios the measurement epoch is on the order of tens of seconds. The measurement epoch duration as well as the beginning of epoch zero is configurable. Epochs are like buckets for measurement values; the final measurement log can be queried for the aggregates of measured values epoch-by-epoch to observe how the measured values changed over time.

The measurement API sits in the protopeer.measurement package. The most important classes are:

Measurement loggers

Just like in log4j, to be able to log you need to get an instance of a logger, called the MeasurementLogger in our case. In ProtoPeer there are two places where measurement loggers can be obtained from. Each Peer has its own instance of a MeasurementLogger, which can be obtained by calling getPeer().getMeasurementLogger() in one of the peerlet's methods. There is also the root measurement logger stored in the experiment singleton, which can be obtained from any place in the code by calling Experiment.getSingleton().getRootMeasurementLogger(). We'll explain the relationship between these loggers later, for now you just need to know how to get an instance of a measurement logger.

So, what can you do with a measurement logger? The most important methods of the MeasurementLogger class are:

public class MeasurementLogger {
        /* logging */
        public void log(Object tag, double value);
        public void log(Object tag1, Object tag2, double value);
        public void log(Object tag1, Object tag2, Object tag3, double value);
        public void logTagSet(Set<Object> tags, double value);

        /* listeners */
        public void addMeasurementLoggerListener(MeasurementLoggerListener listener);
        public void removeMeasurmentLoggerListener(MeasurementLoggerListener listener);

        /* underlying the measurement log */
        public MeasurementLog getMeasurementLog();
}

There are four log methods that take different numbers of tags that can describe the logged value. The tags are treated as a set, i.e. their order doesn't matter and if the same tag is specified twice it will appear only once in the tag set.

Measurement logger fires an event at the end of every measurement epoch. This event can be handled by adding a MeasurementLoggerListener:

public interface MeasurementLoggerListener {
        public void measurementEpochEnded(MeasurementLog log, int epochNumber);
}

Each measurement logger has an underlying MeasurementLog which act as a "database" for the measurements and can be accessed by calling the getMeasurementLog().

Measurement logs

The measurements logged via the MeasurementLogger have to be stored somewhere. This is handled by the MeasurementLog class. The measurement log serves as a "database" for the measured values and can be "queried" in different ways. The result of a query is an Aggregate representing a set of measured values.

Accessing the log

All the measurements logged within one experiment are accessible via the root measurement log stored in the singleton Experiment. At the end of every measurement epoch the measurement logs from all the peers in the experiment are merged into the root log. During simulation the log returned by the Experiment.getSingleton().getRootMeasurementLog() call contains all the measurements from the whole system (see IntroTutorial for the basic example on how to access the log and dump the measurements). During live runs measurement aggregation is done a bit differently, since there are several Experiment instances running on different machines (this topic is covered in here).

Querying the log

The MeasurementLog class has several methods:

public class MeasurementLog implements Serializable, Cloneable {

        /* READ */
        /* get aggregates for a specific epoch number */
        public Aggregate getAggregateByEpochNumber(int epochNumber, Object tag);
        public Aggregate getAggregateByEpochNumber(int epochNumber, Object tag1, Object tag2);
        public Aggregate getAggregateByEpochNumber(int epochNumber, Object tag1, Object tag2, Object tag3);
        public Aggregate getAggregateForTagSetByEpochNumber(int epochNumber, Set<Object> tags);

        /* get aggregates aggregated over the whole time */
        public Aggregate getAggregate(Object tag);
        public Aggregate getAggregate(Object tag1, Object tag2);
        public Aggregate getAggregate(Object tag1, Object tag2, Object tag3);
        public Aggregate getAggregateForTagSet(Set<Object> tags);

        /* WRITE */
        /* merging in the values from other logs */
        public void mergeWith(MeasurementLog otherLog);
        public void mergeWith(MeasurementLog otherLog, int epochNumber);

        /* adding the measurements to the log */
        public void log(int epochNumber, Object tag, double value);
        public void log(int epochNumber, Object tag1, Object tag2, double value);
        public void log(int epochNumber, Object tag1, Object tag2, Object tag3, double value);
        public void logTagSet(int epochNumber, Set<Object> tags, double value);
}

The getAggregate* methods return the aggregates of values that were logged with a specific set of tags. To be returned in an aggregate, the value must have been logged with exactly the same set of tags, not a subset, not a superset. There are methods for getting aggregates spanning the whole time as well as methods for getting aggregates spanning a specific measurement epoch number.

Values from one measurement log can be merged into another by calling the mergeWith method, this can be done for the whole log as well as for a single epoch number.

Normally, the values are added to the measurement log via the MesurementLogger which keeps track of the transitions between the measurement epochs. But if need arises, the values can also be added to the MeasurementLog directly by specifying the epoch number.

Aggregates

So, what are these Aggregates returned by the MeasurementLog? Each aggregate represents a set of measured values:

public class Aggregate {
        /* statistics */
        public int getNumValues();
        public double getAverage();
        public double getMin();
        public double getMax();
        public double getSum();
        public double getMedian();
        public double getPercentile(double percentile);

        /* raw values */
        public Collection<Dobuble> getValues();

        /* mutating methods */
        public void addValue(double value);
        public synchronized void mergeWith(Aggregate otherAggregate);
}

There are several methods for getting the basic statistics about the aggregate, such as average, min, max, sum etc. It's also possible to get a collection of all the raw values that this aggregate holds (this must be separately enabled, see below).

There are also two methods that mutate the aggregates: for adding new values and for merging in the values from another aggregate. The aggregate merge can be used for deriving new interesting statistics from other aggregates returned by the measurement log queries. It is OK to mutate the Aggregates returned by the MeasurementLog.

Are the values actually stored?

By default, the aggregate will not store the values. In that case, getValues() will return null. Even though the values are not stored, it is still possible to compute the statistics; they are computed on-the-fly as the aggregates are merged and new values are added to them. This substantially decreases the memory footprint of the measurement infrastructure.

There are only two statistics which can't be computed without storing the raw values: getMedian() and getPercentile(), it's difficult (impossible?) to compute them efficiently on-the-fly. The measurement infrastructure can be told for which measurement tags to store the raw values and for which not.

Measurement configuration

Each measurement can be enabled or disabled on a per-tag basis. You can also control for which tags the aggregates should be storing the raw values. Configuration is done through the conf/measurement.conf file, more details here.

Examples

Scenario 1: Simple message counting

Assume you would like to count the number of messages received by the peer. The peerlet code would look something like this:

public class MyPeerlet extends BasePeerlet {
        public void handleIncomingMessage(Message message) {
                getPeer().getMeasurementLogger().log("message_count",1);
                //...handle the messages...
        }
}

This code uses the String "message_count" to tag the measurement. We simply log 1 every time we see an incoming message. To get the current count you could call getPeer().getMeasurementLogger().getMeasurementLog().getAggregate("message_count").getSum(), though this is not a typical way of calling get aggregate. More often, the system-wide aggregates are more interesting. All the measurement logs from all the peers are merged into the root log at the end of every measurement epoch. The current system-wide message count can be accessed by calling Experiment.getSingleton().getRootMeasurmentLog().getAggregate("message_count").getSum() from anywhere in the code.

What if you wanted to track how the number of messages received changes over time? You can query the messages by measurement epochs. After the simulation finishes you can dump the values like this:

MeasurementLog mlog=Experiment.getRootMeasurementLog();
//iterate over all the epochs, going forward in time
for (int i=mlog.getMinEpochNumber; i<=mlog.getMaxEpochNumber(); i++) {
        int count=mlog.getAggregate(i,"message_count").getSum();
        int timeElapsed=i*MainConfiguration.getSingleton().measurementEpochDuration;
        System.out.println(timeElapsed+"\t"+count);
}

We have used the String as a tag, this has several problems, for starters you can missspell the string and the code is more difficult to refactor. A much better way of tagging your measurements is by using Java enums:

public enum MyTag {
      MESSAGE_COUNT
}

Then the logging call would look like this:

        getPeer().getMeasurementLogger().log(MyTag.MESSAGE_COUNT,1);

Scenario 2: Logging by peer type

Assume that you have several types of peers in the system running different types of algorithms that you are testing. You are interested in counting messages received by all the peers that are of the same type. Assume that the types are enumerated in enum PeerType and a method getPeerType() returns the type of the peer. You can simply use the peer type as a tag:

getPeer().getMeasurementLogger().log(getPeerType(),MyTags.MESSAGE_COUNT,1);

If you are for example interested in the message count for some peer type ALPHA then you can obtain it from the log by calling Experiment.getRootMeasurementLog().getAggregate(PeerType.ALPHA, MyTags.MESSAGE_COUNT).getSum(). This call: Experiment.getRootMeasurementLog().getAggregate(MyTags.MESSAGE_COUNT, PeerType.ALPHA).getSum() would return the same result; tags are treated as sets and the ordering of the arguments doesn't matter.

MeasurementInfrastructure (last edited 2009-02-23 19:12:46 by key1)