|
1 | 1 | # Running the performance benchmark |
2 | 2 |
|
3 | | -## Required Files |
4 | | -### CIFAR10 Python Dataset for SKLearn Model |
5 | | -This benchmark serves an SKLearn model that depends on the CIFAR10 Python dataset for training. In order to train the model with this dataset, it must be converted to CSV format. The [CIFAR10 python download utility](https://github.com/ucbrise/clipper/blob/develop/examples/tutorial/download_cifar.py) can be used to obtain the required CSV-formatted dataset. |
| 3 | +We have provided a script to benchmark Clipper's performance. The tool bypasses frontend REST APIs, but beyond that, it should test Clipper end to end. To make use of it, you need to: |
| 4 | + |
| 5 | +* Download the CIFAR dataset |
| 6 | +* Get a model-container up and running (or use one of our predefined scripts) |
| 7 | +* Have a Redis server running for Clipper to connect to |
| 8 | +* Define your desired benchmarking parameters |
| 9 | +* Run the tool |
| 10 | + |
| 11 | +This document goes over details on how to use the tool. |
| 12 | + |
| 13 | +## Download dataset(s) |
| 14 | + |
| 15 | +### Download CIFAR10 Binary Dataset for Query Execution |
| 16 | +The C++ benchmark works by sending CIFAR10 query vectors to the container serving the trained SKLearn model. To achieve this, the **binary dataset** is required. It can also be obtained from [https://www.cs.toronto.edu/~kriz/cifar.html](https://www.cs.toronto.edu/~kriz/cifar.html). You can click [here](https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz) to download directly. |
| 17 | + |
| 18 | +You'll want to unzip the .tar.gz file manually. |
| 19 | + |
| 20 | + |
| 21 | +### (Optional) Download CIFAR10 Python Dataset for SKLearn Model |
| 22 | +You can skip this section if you do not intend to use our SKLearn Model for benchmarking. |
| 23 | + |
| 24 | +Our SKLearn model depends on the CIFAR10 Python dataset for training. In order to train the model with this dataset, it must be converted to CSV format. The [CIFAR10 python download utility](../examples/tutorial/download_cifar.py) can be used to obtain the required CSV-formatted dataset. |
6 | 25 |
|
7 | 26 | ```sh |
8 | 27 | ../examples/tutorial/download_cifar.py /path/to/save/dataset |
9 | 28 | ``` |
10 | 29 |
|
11 | | -### CIFAR10 Binary Dataset for Query Execution |
12 | | -The C++ benchmark works by sending CIFAR10 query vectors to the container serving the trained SKLearn model. To achieve this, the **binary dataset** is required. It can also be obtained from [https://www.cs.toronto.edu/~kriz/cifar.html](https://www.cs.toronto.edu/~kriz/cifar.html). |
13 | 30 |
|
14 | | -## Optional Configuration Files |
15 | | -The following benchmark attributes can be loaded via a JSON configuration file: |
16 | | -- **cifar_data_path**: The path to a **specific binary data file** within the CIFAR10 binary dataset with a name of the form `data_batch_<n>.bin`. (`data_batch_1.bin`, for example) |
17 | | -- **num_threads**: The number of threads of execution |
18 | | -- **num_batches**: The number of batches of requests to be sent by each thread |
19 | | -- **batch_size**: The number of requests to be sent in each batch |
20 | | -- **batch_delay_millis**: The per-thread delay between batches, in milliseconds |
| 31 | +## Deploy a model container to query |
| 32 | + |
| 33 | +The benchmarking tool will send requests through Clipper to a model. You'll want to deploy this model in a container that implements Clipper's Model-Container RPC interface. |
| 34 | + |
| 35 | +You can also use one of our predefined model-container scripts. Steps detailing how to do can be found below in this section. |
| 36 | + |
| 37 | +When using these scripts you'll have the ability to set `model_name`, `model_version`, and `clipper_ip` variables. All three are needed for the benchmarking tool's Clipper instance to connect to your model-container. |
| 38 | + |
| 39 | +- `model_version` and `model_name` should match values defined in your JSON configuration file (discussed in the next section) |
| 40 | +- If your model-container will be running on a different host than the benchmarking script, `clipper_ip` should be set to the IP address of the benchmarking script's host. Otherwise, it can be set to`"localhost"`. |
| 41 | + |
| 42 | +- If you want to run the model outside of a Docker container (`model_version` is `1` and `clipper_ip` is `localhost` by default): |
| 43 | + |
| 44 | + - [`bench/setup_noop_bench.sh`](https://github.com/ucbrise/clipper/tree/develop/bench/setup_noop_bench.sh) runs the [`noop_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/noop_container.py). The default `model_name` is `"bench_noop"`. |
| 45 | + |
| 46 | + ```./bench/setup_noop_bench.sh [[<model_name> <model_version> [<clipper_ip>]]] ``` |
| 47 | + |
| 48 | + - [`bench/setup_sum_bench.sh`](https://github.com/ucbrise/clipper/tree/develop/bench/setup_sum_bench.sh) runs the [`sum_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/sum_container.py). The default `model_name` is `"bench_sum"`. |
| 49 | + |
| 50 | + ```./bench/setup_sum_bench.sh [[<model_name> <model_version> [<clipper_ip>]]] ``` |
| 51 | + |
| 52 | + - [`bench/setup_sklearn_bench.sh`](https://github.com/ucbrise/clipper/tree/develop/bench/setup_sklearn_bench.sh) runs the [`sklearn_cifar_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/sklearn_cifar_container). If you wish to use this option, remember to download the CIFAR10 python dataset first. The default `model_name` is `"bench_sklearn_cifar"`. |
| 53 | + |
| 54 | + ```./bench/setup_sklearn_bench.sh <path_to_cifar_python_dataset> [[<model_name> <model_version> [<clipper_ip>]]]``` |
| 55 | + |
| 56 | + Note that `<path_to_cifar_python_dataset>` should be the path to the **directory** containing a parsed CIFAR10 CSV data file with name `cifar_train.data`. |
| 57 | + |
| 58 | +- If you want to deploy the model in a Docker container, you'll need to build the model container Docker image (`model_version` is `1` by default): |
| 59 | + - Create the Docker images for the [`noop_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/noop_container.py) and [`sum_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/sum_container.py). The default names for each model are the same as the ones listed above. |
| 60 | + |
| 61 | + ```./bench/build_bench_docker_images.sh``` |
| 62 | + |
| 63 | + - Run the container on the same host that the benchmarking tool will be run from: |
| 64 | + |
| 65 | + ```docker run [-e MODEL_NAME=<nondefault_model_name>] [-e MODEL_VERSION=<nondefault_model_version>] [-e IP=<non-aws-clipper-ip>] <image_id>``` |
| 66 | + |
| 67 | + If you do not supply an `IP` environment variable in your `docker run ...` command, our script will assume you are on an AWS instance and attempt to grab its IP address. |
| 68 | + |
| 69 | + |
| 70 | +## Define your benchmarking parameters |
21 | 71 |
|
22 | | -To configure these attributes, create a JSON file with the following format and specify its path when the benchmark is executed (see **Steps of Execution** below). |
| 72 | +Create a JSON configuration file that specifies values for the following parameters: |
| 73 | + |
| 74 | +- **cifar\_data_path**: The path to a *specific binary data file* within the CIFAR10 binary dataset with a name of the form `data_batch_<n>.bin`. For example, `<path_to_unzipped_cifar_directory>/data_batch_1.bin`. |
| 75 | +- **num_threads**: The number of threads used to send benchmark requests. This should be a positive integer. Each of these threads will access its own copy of the data and will send requests according to *num_batches*, *request\_batch_size*, *request\_batch\_delay_micros*, *poisson_delay*, and *prevent\_cache_hits*, all described below. |
| 76 | +- **num_batches**: The total number of request batches to be sent by each thread. |
| 77 | +- **request\_batch_size**: The number of requests to be sent in each batch. |
| 78 | +- **request\_batch\_delay_micros**: The per-thread delay between batches, in microseconds. *request\_batch\_delay_micros* and *request\_batch_size* together determine the burstiness of your supplied workload. |
| 79 | +- **poisson_delay**: `"true"` if you wish for the delays between request batches to be drawn from a poisson distribution with mean `request_batch_delay_micros`. `"false"` if you wish for the delay between request batches to be uniform. |
| 80 | +- **prevent\_cache_hits**: `"true"` if you wish for the script to modify datapoints (possibly at the expense of prediction accuracy) in order to prevent hitting Clipper's internal prediction cache. `"false"` otherwise. |
| 81 | +- **latency_objective**: The latency objective for the app that will be created, in microseconds |
| 82 | +- **benchmark\_report_path**: Path to the file in which you want your benchmarking reports saved |
| 83 | +- **report\_delay_seconds**: The delay between each flush of benchmarking metrics to your reports file, in seconds. At each flush, the metrics will reset. If you set the value of this field to `-1`, the metrics will only flush once all threads sending benchmarking requests have terminated. |
| 84 | +- **model_name**: The name of the model Clipper should connect to. Note that this must be the same as the model name your model-container uses. |
| 85 | +- **model_version**: Your model's version. Again, this must be the same version that your model-container uses. |
| 86 | + |
| 87 | +Your JSON config file should look like: |
23 | 88 |
|
24 | 89 | ``` |
25 | 90 | { |
26 | 91 | "cifar_data_path":"<cifar_data_path>", |
27 | 92 | "num_threads":"<num_threads>", |
28 | 93 | "num_batches":"<num_batches>", |
29 | | - "batch_size":"<batch_size>", |
30 | | - "batch_delay_millis":"<batch_delay_millis>" |
| 94 | + "request_batch_size":"<request_batch_size>", |
| 95 | + "request_batch_delay_micros":"<request_batch_delay_micros>", |
| 96 | + "poisson_delay":"<true/false>", |
| 97 | + "prevent_cache_hits":"<true/false>", |
| 98 | + "latency_objective":"<latency_objective>", |
| 99 | + "benchmark_report_path":"<benchmark_report_path>", |
| 100 | + "report_delay_seconds":"<report_delay_seconds>", |
| 101 | + "model_name":"<model_name>", |
| 102 | + "model_version":"<model_version>" |
31 | 103 | } |
32 | 104 | ``` |
33 | 105 |
|
34 | | -If a configuration file is not specified, the benchmark will prompt you for the values of these attributes at runtime. |
| 106 | +We have provided a template for your config file: [config.json.template](./config.json.template). |
| 107 | + |
| 108 | +## Run the benchmarking tool |
| 109 | +**These instructions are given relative to the main clipper source directory.** |
35 | 110 |
|
36 | | -## Steps of Execution |
37 | | -**These steps are given relative to the current directory.** |
| 111 | +1. Build the Clipper source for release: |
| 112 | +`./configure --release && cd release && make generic_bench` |
38 | 113 |
|
39 | | -1. Execute the following: |
40 | | - ```sh |
41 | | - ./setup_bench.sh <path_to_cifar_python_dataset> |
42 | | - ``` |
43 | | -where `<path_to_cifar_python_dataset>` is the path to the **directory** containing a parsed CIFAR10 CSV data file with name `cifar_train.data`. |
| 114 | +2. Confirm you have a Redis server running. If you do not, run `redis-server `or `docker run -d -p 6379:6379 redis:alpine` |
44 | 115 |
|
45 | | -2. Execute the following: |
46 | | - ```sh |
47 | | - cd .. && ./configure --release && cd release |
48 | | - make end_to_end_bench |
49 | | - ``` |
| 116 | +3. Confirm your model-container is running. If it is not, follow the instructions in the "Deploy up a model container for querying" section. |
| 117 | + |
| 118 | +4. Run the benchmark: `./release/src/benchmarks/generic_bench -f <path_to_your_config_file>` |
50 | 119 |
|
51 | | -3. If you created a JSON configuration file above, execute the following: |
52 | | - ```sh |
53 | | - ./src/benchmarks/end_to_end_bench -f "<path_to_config_.json>" |
54 | | - ``` |
| 120 | +5. Check the logs/output from your model container to confirm it's being queried |
| 121 | + |
| 122 | +6. View the benchmarking reports at your specified **benchmark\_report_path**. |
| 123 | + |
| 124 | +## Interpreting the reports |
| 125 | + |
| 126 | +Metrics from your benchmarking run will be located at |
| 127 | +**benchmark\_report_path**. |
| 128 | + |
| 129 | +The file starts with documentation of the configuration details you supplied: |
| 130 | + |
| 131 | + ---Configuration--- |
| 132 | + "cifar_data_path":"<cifar_data_path>", |
| 133 | + "num_threads":"<num_threads>", |
| 134 | + "num_batches":"<num_batches>", |
| 135 | + "request_batch_size":"<request_batch_size>", |
| 136 | + "request_batch_delay_micros":"<request_batch_delay_micros>", |
| 137 | + "poisson_delay":"<true/false>", |
| 138 | + "prevent_cache_hits":"<true/false>", |
| 139 | + "latency_objective":"<latency_objective>", |
| 140 | + "benchmark_report_path":"<benchmark_report_path>", |
| 141 | + "report_delay_seconds":"<report_delay_seconds>", |
| 142 | + "model_name":"<model_name>", |
| 143 | + "model_version":"<model_version>" |
| 144 | + ------------------- |
| 145 | + |
| 146 | +After, metrics are shared by time window. Recall that each window is of length **report\_delay_seconds** (unless you set it to `-1`, in which case there will only be one window that captures the length of the whole run): |
| 147 | + |
| 148 | + |
| 149 | + <Window 1 start time> - <Window 1 end time>: { |
| 150 | + <Metrics captured in Window 1> |
| 151 | + } |
| 152 | + <Window 2 start time> - <Window 2 end time>: { |
| 153 | + <Metrics captured in Window 1> |
| 154 | + } |
| 155 | + ... |
55 | 156 |
|
56 | | - Otherwise, execute |
57 | | - ```sh |
58 | | - ./src/benchmarks/end_to_end_bench |
59 | | - ``` |
60 | | - and specify the values of the attributes enumerated in the **Optional Configuration Files** section above. |
| 157 | +There are several metrics captured in each window. The notable ones are listed below: |
| 158 | + |
| 159 | +Under `ratio_counters`: |
| 160 | + |
| 161 | +- `bench:cifar_bench:default_prediction_ratio`: The fraction of requests to which Clipper responded with a default prediction (bypassing querying the model). This ratio should only be greater than 0 when requests are sent at a rate faster than the Clipper system and your model can serve them. |
| 162 | +- `model:<model_name>:<model_version>:cache_hit_ratio`: The fraction of requests that were served by cache lookups instead of model-generated predictions. |
| 163 | + |
| 164 | +Under `meters`: |
| 165 | + |
| 166 | +- `model:<model_name>:<model_version>:prediction_throughput`: The rate at which your model-container is serving requests |
| 167 | +- `bench:cifar_bench:request_throughput`: The rate at which the benchmarking tool is sending requests. |
| 168 | +- `bench:cifar_bench:prediction_throughput`: The rate at which the whole Clipper system is serving requests. |
| 169 | + |
| 170 | +Under `histograms` (each of these entries lists the unit of measurement, number of datapoints, min, max, mean, standard deviation, and p50, p95, and p99 values): |
| 171 | + |
| 172 | +- `bench:cifar_bench:prediction_latency`: Statistics on latency measured from when the benchmarking script sends a request to when it receives a response. |
| 173 | +- `model:<model_name>:<model_version>:prediction_latency`: Statistics on latency measured from when Clipper sends datapoints to your model-container to when it receives a response. |
| 174 | +- `model:<model_name>:<model_version>1:batch_size`: Statistics on the batch sizes of datapoints sent to your model-container. |
0 commit comments