This section discusses Peel’s approach of a unified, global experiment environment configuration.
In the Motivation section, we defined an experiment environment as a combination of
- a host environment,
- a set of dependent systems, amongst which a principle system under test, and
- an experiment application.
Environments are instantiated with a concrete set of configuration values (for the systems) and parameter values (for the experiment application). Here is again a figure of the environments created for a series of experiments in our running example.
A number of problems arise with a naïve approach for manual configuration (per system & experiment) of the environments in the above example:
- Syntax Heterogeneity. Each system (HDFS, Spark & Flink) has to be configured separately using its own special syntax. This requires basic understanding and knowledge in the configuration parameters for all systems in the stack.
- Variable Interdependence. The sets of configuration variables associated with each system are not mutually exclusive, and care has to be taken that the corresponding values are consistent for the overlapping fragment (e.g., the slaves list in all systems should be the same).
- Value Tuning. For a series of related experiments, all but a very few set of values remain fixed. These values are suitably chosen based on the underlying host environment characteristics in order to maximise the performance of the corresponding systems (e.g., memory allocation, degree of parallelism, temp paths for spilling).
The naïve approach therefore puts a substantial burden on the person conducting the experiments.
Peel’s approach towards the difficulties outlined above is to associate one global environment configuration to each experiment. In doing this, Peel promotes
- configuration reuse through layering, as well as
- configuration uniformity through a hierarchical syntax (HOCON).
At runtime, experiments are represented by experiment beans. Each experiment bean holds a HOCON config that is first constructed and evaluated based on the layering scheme and conventions discussed below, and then mapped to the various concrete config and parameter files and formats of the systems and applications in the experiment environment.
In our running example, this means that each of the six experiments (3x
SparkWC + 3x
FlinkWC) will have an associated
config property – a hierarchical map of key-value pairs which constitute the configuration of all systems and jobs required for that particular experiment.
The Peel configuration system is built upon the concept of layered construction and resolution. Peel distinguishes between three layers of configuration:
- Default. Default configuration values for Peel itself and the supported systems. Packaged as resources in Peel-related jars located in the bundle’s
- Bundle. Bundle-specific configuration values. Located in
config/hosts. Default is the
config/hostssubfolder of the current bundle.
- Host. Host-specific configuration values. Located in the
$HOSTNAMEsubfolder of the
For each experiment bean defined in an experiment suite, Peel will construct an associated configuration according to the following table entries (higher in the list means lower priority).
||Default Peel config.|
||Default system config.|
||Bundle-specific system config (opt).|
||Host-specific system config (opt).|
||Bundle-specific Peel config (opt).|
||Host-specific Peel config (opt).|
|Experiment bean config value||Experiment specific config (opt).|
|System||JVM system properties (constant).|
First comes the default Peel configuration, located in the
Second, for each system upon which the experiment depends (with corresponding system bean identified by
systemID), Peel tries to loads the the default configuration for that system as well as bundle- or host-specific configurations.
Third, bundle- and host-specific
application.conf, which is a counterpart and respectively overrides bundle-wide values defined in
Upon that come the values defined the
config property of the current experiment bean. These are typically used to vary one particular parameter in a sequence of experiment in a suite (e.g. varying the number of workers and the DOP).
Finally, Peel appends a set of configuration parameters derived from the current JVM System object (e.g., the number of CPUs or the total amount of available memory).
Tip: You can see the sequence of files loaded as part of the construction of a particular configuration environment in the Peel console log at runtime.
One of the main advantages of Peel is the ability to share hand-crafted configurations for a set of systems on a particular host environment. The suggested way to do so is through a dedicated Git repository.
If you are using a versioned bundle, must clone the repository under your
For example, we offer an example ACME cluster has a shared configuration available at GitHub, you can use the following command to add it to your
git clone \ email@example.com:stratosphere/peelconfig.acme.git \ peel-wordcount-bundle/src/main/resources/config/hosts/acme-master
We also encourage beginners to use the devhost config which contains best practice configurations for your developer machine.
git clone \ firstname.lastname@example.org:stratosphere/peelconfig.devhost.git \ peel-wordcount-bundle/src/main/resources/config/hosts/$HOSTNAME
If you intend to modify those settings, we suggest to fork the repository and clone the fork instead.
Please check the Environment Configurations Repository for more information on that matter and a list of available configuration repositories.
Let us take a look at the
config folder of the
peel-wordcount bundle in order to illustrate some of the concepts presented above.
# cd "$BUNDLE_BIN" && \ # tree -L 3 --dirsfirst peel-wordcount/config peel-wordcount/config ├── hosts │ ├── acme-master │ │ ├── application.conf │ │ ├── flink-0.8.0.conf │ │ ├── flink-0.8.1.conf │ │ ├── flink-0.9.0.conf │ │ ├── flink.conf │ │ ├── hadoop-2.conf │ │ ├── hdfs-2.4.1.conf │ │ ├── hdfs-2.7.1.conf │ │ ├── hosts.conf │ │ ├── spark-1.3.1.conf │ │ └── spark.conf │ └── $HOSTNAME │ ├── application.conf │ ├── hadoop-2.conf │ ├── hdfs-2.4.1.conf │ └── hdfs-2.7.1.conf ├── experiments.wordcount.xml ├── experiments.xml └── systems.xml
We can see that our bundle does not include bundle-wide environment configuration, because the
config folder does not contain any
*.conf files as direct children.
On the other side, we have two host-specific configurations in the
$HOSTNAME of your machine we can see the following
- A host-specific system configuration for the following system beans:
- A host-specific Peel configuration which sets a shared
systemspaths and configures an rsync connection to the ACME cluster.
acme-master (which is the
$HOSTNAME of the master node in the ACME cluster) we can see the following
- A host-specific system configuration for the following system beans:
- A host-specific Peel configuration which includes an auto-generated hosts.conf for the ACME cluster.
When we run Peel experiment suites from the
acme-master host, we will therefore use the host-specific environment configuration values defined under
config/acme-master. Otherwise, Peel will load and use only the default
reference.*.conf files from the