Environment Configurations
This section discusses Peel’s approach of a unified, global experiment environment configuration.
Challenges
In the Motivation section, we defined an experiment environment as a combination of
- a host environment,
- a set of dependent systems, amongst which a principle system under test, and
- an experiment application.
Environments are instantiated with a concrete set of configuration values (for the systems) and parameter values (for the experiment application). Here is again a figure of the environments created for a series of experiments in our running example.
A number of problems arise with a naïve approach for manual configuration (per system & experiment) of the environments in the above example:
- Syntax Heterogeneity. Each system (HDFS, Spark & Flink) has to be configured separately using its own special syntax. This requires basic understanding and knowledge in the configuration parameters for all systems in the stack.
- Variable Interdependence. The sets of configuration variables associated with each system are not mutually exclusive, and care has to be taken that the corresponding values are consistent for the overlapping fragment (e.g., the slaves list in all systems should be the same).
- Value Tuning. For a series of related experiments, all but a very few set of values remain fixed. These values are suitably chosen based on the underlying host environment characteristics in order to maximise the performance of the corresponding systems (e.g., memory allocation, degree of parallelism, temp paths for spilling).
The naïve approach therefore puts a substantial burden on the person conducting the experiments.
Approach
Peel’s approach towards the difficulties outlined above is to associate one global environment configuration to each experiment. In doing this, Peel promotes
- configuration reuse through layering, as well as
- configuration uniformity through a hierarchical syntax (HOCON).
At runtime, experiments are represented by experiment beans. Each experiment bean holds a HOCON config that is first constructed and evaluated based on the layering scheme and conventions discussed below, and then mapped to the various concrete config and parameter files and formats of the systems and applications in the experiment environment.
In our running example, this means that each of the six experiments (3x SparkWC
+ 3x FlinkWC
) will have an associated config
property – a hierarchical map of key-value pairs which constitute the configuration of all systems and jobs required for that particular experiment.
Configuration Layers
The Peel configuration system is built upon the concept of layered construction and resolution. Peel distinguishes between three layers of configuration:
- Default. Default configuration values for Peel itself and the supported systems. Packaged as resources in Peel-related jars located in the bundle’s
app.path.lib
folder. - Bundle. Bundle-specific configuration values. Located in
config/hosts
. Default is theconfig/hosts
subfolder of the current bundle. - Host. Host-specific configuration values. Located in the
$HOSTNAME
subfolder of theapp.path.config
folder.
For each experiment bean defined in an experiment suite, Peel will construct an associated configuration according to the following table entries (higher in the list means lower priority).
Path | Description |
---|---|
reference.peel.conf |
Default Peel config. |
reference.${systemID}.conf |
Default system config. |
config/${systemID}.conf |
Bundle-specific system config (opt). |
config/hosts/${hostname}/${systemID}.conf |
Host-specific system config (opt). |
config/application.conf |
Bundle-specific Peel config (opt). |
config/hosts/${hostname}/application.conf |
Host-specific Peel config (opt). |
Experiment bean config value | Experiment specific config (opt). |
System | JVM system properties (constant). |
First comes the default Peel configuration, located in the peel-core.jar
package.
Second, for each system upon which the experiment depends (with corresponding system bean identified by systemID
), Peel tries to loads the the default configuration for that system as well as bundle- or host-specific configurations.
Third, bundle- and host-specific application.conf
, which is a counterpart and respectively overrides bundle-wide values defined in reference.peel.conf
.
Upon that come the values defined the config
property of the current experiment bean. These are typically used to vary one particular parameter in a sequence of experiment in a suite (e.g. varying the number of workers and the DOP).
Finally, Peel appends a set of configuration parameters derived from the current JVM System object (e.g., the number of CPUs or the total amount of available memory).
Tip: You can see the sequence of files loaded as part of the construction of a particular configuration environment in the Peel console log at runtime.
Sharing Configurations
One of the main advantages of Peel is the ability to share hand-crafted configurations for a set of systems on a particular host environment. The suggested way to do so is through a dedicated Git repository.
If you are using a versioned bundle, must clone the repository under your *-bundle
project.
For example, we offer an example ACME cluster has a shared configuration available at GitHub, you can use the following command to add it to your peel-wordcount
bundle:
git clone \
git@github.com:stratosphere/peelconfig.acme.git \
peel-wordcount-bundle/src/main/resources/config/hosts/acme-master
We also encourage beginners to use the devhost config which contains best practice configurations for your developer machine.
git clone \
git@github.com:stratosphere/peelconfig.devhost.git \
peel-wordcount-bundle/src/main/resources/config/hosts/$HOSTNAME
If you intend to modify those settings, we suggest to fork the repository and clone the fork instead.
Please check the Environment Configurations Repository for more information on that matter and a list of available configuration repositories.
Example
Let us take a look at the config
folder of the peel-wordcount
bundle in order to illustrate some of the concepts presented above.
# cd "$BUNDLE_BIN" && \
# tree -L 3 --dirsfirst peel-wordcount/config
peel-wordcount/config
├── hosts
│ ├── acme-master
│ │ ├── application.conf
│ │ ├── flink-0.8.0.conf
│ │ ├── flink-0.8.1.conf
│ │ ├── flink-0.9.0.conf
│ │ ├── flink.conf
│ │ ├── hadoop-2.conf
│ │ ├── hdfs-2.4.1.conf
│ │ ├── hdfs-2.7.1.conf
│ │ ├── hosts.conf
│ │ ├── spark-1.3.1.conf
│ │ └── spark.conf
│ └── $HOSTNAME
│ ├── application.conf
│ ├── hadoop-2.conf
│ ├── hdfs-2.4.1.conf
│ └── hdfs-2.7.1.conf
├── experiments.wordcount.xml
├── experiments.xml
└── systems.xml
We can see that our bundle does not include bundle-wide environment configuration, because the config
folder does not contain any *.conf
files as direct children.
On the other side, we have two host-specific configurations in the hosts
subfolder.
Under the $HOSTNAME
of your machine we can see the following *.conf
files:
- A host-specific system configuration for the following system beans:
- hdfs-2.4.1 which delegates to hadoop-2.conf, and
- hdfs-2.7.1 which delegates to hadoop-2.conf.
- A host-specific Peel configuration which sets a shared
downloads
andsystems
paths and configures an rsync connection to the ACME cluster.
Under acme-master
(which is the $HOSTNAME
of the master node in the ACME cluster) we can see the following *.conf
files:
- A host-specific system configuration for the following system beans:
- flink-0.8.0 which delegates to flink.conf,
- flink-0.8.1 which delegates to flink.conf,
- flink-0.9.0 which delegates to flink.conf,
- spark-1.3.1 which delegates to spark.conf,
- hdfs-2.4.1 which delegates to hadoop-2.conf, and
- hdfs-2.7.1 which delegates to hadoop-2.conf.
- A host-specific Peel configuration which includes an auto-generated hosts.conf for the ACME cluster.
When we run Peel experiment suites from the acme-master
host, we will therefore use the host-specific environment configuration values defined under config/acme-master
. Otherwise, Peel will load and use only the default reference.*.conf
files from the peel-core
and peel-extensions
jars.