Execution Workflow
In the previous sections we explained the internals and the code required to configure the environment and define the experiments in a Peel bundle.
At this point, you should be able to figure out the *.xml
and *.conf
files in the peel-wordcount-bundle
module.
In this section, we will explain how to make use of the commands provided by the Peel CLI in order to deploy and run the experiments in your bundle.
Building the Bundle
To assemble the bundle from the sources with Maven, you have one of four options:
mvn clean package # package
mvn clean package -Pdev # package with soft links
mvn clean deploy # package and copy to $BUNDLE_BIN
mvn clean deploy -Pdev # package and copy with soft links
The bundle binaries are assembled under peel-wordcount-bundle/target
.
Running the deploy
phase automatically copies the assembled bundle binaries folder to $BUNDLE_BIN
.
Activating the dev
profile with -Pdev
enforces certain folders in the assembled bundle binaries (config, datasets, and utils) to be created as soft links to their corresponding source locations.
We recommend this feature for bundles which are still under development, as it allows you to adapt the *.xml
and .conf
files on the fly from your IDE without rebuilding.
Bundle Deployment
For large-scale applications, the environment where the experiments need to be executed typically differs from the environment of the machine where the bundle binaries are assembled.
In order to start the execution process, the user therefore needs to first deploy the bundle binaries from the local machine to the desired host environment.
The Peel CLI offers a special command to that purpose. In order to push the peel-wordcount
bundle to the ACME cluster, you just have to run:
# from $BUNDLE_BIN/peel-experiments on the developer host
./peel.sh rsync:push acme
The command uses rsync to copy the contents of the enclosing Peel bundle to the target environment.
The connection options for the rsync
calls are thereby taken from the environment configuration of the local environment.
In the example above we are referring to the acme
remote, which is configured in the application.conf
of the linked developer environment.
# rsync remotes, located in $HOSTNAME/application.conf
app.rsync {
# 'ACME' remote
acme {
url = "acme-master.acme.org" # remote host url
rsh = "ssh -l acme-user" # remote shell to use
dst = "/home/acme-user/experiments" # remote destination base folder
own = "peel:peel" # remote files owner (optional)
}
}
Upon executing the rsync:push
command, you can login to the ACME server and continue working from there with the mirrored copy of your bundle.
# on the developer host
ssh -l acme-user acme-master.acme.org
# on acme-master.acme.org
cd "/home/acme-user/experiments/peel-wordcount"
Once you start running experiments on the remote host you will accumulate results data under the app.path.results
folder of your bundle.
Once you archive those results, you can fetch them back to your developer host with the following command.
# from $BUNDLE_BIN/peel-experiments on the developer host
./peel.sh rsync:pull acme
Experiments Execution
The central piece of execution logic offered by Peel is the actual process of running the defined experiments.
Execution Lifecycle
As we saw in the Experiments Definitions section, Peel organizes experiments in sequences called experiment suites. The execution lifecycle is therefore always tied to a specific suite and in general undergoes the following phases:
- Setup Suite.
- Systems with Suite lifespan required for execution are set up and started by Peel.
- Systems with Provided lifespan are assumed to be already up and running, but are re-configured if necessary.
- Execute Experiments. For each experiment in the suite Peel performs the following tasks
- Setup Experiment.
- Ensure that the required inputs are materialized (either generated or copied) in the respective file system.
- Check the configuration of associated descendant systems with Provided or Suite lifespan against the values defined in the current experiment config. If the values do not match, it reconfigures and restarts the system.
- Set up systems with Experiment lifespan.
- Execute Experiment Runs. For each experiment run which has not been completed by a previous invocation of the same suite.
- Setup Experiment Run.
- Check and set up systems with Run lifespan.
- Execute Experiment Run.
- Execute the experiment.
- Collect log data from the associated systems.
- Clear the produced outputs.
- Tear Down Experiment Run.
- Tear down systems with Run lifespan.
- Setup Experiment Run.
- Tear Down Experiment.
- Tear down systems with Experiment lifespan.
- Setup Experiment.
- Tear Down Suite.
- Tear down systems with Suite lifespan.
- Leaves systems with Provided lifespan up and running with the current configuration.
The Peel CLI offers a number of commands that are based on the above lifecycle.
Running a Full Suite
First, let us take a look at the most common scenario - running all experiments in a suite. The following command
./peel.sh suite:run wordcount.scale-out
will execute the weak wordcount.scale-out
experiment defined in the previous section.
The results are thereby written in the results
subfolder of your bundle (the default path configured for app.path.results
).
In this case the file structure looks as follows.
# tree -L 3 --dirsfirst results/wordcount.scale-out/
results/wordcount.scale-out/
├── datagen.words
│ ├── run.err
│ └── run.out
├── wordcount.flink.top005.run01
│ ├── logs
│ │ ├── flink
│ │ └── hdfs-2
│ ├── run.err
│ ├── run.out
│ ├── run.pln
│ └── state.json
├── wordcount.flink.top005.run02
│ └── ...
├── wordcount.flink.top005.run03
│ └── ...
├── wordcount.flink.top010.run01
│ └── ...
├── wordcount.flink.top010.run02
│ └── ...
├── wordcount.flink.top010.run03
│ └── ...
├── wordcount.flink.top020.run01
│ └── ...
├── wordcount.flink.top020.run02
│ └── ...
├── wordcount.flink.top020.run03
│ └── ...
├── wordcount.spark.top005.run01
│ ├── logs
│ │ ├── hdfs-2
│ │ └── spark
│ ├── run.err
│ ├── run.out
│ └── state.json
├── wordcount.spark.top005.run02
│ └── ...
├── wordcount.spark.top005.run03
│ └── ...
├── wordcount.spark.top010.run01
│ └── ...
├── wordcount.spark.top010.run02
│ └── ...
├── wordcount.spark.top010.run03
│ └── ...
├── wordcount.spark.top020.run01
│ └── ...
├── wordcount.spark.top020.run02
│ └── ...
└── wordcount.spark.top020.run03
└── ...
Under results/wordcount.scale-out
we see a folder for the data generation job datagen.words
, as well as a folder for each experiment run.
The latter contains at least the following items:
- A
state.json
file with the serialized state of the experiment bean. - A
run.out
file with thestdout
putput of the experiment run. - A
run.err
file with thestderr
output of the experiment run. - A
logs
folder with subfolders for each dependent system, containing the collected log data while running the experiment.
If an experiment run succeeds, the runExitCode
value in it’s state.json
file is set to zero. A non-zero value marks an experiment as failed.
If you execute the suite:run
command again, Peel will skip successfully completed runs per default.
In order to enforce the re-execution of successful experiments, use the --force
flag.
./peel.sh suite:run wordcount.scale-out --force
Running a Single Experiment
Sometimes you might want to execute a single experiment run in a suite. The Peel CLI command for that is exp:run
.
This command essentially follows the same lifecycle as suite:run
, but considers only a single run of the specified experiment to be part of the suite and neglects its runExitCode
value.
The default run to be executed is 1, but you can change this with the optional --run
argument.
For example, to execute only the second run of the wordcount.flink.top010
experiment type the following command.
./peel.sh exp:run wordcount.scale-out wordcount.flink.top10 --run 2
You can also split the preparation (system setup + data generation), run execution, and finalization (system teardown) phases in three commands as follows.
./peel.sh exp:setup wordcount.scale-out wordcount.flink.top10
./peel.sh exp:run wordcount.scale-out wordcount.flink.top10 --run 2 --just
./peel.sh exp:teardown wordcount.scale-out wordcount.flink.top10
Setting Up / Tearing Down Systems
In addition to the lifecycle-based commands explained above, Peel also offers commands to directly start and stop systems described by system beans.
To start and stop the flink-0.9.0
system bean, for example, you can use the following command.
./peel.sh sys:setup flink-0.9.0 # start
./peel.sh sys:teardown flink-0.9.0 # stop
Dependent systems are thereby transitively started and stopped (in the above example this includes hdfs-2.7.1
).
Since the system bean is loaded outside of the context of a particular experiment, it will be configured using the values loaded for the current environment.
Archiving the Results
Experiment results are maintained in subfolders of ${app.path.results}
and tend to get bigger when you scale out the data and the number of slaves due to the increased number of log files.
To save space and make use of the inherent log file entropy, Peel offers the following commands.
./peel.sh res:archive wordcount.scale-out
./peel.sh res:extract wordcount.scale-out
The first command creates a *.tar.gz
archive for the given experiment path, while the second extracts a previously created archive.
Make sure you archive the results of a suite before you fetch them to your developer machine with rsync:pull
.