Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hive, Apache Pig etc.
At its heart, tez is very simple and has just two components:
-
The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every 'task' in tez has the following:
-
Input to consume key/value pairs from.
-
Processor to process them.
-
Output to collect the processed key/value pairs.
-
A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks' described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.
For instructions on how to contribute to Tez, refer to: Tez Wiki - How to Contribute
- JDK 21+
- Maven 3.9.14 or later
- spotbugs 4.9.3 or later (if running spotbugs)
- ProtocolBuffer 3.25.5
- Hadoop 3.x
- Clean:
mvn clean - Compile:
mvn compile - Run tests:
mvn test - Create JAR:
mvn package - Run spotbugs:
mvn compile spotbugs:spotbugs - Run checkstyle:
mvn compile checkstyle:checkstyle - Install JAR in M2 cache:
mvn install - Deploy JAR to Maven repo:
mvn deploy - Run jacoco:
mvn test -Pjacoco - Run Rat:
mvn apache-rat:check - Build javadocs:
mvn javadoc:javadoc - Build distribution:
mvn package -Dhadoop.version=3.4.2 - Visualize state machines:
mvn compile -Pvisualize -DskipTests=true
- Use
-Dpackage.formatto create distributions with a format other than .tar.gz (mvn-assembly-plugin formats). - Use
-Dhadoop.versionto specify the version of Hadoop to build Tez against. - Use
-Dprotoc.pathto specify the path toprotoc. - Use
-Dallow.root.buildto root buildtez-uicomponents.
Tez runs on top of Apache Hadoop YARN and requires Hadoop 3.x.
By default, it can be compiled against other compatible Hadoop versions by
specifying hadoop.version:
mvn package -Dhadoop.version=3.4.2For recent versions of Hadoop (which do not bundle AWS and Azure by default), you can bundle AWS-S3 or Azure support:
mvn package -Dhadoop.version=3.4.2 -Paws -PazureTez also has shims to provide version-specific implementations for various APIs. For more details, refer to Hadoop Shims.
-
UI Build Issues
In case of issues with the UI build, please clean the UI cache:
mvn clean -PcleanUICache
-
Skip UI Build
To skip the UI build, use the
nouiprofile:mvn clean install -DskipTests -Pnoui
Maven will still include the
tez-uiproject, but all related plugins will be skipped. -
Issue with PhantomJS on building in PowerPC
Official PhantomJS binaries were not available for the Power platform. If the build fails on PPC, try installing PhantomJS manually and rerun. Refer to PhantomJS README and install it globally.
The version of the Protocol Buffer compiler (protoc) can be defined
on-the-fly:
mvn clean install -DskipTests -pl ./tez-api -Dprotobuf.version=3.25.5The default version is defined in the root pom.xml.
If you have multiple versions of protoc, set the PROTOC_PATH environment
variable to point to the desired binary. If not defined, the embedded protoc
compiler corresponding to ${protobuf.version} will be used.
Alternatively, specify the path during the build:
mvn package -DskipTests -Dprotoc.path=/usr/local/bin/protocBuild a local copy of the Apache Tez website:
mvn site -pl docsIf you are building a submodule directory, dependencies will be resolved from
the Maven cache or remote repositories. Alternatively, run
mvn install -DskipTests from the Tez top level once and then work from the
submodule.
Use -Pvisualize to generate a Graphviz file (Tez.gv) representing state
transitions:
mvn compile -Pvisualize -DskipTests=trueOptional parameters:
-Dtez.dag.state.classes=<comma-separated list of classes>(Default: DAG, Vertex, Task, TaskAttempt)-Dtez.graphviz.title(Default: Tez)-Dtez.graphviz.output.file(Default: Tez.gv)
Example for DAGImpl:
mvn compile -Pvisualize \
-Dtez.dag.state.classes=org.apache.tez.dag.app.dag.impl.DAGImpl \
-DskipTests=trueConvert the .gv file to an image:
dot -Tpng -o Tez.png Tez.gvUse -Ptools to build tools under tez-tools:
mvn package -Ptools