In data science work streams, batch pipelines involve touching varied data sources (databases, warehouses, data lakes), generating features, imputing, exploration and many other tasks all the way to generating trained model artifacts.
While doing so, we think of the process from the start to end as blocks that can be chained in a sequence (or more generally as a directed acyclic graph or DAG).
Some desirable properties we want from model pipelines are:
Ultimately we would like to manage pipelines without much manual work.
Workflow tools address these gaps. What they enable the user to do is
Further these tools also control resources (compute, storage etc) and perform monitoring to achieve these goals.
Example tools: