Class DataReuseEngine

  • All Implemented Interfaces:
    Refiner

    public class DataReuseEngine
    extends Engine
    implements Refiner
    The data reuse engine reduces the workflow on the basis of existing output files of the workflow found in the Replica Catalog. The algorithm works in two passes.

    In the first pass , we determine all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file existing in the Replica Catalog , if

      - the output file is not an input to any of the children of the job X
      
    In the second pass, we remove the job whose output files exist in the Replica Catalog and try to cascade the deletion upwards to the parent jobs. We start the breadth first traversal of the workflow bottom up. A node is marked for deletion if -
      ( It is already marked for deletion in pass 1
          OR
          ( ALL of it's children have been marked for deletion
            AND
            Node's output files have transfer flags set to false
          )
      )
     
    Version:
    $Revision$
    Author:
    Karan Vahi
    • Field Detail

      • mAllDeletedJobs

        private java.util.List<Job> mAllDeletedJobs
        List of all deleted jobs during workflow reduction.
      • mAllDeletedNodes

        private java.util.List<GraphNode> mAllDeletedNodes
        List of all deleted jobs during workflow reduction.
      • mXMLStore

        private XMLProducer mXMLStore
        The XML Producer object that records the actions.
      • mWorkflow

        private ADag mWorkflow
        The workflow object being worked upon.
    • Constructor Detail

      • DataReuseEngine

        public DataReuseEngine​(ADag orgDag,
                               PegasusBag bag)
        The constructor
        Parameters:
        orgDag - The original Dag object
        bag - the bag of initialization objects.
    • Method Detail

      • getWorkflow

        public ADag getWorkflow()
        Returns a reference to the workflow that is being refined by the refiner.
        Specified by:
        getWorkflow in interface Refiner
        Returns:
        ADAG object.
      • getXMLProducer

        public XMLProducer getXMLProducer()
        Returns a reference to the XMLProducer, that generates the XML fragment capturing the actions of the refiner. This is used for provenace purposes.
        Specified by:
        getXMLProducer in interface Refiner
        Returns:
        XMLProducer
      • reduceWorkflow

        public ADag reduceWorkflow​(ADag workflow,
                                   ReplicaCatalogBridge rcb)
        Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.
        Parameters:
        workflow - the workflow to be reduced.
        rcb - instance of the replica catalog bridge.
        Returns:
        the reduced dag
      • reduceWorkflow

        public Graph reduceWorkflow​(Graph workflow,
                                    ReplicaCatalogBridge rcb)
        Reduces the workflow on the basis of the existence of lfn's in the replica catalog. The existence of files, is determined via the bridge.
        Parameters:
        workflow - the workflow to be reduced.
        rcb - instance of the replica catalog bridge.
        Returns:
        the reduced dag. The input workflow object is returned reduced.
      • getDeletedJobs

        public java.util.List<Job> getDeletedJobs()
        This returns all the jobs deleted from the workflow after the reduction algorithm has run.
        Returns:
        List containing the Job of deleted leaf jobs.
      • getDeletedLeafJobs

        public java.util.List<Job> getDeletedLeafJobs()
        This returns all the deleted jobs that happen to be leaf nodes. This entails that the output files of these jobs be transferred from the location returned by the Replica Catalog to the pool specified. This is a subset of mAllDeletedJobs Also to determine the deleted leaf jobs it refers the original dag, not the reduced dag.
        Returns:
        List containing the Job of deleted leaf jobs.
      • getJobsInRC

        private java.util.List<GraphNode> getJobsInRC​(Graph workflow,
                                                      java.util.Set filesInRC)
        Returns all the jobs whose output files exist in the Replica Catalog. An output file with the transfer flag set to false is treated equivalent to the file being in the Replica Catalog , if - the output file is not an input to any of the children of the job X
        Parameters:
        workflow - the workflow object
        filesInRC - Set of String objects corresponding to the logical filenames of files that are found to be in the Replica Catalog.
        Returns:
        a List of GraphNodes with their Boolean bag value set to true.
        See Also:
        org.griphyn.cPlanner.classes.Job
      • cascadeDeletionUpwards

        protected Graph cascadeDeletionUpwards​(Graph workflow,
                                               java.util.List<GraphNode> originalJobsInRC)
        Cascade the deletion of the jobs upwards in the workflow. We start a breadth first traversal of the workflow bottom up. A node is marked for deletion if -
          ( It is already marked for deletion
              OR
              ( ALL of it's children have been marked for deletion
                AND
                Node's output files have transfer flags set to false
              )
          )
         
        Parameters:
        workflow - the worfklow to be deduced
        originalJobsInRC - list of nodes found to be in the Replica Catalog.
      • transferOutput

        protected boolean transferOutput​(GraphNode node)
        Returns whether a user wants output transferred for a node or not. If no output files are associated , true will be returned
        Parameters:
        node - the GraphNode
        Returns:
        boolean