From c8d3569d38efdeb1c4aa79c97f3420364aed1c41 Mon Sep 17 00:00:00 2001 From: Alessandro Berti Date: Wed, 20 Nov 2024 09:33:15 +0100 Subject: [PATCH] fix --- docs/01_handling_event_data.md | 2 + docs/04_process_discovery.md | 21 +++++----- docs/06_conformance_checking.md | 20 +++------- docs/07_process_trees.md | 14 +++---- docs/08_feature_selection.md | 16 +++----- docs/09_statistics.md | 5 ++- docs/10_log-model_evaluation.md | 10 +++-- docs/11_simulation.md | 59 ++++++++++++----------------- docs/15_streaming_process_mining.md | 2 + 9 files changed, 66 insertions(+), 83 deletions(-) diff --git a/docs/01_handling_event_data.md b/docs/01_handling_event_data.md index 9b13056ac..79dce3a84 100644 --- a/docs/01_handling_event_data.md +++ b/docs/01_handling_event_data.md @@ -108,6 +108,8 @@ as a -file: + + |CaseID|Activity|Timestamp|clientID| |---|---|---|---| |1|register request|20200422T0455|1337| diff --git a/docs/04_process_discovery.md b/docs/04_process_discovery.md index 09b861f4f..7097e51f0 100644 --- a/docs/04_process_discovery.md +++ b/docs/04_process_discovery.md @@ -8,6 +8,8 @@ order of events/activities that are executed during a process execution. In the following, we made up an overview to visualize the advantages and disadvantages of the mining algorithms. + + |Alpha|Alpha+|Heuristic|Inductive| |---|---|---|---| |Cannot handle loops of length one and length two|Can handle loops of length one and length two|Takes frequency into account|Can handle invisible tasks| @@ -165,6 +167,8 @@ if __name__ == "__main__": ``` + + |Parameter name|Meaning| |---|---| |dependency_threshold|dependency threshold of the Heuristics Miner (default: 0.5)| @@ -400,17 +404,12 @@ if __name__ == "__main__": Visualizing the DFGs, we can say that the correlation miner was able to discover a visualization where the main path is clear. Different variants of the correlation miner are available: -|Variants.CLASSIC|Calculates the P/S matrix and the duration matrix in the classic way (the entire list of -events is used)| + + +|Variants.CLASSIC|Calculates the P/S matrix and the duration matrix in the classic way (the entire list of events is used)| |---|---| -|Variants.TRACE_BASED|Calculates the P/S matrix and the duration matrix on a classic event log, -trace-by-trace, and merges the results. The resolution of the linear problem permits to -obtain a model that is more understandable than the classic DFG calculated on top of the -log.| -|Variants.CLASSIC_SPLIT|Calculates the P/S matrix and the duration matrix on the entire list of events, as in -the classic version, but splits that in chunks to fasten the computation. Hence, the -generated model is less accurate (in comparison to the CLASSIC version) but the -calculation is faster. The default chunk size is 100000 events.| +|Variants.TRACE_BASED|Calculates the P/S matrix and the duration matrix on a classic event log, trace-by-trace, and merges the results. The resolution of the linear problem permits to obtain a model that is more understandable than the classic DFG calculated on top of the log.| +|Variants.CLASSIC_SPLIT|Calculates the P/S matrix and the duration matrix on the entire list of events, as in the classic version, but splits that in chunks to fasten the computation. Hence, the generated model is less accurate (in comparison to the CLASSIC version) but the calculation is faster. The default chunk size is 100000 events.| @@ -443,6 +442,8 @@ if __name__ == "__main__": Some parameters can be used in order to customize the execution of the temporal profile: See Parameters + + |Parameter Key|Type|Default|Description| |---|---|---|---| |Parameters.ACTIVITY_KEY|string|concept:name|The attribute to use as activity.| diff --git a/docs/06_conformance_checking.md b/docs/06_conformance_checking.md index aef278bfe..4f4d16459 100644 --- a/docs/06_conformance_checking.md +++ b/docs/06_conformance_checking.md @@ -1035,6 +1035,8 @@ if __name__ == "__main__": Some parameters can be used in order to customize the conformance checking of the temporal profile: See Parameters + + |Parameter Key|Type|Default|Description| |---|---|---|---| |Parameters.ACTIVITY_KEY|string|concept:name|The attribute to use as activity.| @@ -1064,22 +1066,12 @@ execution is repeated (that means rework) from different people. The verification of LTL rules requires the insertion of the required parameters (of the specific rule). Hence, this form of conformance checking is not automatic. The LTL rules that are implemented in pm4py are found in the following table: + + |LTL rule|Description| |---|---| -|ltl.ltl_checker.four_eyes_principle(log, A, B)|Applies the four eyes principle on the activities A and B. -Parameters: -log: event log -A: the activity A of the rule (an activity of the log) -B: the activity B of the rule (an activity of the log) -Returns: -Filtered log object (containing the cases which have A and B done by the same person)| -|ltl.ltl_checker.attr_value_different_persons(log, A)|Finds the process executions in which the activity A is repeated by -different people. -Parameters: -log: event log -A: the activity A of the rule (an activity of the log) -Returns: -Filtered log object (containing the cases which have A repeated by different people)| +|ltl.ltl_checker.four_eyes_principle(log, A, B)|Applies the four eyes principle on the activities A and B. Parameters: log: event log A: the activity A of the rule (an activity of the log) B: the activity B of the rule (an activity of the log) Returns: Filtered log object (containing the cases which have A and B done by the same person)| +|ltl.ltl_checker.attr_value_different_persons(log, A)|Finds the process executions in which the activity A is repeated by different people. Parameters: log: event log A: the activity A of the rule (an activity of the log) Returns: Filtered log object (containing the cases which have A repeated by different people)| diff --git a/docs/07_process_trees.md b/docs/07_process_trees.md index 895cedad4..245f5d3b4 100644 --- a/docs/07_process_trees.md +++ b/docs/07_process_trees.md @@ -56,6 +56,8 @@ if __name__ == "__main__": Suppose the following start activity and their respective occurrences. + + |Parameter|Meaning| |---|---| |MODE|most frequent number of visible activities (default 20)| @@ -66,19 +68,13 @@ Suppose the following start activity and their respective occurrences. |PARALLEL|probability to add a parallel operator to tree (default 0.25)| |LOOP|probability to add a loop operator to tree (default 0.25)| |OR|probability to add an or operator to tree (default 0)| -|SILENT|probability to add silent activity to a choice or loop operator -(default 0.25)| +|SILENT|probability to add silent activity to a choice or loop operator (default 0.25)| |DUPLICATE|probability to duplicate an activity label (default 0)| |LT_DEPENDENCY|probability to add a random dependency to the tree (default 0)| |INFREQUENT|probability to make a choice have infrequent paths (default 0.25)| |NO_MODELS|number of trees to generate from model population (default 10)| -|UNFOLD|whether or not to unfold loops in order to include choices -underneath in dependencies: 0=False, 1=True -if lt_dependency <= 0: this should always be 0 (False) -if lt_dependency > 0: this can be 1 or 0 (True or False) (default -10)| -|MAX_REPEAT|maximum number of repetitions of a loop (only used when unfolding is -True) (default 10)| +|UNFOLD|whether or not to unfold loops in order to include choices underneath in dependencies: 0=False, 1=True if lt_dependency <= 0: this should always be 0 (False) if lt_dependency > 0: this can be 1 or 0 (True or False) (default 10)| +|MAX_REPEAT|maximum number of repetitions of a loop (only used when unfolding is True) (default 10)| diff --git a/docs/08_feature_selection.md b/docs/08_feature_selection.md index ad0ff3dbc..5cac8b6ef 100644 --- a/docs/08_feature_selection.md +++ b/docs/08_feature_selection.md @@ -114,18 +114,14 @@ log_to_features.apply . The types of features that can be considered by a manual feature selection are: -|str_ev_attr|String attributes at the event level: these are hot-encoded into features that may -assume value 0 or value 1.| + + +|str_ev_attr|String attributes at the event level: these are hot-encoded into features that may assume value 0 or value 1.| |---|---| -|str_tr_attr|String attributes at the trace level: these are hot-encoded into features that may -assume value 0 or value 1.| -|num_ev_attr|Numeric attributes at the event level: these are encoded by including the last value of -the attribute among the events of the trace.| +|str_tr_attr|String attributes at the trace level: these are hot-encoded into features that may assume value 0 or value 1.| +|num_ev_attr|Numeric attributes at the event level: these are encoded by including the last value of the attribute among the events of the trace.| |num_tr_attr|Numeric attributes at trace level: these are encoded by including the numerical value.| -|str_evsucc_attr|Successions related to the string attributes values at the event level: for example, if -we have a trace [A,B,C], it might be important to include not only the presence of the -single values A, B and C as features; but also the presence of the directly-follows -couples (A,B) and (B,C).| +|str_evsucc_attr|Successions related to the string attributes values at the event level: for example, if we have a trace [A,B,C], it might be important to include not only the presence of the single values A, B and C as features; but also the presence of the directly-follows couples (A,B) and (B,C).| diff --git a/docs/09_statistics.md b/docs/09_statistics.md index a2e8e01cf..d5875661f 100644 --- a/docs/09_statistics.md +++ b/docs/09_statistics.md @@ -124,8 +124,9 @@ In the following, we aim to insert the following attributes to events inside a l Attributes -|@@approx_bh_partial_cycle_time|Incremental cycle time associated to the event (the cycle time of the last event is -the cycle time of the instance)| + + +|@@approx_bh_partial_cycle_time|Incremental cycle time associated to the event (the cycle time of the last event is the cycle time of the instance)| |---|---| |@@approx_bh_partial_lead_time|Incremental lead time associated to the event| |@@approx_bh_overall_wasted_time|Difference between the partial lead time and the partial cycle time values| diff --git a/docs/10_log-model_evaluation.md b/docs/10_log-model_evaluation.md index 9db1da7b2..89416def3 100644 --- a/docs/10_log-model_evaluation.md +++ b/docs/10_log-model_evaluation.md @@ -430,12 +430,12 @@ The list of parameters are: Inspect parameters -|PRINT_DIAGNOSTICS|Enables the printing of the diagnostics on the Petri net, when WOFLAN is -executed.| + + +|PRINT_DIAGNOSTICS|Enables the printing of the diagnostics on the Petri net, when WOFLAN is executed.| |---|---| |RETURN_DIAGNOSTICS|Returns a dictionary containing the diagnostics.| -|RETURN_ASAP_WHEN_NOT_SOUND|Stops the execution of WOFLAN when a condition determining that the Petri net -is not a sound workflow net is found.| +|RETURN_ASAP_WHEN_NOT_SOUND|Stops the execution of WOFLAN when a condition determining that the Petri net is not a sound workflow net is found.| @@ -503,6 +503,8 @@ the corresponding step): Inspect outputs + + |S_C_NET|| |---|---| |PLACE_INVARIANTS|| diff --git a/docs/11_simulation.md b/docs/11_simulation.md index 9f939ceac..2b81f5877 100644 --- a/docs/11_simulation.md +++ b/docs/11_simulation.md @@ -14,12 +14,11 @@ been provided by the user. A playout of a Petri net takes as input a Petri net along with an initial marking, and returns a list of process executions that are allowed from the process model. We offer different types of playouts: -|Variants.BASIC_PLAYOUT|A basic playout that accepts a Petri net along with an initial marking, and returns a -specified number of process executions (repetitions may be possible).| + + +|Variants.BASIC_PLAYOUT|A basic playout that accepts a Petri net along with an initial marking, and returns a specified number of process executions (repetitions may be possible).| |---|---| -|Variants.EXTENSIVE|A playout that accepts a Petri net along with an initial marking, and returns all the -executions that are possible according to the model, up to a provided -length of trace (may be computationally expensive).| +|Variants.EXTENSIVE|A playout that accepts a Petri net along with an initial marking, and returns all the executions that are possible according to the model, up to a provided length of trace (may be computationally expensive).| @@ -27,6 +26,8 @@ The list of parameters for such variants are: Inspect parameters + + |Variants.BASIC_PLAYOUT|Parameters.ACTIVITY_KEY|The name of the attribute to use as activity in the playout log.| |---|---|---| ||Parameters.TIMESTAMP_KEY|The name of the attribute to use as timestamp in the playout log.| @@ -145,6 +146,8 @@ if __name__ == "__main__": During the replay operation, some debug messages are written to the screen. The main outputs of the simulation process are: + + |simulated_log|The traces that have been simulated during the simulation.| |---|---| |res|The result of the simulation (Python dictionary).| @@ -157,18 +160,15 @@ res Inspect outputs -|places_interval_trees|an interval tree for each place, that hosts an interval for each time when it was -“full” according to the specified maximum amount of tokens per place.| + + +|places_interval_trees|an interval tree for each place, that hosts an interval for each time when it was “full” according to the specified maximum amount of tokens per place.| |---|---| -|transitions_interval_trees|an interval tree for each transition, that contains all the time intervals in which -the transition was enabled but not yet fired (so, the time between a transition was -fully enabled and the consumption of the tokens from the input places)| +|transitions_interval_trees|an interval tree for each transition, that contains all the time intervals in which the transition was enabled but not yet fired (so, the time between a transition was fully enabled and the consumption of the tokens from the input places)| |cases_ex_time|a list containing the throughput times for all the cases of the log| |median_cases_ex_time|the median throughput time of the cases in the simulated log| -|input_case_arrival_ratio|the case arrival ratio that was provided by the user, or automatically calculated -from the event log.| -|total_cases_time|the difference between the last timestamp of the log, and the first timestamp of the -simulated log.| +|input_case_arrival_ratio|the case arrival ratio that was provided by the user, or automatically calculated from the event log.| +|total_cases_time|the difference between the last timestamp of the log, and the first timestamp of the simulated log.| @@ -253,32 +253,21 @@ petri_semaph_fifo Inspect parameters -|Variants.PETRI_SEMAPH_FIFO|Parameters.PARAM_NUM_SIMULATIONS|Number of simulations that are performed (the goal is to have such number of traces -in the model)| + + +|Variants.PETRI_SEMAPH_FIFO|Parameters.PARAM_NUM_SIMULATIONS|Number of simulations that are performed (the goal is to have such number of traces in the model)| |---|---|---| ||Parameters.PARAM_CASE_ARRIVAL_RATIO|The case arrival ratio that is specified by the user.| ||Parameters.PARAM_MAP_RESOURCES_PER_PLACE|A map containing for each place of the Petri net the maximum amount of tokens| -||Parameters.PARAM_DEFAULT_NUM_RESOURCES_PER_PLACE|If the map of resources per place is not specified, then use the specified maximum -number of resources per place.| +||Parameters.PARAM_DEFAULT_NUM_RESOURCES_PER_PLACE|If the map of resources per place is not specified, then use the specified maximum number of resources per place.| ||Parameters.PARAM_MAX_THREAD_EXECUTION_TIME|Specifies the maximum execution time of the simulation (for example, 60 seconds).| -||Parameters.PARAM_SMALL_SCALE_FACTOR|Specifies the ratio between the “real” time scale and the simulation time scale. A -higher ratio means that the simulation goes faster but is in general less accurate. -A lower ratio means that the simulation goes slower and is in general more accurate -(in providing detailed diagnostics). The default choice is 864000 seconds (10 days). -So that means that a second in the simulation is corresponding to 10 days of real -log.| -||Parameters.PARAM_ENABLE_DIAGNOSTICS|Enables the printing of the simulation diagnostics through the usage of the -“logging” class of Python| +||Parameters.PARAM_SMALL_SCALE_FACTOR|Specifies the ratio between the “real” time scale and the simulation time scale. A higher ratio means that the simulation goes faster but is in general less accurate. A lower ratio means that the simulation goes slower and is in general more accurate (in providing detailed diagnostics). The default choice is 864000 seconds (10 days). So that means that a second in the simulation is corresponding to 10 days of real log.| +||Parameters.PARAM_ENABLE_DIAGNOSTICS|Enables the printing of the simulation diagnostics through the usage of the “logging” class of Python| ||Parameters.ACTIVITY_KEY|The attribute of the log that should be used as activity| ||Parameters.TIMESTAMP_KEY|The attribute of the log that should be used as timestamp| -||Parameters.TOKEN_REPLAY_VARIANT|The variant of the token-based replay to use: token_replay, -the classic variant, that cannot handle duplicate transitions; -backwards, the backwards token-based replay, that is slower but can handle -invisible transitions.| -||Parameters.PARAM_FORCE_DISTRIBUTION|If specified, the distribution that is forced for the transitions (normal, -exponential)| -||Parameters.PARAM_DIAGN_INTERVAL|The time interval in which diagnostics should be printed (for example, diagnostics -should be printed every 10 seconds).| +||Parameters.TOKEN_REPLAY_VARIANT|The variant of the token-based replay to use: token_replay, the classic variant, that cannot handle duplicate transitions; backwards, the backwards token-based replay, that is slower but can handle invisible transitions.| +||Parameters.PARAM_FORCE_DISTRIBUTION|If specified, the distribution that is forced for the transitions (normal, exponential)| +||Parameters.PARAM_DIAGN_INTERVAL|The time interval in which diagnostics should be printed (for example, diagnostics should be printed every 10 seconds).| @@ -320,6 +309,8 @@ The list of parameters are: Inspect parameters + + |MAX_LIMIT_NUM_TRACES|Maximum number of traces that are returned by the algorithm.| |---|---| |MAX_TRACE_LENGTH|Maximum length of a trace that is output of the algorithm.| diff --git a/docs/15_streaming_process_mining.md b/docs/15_streaming_process_mining.md index 58d40855e..66bfe3436 100644 --- a/docs/15_streaming_process_mining.md +++ b/docs/15_streaming_process_mining.md @@ -472,6 +472,8 @@ if __name__ == "__main__": ``` + + |Parameter Key|Type|Default|Description| |---|---|---|---| |Parameters.CASE_ID_KEY|string|case:concept:name|The attribute to use as case ID.|