Time spent on data fetch and rule execution in scoring plan

Hi all.

We are running DQC 10.4 and having some issues with execution times, for which we would like to determine is it due to the:

  1. big source data volume,
  2. number (or the complexity) of the rules within some scoring plan, or
  3. complex joins with other tables/data sources also within some scoring plan.

We have the final duration of some scoring plan, as a part of some work flow, but there are no details that we need.
After reviewing and testing different types of logging within DQC execution, and via web console (admin center), we didn’t find quite what we were looking for.

So, the main question is - is there a way to get the time spent on the data read/fetch, and rules execution, separately?

Thanks,
Aleks

Hello Aleks,

When you run a workflow it generates logs with the information about the result of the workflow execution and behavior of its tasks. That information is either logged in the .wis file that is created with the workflow execution or it can be written to a database. The .wis file will contain information on when each task started, when it finished and its state of completion. Furthermore, each workflow creates a wfinst folder with subfolders with logs of individual tasks and temporary script. Each task creates its own task.log. You can find more information in the pages Logging Configuration and Executing Workflows.

All of the options you listed can slow down the execution time. Also, different joins have different impact on processing time depending on their complexity. If you would like to monitor the execution of the plan in real time I suggest clicking on the “show progress” button in the status panel of the IDE. This will show how records are being processed, so that you can see which task is taking more time than expected. You can find information about it in the Building a plan page of the DQC developer guide in our documentation.

Performance issues are complex and they may be due to a number of reasons and you should experiment with different tasks and configurations to see what is causing the slowdowns.

If you have issues finding any of the documentation pages, please raise a support ticket through the Support Helpdesk and we will help you.

Kind Regards,
Maksim

Hello Maksim.

Thanks for the feedback. I would comment on the parts of your answer.

We are not using database as log data storage, so we did examine wis files even prior to this question. There we have, as you described, duration of the workflow execution, in total, as well as for separate steps - scoring plans.
I.e we have scoring_XXX as plan executed and in wis file written:

	<workflowTaskPersistModelBean stopped="2020-04-27 14:41:37" started="2020-04-27 10:51:57" id="scoring_XXX" state="FINISHED_OK">
		<compensationStates/>
	</workflowTaskPersistModelBean>

And that’s it - here no more useful info, in regards to my question.

wfinst_ folder, and scoring plan XXX subfolder does have task.log file, but again there are no useful info in regards to my question. In task.log you can find list of parsed steps from the plan and they have just one date/time info for all, in this case, I.e. 2020-04-27 10:51:58, INFO and DEBUG statements for processed steps, and at the end 2020-04-27 14:41:37 INFO: Finished! (Time spent: 13778 s)

I can analyse the execution within IDE while executing, via Console and details there, but this is not so adequate, since there is no proper written trail, and no way for comparison of multiple data read and rules execution. Real time info gathering is rather not an option for us. If there is a way to write these data during the execution in log file, that would be OK, we can (batch) parse those files later on, for comparison.

That’s why we are trying to narrow down the number of reasons for execution slowdowns, initially with this question, to see if the issue if with the data source read,
or complexity of the rules.

I would appreciate any further comments or ideas from your side, but I thank you anyway for the effort and support thus far.

BR,
Aleks

Hello Aleksandar,

I would suggest the following methods to debug your workflow and components in order to get a better picture of where the performance issue may lie:

  • Break down the sequential components into smaller chunks and test each one with your given input.
  • Instead of running a single Workflow, which will yield a single Workflow log, seperate each sequence of components into a separate Workflow and connect them with a Synchronize task. This way, you will get a .wis file which will contain details on the Workflow states of individual tasks, along with the complete execution.
  • Try increasing the memory allocated to the IDE by increasing the -Xmx parameter in the dqc.ini file.
  • Testing your joins could also influence the performance of your Workflow.

Overall, I would recommend to break down the entire task into smaller bits which will give you additional information on every single component and its performance.

In case the performance issues are persistent on your end, you can open a Support Ticket, and if you are interested, we could involve our professional services team to look into improving the performance of your Workflow.

Best regards,
Aleksandar Aleksov