Introduction

Hello, I am katuo, the Engineering Leader of the Cloud Attendance team. Before diving into the main topic of the article, I would like to briefly introduce our product, Cloud Attendance. Cloud Attendance provides a function to manage attendance data, such as time clocking, on the cloud. Since its release about five years ago in 2019, it has added many new features and grown into a large-scale service. However, with the increase in features and the number of users, there has been a significant increase in the number of use cases for the core function of attendance, the aggregation logic, which must accommodate countless input patterns and setting values. This has become a major challenge in our ongoing development efforts to ensure that we correctly calculate the expected values that users demand and execute quality assurance effectively.

To address these quality-related challenges, we embarked on a project to re-evaluate the quality of our cloud attendance management system’s aggregation logic by executing approximately 7,500 test cases. To efficiently conduct this vast number of tests, we automated the combination testing at the integration test level. In this article, we aim to share with QA engineers and software engineers facing similar challenges in quality assurance for products with numerous use cases and complex logic, the specific challenges we encountered, the methodologies we adopted for running large-scale test cases, and the insights we gained throughout this process.

About the Aggregation Feature

Our Cloud Attendance system offers an aggregation feature that processes various settings and user-entered attendance data (such as clock-ins, clock-outs, and leave requests) to produce aggregation results.

The parameters required to calculate the aggregate results can be categorized primarily as follows.

The possible combinations of Attendance Data and Various Setting Data are endless, leading to a multitude of potential scenarios.

Issues

Due to the nature of our product, we often received queries from users questioning the accuracy of the results, especially soon after launching new aggregation features. We faced several challenges:

Members of the company had vague concerns about the quality of the product because they did not know what percentage of the use cases they would have to support would be defective.
It was challenging to ensure that existing aggregation logic remained intact when adding new features, leading to high regression testing costs and compromised test quality.
After the product’s release, changes in team composition meant that fewer members systematically understood the aggregation logic.

To address these issues, we initiated a project to fix the reported bugs, support the behavior defined in the aggregation logic, and conduct extensive testing.

Testing Tactics

Creating Test Cases

The QA team identified factors and levels from the parameters, using methods like pairwise testing to create approximately 7,500 test cases. We are deeply grateful to the QA team for their efforts in creating these numerous test cases. However, initially, due to the sheer number of cases, there were many errors in the expected outcomes and test data, which necessitated a process of data review and case correction discussed later in this document.

Test Execution Method

The engineering team considered three approaches to execute the test cases provided by QA:

Manually execute the test cases through the user interface.
Integrate the test cases into existing unit tests.
Develop a tool to automate the execution of the test cases.

Ultimately, we chose to Develop a tool to automate the test case execution. The main reasons for this decision were:

Ease of Re-execution

Automated tests can be easily re-run. Given the time required to prepare data and verify results per case, manually executing all 7,500 cases was impractical.

Consistency of Test Results

Manual testing often leads to inaccuracies in comparing expected and actual outcomes due to human error, which automation eliminates.

Facilitation of Regression Testing

Automation makes it easier to ensure that new code does not adversely affect existing functions.

Long-term Cost Reduction

Although setting up automation involves upfront costs, it reduces the expenses of manual testing over time, especially for repeated tests.

Complexity in Maintaining Unit Tests

Our aggregation logic consists of multiple classes and modules. Adopting the second approach would complicate unit test design, as parent classes would end up testing functionalities that should be independently verified in child classes, leading to overlapping and unclear test scopes.

On the other hand, we also considered the following potential drawbacks, but ultimately decided to proceed with the development of the test tool:

The test tool itself may contain bugs, which could lead to the risk of producing incorrect results.
The development of the test tool could be time-consuming, potentially impacting the project timeline.
Maintenance of the test tool would also require time, potentially increasing operational costs.

Development of the Automated Testing Tool

Requirements for the Automated Testing Tool

To ensure that the tests could be executed accurately and efficiently, we established the following requirements:

Item	Reason
Execute all test cases	All patterns decided to be operationally guaranteed must be tested
Selective execution option	To prevent automated tests from running continuously and prolonging CI times during PR reviews and release processes
Input and expected values read from CSV files	To facilitate the use of spreadsheets for test case creation, which are then converted to CSV for use
Default values for unspecified items in test data	To clarify which factors were intentionally altered during test data creation
High-speed local execution	To allow rapid execution and modification of test cases
Group-specific test execution	To enable efficient management and execution of related test cases by employment rule groups

Configuration of the Automated Testing Tool

The structure of the automated testing tool is straightforward. It loads data from CSV files using Rspec and executes tests against

the calculation classes located at the root hierarchy of the aggregation logic class groups. The results, including the test data, expected and actual values, are output to CSV files.

By directly utilizing the mechanisms of RSpec, we were able to complete the development of the test tool itself within the originally anticipated timeframe of about three weeks.

CSV File Structure

Test cases are organized into directories classified by employment rule, enabling the efficient execution of related test cases by specifying the employment rule during tool execution. The directory structure is as follows:

.
├── basic_employment_rule
│   ├── hoge.csv
│   ├── ...
│   └── foo.csv
├── discretionary_labor_employment_rule
├── flextime_employment_rule
├── one_month_modified_working_hours_employment_rule
├── one_year_modified_working_hours_employment_rule_1
├── one_year_modified_working_hours_employment_rule_2
├── one_year_modified_working_hours_employment_rule_3
└── supervision_employment_rule

The total number of CSV files for all test cases for each employment rule is 116. When executing the test tool, setting an environment variable like EMPLOYMENT_RULES are basic_employment_rule, shift_employment_rule allows the execution of test cases related to the specified employment rules. This focus ensures efficient debugging and test execution.

The results of the test cases are automatically placed in directories created based on the combination of the employment rule and the execution date, for example:

test_tool/result/basic_employment_rule/20230401_1/hoge.f

Improve Local Execution Time

To execute and debug the 7,500 test cases efficiently, increasing the execution speed in the local environment was crucial. We implemented the following in the testing tool:

Create a test database for each employment rule being tested.
Use Parallel to execute tests concurrently by employment rule.

For instance, if the employment rules basic_employment_rule, discretionary_labor_employment_rule, and flextime_employment_rule are specified for testing, Parallel creates separate process and test databases and allows concurrent execution of tests, as shown in the first diagram below. When a employment rule is specified and executed, several test files under that employment rule are executed in parallel, as shown in the second diagram below.

As a result, we were able to reduce execution time by approximately 75%~80% compared to the previous system.

Reviewing Test Data

Need for Reviewing Test Data

As mentioned at the beginning of this article, handling 7,500 test cases included errors in test data and expected values, as well as mistakes in the implementation of the testing tool. Therefore, it was necessary for the engineering team to scrutinize the test cases to enhance the quality assurance accuracy.

Process for Reviewing Test Data

Given the large number of test cases and the involvement of multiple team members, it was crucial to define a clear process for reviewing test data to prevent disorganized work and inefficiency. We also needed a system to track completion to avoid significant increases in labor costs. To resolve these issues, we defined the following flow and ensured that all team members adhered to this process:

ID	Process Name	Details
1	Generate CSV Files	Extract test execution data from the managed spreadsheet and save it as a CSV file.
2	Download CSV Files	Download test cases converted to CSV files locally for test execution with local test tools
3	Execute Test Cases	Run tests using the saved CSV files and output the results to CSV files.
4	Upload Test Results	Upload the test results into the spreadsheet to track overall progress.
5	Check and Fix Test Cases	Correct identified issues in the spreadsheet and create bug tickets as needed.
6	Re-Generate CSV Files	Save the corrected test cases again as CSV files.
7	Re-Download CSV Files	Download the test cases converted to CSV files locally in order to run the test again with the local test tool.
8	Re-Execute Test Cases	Re-run the tests with the corrected CSV files to verify improvements.
9	Push Csv Files	Once the tests show zero failures or all bugs are identified, push the final CSV files to Github.

By executing this process simultaneously by multiple team members for each employment rule file, we systematically aggregated the completed CSV files into a Github repository. This approach allowed us to efficiently and systematically proceed with reviewing the test data.

Efficiency Improvements in the Review Process

To enhance the efficiency of the test data review process, we implemented several innovations.

Bulk Generation of CSV Files Using Google App Script

We developed a Google App Script that converts data on a spreadsheet into a CSV file format suitable for test execution with a single command execution, reducing the preparation time for test data. The following code was written to streamline the process.

function exportSheetsAsCSV() {
  // Get the active spreadsheet
  var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  // Get all sheets in the spreadsheet
  var sheets = spreadsheet.getSheets();
  // Specify the destination folder by name
  var folderName = "Your_Folder_Name"; // Set the name of the destination folder appropriately
  var folder = DriveApp.getFoldersByName(folderName).next();
  // Process each sheet
  sheets.forEach(function(sheet) {
    // Get the data from the sheet
    var data = sheet.getDataRange().getValues();
    // Convert the data to a CSV format string
    var csv = convertToCSV(data);
    // Create a CSV file
    var csvFileName = sheet.getName() + ".csv";
    // Save the CSV file to the specified folder
    var csvFile = folder.createFile(csvFileName, csv, MimeType.CSV);
    // Download the file
    var downloadUrl = csvFile.getDownloadUrl();
    Logger.log("CSV file created: " + downloadUrl);
  });
}
// Function to convert cell values to a CSV format string, enclosing them in double quotes
function convertToCSV(data) {
  var csv = "";
  data.forEach(function(row) {
    var csvRow = row.map(function(cell) {
      return '"' + String(cell) + '"';
    }).join(",");
    csv += csvRow + "\n";
  });
  return csv;
}

Automatic Conversion of CSV to TSV Using a Browser App

We created a simple local browser application using HTML + JS that converts comma-separated CSV files into tab-separated TSV format. This application allows for the direct pasting of the test tool’s output results into the test case spreadsheet, reducing the time needed to update the test data spreadsheet.

Utilization of Spreadsheet Filtering Features

We maximized the use of spreadsheet filtering features to quickly identify fault patterns, allowing us to collectively discover and correct issues, thereby reducing the time spent on corrections.

Project Outcomes

Calculation of Defect Rates
- From the 7,500 cases, we identified cases that constituted defects and calculated the failure rate. This allowed us to dispel concerns about the quality and prevalence of bugs in the aggregation logic with quantifiable metrics.
Creation of a Specification Bug Ledger
- We identified unintended behaviors from the executed test cases and created a ledger for managing specification bugs that need future correction.
Correction of Identified Bugs
- We were able to correct bugs discovered during the test execution within the project’s timeframe.
Development of a Regression Testing Mechanism
- We developed a regression testing mechanism that can be immediately used when modifications to the aggregation logic are needed, thus facilitating future maintenance work.
Enhanced Understanding Among Team Members
- Through activities such as scrutinizing test data, existing team members were able to systematically deepen their understanding of the structure and behavior of the aggregation logic.

Problems Encountered

Issues with the Test Tool
- As initially feared, there were bugs in the test tool, leading to mistakes in setting up the test data. This necessitated re-verification and correction of some test data, resulting in additional costs.
Misunderstanding of Default Values
- When creating test data, we attempted to clarify which values were intentionally changed by utilizing the application’s default values. However, this led to misunderstandings between QA and engineers, causing numerous instances of incorrect expected values being set.
Deficiencies in Test Data Expected Values and the Cost of Corrections
- With the total number of test cases reaching 7,500, there were many deficiencies in expected values, and the cost of corrections turned out to be higher than anticipated.
Lack of Communication Between QA and Engineers
- Insufficient communication between engineers and QA during the creation of test cases meant that QA did not fully understand the engineers’ intentions, leading to increased costs for correcting test data. Furthermore, until a test data correction workflow was established, team members were making corrections in different ways, which resulted in inefficient progress management.

Feuture Challenges

Improving Readability of Default Values
- Although default values clarified the purpose of the tests, there were cases where the behavior differed from the engineers’ understanding because the system’s application model, rather than the testing tool, defined the default values.
- Going forward, we will set default values in the testing tool and apply these during the creation of test data.
Importance of Reviewing Test Data
- Reviewing the test data required significant effort, and this should have been clearly defined and undertaken early in the project.
Reduction of Unnecessary Tests
- From a white-box testing perspective, unnecessary tests were prominent, so we should actively reduce the number of test cases to minimize maintenance costs.
Strengthening Communication
- There was a lack of communication among engineers during the test case creation phase, so we should have established a format for creating spreadsheets that makes test case interpretation, execution, and data review easier before starting.
Counting Test Case Deficiencies
- We should have kept a count of how many test cases had deficiencies to hold objective information on the quality of test cases created by QA members.
Advancing Test Automation
- Due to the low accuracy of test cases, we chose local execution to improve debugging efficiency. Now that the test cases are complete, we aim to incorporate them into CI (Continuous Integration) to make execution easier in the future.

In Conclusion

Although the project spanned approximately 4.5 months and involved team member changes, it was mentally demanding and faced many challenges. Despite these hurdles, we successfully completed testing, bug fixing, and reverse engineering, resolving many of the initial issues. Looking back, I am confident that the project was worthwhile. In the future, when facing similar issues in other products, we intend to use the experience gained from this project to undertake development efforts more effectively. Thank you for reading this article to the end.

Money Forward Developers Blog

Executing Over 7,500 Test Cases to Ensure the Accuracy of Aggregation Logic