Towards Integrating Six Sigma Approach: Service Level Agreement Measurement and Monitoring (A Malaysian It Outsourcing Case Study)

: Service Level Agreements (SLAs) are critical for outsourcing and technological related vendors companies. SLAs are the key requirements for outsourcing implementation and deployment as well as are the key differentiator in the service provider‟s offerings industry. Over time, SLAs drive behaviors delivering a minimum level of service to the provider resultant in limited room in innovation and improvement in SLA arrangement and SLA contract renewal. As such, outsourcing companies may deem become a commodity, lacking strategic, innovative and value-added partner to the business. In long run, outsourcing companies face challenges with rapidly changing business requirements that drives business agility to stay competitive. The paper attempts to investigate a vendor IT Service Company accountable for continuous measurement and reporting of SLA activities where agreed-on service levels face challenges. This paper aims to evaluate the practical applicability of Six Sigma approach (i.e. DMAIC (Define, Measure, Analyze, Improve and Control)) as a result of root-cause-analysis; where everyone working on the problem(s) stays focus, drive towards a root cause analysis and eventually address the problem directly. The result of this case study revealed how Six Sigma approach has successfully improve the SLA achievements by identifying the significant factors contributing remedies or penalties towards SLA measurements.


Introduction
A Service Level Agreement (SLA) is an integral part of an IT vendor contract that defines the level of contracted service(s) laying out respective agreed-upon metrics expected by customers. SLAs pull together information such as metrics; responsibilities and expectations to ensure both parties have the same understanding of requirements (Singh and Ajmer, 2018). In the event of problems or issues with the services; neither party can plead ignorance, nor deliberately or inadvertently misinterpret the remedies or penalties. The SLA is to protect the parties" interest of the vendor-supplier-customer relationship (Russo, 2018).
In view from the business perspective, SLAs should be aligned to engage business objectives; any noncompliance of agreed-on service levels would bring negative impact on quality service deliverables, customer experiences, and customer engagement; as well as fostering customer loyalty. Although majority of SLAs are annually contractual basis, it is advisable to review and benchmark metrics measurement ensuring the best practices and performance standards are complies and adhere. In addition, SLA includes the management and services elements. Service elements include the conditions of service available, type of service provided, responsibilities of each stakeholders, escalation procedures etc. Management elements comprises of definitions of measurement standards, methods, processes, contents and frequency, dispute resolution process, indemnification clauses etc. resulting from service level breaches. As such, a systematic quality control (QC) approach towards metrics measurement is critical in SLAs monitoring, tracking, improving and controlling. In this paper, an in-depth analysis of existing SLA activities within the business process flow was carried out to root-cause the main problems and issues of the case company in meeting its SLA by adopting and implementing the Six Sigma DMAIC approach.

Research Methodology and Six Sigma Quality Improvement Methodology (QIM)
This section describes the case study, the adopted research approaches, data collection and the development and implementation of Six Sigma approach for the case company.

Research Methodology
This paper adopted case study research method, which is capable of comprehensive in-depth understanding of a diverse range of issues and problems across the outsourcing disciplines in the IT service industry (Creswell, 2018) (Yin, 2017). This case study focuses on the Break-fix services. During the preliminary analysis phase, a series of interviews, observations and on-the-job trainings were conducted to gather initial understanding of the operational process and wok flow of the case study company. The collective analysis outcomes gave a broad overview of how the related vendor-client activities are meeting agreeable SLAs for all aspects of outsource services. This further complicates the situation when one of the projects was diagnosed and required urgent attention as a result of deteriorating in customer satisfaction and SLAs non-compliance.

Six Sigma
Six Sigma is a disciplined and data-driven practices methodology, which refers to a set of tools and techniques that help organizations maximize performance and reduce defects (i.e. product, process, service) and ultimately lead to reduction of variability in business processes (Wong, Yu and Chean, 2019), (Hussain et al., 2017), (Baharudin et al., 2018). The implementation of measurement-based strategy is accomplished with DMAIC methodology for this case study. The driver for successful Six Sigma implementation is the DMAIC improvement cycle and it is use to rectify under-performing processes (Wong and Yu, 2020), (Jauhar et al., 2015), (Pande, Neuman and Cavanagh, 2000).
This case study project is selected as the process improvement opportunity providing the greatest impact and most manageable effort while still aligning with case company"s strategy of compliance and adherence to global practices and processes set by the global service providers. Following are the five-steps methods of DMAIC methodology (Team, 2019):  Define the problem and building the process knowledge. A high-level process map of the Break-fix process, the Supplier-Input-Process-Output-Customer (SIPOC) diagram, identification of Critical-To-Quality (CTQ) and Voice-Of-Customer (VoC) were conducted.  Measure and quantify the problem(s). Measurements provide key indicators of process health and clues of project"s state-of-art. As the team collects data; the team focus on the quality of SLAs receiving from the services. The current performance baseline of the process(es) are measured and benchmarked against the baseline  Analyse by identifying the cause of the problem. The biggest challenge is the case company"s resistance jumping to solution prior understanding the true root-cause(s) of SLAs compliance.  Improve by solving the root cause and verify improvement. It is crucial for the team to implement solution plans and deploy actions to resolve the root-cause(s). This is when the team refines their metrics countermeasure ideas; implement solutions and collects data confirming measurable improvements take place.  Control by maintaining the gains and pursue perfection. The team develops Monitoring Plan to track the success of the updated and/or improvised process.

Case Company Background
The case company is one of the leading information and communication technology (ICT) outsourcing company in Malaysia. The case company focuses on IT technical onsite support, solution implementation and deployment, system maintenance, helpdesk and software support, managed services, IT outsourcing etc. It provides a 24/7 nationwide support in all major towns in Malaysia with more than 100 technological professionals and specialists. The core revenue stream for the case company derives from the category of Break-Fix. Break-fix, also known as Break/Fix; plays an important role in Malaysia"s IT business support industry. It is a type of revenue model for small-to-medium sized businesses (SMEs) where Break-fix vendors send professional IT technicians and/or engineers to the location or site to analyse and determine system issues and provide on-premise remedies (Wang, Li and Wang, 2018). The nature of Break-fix for the case company is dealing with how quickly the technical teams reacting to technical support crisis (i.e. software malfunction or hardware break) within the agreeable SLAs (i.e. committed to comply with 90% SLA for most outsource services). There are two types of break-fixes category for the case company with respective SLA:

Six Sigma DMAIC -Define Phase
The first step to kick starts any Six Sigma project is to decide whether the issue(s) under the case company"s consideration is beneficial for the application of tools used in this six sigma project. The Supplier, Input, Process, Output and Customer (SIPOC) diagram provides a high-level overview of process mapping describing the tasks and activities putting the break-fix outsourcing services to accomplish the business goals. SIPOC helps to identify problems and isolate break-fix areas that add little value to case company (Gueorguiev, 2018 Figure 1 outlines the SIPOC diagram of cross-functional break-fix. Next step is to analyze the expectations about the process with regard to SLA. The identification of Critical-To-Quality (CTQs) for the break-fix services skewed towards SLA conformance and compliance. This case study is adopting Six Sigma DMAIC approach for case company"s break-fix outsourcing services.
The main objective of this phase is to focus on defining the business goals; specifying the problems and identifying the CTQs of the break-fix services to help measure the impact of the problems (i.e. SLAs noncompliance) has on the customer or towards case company business goodwill. The ultimate business goals of the case company are to comply with 95% SLA achievement so that any potential issues or problems can be notified and escalated before they become expensive and hinder upcoming associated contract renewal process. Therefore, it is essential to ensure the relationship of customer-client-vendor is managed effectively and no one left vulnerable and uninsured. As a result, Table 1 outlines the CTQs for the break-fix services: In nutshell, the opportunities of six sigma project is identified and justified as below:  Performance gaps exist between planned and actual performance  The root-cause of the challenges are not clearly identified and understood  The team has no solutions to challenges faced

Six Sigma DMAIC -Measure Phase
This phase is to develop a data collection plan for the break-fix services; ranging from different sources to determine the right measures, types of defects and metrics. At the end of the phase is to have relevant, reliable and significant data to back up the baseline process prior investing more resources into the analyze phase. The team will focus on eliminating or reducing the computed "variance" as much as possible.
As a result from the qualitative research outcomes (i.e. interviews, on-the-job training and job observation during the period of Apr -June 2013 from the operational and management panel; i.e. Executive Director, Operation Manager, Lead Engineers, Engineers, Helpdesk Support Manager, Helpdesk Support Supervisors and Helpdesk Support Executives) and SIPOC diagram, findings from data collection (Figure 2 and Figure 3) revealed that the break-fix services are underperformed and requires urgent efforts for SLA performance improvement:  59% of total tickets exceeded the 95% SLA for Break-fix services (i.e. four working hours for "Incidents"; three working days for "IMAC") for June 2013.   Practically, the helpdesk will track and monitor total time consumed in the communication/coordination process and staff on-site time until a logged ticket is resolved.

Six Sigma DMAIC -Analyse Phase
Using collected data from the Measure phase, the team will analyse and closely examine the process and determine the root cause(s) of business inefficiency; i.e. break-fix SLA achievement of 59% instead of 95%. Most importantly, there exist a huge performance gap between actual and goal performance. As such, a rootcause identification analysis involving process investigations and process improvements should focus on developing solutions to eliminate root cause(s) occurrences. The data gathered from the Measure phase has been visually displayed and analysed through the use of data analysis and process analysis tools for better interpretation and understanding such as: (1)  Due to confusion and conflict of ticket handling among same pool of resources attending to ALL projects other than the break-fix services with different SLA requirements  Due to tendency of incomplete ticket closure upon tickets are resolved  Due to long awaiting time taken by the helpdesk support assigning tickets to engineer/technicians; i.e. deficiency in the metric of coordination and communication time The outcomes from the preliminary analysis provide a clear indicator that drills down investigation reaching the exact root causes is necessary by exploring all possible causes to the non-compliance of 59% SLA achievement. An in-depth analysis into: (1) Duration span of communication and coordination and (2) Engineers/Technicians utilization to Break-fix services has been carried out to further broken into smaller manageable tasks to ease of root-cause analysis; the second round of findings are tabulated in Table 2 and Table  3. Table 2 outlines the percentage of tickets neglected by helpdesk support in calculating and providing resolution time and respond SLA time to the engineers/technicians. In the month of April 2013, none of the tickets are attempted with resolution SLA and respond SLA. Despite a high percentage of tickets captured without SLA expiry dates; many tickets are detected with wrong computed SLA expiry dates (i.e. 42% and 99.6% respectively for May and June 2013).

Table 2. Calculation of Break-Fix (Incidents)
Due to case company"s business nature, all projects under the case company are sharing the same pool of resources. This way of resources allocation that spread the same pool of resources covering nearby regions is deemed found to be ineffective and inefficient. Therefore, a more stringent analysis on "engineer/technician utilization for KL (Kuala Lumpur) region" was explored and tabulated in Table 3. In the process of developing improvements to break-fix services, it was noticed that regardless of number of full-time-employee (FTEs) assigned to KL region, the SLA violation percentage are consistent (i.e. 53% and 56% respectively for Apr and May). Hence, an engineer/technician simulation of resource utilization under different scenario has been carried out based on historical data and a number of assumptions. The simulation outcomes demonstrated that assignment of tickets with "same SLA resolution duration" is more efficient and effective as compared to tickets with "different SLA resolution duration". As such, it is important to have "dedicated" staff trained and equipped for single project; with basic and relevant skill-sets in providing break-fix resolutions. This suggestion of improvement of "dedicated" staff was implemented in June 2013 where only five dedicated engineers are assigned to the break-fix services in the KL region. Although the adoption of five dedicated staff in June 2013 serving 269 tickets resultant of 53% of SLAs achievement; this improvement has proven great saving of six staff (i.e. 54% in staff utilization, 11 -5) as compare to May 2013 recorded with 56% of SLAs achievement with utilization of 11 engineers/technician and Apr 2013 recorded with same SLAs achievement as June 2013 but utilization of FTEs is almost threefold. If benchmarking were to make base on June 2013 result, the remaining of the FTEs can be fully utilized and assigned to other projects to strike a balance in maximizing the return-oninvestment (ROI).

Six Sigma DMAIC -Improve Phase
The outcomes of rigorous analyses and data collections have enabled the team to target process(es) by designing creative and innovative solution to address and prevent the longstanding process problems. Table 4 compiled a listing of action plans addressing the root-causes as well as implementing the solution to meets the agreed SLAs achievement. The suggestion for improvements focused on two broad categories of: (1) Assignment of dedicated engineer/technician and (2) Operational management activities of ticket assignment (i.e. managing, monitoring and tracking of the SLAs achievement):  Adopting Dedicated Engineer/Technician to KL region - Table 3 showed the tabulation of healthy sign in resource utilization when five or six dedicated pool of resources are assigned to service break-fix services only. This suggestion solution was tested in June 2013 where resources utilization reached an optimum peak of 95%-100% when the project is assigned to six dedicated engineers/technicians attending maximum number of tickets per day (i.e. 2.5 to 3.0 hour per dedicated resource). This solution is deemed effective and financially viable.  It is important for the engineer/technician to perform on time ticket closure prior attending next ticket assignment. The resolution timing starts ticking when the ticket is logged and ends when the case company updates tickets" status to "completed". However, this ticket completion milestone has been conveniently neglected by most engineers/technicians. The main challenge is the internet connection disrupting respective engineer/technician from real-time ticket closure; leaving no choice to seek assistance from the helpdesk support executive.  All tickets closure required uploading scanned job sheet to vendor"s ticket management system. With the new incentive rewarding initiative, engineer/technician with most tickets resolved within SLAs or least SLAs violation will be awarded. Ultimately, this new scheme is to motivate and cultivate healthy work culture of ticket closure upon ticket resolution.  The root-cause of non-computation of SLAs expiry time is cause by the complexity of break-fix official operational hours 9 am to 5 pm, Monday to Friday only. Since the variables of day and SLAs achievements are fixed for "Incidents" and "IMAC", the expiry time chart for break-fix is tabulated and calculated as reference to both support and technical teams. This guided chart aims to shorten manual computation as well as increase expiry time accuracy and reliability.

Six Sigma DMAIC -Control Phase
The core activities at the Control phase is to ensure the suggestion of solutions in Table 4 is carry out and the required performance process (es) or solution (s) is being managed, verified, validated, monitored and tracked. Most importantly, the post-implementation results are to be evaluated and benchmarked for SLAs achievement tracking ensuring progress is ascertained. Table 5  before handover the responsibilities of process monitoring and implementation to the case company. In addition, a project review has been carried out in the year of 2016 and 2019. The team received positive feedback for the prediction chart where dedicated FTEs is capable of handling higher ticket volume and better FTE utilization. Using Jan 2014 as the post implementation result and compare against May 2013, a significant reduced of SLAs violation (from 53% to 36%) were recoded and FTE utilization is 73%. This action list contributed a cost saving of four FTEs (i.e. 36%) on operation expenses (OpeX); with fewer dedicated engineers and lesser SLAs violations (i.e. 36%).
The responsibilities of process monitoring and tracking expanded to year 2016 and 2019. Since the year 2014, the post implementation of six sigma is transparent as all the improvements and controls are blended into day-to-day project activities and milestones. In the year 2016, break-fix services recorded an acceptable SLA achievement with some findings are readily unavailable due to changes of policies, standards and SLA contents. Most importantly, the effects of Six Sigma implementation in terms of practices and metrics defined are still valid in day-to-day routine tasks. As of 2019, it was observed changes in project initiator, project leader and team members. As a result of lacking controls enforcements in operational routines and activities, it has triggered the team"s concern that six-sigma project is a form of journey of circular chain of continuous improvement; i.e. a journey for improvement that does not end when the company reaching its milestone of SLAs achievement improvement since 2014. Along the way, break-fix services experiences transformation and operational excellence. As such, the project team will analyse the break-fix project especially when transition happen in early 2019 if there is a need to investigate into new practices, controls and standard operating procedure. Based on the preliminary data collected from 2019, a new round of DMAIC approach should take place in addressing the root-cause analysis.

Conclusion
Six Sigma DMAIC is a systematic, objective and fact-based approach of problem solving that makes process improvement accessible and learnable even for beginners" level. Other than placing emphasis on the deliverables, DMAIC concentrates on the process that created the deliverables. This leads to more effective and permanent solutions in long run. Most importantly, Six Sigma DMAIC encourages backtracking to previous steps if more information is required. In this break-fix project which demonstrated a continuous improvement since 2013 into 2014, 2016 and 2019, no one would expect the root-cause of SLAs non-achievement is related to dedicated pool of resources; rather majority of the project managers will request for more resources whenever encountered increases in SLAs violation. This project has proven the practicality of Six Sigma DMAIC approach in the break-fix service industry especially in identifying significant key factors for SLAs violation. Nevertheless, DMAIC is the best method to discover best practices, as it is a data-driven, structured problemsolving framework. In long run, this proactive quality control methodology provides recommendations for breakfix problems in reduction of operational cost, improvement in efficiency and timeline, compliance to standards and policy and improvement to customer satisfaction index. Most importantly, Six Sigma assists the management in developing empowered potential employees in identifying and driving improvement forces and ideas as a mean of continuous improvement.

Acknowledgment
and supervisors of the case company who has participated contributed your valuable views and suggestions in the data gathering process.