Mining of High Utility Item sets using Genetic Algorithm
Main Article Content
Abstract
A high utility item set mining system using genetic algorithm has been proposed in this article. The proposed system mines high utility item sets by parsing the database only once. The high utility item set mining problem has been reformulated as a constrained optimization problem. Genetic algorithm has been used to mine the high utility item sets. The entities such as gene, chromosome, population and fitness function, required to apply genetic algorithm for mining high utility item sets has been defined. In the proposed system, due to the requirements imposed by the high utility item set mining problem, a non binary representation of the genetic algorithm has been proposed. The operators of genetic algorithm such as cross over and selection operation have been customized to efficiently operate in the mining problem. To reduce the explosion of candidate generation, an upper bound based on the sum of the item set utility and remaining utility is used.
The item set utility is compared with the threshold to select the high utility item sets. The remaining utility is used to select prospective candidates for superset generation. The algorithm starts with an initial population comprising of single item set. The fitness function computes the sum of the item set utility and the remaining utility and selects the prospective item sets for breeding. The children are generated by merging two prospective item sets. The selection and breeding process are repeated until there are no more chromosomes in the population for further breeding. The proposed system was able to identify high utility item sets in a single scan of the database. The genetic algorithm was able to converge from an initial population of item sets to high utility item sets. Extensive testing of the proposed system showed the optimal potential of the system verified to other similar state of the art system.
Downloads
Metrics
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Licensing
TURCOMAT publishes articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This licensing allows for any use of the work, provided the original author(s) and source are credited, thereby facilitating the free exchange and use of research for the advancement of knowledge.
Detailed Licensing Terms
Attribution (BY): Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses them or their use.
No Additional Restrictions: Users may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.