Main Article Content
A high utility item set mining system using genetic algorithm has been proposed in this article. The proposed system mines high utility item sets by parsing the database only once. The high utility item set mining problem has been reformulated as a constrained optimization problem. Genetic algorithm has been used to mine the high utility item sets. The entities such as gene, chromosome, population and fitness function, required to apply genetic algorithm for mining high utility item sets has been defined. In the proposed system, due to the requirements imposed by the high utility item set mining problem, a non binary representation of the genetic algorithm has been proposed. The operators of genetic algorithm such as cross over and selection operation have been customized to efficiently operate in the mining problem. To reduce the explosion of candidate generation, an upper bound based on the sum of the item set utility and remaining utility is used.
The item set utility is compared with the threshold to select the high utility item sets. The remaining utility is used to select prospective candidates for superset generation. The algorithm starts with an initial population comprising of single item set. The fitness function computes the sum of the item set utility and the remaining utility and selects the prospective item sets for breeding. The children are generated by merging two prospective item sets. The selection and breeding process are repeated until there are no more chromosomes in the population for further breeding. The proposed system was able to identify high utility item sets in a single scan of the database. The genetic algorithm was able to converge from an initial population of item sets to high utility item sets. Extensive testing of the proposed system showed the optimal potential of the system verified to other similar state of the art system.