Main Article Content
Elasticsearch is most popular search engine which is based on Apache Lucene. Many advantages are identified with the Elasticsearch. For every inserted document or record the Elasticsearch’s auto-generated id values are created. But this may leads to increasing the duplicate values. To overcome this various duplicate methods are introduced by various researchers. Indexing is very important for the elastic search removing duplicates in elastic is based on indexing. For this Lucene Index and Translog are used. This can be used for all types of data in Elastic search. Many researchers working on removing duplicates and shards from the data. But still there is lot of corrupted shards is present in output. To overcome this, A Two way Refining Algorithm (TWRA) is introduced to remove the extra corrupted shards for extra refinement of data. The TWRA consists of two refinements of data such as Advanced Advanced Data Cleaning and Advanced Data Filtering Algorithm. Experimental results show the performance of the proposed methodology.
TURCOMAT publishes articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This licensing allows for any use of the work, provided the original author(s) and source are credited, thereby facilitating the free exchange and use of research for the advancement of knowledge.
Detailed Licensing Terms
Attribution (BY): Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses them or their use.
No Additional Restrictions: Users may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.