Optimizing Enterprise Search Performance Using EHCache-Backed Apache Lucene Indexing for Hybrid Caching Systems

Authors

  • Deepak Singh

Abstract

Enterprise search systems are very important in facilitating rapid retrieval and retrieval of information in large volumes. Nonetheless, with the amount of data and complexity of queries, low query latency and high throughput becomes a major issue. The classical search models with only indexing schemes usually have performance bottlenecks with regard to redundancy in query execution and low reuse of computed responses. The proposed paper is a hybrid caching and optimization search framework that will combine EHCache with the search indexing of Apache Lucene to improve search performance within enterprise applications. The suggested solution makes use of caching in-memories to save queries and search results that are most commonly used and thus eliminates repetitive processes and also accelerates the response times. Through integration of EHCache and efficient indexing and retrieval system of Lucene, the system is able to process queries faster without compromising accuracy and relevance. The framework presents smart cache invalidation methods to make sure that there is consistency in data, such as time-expired, event-based cache invalidation, and part of the cache update based on the index update. These solutions solve one of the main problems of the caching systems a tradeoff between performance progress and the appropriateness of the information. In order to gauge the solution effectiveness, comparative study is done between the cached and non-cached search systems checking benchmark and real world query work loads. Such performance metrics like query latency, throughput, cache hit ratio, and resource utilization in a system are examined. The results of the experiment indicate that the time required to respond to the queries is greatly reduced and the system efficiency is enhanced when the hybrid caching approach is employed. The results indicate a good combination of EHCache with Apache Lucene produces a more efficient solution of scalability in enhancing enterprise search systems. This is not only a way of improving performance but also, reliability and flexibility in dynamic data environments where it is appropriate in the modern applications that demand high speeds of information retrieval.

References

1. McCandless, M., Hatcher, E., & Gospodnetic, O. (2010). Lucene in action (2nd ed.). Manning Publications.

2. Białecki, A., Muir, R., & Ingersoll, G. (2012). Apache Solr. In Proceedings of the ACM SIGIR Conference (pp. 1–2).

3. Shapira, B., Rokach, L., & Freilikhman, S. (2014). Facebook single sign-on for online services. IEEE Internet Computing, 18(2), 38–45.

4. Ehcache. (2022). Ehcache documentation. Retrieved from https://www.ehcache.org

5. Dean, J., & Barroso, L. A. (2013). The tail at scale. Communications of the ACM, 56(2), 74–80.

6. Kleppmann, M. (2017). Designing data-intensive applications. O’Reilly Media.

7. Hennessy, J. L., & Patterson, D. A. (2019). Computer architecture: A quantitative approach (6th ed.). Morgan Kaufmann.

8. Tanenbaum, A. S., & Van Steen, M. (2017). Distributed systems: Principles and paradigms (2nd ed.). Pearson.

9. Aggarwal, C. C. (2015). Data mining: The textbook. Springer.

10. Ghemawat, S., Gobioff, H., & Leung, S. T. (2003). The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles (pp. 29–43).

11. Zaharia, M., Chowdhury, M., Franklin, M. J., et al. (2010). Spark: Cluster computing with working sets. In Proceedings of the USENIX Conference (pp. 1–9).

12. Pugh, W. (1990). Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM, 33(6), 668–676.

13. Podlipnig, S., & Böszörmenyi, L. (2003). A survey of web cache replacement strategies. ACM Computing Surveys, 35(4), 374–398.

14. Cao, P., & Irani, S. (1997). Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium (pp. 193–206).

15. Elasticsearch. (2022). Elasticsearch reference documentation. Retrieved from https://www.elastic.co

Downloads

Published

2022-10-16

How to Cite

Singh, D. (2022). Optimizing Enterprise Search Performance Using EHCache-Backed Apache Lucene Indexing for Hybrid Caching Systems. Australian Journal of Cross-Disciplinary Innovation , 4(4). Retrieved from https://journals.theusinsight.com/index.php/AJCDI/article/view/161

Issue

Section

Articles