Pilot Testing of an Information Extraction (IE) Prototype for Legal Research
DOI:
https://doi.org/10.23962/10539/29192Keywords:
information retrieval (IR), information extraction (IE), natural language processing (NLP), legal cases, document databases, source cases, cases referred to (CRTs)Abstract
This article presents findings from pilot testing of elements of an information extraction (IE) prototype designed to assist legal researchers in engaging with case law databases. The prototype that was piloted seeks to extract, from legal case documents, relevant and accurate information on cases referred to (CRTs) in the source cases. Testing of CRT extraction from 50 source cases resulted in only 38% (n = 19) of the extractions providing an accurate number of CRTs. In respect of the prototype’s extraction of CRT attributes (case title, date, journal, and action), none of the 50 extractions produced fully accurate attribute information. The article outlines the prototype, the pilot testing process, and the test findings, and then concludes with a discussion of where the prototype needs to be improved.
References
Abdelmagid, M., Ahmed, A., & Himmat, M. (2015). Information extraction methods and extraction techniques in the chemical document’s contents: Survey. ARPN Journal of Engineering and Applied Sciences, 10(3), 1068–1073.
Al-Anzi, F. S., & AbuZeina, D. (2018). Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach. Information Processing C Management, 54(1), 105–115. https://doi.org/10.1016/j.ipm.2017.10.003
Aritomo, D., & Watanabe, C. (2019). Achieving efficient similar document search over encrypted data on the cloud. In 2019 IEEE International Conference on Smart Computing (SMARTCOMP) (pp. 1–6). https://doi.org/10.1109/smartcomp.2019.00020
Batra, S., & Tyagi, C. (2012). Comparative analysis of relational and graph databases. International Journal of Soft Computing and Engineering (IJSCE), 2(2), 509–512.
Chopra, A., Prashar, A., & Sain, C. (2013). Natural language processing. International Journal of Technology Enhancements and Emerging Engineering Research, 1(4), 131–134.
Chowdhary, K. (2012). Natural language processing. Jodhpur, India: MBM Engineering College. Retrieved from http://www.krchowdhary.com/me-nlp12/nlp-01.pdf
Conroy, R. (2016). Sample size: A rough guide. Dublin: Royal College of Surgeons in Ireland. http://doi.org/10.1080/08897077.2011.640215
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537. https://doi.org/10.1.1.231.4614
Croft, W. B., Metzler, D., & Strohman, T. (2015). Information retrieval in practice. New York: Pearson.
Firdhous, M. (2010). Automating legal research through data mining. International Journal of Advanced Computer Science and Applications (IJACSA), 1(6), 9–16. https://doi.org/10.14569/IJACSA.2010.010602
Goyvaerts, J., & Levithan, S. (2009). Regular expressions cookbook. Boston: O’Reilly Media.
Iida, R., Inui, K., Takamura, H., & Matsumoto, Y. (2003). Incorporating contextual cues in trainable models for coreference resolution. In Proceedings of the 2003 EACL Workshop on the Computational Treatment of Anaphora (pp. 23–30).
Indurkhya, N., & Damerau, F. J. (Eds.).(2010). Handbook of natural language processing (Vol.2). Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420085938
Kumar, R., & Sharma, S. C. (2018). Information retrieval system: An overview, issues, and challenges. International Journal of Technology Diffusion (IJTD), 9(1), 1–10. https://doi.org/10.4018/IJTD.2018010101
LexisNexis South Africa. (n.d.). All South African law reports. Retrieved from https://store.lexisnexis.co.za/products/all-south-african-law-reports-2020-skuZASKUPG1994
Liddy, E. D. (2001). Natural language processing. In M. A. Drake (Ed.), Encyclopedia of library and information science (2nd ed.). New York: Marcel Dekker. https://doi.org/10.1017/S0267190500001446
Liu, B. (2011). Web data mining: Exploring hyperlinks, contents, and usage data. New York: Springer-Verlag. https://doi.org/10.1007/978-3-642-19460-3
Losee, R. M. (2015). Validating a model predicting retrieval ordering performance with statistically dependent binary features. International Journal of Information Retrieval Research (IJIRR), 5(1), 1–18. https://doi.org/10.4018/ijirr.2015010101
Losee, R. M., Bookstein, A., & Yu, C. T. (1986). Probabilistic models for document retrieval: A comparison of performance on experimental and synthetic databases. In SIGIR ‘86: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 258–264). https://doi.org/10.1145/253168.253222
MongoDB. (n.d.). Introduction to MongoDB: Documents. Retrieved from https://docs.mongodb.com/manual/core/document/
Moniruzzaman, A., & Hossain, S. (2013). NoSQL database: New era of databases for big data analytics – Classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 43–45. https://doi.org/10.1016/S0262-4079(12)63205-9
Mooney, R. J., & Bunescu, R. (2005). Mining knowledge from text using information extraction. ACM SIGKDD Explorations Newsletter, 7(1), 3–10. https://doi.org/10.1145/1089815.1089817
Padayachy, T., Scholtz, B., & Wesson, J. (2018). An information extraction model using a graph database to recommend the most applied case. In Proceedings of the 2018 International Conference on Computing, Electronics and Communications Engineering (ICCECE) (pp. 89–94). doi: 10.1109/iCCECOME.2018.8658659
Pandey, S., Mathur, I., & Joshi, N. (2019). Information retrieval ranking using machine learning techniques. In 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 86–92). https://doi:10.1109/AICAI.2019.8701391
Piskorski, J., & Yangarber, R. (2013). Information extraction: Past, present and future. In T. Poibeau, H. Saggion, J. Piskorski, & R. Yangarber (Eds.), Multi-source, multilingual information extraction and summarization. Berlin: Springer. https://doi.org/10.1007/978-3-642-28569-1_2
Quinlan, J. R. (1986). The effect of noise on concept learning. In R. S. I. Michalski, J. G. Carboneel, & T. M. Mitchell (Eds.), Machine learning. Burlington, MA: Morgan Kaufmann Publishers.
Roshdi, A., & Roohparvar, A. (2015). Review: Information retrieval techniques and applications. International Journal of Computer Networks and Communications Security, 3(9), 373–377.
Roy-Hubara, N., & Sturm, A. (2020). Design methods for the new database era: A systematic literature review. Software C Systems Modeling, 19, 297–312. https://doi.org/10.1007/s10270-019-00739-8
Schofield, A., Magnusson, M., & Mimno, D. (2017). Pulling out the stops: Rethinking stopword removal for topic models. In 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp. 432–436). https://doi.org/10.18653/v1/e17-2069
Singh, S. (2018). Natural language processing for information extraction. arXiv preprint arXiv:1807.02383.
Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining – An overview. International Journal of Computer Science C Communication Networks, 5(1), 7–16. https://doi.org/10.5121/ijcga.2015.5105
Downloads
Published
Issue
Section
License
Copyright (c) 2020 https://creativecommons.org/licenses/by/4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
- Abstract 206
- pdf 84