Pilot Testing of an Information Extraction (IE) Prototype for Legal Research

Authors

DOI:

https://doi.org/10.23962/10539/29192

Keywords:

information retrieval (IR), information extraction (IE), natural language processing (NLP), legal cases, document databases, source cases, cases referred to (CRTs)

Abstract

This article presents findings from pilot testing of elements of an information extraction (IE) prototype designed to assist legal researchers in engaging with case law databases. The prototype that was piloted seeks to extract, from legal case documents, relevant and accurate information on cases referred to (CRTs) in the source cases. Testing of CRT extraction from 50 source cases resulted in only 38% (n = 19) of the extractions providing an accurate number of CRTs. In respect of the prototype’s extraction of CRT attributes (case title, date, journal, and action), none of the 50 extractions produced fully accurate attribute information. The article outlines the prototype, the pilot testing process, and the test findings, and then concludes with a discussion of where the prototype needs to be improved.

References

Abdelmagid, M., Ahmed, A., & Himmat, M. (2015). Information extraction methods and extraction techniques in the chemical document’s contents: Survey. ARPN Journal of Engineering and Applied Sciences, 10(3), 1068–1073.

Al-Anzi, F. S., & AbuZeina, D. (2018). Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach. Information Processing C Management, 54(1), 105–115. https://doi.org/10.1016/j.ipm.2017.10.003

Aritomo, D., & Watanabe, C. (2019). Achieving efficient similar document search over encrypted data on the cloud. In 2019 IEEE International Conference on Smart Computing (SMARTCOMP) (pp. 1–6). https://doi.org/10.1109/smartcomp.2019.00020

Batra, S., & Tyagi, C. (2012). Comparative analysis of relational and graph databases. International Journal of Soft Computing and Engineering (IJSCE), 2(2), 509–512.

Chopra, A., Prashar, A., & Sain, C. (2013). Natural language processing. International Journal of Technology Enhancements and Emerging Engineering Research, 1(4), 131–134.

Chowdhary, K. (2012). Natural language processing. Jodhpur, India: MBM Engineering College. Retrieved from http://www.krchowdhary.com/me-nlp12/nlp-01.pdf

Conroy, R. (2016). Sample size: A rough guide. Dublin: Royal College of Surgeons in Ireland. http://doi.org/10.1080/08897077.2011.640215

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537. https://doi.org/10.1.1.231.4614

Croft, W. B., Metzler, D., & Strohman, T. (2015). Information retrieval in practice. New York: Pearson.

Firdhous, M. (2010). Automating legal research through data mining. International Journal of Advanced Computer Science and Applications (IJACSA), 1(6), 9–16. https://doi.org/10.14569/IJACSA.2010.010602

Goyvaerts, J., & Levithan, S. (2009). Regular expressions cookbook. Boston: O’Reilly Media.

Iida, R., Inui, K., Takamura, H., & Matsumoto, Y. (2003). Incorporating contextual cues in trainable models for coreference resolution. In Proceedings of the 2003 EACL Workshop on the Computational Treatment of Anaphora (pp. 23–30).

Indurkhya, N., & Damerau, F. J. (Eds.).(2010). Handbook of natural language processing (Vol.2). Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420085938

Kumar, R., & Sharma, S. C. (2018). Information retrieval system: An overview, issues, and challenges. International Journal of Technology Diffusion (IJTD), 9(1), 1–10. https://doi.org/10.4018/IJTD.2018010101

LexisNexis South Africa. (n.d.). All South African law reports. Retrieved from https://store.lexisnexis.co.za/products/all-south-african-law-reports-2020-skuZASKUPG1994

Liddy, E. D. (2001). Natural language processing. In M. A. Drake (Ed.), Encyclopedia of library and information science (2nd ed.). New York: Marcel Dekker. https://doi.org/10.1017/S0267190500001446

Liu, B. (2011). Web data mining: Exploring hyperlinks, contents, and usage data. New York: Springer-Verlag. https://doi.org/10.1007/978-3-642-19460-3

Losee, R. M. (2015). Validating a model predicting retrieval ordering performance with statistically dependent binary features. International Journal of Information Retrieval Research (IJIRR), 5(1), 1–18. https://doi.org/10.4018/ijirr.2015010101

Losee, R. M., Bookstein, A., & Yu, C. T. (1986). Probabilistic models for document retrieval: A comparison of performance on experimental and synthetic databases. In SIGIR ‘86: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 258–264). https://doi.org/10.1145/253168.253222

MongoDB. (n.d.). Introduction to MongoDB: Documents. Retrieved from https://docs.mongodb.com/manual/core/document/

Moniruzzaman, A., & Hossain, S. (2013). NoSQL database: New era of databases for big data analytics – Classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 43–45. https://doi.org/10.1016/S0262-4079(12)63205-9

Mooney, R. J., & Bunescu, R. (2005). Mining knowledge from text using information extraction. ACM SIGKDD Explorations Newsletter, 7(1), 3–10. https://doi.org/10.1145/1089815.1089817

Padayachy, T., Scholtz, B., & Wesson, J. (2018). An information extraction model using a graph database to recommend the most applied case. In Proceedings of the 2018 International Conference on Computing, Electronics and Communications Engineering (ICCECE) (pp. 89–94). doi: 10.1109/iCCECOME.2018.8658659

Pandey, S., Mathur, I., & Joshi, N. (2019). Information retrieval ranking using machine learning techniques. In 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 86–92). https://doi:10.1109/AICAI.2019.8701391

Piskorski, J., & Yangarber, R. (2013). Information extraction: Past, present and future. In T. Poibeau, H. Saggion, J. Piskorski, & R. Yangarber (Eds.), Multi-source, multilingual information extraction and summarization. Berlin: Springer. https://doi.org/10.1007/978-3-642-28569-1_2

Quinlan, J. R. (1986). The effect of noise on concept learning. In R. S. I. Michalski, J. G. Carboneel, & T. M. Mitchell (Eds.), Machine learning. Burlington, MA: Morgan Kaufmann Publishers.

Roshdi, A., & Roohparvar, A. (2015). Review: Information retrieval techniques and applications. International Journal of Computer Networks and Communications Security, 3(9), 373–377.

Roy-Hubara, N., & Sturm, A. (2020). Design methods for the new database era: A systematic literature review. Software C Systems Modeling, 19, 297–312. https://doi.org/10.1007/s10270-019-00739-8

Schofield, A., Magnusson, M., & Mimno, D. (2017). Pulling out the stops: Rethinking stopword removal for topic models. In 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp. 432–436). https://doi.org/10.18653/v1/e17-2069

Singh, S. (2018). Natural language processing for information extraction. arXiv preprint arXiv:1807.02383.

Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining – An overview. International Journal of Computer Science C Communication Networks, 5(1), 7–16. https://doi.org/10.5121/ijcga.2015.5105

Downloads

Published

30-06-2020

Issue

Section

Research Articles

How to Cite

Scholtz, B., Padayachy, T. and Adewoyin, O. (2020) “Pilot Testing of an Information Extraction (IE) Prototype for Legal Research”, The African Journal of Information and Communication (AJIC) [Preprint], (25). doi:10.23962/10539/29192.
Views
  • Abstract 206
  • pdf 84