Segmentation of patent claims for improving their readability

Gabriela Ferraro, Hanna Suominen, Jaume NUALART VILAPLANA

Research output: A Conference proceeding or a Chapter in BookConference contribution

Abstract

Good readability of text is important to ensure efficiency in communication and eliminate risks of misunderstanding. Patent claims are an example of text whose readability is often poor. In this paper, we aim to improve claim readability by a clearer presentation of its content. Our approach consist in segmenting the original claim content at two levels. First, an entire claim is segmented to the components of preamble, transitional phrase and body, using a rule-based approach. Second, a conditional random field is trained to segment the components into clauses. An alternative approach would have been to modify the claim content which is, however, prone to also changing the meaning of this legal text. For both segmentation levels, we report results from statistical evaluation of segmentation performance. In addition, a qualitative error analysis was performed to understand the problems underlying the clause segmentation task. Our accuracy in detecting the beginning and end of preamble text is 1.00 and 0.97, respectively. For the transitional phase, these numbers are 0.94 and 1.00 and for the body text, 1.00 and 1.00. Our precision and recall in the clause segmentation are 0.77 and 0.76, respectively. The results give evidence for the feasibility of automated claim and clause segmentation, which may help not only inventors, researchers, and other laypeople to understand patents but also patent experts to avoid future legal cost due to litigations.
Original languageEnglish
Title of host publicationProceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)
Subtitle of host publication14th Conference of the European Chapter of the Association for Computational Linguistics
EditorsSandra Williams, Advaith Siddharthan, Ani Nenkova
Place of PublicationNew York, USA
PublisherCurran Associates
Pages66-73
Number of pages8
ISBN (Print)9781937284916, 9781632664075
Publication statusPublished - 2014
Event3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2014) - Gothenburg, Gothenburg, Sweden
Duration: 26 Apr 201430 Apr 2014

Workshop

Workshop3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2014)
Abbreviated titlePITR
CountrySweden
CityGothenburg
Period26/04/1430/04/14

Fingerprint

Error analysis
Communication
Costs

Cite this

Ferraro, G., Suominen, H., & NUALART VILAPLANA, J. (2014). Segmentation of patent claims for improving their readability. In S. Williams, A. Siddharthan, & A. Nenkova (Eds.), Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR): 14th Conference of the European Chapter of the Association for Computational Linguistics (pp. 66-73). New York, USA: Curran Associates. EACL (Workshop - PITR)
Ferraro, Gabriela ; Suominen, Hanna ; NUALART VILAPLANA, Jaume. / Segmentation of patent claims for improving their readability. Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR): 14th Conference of the European Chapter of the Association for Computational Linguistics. editor / Sandra Williams ; Advaith Siddharthan ; Ani Nenkova. New York, USA : Curran Associates, 2014. pp. 66-73 (EACL (Workshop - PITR)).
@inproceedings{f80d77ba512542c59e1848619ca56bf4,
title = "Segmentation of patent claims for improving their readability",
abstract = "Good readability of text is important to ensure efficiency in communication and eliminate risks of misunderstanding. Patent claims are an example of text whose readability is often poor. In this paper, we aim to improve claim readability by a clearer presentation of its content. Our approach consist in segmenting the original claim content at two levels. First, an entire claim is segmented to the components of preamble, transitional phrase and body, using a rule-based approach. Second, a conditional random field is trained to segment the components into clauses. An alternative approach would have been to modify the claim content which is, however, prone to also changing the meaning of this legal text. For both segmentation levels, we report results from statistical evaluation of segmentation performance. In addition, a qualitative error analysis was performed to understand the problems underlying the clause segmentation task. Our accuracy in detecting the beginning and end of preamble text is 1.00 and 0.97, respectively. For the transitional phase, these numbers are 0.94 and 1.00 and for the body text, 1.00 and 1.00. Our precision and recall in the clause segmentation are 0.77 and 0.76, respectively. The results give evidence for the feasibility of automated claim and clause segmentation, which may help not only inventors, researchers, and other laypeople to understand patents but also patent experts to avoid future legal cost due to litigations.",
keywords = "Text Segmentation, Readability, Patent claims",
author = "Gabriela Ferraro and Hanna Suominen and {NUALART VILAPLANA}, Jaume",
year = "2014",
language = "English",
isbn = "9781937284916",
pages = "66--73",
editor = "Sandra Williams and Advaith Siddharthan and Ani Nenkova",
booktitle = "Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)",
publisher = "Curran Associates",

}

Ferraro, G, Suominen, H & NUALART VILAPLANA, J 2014, Segmentation of patent claims for improving their readability. in S Williams, A Siddharthan & A Nenkova (eds), Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR): 14th Conference of the European Chapter of the Association for Computational Linguistics. Curran Associates, New York, USA, EACL (Workshop - PITR), pp. 66-73, 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2014) , Gothenburg, Sweden, 26/04/14.

Segmentation of patent claims for improving their readability. / Ferraro, Gabriela; Suominen, Hanna; NUALART VILAPLANA, Jaume.

Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR): 14th Conference of the European Chapter of the Association for Computational Linguistics. ed. / Sandra Williams; Advaith Siddharthan; Ani Nenkova. New York, USA : Curran Associates, 2014. p. 66-73 (EACL (Workshop - PITR)).

Research output: A Conference proceeding or a Chapter in BookConference contribution

TY - GEN

T1 - Segmentation of patent claims for improving their readability

AU - Ferraro, Gabriela

AU - Suominen, Hanna

AU - NUALART VILAPLANA, Jaume

PY - 2014

Y1 - 2014

N2 - Good readability of text is important to ensure efficiency in communication and eliminate risks of misunderstanding. Patent claims are an example of text whose readability is often poor. In this paper, we aim to improve claim readability by a clearer presentation of its content. Our approach consist in segmenting the original claim content at two levels. First, an entire claim is segmented to the components of preamble, transitional phrase and body, using a rule-based approach. Second, a conditional random field is trained to segment the components into clauses. An alternative approach would have been to modify the claim content which is, however, prone to also changing the meaning of this legal text. For both segmentation levels, we report results from statistical evaluation of segmentation performance. In addition, a qualitative error analysis was performed to understand the problems underlying the clause segmentation task. Our accuracy in detecting the beginning and end of preamble text is 1.00 and 0.97, respectively. For the transitional phase, these numbers are 0.94 and 1.00 and for the body text, 1.00 and 1.00. Our precision and recall in the clause segmentation are 0.77 and 0.76, respectively. The results give evidence for the feasibility of automated claim and clause segmentation, which may help not only inventors, researchers, and other laypeople to understand patents but also patent experts to avoid future legal cost due to litigations.

AB - Good readability of text is important to ensure efficiency in communication and eliminate risks of misunderstanding. Patent claims are an example of text whose readability is often poor. In this paper, we aim to improve claim readability by a clearer presentation of its content. Our approach consist in segmenting the original claim content at two levels. First, an entire claim is segmented to the components of preamble, transitional phrase and body, using a rule-based approach. Second, a conditional random field is trained to segment the components into clauses. An alternative approach would have been to modify the claim content which is, however, prone to also changing the meaning of this legal text. For both segmentation levels, we report results from statistical evaluation of segmentation performance. In addition, a qualitative error analysis was performed to understand the problems underlying the clause segmentation task. Our accuracy in detecting the beginning and end of preamble text is 1.00 and 0.97, respectively. For the transitional phase, these numbers are 0.94 and 1.00 and for the body text, 1.00 and 1.00. Our precision and recall in the clause segmentation are 0.77 and 0.76, respectively. The results give evidence for the feasibility of automated claim and clause segmentation, which may help not only inventors, researchers, and other laypeople to understand patents but also patent experts to avoid future legal cost due to litigations.

KW - Text Segmentation

KW - Readability

KW - Patent claims

M3 - Conference contribution

SN - 9781937284916

SN - 9781632664075

SP - 66

EP - 73

BT - Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

A2 - Williams, Sandra

A2 - Siddharthan, Advaith

A2 - Nenkova, Ani

PB - Curran Associates

CY - New York, USA

ER -

Ferraro G, Suominen H, NUALART VILAPLANA J. Segmentation of patent claims for improving their readability. In Williams S, Siddharthan A, Nenkova A, editors, Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR): 14th Conference of the European Chapter of the Association for Computational Linguistics. New York, USA: Curran Associates. 2014. p. 66-73. (EACL (Workshop - PITR)).