| Abstract: |
Encouraging inventors to disclose new inventions is an important economic
justification for the patent system, yet the technical information contained
in patent applications is often inadequate and unclear. This paper proposes a
novel approach to measure disclosure in patent applications using algorithms
from computation allinguistics. Borrowing methods from the literature on
second language acquisition, we analyze core linguistic features of 40,949
U.S. applications in three patent categories related to nanotechnology,
batteries, and electricity from 2000 to 2019. Relying on the expectation that
universities have more incentives to disclose their inventions than
corporations for either incentive reasons or for different source documents
that patent attorneys can draw on, we confirm the relevance and usefulness of
the linguistic measures by showing that university patents are more readable.
Combining the multiple measures using principal component analysis, we find
that the gap in disclosure is 0.4 SD, with a wider gap between top applicants.
Our results do not change after accounting for the heterogeneity of inventions
by controlling for cited-patent fixed effects. We also explore whether one
pathway by which corporate patents become less readable is use of multiple
examples to mask the “best mode” of inventions. By confirming that
computational linguistic measures are useful indicators of readability of
patents, we suggest that the disclosure function of patents can be explored
empirically in a way that has not previously been feasible. |