Browsing by Author "Cahan SF"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- ItemAnisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis(Cambridge University Press, 2024-01-25) Hou F; Wang R; Ng S-K; Zhu F; Witbrock M; Cahan SF; Chen L; Jia XCoreference resolution is the task of identifying and clustering mentions that refer to the same entity in a document. Based on state-of-the-art deep learning approaches, end-to-end coreference resolution considers all spans as candidate mentions and tackles mention detection and coreference resolution simultaneously. Recently, researchers have attempted to incorporate document-level context using higher-order inference (HOI) to improve end-to-end coreference resolution. However, HOI methods have been shown to have marginal or even negative impact on coreference resolution. In this paper, we reveal the reasons for the negative impact of HOI coreference resolution. Contextualized representations (e.g., those produced by BERT) for building span embeddings have been shown to be highly anisotropic. We show that HOI actually increases and thus worsens the anisotropy of span embeddings and makes it difficult to distinguish between related but distinct entities (e.g., pilots and flight attendants). Instead of using HOI, we propose two methods, Less-Anisotropic Internal Representations (LAIR) and Data Augmentation with Document Synthesis and Mention Swap (DSMS), to learn less-anisotropic span embeddings for coreference resolution. LAIR uses a linear aggregation of the first layer and the topmost layer of contextualized embeddings. DSMS generates more diversified examples of related but distinct entities by synthesizing documents and by mention swapping. Our experiments show that less-anisotropic span embeddings improve the performance significantly (+2.8 F1 gain on the OntoNotes benchmark) reaching new state-of-the-art performance on the GAP dataset.
- ItemTextual dimensions of sustainability information, stock price informativeness, and proprietary costs: Evidence from integrated reports(Elsevier Ltd on behalf of the British Accounting and Finance Association, 2024-10-23) Barth ME; Cahan SF; Chen L; Venter ER; Wang RWe examine whether integrated report quality, IRQ, is negatively associated with stock price synchronicity, an inverse measure of firm-specific information, and the extent to which the relation between IRQ and synchronicity is attenuated by proprietary costs. We measure IRQ using machine-based textual analysis along four dimensions: textual attributes, topical content, integrated reporting capitals, and financial versus sustainability information. We find that measures of IRQ based on seven textual attributes are negatively related to synchronicity, which is consistent with higher quality text containing more firm-specific content. Using PhraseLDA to identify topics in integrated reports, we find that contents related to the three most common categories—governance, performance, and risks and opportunities—are negatively associated with synchronicity. We find similar results for all integrated report capitals, except manufactured capital. Further, we find that sustainability information has a larger negative association with synchronicity than financial information. We also find that proprietary costs stemming from product market competition attenuate the association between IRQ and synchronicity, which suggests the informativeness of integrated reports varies with a firm's competitive environment. Our results may inform the International Sustainability Standards Board as it considers the role of the Integrated Reporting Framework in developing sustainability standards.