Introducing the Contextors Parser

By
The estimated time to read this article is 6 minutes
Abstract: In this article we introduce the goals and notions that have been guiding us in developing our syntactic parser, among them a flexible scheme for writing linguistic rules, transparency of each rule in the system, an advanced testing tool that is sensitive to the smallest changes and a mechanism for retrieving syntactic and lexical features from every node within a given phrase. We also refer the reader to some of the parser's applications.

Rule-based parser

The Contextors’ Parser assigns syntactic structure trees to strings of words in English. Developing the parser is a fresh trial of teaching a machine rules about different linguistic aspects of English. The parser development is an on-going process and we are adding support for more linguistic phenomena all the time, with the intention to cover all grammatical structures of the English language, getting closer and closer to the Perfect ParserTM.

Theoretical research

In the process of adding rules to the parser and examining its solutions, some interesting theoretical issues arise. We research these cases and develop rules that prevent the generation of wrong solutions. You can find examples in our article about the different uses of of-preposition phrases, and others.

Developed by linguists

The parser and its development environment were built from scratch in order to support, from the beginning, a scheme for writing grammar rules that makes it possible to incorporate linguistic insights. A lot of effort was made in order to let linguists develop and test rules themselves. You can read here about the concept of Language Engineering and the process of creating rules. The main principle we follow is to put behind the scene all methods that are related to the model of the parser and let the linguist express themselves with high-level methods that match most closely the linguistic language they use.

Overcoming the rule-based challenge

The parser design and the tools for testing and debugging we’ve developed, make it possible to insert a new rule or adjust one in accordance with all relevant other rules in the system. This allows us to avoid conflicts and stay in control over the development process. The output of the parser is always predictable as it is based on rules. And because our system is transparent, our linguists can examine each step of the parsing process and trace any parsing problem to the specific piece of code that is responsible for it. Our testing tools allow us to test any change over a big number of examples and see its impact.

Linguistic programming language

In order to formalize different kinds of rules, we’ve developed various methods to express linguistic rules and principles. We extract different linguistic properties of the input and use them while parsing. Often, these methods are used as building blocks for additional more complicated rules. Moreover, eliminating wrong solutions is often achieved by formulating principles that are operative across several syntactic rules. All these methods combine to form a very rich and flexible linguistically oriented programming language.

An acceptable input of the parser may be a sentence or a phrase. The output is a syntactic structure tree which combines the representations of syntactic categories and grammatical functions (read more about it here). We can choose what level of details to visualize and highlight part of the tree. The strategy we choose for the parser is to give all possible solutions for a given input. Depending on an application needs, we can reduce the level of details and then show less solutions.

Applications

As mentioned above, the parser coverage is constantly increasing. We see that a good coverage of certain fields of the language can benefit interesting products such as the voice conjugator and other tools we’ve developed. The detailed analysis of the parser and the linguistic methods we use allow us to modify text while preserving its meaning and its basic structure.

How can you use the Contextors Parser?

We’ve opened an API for the parser. The API assigns to sentences their grammatical attributes (tense, polarity, voice, etc.) and main components (subject, verb, object, etc.). We are looking for beta users to start building interesting applications based on it. Apply for access here.

× Never miss a post! join our mailing list

Contact

Join our mailing list: