A synopsis of my dissertation can be found at the Analist financial platform.
Next to my full-time position at WCC Smart Search & Match, I am currently affiliated to the Econometric Institute at the Erasmus School of Economics (ESE), Erasmus University Rotterdam (EUR), the Netherlands. My research consists of work resulting from my former Ph.D. trajectory, as well as new efforts in language technologies.
My Ph.D. project revolved around financial event discovery in emerging news for algorithmic trading. At the time, I was employed as a Ph.D. candidate at the Erasmus Research Institute of Management (ERIM) and the Econometric Institute at the Erasmus School of Economics (ESE), Erasmus University Rotterdam (EUR), the Netherlands. My promotors and daily supervisor were Uzay Kaymak, Franciska de Jong, and Flavius Frasincar, respectively.
My Ph.D. work is linked to the Netherlands Organisation for Scientific Research (NWO) Physical Sciences (EW) Free Competition project 612.001.009: Financial Events Recognition in News for Algorithmic Trading (FERNAT). NWO funds thousands of top researchers at universities and institutes and steers the course of Dutch science by means of subsidies and research programmes. The NWO-EW Free Competition is intended for top innovative and risky fundamental research proposals that have a high scientific or practical urgency and that are significant for at least one of the disciplines of astronomy, computer science, or mathematics.
The flourishing data market and the financial resources pumped into research on the extraction of knowledge from data underline today's major stakes that are involved with the accurate extraction of knowledge and efficient usage thereof in financial decision making. Not only traditional and ubiquitous numerical data, but especially textual data such as news messages, are gradually receiving more attention. Such data often remain unused, due to their unstructured nature thwarting automatic processing. Thus, often large amounts of valuable knowledge remain undiscovered. Generally, this knowledge can be summarized in events, which are inextricably linked with financial markets. It has long been known that events, like acquisitions, product launches, natural disasters, or wars, could exert a notable influence on stock rates. Smart applications of detected events may therefore be beneficial for financial decision making. For instance, trading algorithms can be enriched with events, so as to better respond to new developments. Also, taking into account certain events may improve financial risk estimations. Because of the promising applications, but also the many inevitable challenges, my dissertation comprises a number of topics related to the semi-automatic detection of (financial) events in news text.
Taking into consideration the results of a detailed evaluation of currently existing knowledge-driven, data-driven, and hybrid extraction systems and methods focusing on events, my thesis presents a semi-automatic system for financial event extraction from news text. The system consists of various innovative, qualitatively competitive, and knowledge-driven components for natural language processing. Their interaction with a knowledge base fosters a feedback loop, so that newly detected events can be digested, allowing the incorporated knowledge to be used in future (extraction) processes.
In addition, two languages are proposed that can refine the aforementioned system. While the first language is focused on pattern-based event extraction from text, the second language is targeted toward the definition and execution of knowledge base updates, associated with the extracted events. After conducting a series of experiments focusing on pattern development times and result accuracies, it can be concluded that the extraction language offers, in contrast to many modern alternatives, a simple and flexible notation utilizing lexical, syntactical, and semantical elements, while maintaining expressivity. Moreover, an evolutionary approach for automatically generating patterns contributes to the practical employability of the language. The trigger-based knowledge base update language and various developed execution models are particularly suitable for event extraction applications. While modern languages are often less suited for fully automated updates, now an increased flexibility is offered for automatically executing predefined rules, providing the user with a wide range of options, e.g., immediate or deferred execution, update chaining, etcetera.
Last, two financial applications of events extracted from news text are presented in my dissertation. For both applications it is confirmed that the addition of events to prevailing computations or algorithms can yield more accurate results. In our analysis of an event-based automated trading application, rules are generated for trading stocks. The best performing rules do not only make use of numerical signals such as average historical stock rates, but also employ news-based event signals. Moreover, when cleaning stock data from disruptions caused by financial events, financial risk analyses yield more accurate results.
The reported results suggest that it is possible to semi-automatically and accurately detect events in news text in a knowledge-driven way, when making use of advanced extraction rules. Additional update rules and execution models accommodate a feedback loop to knowledge bases underlying event extraction systems. Events detected in news can be used as additional parameters in financial applications, thus yielding more accurate outcomes in the evaluated cases. Such advantageous applications can be of good use in (semi-)automated environments.
Parts of my work - done in order to obtain the doctorate degree - have received national media attention. The most interesting outcomes, especially the ones related to incorporating extracted financial events into trading algorithms, have been disseminated to the general audience.
A synopsis of my dissertation can be found at the Analist financial platform.
In an extensive article, the largest Dutch pop-science website Scientias covers my work on financial event recognition in news for enhancing trading algorithms.
As part of the wrap-up of the COMMIT project, the Commotion magazine featured an article on the outcomes of my dissertation.
Furthermore, I have been awarded with the Elsevier / Prof. Peter P. Chen Data and Knowledge Engineering Best Paper 2013 Award and a travel grant to the ER 2016 conference.
In context of my NWO-EW Free Competition FERNAT project, I've released several pieces of software during the course of my Ph.D. research.
Argos is a highly-configurable financial data monitoring tool that allows you to collect intra-day information on a financial market using stock feeds and (financial) news feeds. The all-seeing eyes of the mythological hundred-eyed giant Argos are represented by monitoring components that collect (stock) data on financial markets as well as related (financial) news messages.
By default, RSS news feed parsers for Yahoo, Reuters, Associated Press, New York Times, Business Week, and CNN are included, but of course the data collectors are easily extendible. In the current version, stock information on NASDAQ-100 companies is gathered using a Google API, but this can also be extended to other services.
The OULx software implements the Ontology Update Language (OUL) together with several extensions, inspired by the existing SQL-triggers mechanism, making use of SPARQL and SPARQL/Update statements. Programming work has been done by Jordy Sangers. OULx contains eight different execution models, providing flexibility with respect to the update process.
The Hermes news personalization tool is software developed at my department for news processing purposes. It has many contributors, e.g., Jethro Borsje, Wouter IJntema, Jordy Sangers, Arnout Verheij, Allard Kleijn, Michel Capelle, and Marnix Moerland. This has resulted in various releases that each focus on specific aspects, e.g., result ranking algorithms, graphical query languages, information extraction languages, news recommendation algorithms, etc. On my Hermes-related page, I have made available Hermes 1.1, which implements a graphical query language and result ranking algorithms, and Hermes 2.3, featuring a lexico-semantic pattern language for information extraction and a genetic programming approach for automatic pattern learning.
Non-commercial and private use of the software is allowed. When publishing work that is based on my software, please ensure that proper citations are included. In case of questions, comments, or suggestions, feel free to contact me. Also, please notify me when you have developed extensions, additional components, etc., so that I can add them to the next software release.
Erasmus School of Economics
Erasmus University Rotterdam
P.O. Box 1738
NL-3000 DR Rotterdam
Burgemeester Oudlaan 50
NL-3062 PA Rotterdam