Session abstract:
As the world's leading provider of financial news, Bloomberg LP ingests on the order of 1 million news stories per day from over 100 thousand sources in over 40 languages. To facilitate users' ability to quickly retrieve news tailored to their specific interests, stories are run through a classification system containing hundreds of thousands of rules where they are tagged in real-time with a mean latency of under 50ms. In this talk I'll discuss the migration of the news classification engine from a legacy system to a solution based on Luwak/Lucene while retaining the query language of the existing corpus of rules.