Schema Learning in Apache Solr

06/11/2018 - 14:50 to 15:10

Palais Atelier

short talk (20 min)

Beginner

Session abstract:

Apache Solr has come a long way from being used for simple full-text search to modern day analytics, geospatial, media and multi-tenant search applications. However, it suffers from the inductive problem of “schema-resolution”.

While there exists a “schema-less” mode in Apache Solr, it doesn't really solve the above problem, as it generates a very generic schema under the hood. At Unbxd, a multi-tenant e-commerce search platform, arriving at the most optimal schema is critical for performance and functionality.

This talk presents our contribution to Solr (SOLR-11741), a “schema-learning mode” that leverages “field type hierarchy” to solve the schema inference problem by learning from source documents and run-time query patterns. The talk additionally focuses on how searching, sorting & faceting can become more efficient with this feature and provide insights into data anomalies.

Slide: