With the recent announcement of Metacortex, I felt it prudent to share some of the documentation.
The coolest part about the system is that there are only three steps:
- Deploy Ingestion Server, Search Server, PostgreSQL backend
- Send comments to ingestion endpoint
- Search! (possibly setup notifications as well)
The point being, the setup even for an enterprise should not take long at all. All of the components described are often already approved / have deployment patterns within an enterprise (i.e. easy to setup) and the system ships with the required models, setup scripts, etc.
The only “difficult part” is setting up the API calls. However, even the API to build the complex search capabilities, is really quite simple. Only four fields are required.
An example Curl command to Metacortex:
Will get a response like the following:
As you can see there are a few additional fields added, but essentially you get a sentiment, score, and key_word response from the NLP engine, which could be used.
The current v1 API documentation (single comment):
Multiple comments, are simply a list of these objects, contained in the form [{object}, {object}, …, {object}].
That’s really it! The example HNProfile.com for instance, was literally just created from hitting an ingestion endpoint with Hacker News comments.
Performance
It’s always important to note performance in my book. Today, the system runs as a flask app, where you can lunch multiple flask apps behind a load balancer. It only requires 187Mb of RAM and with a 2.2GHz processor can process ~70 comments / second.
This means, if you had 1 billion comments a year, perfectly distributed in time, the system would be able to handle it on a flask app (1,000,000,000 / (365 * 24 * 60 * 60) = 31.7).
Regarding search, the vast majority of the search is performed on the database, meaning it’s really the database that is the bottleneck. Today with a db.m4.xlarge on AWS, you can within 50ms search for experts and 200 ms for content. The database contains roughly 500,000 authors, 25 million comments, on 125,000 stories, with 10 million unique topics having been mentioned / discussed.