Metacortex Ingestion API

With the recent announcement of Metacortex, I felt it prudent to share some of the documentation.

The coolest part about the system is that there are only three steps:

  1. Deploy Ingestion Server, Search Server, PostgreSQL backend
  2. Send comments to ingestion endpoint
  3. Search! (possibly setup notifications as well)

The point being, the setup even for an enterprise should not take long at all. All of the components described are often already approved / have deployment patterns within an enterprise (i.e. easy to setup) and the system ships with the required models, setup scripts, etc.

The only “difficult part” is setting up the API calls. However, even the API to build the complex search capabilities, is really quite simple. Only four fields are required.

An example Curl command to Metacortex:

Will get a response like the following:

As you can see there are a few additional fields added, but essentially you get a sentiment, score, and key_word response from the NLP engine, which could be used.

The current v1 API documentation (single comment):

Multiple comments, are simply a list of these objects, contained in the form [{object}, {object}, …, {object}].

That’s really it! The example HNProfile.com for instance, was literally just created from hitting an ingestion endpoint with Hacker News comments.

Performance

It’s always important to note performance in my book. Today, the system runs as a flask app, where you can lunch multiple flask apps behind a load balancer. It only requires 187Mb of RAM and with a 2.2GHz processor can process ~70 comments / second.

This means, if you had 1 billion comments a year, perfectly distributed in time, the system would be able to handle it on a flask app (1,000,000,000 / (365 * 24 * 60 * 60) = 31.7).

Regarding search, the vast majority of the search is performed on the database, meaning it’s really the database that is the bottleneck. Today with a db.m4.xlarge on AWS, you can within 50ms search for experts and 200 ms for content. The database contains roughly 500,000 authors, 25 million comments, on 125,000 stories, with 10 million unique topics having been mentioned / discussed.

Announcing Metacortex

As some of you may have noticed, recently I took a brief haitus (from blog posting & updates).

There has been a lot going on and although I have been able to manage most technical issues, new features and blog posts have taken a back seat.

Primarily, this is due to my limited bandwidth: 20 – 30 hours a week at the moment. However, I’m excited to announce, we do have some updates this month!

Announcing Metacortex.me

We are pleased to announced Metacortex.me!

The goal of which is Enterprise Knowledge Management – find experts, search content, track mood, reduce duplication and more!

The end goal for projectpiglet.com was always to accomplish two goals:

  1. Debug the search and tracking algorithms for metacortex.me
  2. Keep the lights on while doing #1

To that end, it’s been a smashing success. The algorithm (ExpertRank) is currently being drafted for patenting (provisional filed), and should be filed within the next few weeks. In addition, we have several working demos and I personally (and others) continue to use projectpiglet.com regularly to make gains on the market (over the last year: 55%, lower than last year).

projectpiglet.com was always intended to be limited in scope; with search being the real end goal. Since 2013, when I wrote the first version of ProjectPiglet, I knew it could be more – a better search engine. This has been exhausted since I’ve started working corporate jobs; I’ve grown to yearn for that search engine.

Even alluding to the fact I didn’t think search was solved (or could be solved) on my personal blog. The point is, if I focused entirely on projectpiglet.com I would be doing a disservice to myself, my company, and probably to you.

The Future

Unfortunately, this means two things:

  1. We’ll be competing in the enterprise search space, doing B2B
  2. Minimal effort toward new features on projectpiglet.com

Competing in the enterprise search arena is scary, not only are there big players, there are free players! With that in mind, I have spent the past few months patenting what I could and obfuscating the rest.

As for development on projectpiglet.com; the metacortex.me demos still rely on the data collected. Meaning, it’s not going anywhere any time soon. It’s the testing ground for the enterprise search. Meaning, it’s necessary to maintain the project, if metacortex.me is to be a success. On the other hand, development outside the scope of enterprise search will be limited (i.e. probably no nicer financial charts). In addition, today I still use projectpiglet.com regularly to make money.

In other words, projectpiglet.com will continue – albeit primarily for bug squashing and any new features will be focused towards the end goal of improving search and the algorithms.