Archive for November, 2009

Discovering Content by Mining the Entity Web

Wednesday, November 18th, 2009

I had a blast last night presenting to CS students at the University of Washington. For those who missed the talk, the video is embedded below.

Abstract: Unstructured natural language text found in blogs, news and other web content is rich with semantic relations linking entities (people, places and things). At Evri, we are building a system which automatically reads web content similar to the way humans do. The system can be thought of as an army of 7th grade grammar students armed with a really large dictionary. The dictionary, or knowledge base, consists of relatively static information mined from structured and semi-structured publicly available information repositories like Wikipedia, Crunchbase, and Amazon. This large knowledge base is in turn used by a highly distributed search and indexing infrastructure to perform a deep linguistic analysis of many millions of documents ultimately culminating in a large set of semantic relationships expressing grammatical SVO style clause level relationships. This highly expressive, exacting, and scalable index makes possible a new generation of content discovery applications.

The full talk: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, and the Slides.

Automated Action Based Content Generation

Monday, November 9th, 2009

I’m excited to announce The Attack Machine, an exploratory site available via Evri’s experimental Garden, which leverages the power of the Evri API to automatically generate content oriented around an action, or verb.

Content on The Attack Machine is automatically generated leveraging Evri’s deep linguistic analysis of news, blog and other web content. The Attack Machine is an example application highlighting the power of leveraging a grammatical clause level understanding of verbs. The entire Attack Machine site could be straightforwardly mapped to other verbs, such as love, hate, punch, cherish, destroy, etc.

Content on The Attack Machine homepage is generated by an algorithm which runs every few minutes and looks for the top attackers and victims in particular categories such as animals, locations, weapons, and politicians.


Evri’s system can be thought of like an army of 7th grade grammar students armed with a really large dictionary, or knowledgebase. These 7th grade grammar student algorithms are able to scour the news, blog and other web content to break every sentence down into key grammatical clauses such as the grammatical subject, verb, and object. The attackers in The Attack Machine are, in essence, the grammatical subjects; the verb is always attack and related verbs like: kill, assault, maim, etc.; and the victims on The Attack Machine are the grammatical objects. So, for example, if a blog post contains a simple sentence or title like: “Israel attacks Hamas.”, this post will appear in the attacker page for Israel, and the victim page for Hamas.

In a blog post titled Evri’s Garden Sprouts Some Search I discuss in detail the underlying search mechanism our scientists and engineers use to power higher level API and application functionality. One of the key higher level API functionalities The Attack Machine leverages is an API resource called Get relations about an entity. This API resource is used, for example, to populate the center column in the individual attack pages such as this one on alligator attacks:

and more specifically, the following REST API call is used:

http://api.evri.com/v1/organism/alligator-0×397510/relations/verb/attack/?media=article&sort=date&includeMatchedLocations=true&appId=evri.com/blog

This API call allows us to automatically identify the key people, places, organizations and things involved in attacks, in addition to getting the articles which correspond to the latest attacks. Now if the user clicks on a specific person, place, or thing, the following API call is used:

http://api.evri.com/v1/organism/alligator-0×397510/relations/verb/attack/location/florida-0×3154d?media=article&sort=date&includeMatchedLocations=true&appId=evri.com/blog

So in this way, we can populate the data for all attacks by alligators in Florida.

Finally, The Attack Machine leverages the Evri API to generate unique natural language content which benefits the reader, as well as significantly helps page SEO. For example, the 1st and third paragraphs in the 1st column in the screen shot above are automatic formulations from API output. In addition, the Evri knowledgebase is leveraged to minimize human editorial contribution. For example, the second paragraph above is written by an editor and linked to either a specific entity, a narrow category for an entity, such as animal, or politician, and a higher level category such as person, organism, or location. Simple logic is then applied at page generation time to select the natural language content from an entity handle if it exists, if not, from a narrow category handle, and if none exists (since we have thousands), then from a higher level category (there are only a handful of these) handle.

That’s all for now. If you have any questions on how to use the Evri API for similar applications, or any other feedback, please let us know on our API forum.

Embed Evri’s Sentiment Widget

Tuesday, November 3rd, 2009

picture-2Check out our new sentiment widget. You can select your favorite topic to seed the widget. Then just select your blogging platform, or grab the code, and embed it onto your web site or blog. Your readers can then get an up to the minute assessment of how the web feels about your topic.

If your topic is Barack Obama, for example, your readers can see that 38% of the web feels positively about him, and 62% of the web is expressing negative sentiment about him. You can also see who Barack Obama’s top critics and praisers are, and then explore exactly what they are saying about the president. Play around with the drop down boxes to get different sentiment expressions found on the web and as always, please do send us your feedback.

On a final note, the Evri sentiment widget is built using our sentiment API which you can read all about in this blog post titled Sentiment API Exposes Web’s Feelings.