A Successful Elasticsearch Boston NLP/ML Hackday

March 27, 2013

Boy has the Elasticsearch Boston meetup group grown by leaps and bounds in just a few short months. It all started back in August 2012 when we organized HaackrDay. We were looking for ways to get involved in the local dev community and meet and network with smart people. And we definitely did.

Igor Motov, a major committer for the Elasticsearch project, showed up at the event and we got to chat about Elasticsearch and its adoption by companies around Boston. Traackr had recently made the switch to this new search engine and we were eager to see what other people’s experiences were.

At the time, Igor worked at Sonian and was responsible for one of the largest Elasticsearch clusters in existence. We agreed we all wanted to see the technology get more widely used and felt one way we could contribute would be to help kick start the local Boston meetup group. So, that's exactly what Igor and I set out to do.

From the beginning, we decided to avoid setting the format in stone. We wanted things to be open ended so that we could adapt to the various members' needs and expectations. This lead us to a good mix of experienced users as well as newcomers that just wanted to find out what the technology was all about. The next natural step was to let people get some hands-on familiarity with the technology itself. And to do that, we needed searchable data.

To make things a bit more interesting, we decided to add to the mix something that very often goes hand-in-hand with search: Natural Language Processing (NLP). NLP is often used in the field of Information Retrieval to extract metadata out of free form text. This metadata can then be made searchable on top of the unstructured text to provide even better search results.

At Traackr we are looking at using NLP to gain better understanding of what our influencers write about on a daily basis. So we reached out for some pointers to our friends at Embed.ly who are already NLP experts and the Elasticsearch Boston NLP/ML Hackday was born.

We ended up selling out all of our tickets days in advance of the event. The event itself was a great success: we kicked off the day with Kawandeep Virdee from Embed.ly who gave an excellent talk around NLP. Igor follow up with a presentation on Elasticsearch Plugins. We then opened up the floor to ideas attendees wanted to pursue and let teams organize organically around them. This was a huge hit.

By 11am, we had about six or seven different teams hacking away. Some of the data provided to them were: 38 million articles from the Traackr index, 13 million Wikipedia records, 500K enron emails and of course access to the Embed.ly and Traackr APIs. Food and drinks were also provided. We concluded the day with a few notable projects that won books sponsored by Manning Publications (that’s right, we got them to sponsor too!):

  • One team leveraged Latent Dirichlet allocation and the MAchine Learning for LanguagE Toolkit to parse technical articles and attempt to make recommendations for other similar articles.
  • Dharmendra Kanejiya from Cognii, who has already built an educational tool that provides automatic assessment of descriptively written essay type answers, spent the day building the UI around that tool.
  • Alex Lambert from Spindle processed articles from the Economist from the past 5 years and attempted to rate them on 3 different mood scales to see if indeed the world is coming to an end ( :-) seriously, no joke). Thankfully, he came to the conclusion that the world is not ending, so we are all safe for now.

What really made the event that much more enjoyable was the fact that the good folks at Hack/Reduce reached out to us to donate the space and server cluster for the day.

Hack/Reduce is a non-profit organization whose mission is to promote the learning and use of big-data technologies. They felt, just like we did, that the event was a great fit for their space. So a big thank you goes out to them (Andrew LaPrade, Adrienne Cochrane and team) for helping us out with the logistics. Also, a big thank you to the Hopper team (Joost Ouwerkerk, Philippe Laflamme and Greg Lu) for setting up the cluster for us and providing tech support throughout the day.

Here is hoping for many more to come.

