How To Use Humans And Machines To Perfect Curation

There are still some tasks that can only be accomplished by people.

I remember going to Yahoo! back in 1995, just before everything exploded here in Silicon Valley. I was there with a number of cable TV executives (we had come down from Toronto to meet with representatives from @Home, Netscape, and Yahoo! to get the lay of the land prior to launching our own cable internet service in Canada – think Xfinity), and we were there just before everything exploded here in Silicon Valley.

In the end, we never met with Netscape; this was about a week before they went public on the stock market, and they ended up not having enough time for us. Nevertheless, we strolled into the rear of that industrial unit in Mountain View, dressed formally in our business suits and ties. I distinctly recall a few things from that meeting, including how stuffy I felt in my suit and tie while everyone else was in ripped jeans and t-shirts (yes, even Jerry Yang, who we met with that day). Also, when we walked in, the first thing we saw was not a formal business reception desk, but rather someone in the lobby sitting at a workstation with a huge screen and surfing the internet.

This was the first thing that struck me as odd about the meeting. She was going through links to add to Yahoo!, which at the time was little more than a manually curated directory and not even a search engine at all. When we entered the room, she was sitting there with a large dog sprawled across her lap and surfing the web. Someone greeted us as we entered the lobby and led the six of us dressed conservatively in suits and ties into a tiny conference room on the right side of the foyer. They told us to go to the kitchen if we wanted anything to drink and to help ourselves from the refrigerator there. I remember opening it and seeing that it was stocked full of Jolt Cola and Twinkies (Jolt used to be the go-to drink for developers pulling all-nighters; I guess you could consider it the first energy drink – before Red Bull and Rock Star). Anyway, I had no clue where all of this was going to end up, but for those of us Canadian executives who were accustomed to working in corporate IT, the working atmosphere was very different from what we were used to. (In hindsight, I realize that I should have definitely approached Jerry about a job at that same moment, but who knew at the time?) We spoke about developing Yahoo! Canada, which would be the company’s first edition available in a country other than the US. The conversations were fruitful.

There were a lot of intriguing attempts to categorize the web before Google came along. Some of these attempts were done manually through human curation, such as Yahoo!, while others were done algorithmically through services such as Alta Vista and Lycos, which were very significant at the time. Both had their flaws, but one of the reasons Yahoo! was so successful (they did practically invent banner advertising and were successful enough to buy the company that invented text advertising, which was formerly known as GoTo.com and is now known as Overture) was that there was nothing like human curation of the web. The links contained in the first version of the Yahoo! directory was of the highest possible standard since each and every one of them had been reviewed and approved by a real person. An algorithm had not been used in any of the efforts to cheat the system.

Now, back in those days, it may have been easy to keep a fairly up-to-date listing of the best stuff on the internet simply by hand curating it; however, of course, the size and scope of the internet exploded, and there was no way to meet the need simply by continuing to hand curate everything. There was no way to meet the need simply by continuing to hand curate everything.

The difficulty is that when you do rid of that manual curation, you end up losing a significant amount of the directory’s overall quality. It is a constant struggle for the developers at search engines like Google to tweak and tweak their algorithms in order to keep the most relevant content on top (oh, and of course, don’t forget the best paying ads). The links that are handed to you by the algorithm are just not as good or can be gamed. Humans are still required for the curation process.

Hand curation will invariably result in the delivery of the highest-quality and most pertinent content to you. But how can you manually select content when the internet is expanding at a pace of 500 percent each year (and that’s just an average; certain regions, like Africa and the Middle East, are expanding at a rate of 2,000 to 3,000 percent each year)? Crowdsourcing. But doing it the correct way using crowdsourcing. The concept of crowdsourcing in its modern application is essentially an expansion, in both senses of the word, of the interaction and community that have always been on the internet and even before it.

The problem with the idea of crowdsourcing is that it is mostly unstructured and, in essence, a crapshoot. You may post your question one day and receive a ton of really helpful replies in a very quick way, or you could post your query and get nothing at all in response. It really depends on the quality of the process, the question going out to the right crowd, etc. – there are a lot of parameters that need to be met in order for crowdsourcing to work. (If you want an example of this type of variability, just try posting a question on either of the above.) I posted a question on startups and I got an immediate response. Another day, I posted a question on kettlebell training and I didn’t get a response for months.)

That is only one illustration of it; other services, such as Amazon’s Mechanical Turk, use a model somewhat similar to it but also assign work to their users. For instance, let’s imagine that you wanted to conduct research on a specific topic (like promoting a local business on the internet), and perhaps create a list of the most prominent websites in a given category. If the crowdsourcing service was sophisticated enough, it would be able to share the task and compile the results; for instance, if we used the example from earlier, the system could potentially assign the task of choosing the best website for small business marketing to one hundred different people and then simply compile and sort their responses. This is one example of how a discrete task like this could be delegated to a crowdsourcing service. In this scenario, the crowdsourcing service would test and determine who the best humans for the task would be The majority of “crowdsourcing” done in today’s world isn’t really crowdsourcing at all because it usually involves delegating work to a single individual and doesn’t include any sifting of the replies.

Human curation on its own, in the form of the original Yahoo!, with individual surfers reading and assessing each site, is not scalable. However, a new, systematic method of crowdsourcing results, which allows for human input from anybody who interacts with a site, is quite feasible. My impression is that, in our pursuit of algorithmic and automated relevance, we have either completely ignored the requirement that humans are involved in the process of determining relevance or we have drastically downplayed its importance.

There are many who question the necessity of crowdsourcing in the first place. Shouldn’t we be able to train computers to perform the task instead of people eventually? First of all, there are a lot of jobs that we haven’t figured out how to program computers to perform, and second of all, there are a lot of things that need to be done that call for a human touch. Some of these questions may be answered in the affirmative, while others must be answered in the negative. The dilemma that arises therefore is this: how can we ensure that crowdsourcing is successful?

Consider the situation in this light. If we can’t even teach computers how to talk to us in a way that allows them to understand what we mean by what we say, how on earth are we going to train them to do work that requires essentially no effort on the part of a person but takes them a significant amount of time? Leveraging the power of the crowd will be essential in order to execute jobs that need human intervention, such as researching a topic or providing clarification on something.

I have no doubt in my mind that the next generation of incredibly powerful internet and web-based apps will not only need to capture precise intent, but they will also need to harness the crowd to deliver human insight. If we devise structured methods for processing the work for and from the crowd and use techniques in order to train and improve the crowd’s response, then we should have an incredibly powerful force of ability and knowledge that can truly create the next web. If we invent structured ways to process the work for and from the crowd, then we can improve the crowd’s response.

How do we make the most of the collective effort? Either we can employ the data that the community provides by examining and altering things like individual reviews and ratings, or we can make use of the crowd in real-time by utilizing some form of rapid reply system. Both of these options allow us to harness the power of the crowd.

As an illustration of how we might be able to benefit from or make use of crowdsourcing in order to enhance the dining experience at restaurants in the future, consider the following example:

Imagine that I have an appointment in San Francisco at three in the afternoon and that I am driving there. The system is aware that I eat supper at about six o’clock in the evening since this is the time that, according to my calendar and my past, is the normal time that I eat dinner. Because it has gleaned the location of my appointment in San Francisco from my calendar, it is aware of the place where I will be for my meeting that lasts from, for example, four in the afternoon to five-thirty in the evening.

On the way to San Francisco, while I’m driving up to the city, it knows that I’m going up to the city by the way because it has information from my GPS, it can determine that I’m moving 65 miles per hour up Highway 101. Since it is aware that I am traveling at 65 miles per hour, which is, in hindsight, a little bit improbable given that it is 3 in the afternoon on a weekday, it will know not to text me with the information about possible places to eat after the meeting because it is aware of the fact that I am traveling at that speed.

The system understands that I will be able to make it to my appointment on time due to the distance I have to travel, the pace I travel, and the amount of traffic, therefore it does not tell me that “hey, you know maybe should get off here and take 280 the rest away since 101 is congested.” It is aware of all of these things, including the precise times at which to speak to me and the times at which to refrain from disturbing me. It is aware that I am currently in the car and it may even be aware that I am listening to the radio. Alternatively, it may be aware that I am listening to Rhapsody or another musical application on my phone and thus is aware that I am listening to this musical application. Since it is trained not to bother me during the meeting, it knows that if it is going to ask me about dinner, then it has to ask me before I get to my meeting. Therefore, it waits for a time, say fifteen minutes before I’m supposed to get to the parking garage and park my car, and then it says, “Hey sorry to bother you but I realize that you might be hungry after your meeting so I’ve taken the liberty of looking at some restaurants in the area around where you’re having the meeting This is the point at which I can either answer “yes” or “no.”

It is aware of my preferences. It is aware of the dietary constraints I have. It is aware of the kind of foods that I enjoy eating, and if I’ve been to many restaurants in the region before and given them a positive rating on Yelp, it is aware of the types of restaurants that I enjoy in the surrounding area. After the meeting, you can find me saying things like “Yes, I would want to know where to go for dinner.” It may ask, “Are you interested in experiencing something different?” I’d give it a “yes,” and after that, the system would automatically de-prioritize the restaurants on the list based on whether or not I’d gone to them before, as well as take into account the fact that I enjoy steak. The very first thing that it does is indicate that there is a wonderful steakhouse that is only a few steps away from where you are having your meeting. It is known as the XXX grill. Do you want me to check for reservations or see if they have a table available for you? Do you want me to check on the table availability? So I say “Sure”. Following that, it would inquire as to whether or not “Shall I cook it for two because your wife is at another meeting not too far away and might be able to join you?”

It does a reservation check, as well as an internet check, a Yelp check, and an Open Table check. It will actually find a website, find the number, pick up the phone, call the restaurant, and say something along the lines of “I’m calling for Mr. Kalaboukis table for two at 6 PM do you have any available press 1 for yes 2 for no or enter a message that I can pass on to him.” If Open Table does not have an open table at this restaurant, it will find a website, find the number, pick up the phone, and call the restaurant. It is truly capable of making reservations for me just like a human concierge would do.

Do not try to convince me that we do not have the ability to carry that out at this time.

How does the system determine which eateries offer high-quality food? It did this in order to decide which of them was the best by collecting reviews from Yelp and any other sources that could be found and put them all together. Do you remember when it inquired as to whether or not I was interested in visiting a new location? At that point, it went out and retrieved some data. Let’s assume there was no information about nearby eateries that could be found online. After that, it would be sent to a network of experts who had previously shown their interest in becoming a member of an expert network that was capable of providing answers to inquiries in real-time. The system would ask to send a question in real-time, and the query would be something along the lines of “My boss is going to be in this region at that time, and he enjoys steak. Can you recommend?” This question would then be sent to a group of individuals who reside and/or work in the area in question. These individuals reply, the data is classified and sorted, and then it is delivered back to me. In addition to this, it is theoretically possible that we could offer the data to Yelp in order for them to include it in their database. Again, all of these systems are already operational; all that has to be done is to bring them together and put them into use.

In my opinion, we place an excessive amount of reliance on the ability of algorithms to provide us with the precise outcomes we seek. In order to arrive at a conclusion that is of any benefit, we are going to need the assistance of other people. It is possible that we will need to sift and update this information using programmed methods; nonetheless, it is imperative that we utilize human input at some point along the path in order to obtain the desired outcomes. Either we begin with a set of results generated by an algorithm and then apply human judgment to them in order to come up with a blended result that provides the advantages of both approaches, or we begin with a set of results generated by an algorithm and then apply human judgment to them in order to come up with a blended result that provides the advantages of both approaches. The human input could be at the beginning of the process, where we use actual human curators to pull together relevant content in a particular content area and then algorithmically revis

The methodical and effective use of crowdsourcing will become the second pillar of the web that will replace the current one. Work will be envisioned, assigned, diced, finished, recombined, and delivered, and it will truly respond to the user’s genuine purpose throughout the entire process.

Pin It on Pinterest