兜兜里有地雷
当前位置:首页 - 周公 >

如何用大数据做出正确的判断

2019-08-17来源:佛山视窗

Video from TED


Roy Price is a man that most of you have probably never heard about, even though he may have been responsible for 22 somewhat mediocre minutes of your life on April 19, 2013. He may have also been responsible for 22 very entertaining minutes, but not very many of you. And all of that goes back to a decision that Roy had to make about three years ago.

Roy Price 这个人, 在座的绝大多数可能都没听说过, 即使他曾经在 2013 年 4 月 19 日这一天占用了你们生命中普通的 22 分钟。 他也许曾经带给了 各位非常欢乐的 22 分钟, 但对你们当中很多人来说 可能并不是这样。 而这一切全部要追溯到 Roy 在三年前的一个决定。


So you see, Roy Price is a senior executive with Amazon Studios. That's the TV production company of Amazon. He's 47 years old, slim, spiky hair, describes himself on Twitter as "movies, TV, technology, tacos." And Roy Price has a very responsible job, because it's his responsibility to pick the shows, the original content that Amazon is going to make. And of course that's a highly competitive space. I mean, there are so many TV shows already out there, that Roy can't just choose any show. He has to find shows that are really, really great. So in other words, he has to find shows that are on the very right end of this curve here.

实际上,Roy Price 是 亚马逊广播公司的一位资深决策者。 这是亚马逊旗下的一家电视节目制作公司。 他 47 岁,身材不错, 短发梳得很有型, 他在 Twitter 上形容自己是 "电影、电视、科技、墨西哥卷饼(爱好者)"。 Roy Price 有一个责任非常重大的工作, 因为他要负责帮亚马逊挑选即将制作的原创节目。 当然,这个领域的竞争非常激烈。 我是说,其他公司已经有 那么多的电视节目, Roy 不能只是随便乱挑一个节目。 他必须找出真正会走紅的节目。 换句话说,他挑选的节目必须落在这条曲线的右侧。


So this curve here is the rating distribution of about 2,500 TV shows on the website IMDB, and the rating goes from one to 10, and the height here shows you how many shows get that rating. So if your show gets a rating of nine points or higher, that's a winner. Then you have a top two percent show. That's shows like "Breaking Bad," "Game of Thrones," "The Wire," so all of these shows that are addictive, whereafter you've watched a season, your brain is basically like, "Where can I get more of these episodes?" That kind of show. On the left side, just for clarity, here on that end, you have a show called "Toddlers and Tiaras" -which should tell you enough about what's going on on that end of the curve.

这条曲线是 IMDB (译注:网络电影资料库)里 2500 个电视节目的 客户评分曲线图, 评分从 1 到 10,纵轴表明有多少节目达到这个评分。 所以如果你的节目达到 9 分或更高,你就是赢家,你就拥有那 2%的顶尖节目。例如像是“绝命毒师”、“权力的游戏”、“火线重案组”,全部都是会让人上瘾的节目, 看完一季之后, 你基本马上就会想,“我要去哪里找到剩下的剧集?” 基本就是这类的节目。曲线左边,不妨选个最靠边, 比较明显的点, 这儿有个叫“选美小天后" (译注:儿童选秀类)的节目——足够让你明白 曲线最左端代表了什么。


Now, Roy Price is not worried about getting on the left end of the curve, because I think you would have to have some serious brainpower to undercut "Toddlers and Tiaras." So what he's worried about is this middle bulge here, the bulge of average TV, you know, those shows that aren't really good or really bad, they don't really get you excited. So he needs to make sure that he's really on the right end of this.

现在,Roy Price并不担心会选个落在曲线最左边的节目, 因为我认为你们都具备严肃的判断力来给 "选美小天后" 打个低分 。 他担心的是中间多数的这些节目, 多到爆的这些一般的电视节目, 这些节目不算好,但也不是很烂, 它们不会真正地让你感兴趣。 所以他要确保他看好的节目是落在最右端这里。


So the pressure is on, and of course it's also the first time that Amazon is even doing something like this, so Roy Price does not want to take any chances. He wants to engineer success. He needs a guaranteed success, and so what he does is, he holds a competition.

那么压力就来了, 当然,这也是亚马逊第一次想要做这类事情, 所以Roy Price不想只是碰运气。 他想要打造成功。 他要一个万无一失的成功, 于是,他举办了一个竞赛。


So he takes a bunch of ideas for TV shows, and from those ideas, through an evaluation, they select eight candidates for TV shows, and then he just makes the first episode of each one of these shows and puts them online for free for everyone to watch. And so when Amazon is giving out free stuff, you're going to take it, right? So millions of viewers are watching those episodes.

他带来了很多关于电视节目的想法, 通过一个评估, 他们挑了八个候选的电视节目, 然后他为每一个节目制作了第一集, 再把它们放到网上, 让每个人免费观看。 当亚马逊要给你免费的东西时, 你就会拿,对吧? 所以几百万人在看这些剧集。


What they don't realize is that, while they're watching their shows, actually, they are being watched. They are being watched by Roy Price and his team, who record everything. They record when somebody presses play, when somebody presses pause, what parts they skip, what parts they watch again. So they collect millions of data points, because they want to have those data points to then decide which show they should make. And sure enough, so they collect all the data, they do all the data crunching, and an answer emerges, and the answer is, "Amazon should do a sitcom about four Republican US Senators." They did that show.

而这些人不知道的是, 当他们在观看节目的时候, 实际上他们也正被观察着。 他们被 Roy Price及他的团队观察, 他们纪录了所有的一切。 他们纪录了哪些人按了拨放, 哪些人按了暂停, 哪些部分他们跳过了, 哪些部分他们又重看了一遍。 他们收集了几百万个数据,因为他们想要用这些数据来决定 要做什么样的节目。 理所当然,他们收集了所有的数据,处理过后得到了一个答案, 而答案就是, “亚马逊需要制作一个有关四个美国共和党参议员的喜剧”。 他们真的做了。


So does anyone know the name of the show? (Audience: "Alpha House.") Yes, "Alpha House," but it seems like not too many of you here remember that show, actually, because it didn't turn out that great. It's actually just an average show, actually--literally, in fact, because the average of this curve here is at 7.4, and "Alpha House" lands at 7.5, so a slightly above average show, but certainly not what Roy Price and his team were aiming for. Meanwhile, however, at about the same time, at another company, another executive did manage to land a top show using data analysis, and his name is Ted, Ted Sarandos, who is the Chief Content Officer of Netflix, and just like Roy, he's on a constant mission to find that great TV show, and he uses data as well to do that, except he does it a little bit differently. So instead of holding a competition, what he did-and his team of course-was they looked at all the data they already had about Netflix viewers, you know, the ratings they give their shows, the viewing histories, what shows people like, and so on. And then they use that data to discover all of these little bits and pieces about the audience: what kinds of shows they like, what kind of producers, what kind of actors. And once they had all of these pieces together, they took a leap of faith, and they decided to license not a sitcom about four Senators but a drama series about a single Senator. You guys know the show?

有人知道这个节目吗? (观众:" 阿尔法屋。") 是的,就是"阿尔法屋"。 但看起来你们大部人都不记得有这部片子,因为这部片子收视率并不太好。它其实只是个一般的节目,实际上,一般的节目差不多对应曲线上大概 7.4 分的位置,而 “阿尔法屋” 落在 7.5 分,所以比一般的节目高一点点, 但绝对不是 Roy Price 和 他的团队想要达到的目标。 但在差不多同一时间,另一家公司的另一个决策者,同样用数据分析却做出了一个顶尖的节目,他的名字是 Ted, Ted Sarandos 是 Netflix 的 首席内容官, 就跟 Roy 一样,他也要不停地寻找最棒的节目, 而他也使用了数据分析, 但他的做法有点不太一样。 不是举办竞赛,他和他的团队观察了 Netflix 已有的所有观众数据, 比如观众对节目的评分、 观看纪录、哪些节目最受欢迎等等。 他们用这些数据去挖掘观众的所有小细节: 他们喜欢什么类型的节目、 什么类型的制作人、 什么类型的演员。 就在他们收集到全部的细节后, 他们信心满满地决定要制作一部, 不是四个参议员的喜剧, 而是一系列有关一位单身参议员的电视剧。 各位知道那个节目吗?


Yes, "House of Cards," and Netflix of course, nailed it with that show, at least for the first two seasons.

是的,“纸牌屋”, 当然,Netflix 至少在头两季在这个节目上赚到了极高的收视率。


"House of Cards" gets a 9.1 rating on this curve, so it's exactly where they wanted it to be.

“纸牌屋” 在这个曲线上拿到了 9.1 分, 他们绝对实现了最初的目标。


Now, the question of course is, what happened here? So you have two very competitive, data-savvy companies. They connect all of these millions of data points, and then it works beautifully for one of them, and it doesn't work for the other one. So why? Because logic kind of tells you that this should be working all the time. I mean, if you're collecting millions of data points on a decision you're going to make, then you should be able to make a pretty good decision. You have 200 years of statistics to rely on. You're amplifying it with very powerful computers. The least you could expect is good TV, right?

很显然,问题来了,这到底是怎么回事? 有两个非常有竞争力、精通数据分析的公司。他们整合了所有的数据,然后,其中一个干的很漂亮,而另一个却没有,这是为什么呢? 毕竟逻辑分析会告诉你,这种方法应该每次都有效啊,我是说,如果你收集了所有的数据来制定一个决策,那你应该可以得到一个 相当不错的决策。你有 200 年的统计方法做后盾。你用高性能的计算机去增强它的效果。 至少你可以期待得到一个 还不错的电视节目,对吧?


And if data analysis does not work that way, then it actually gets a little scary, because we live in a time where we're turning to data more and more to make very serious decisions that go far beyond TV. Does anyone here know the company Multi-Health Systems? No one. OK, that's good actually. OK, so Multi-Health Systems is a software company, and I hope that nobody here in this room ever comes into contact with that software, because if you do, it means you're in prison.

但如果数据分析并没有想像中的有效, 这就有点吓人了, 因为我们生活在一个 越来越依赖数据的时代, 我们要用数据做出远比电视节目还要严肃重要的决策。 你们当中有人知道 "MHS" 这家公司吗? 没人?好,这就好。 好的,MHS是一家软件公司, 而我希望在座的各位没人与他们的软件有任何关系, 因为如果你有, 就表示你犯了罪被判刑了。


If someone here in the US is in prison, and they apply for parole, then it's very likely that data analysis software from that company will be used in determining whether to grant that parole. So it's the same principle as Amazon and Netflix, but now instead of deciding whether a TV show is going to be good or bad, you're deciding whether a person is going to be good or bad. And mediocre TV, 22 minutes, that can be pretty bad, but more years in prison, I guess, even worse.

如果有人在美国被判入狱, 要申请假释, 很有可能那家公司的数据分析软件 就会被用来判定你是否能获得假释。 它也是采用跟亚马逊和 Netflix 公司相同的原则, 但并不是要决定某个电视节目收视率的好坏, 而是用来决定 一个人将来的行为是好是坏。 一个 22 分钟的普通电视节目可以很糟糕, 但我觉得要坐很多年的牢,更糟糕。


And unfortunately, there is actually some evidence that this data analysis, despite having lots of data, does not always produce optimum results. And that's not because a company like Multi-Health Systems doesn't know what to do with data. Even the most data-savvy companies get it wrong. Yes, even Google gets it wrong sometimes.

但不幸的是,实际上已经有证据显示, 这项数据分析尽管可以依靠庞大的数据资料, 它并不总能得出最优的结果。但并不只有像 MHS 这样的软件公司不确定到底怎么分析数据,就连最顶尖的数据公司也会出错。 是的,甚至谷歌有时也会出错。


In 2009, Google announced that they were able, with data analysis, to predict outbreaks of influenza, the nasty kind of flu, by doing data analysis on their Google searches. And it worked beautifully, and it made a big splash in the news, including the pinnacle of scientific success: a publication in the journal "Nature." It worked beautifully for year after year after year, until one year it failed. And 

nobody could even tell exactly why. It just didn't work that year, and of course that again made big news, including now a retraction of a publication from the journal "Nature." So even the most data-savvy companies, Amazon and Google, they sometimes get it wrong. And despite all those failures, data is moving rapidly into real-life decision-making -- into the workplace, law enforcement, medicine. So we should better make sure that data is helping.

2009 年,谷歌宣布他们可以用数据分析来预测流行性感冒何时爆发, 就是那种讨人厌的流感, 他们用自己的搜寻引擎来做数据分析。 结果证明它准确无比, 引得各路新闻报道铺天盖地, 甚至还达到了一个科学界的顶峰: 在 “自然” 期刊上发表了文章。 之后的每一年,它都预测得准确无误, 直到有一年,它失败了。 没有人知道到底是什么原因。 那一年它就是不准了, 当然,这又成了一个大新闻, 包括现在被 "自然” 期刊撤稿。 所以,即使是最顶尖的数据分析公司, 亚马逊和谷歌, 他们有时也会出错。 但尽管出现了这些失败, 数据仍然在马不停蹄地渗透进我们实际生活中的决策—— 进入工作场所、 执法过程、 医药领域。 所以,我们应该确保数据是能够帮助我们解决问题的。


Now, personally I've seen a lot of this struggle with data myself, because I work in computational genetics, which is also a field where lots of very smart people are using unimaginable amounts of data to make pretty serious decisions like deciding on a cancer therapy or developing a drug. And over the years, I've noticed a sort of pattern or kind of rule, if you will, about the difference between successful decision-making with data and unsuccessful decision-making, and I find this a pattern worth sharing, and it goes something like this.

我个人也曾经多次 被数据分析搞的焦头烂额, 因为我在计算遗传学领域工作, 这个领域有很多非常聪明的人在用多到难以想像的数据 来制定相当严肃的决策, 比如癌症治疗,或者药物开发。 经过这几年,我已经注意到一种模式 或者规则,你们也可以这么理解, 就是有关于用数据做出成功决策和不成功决策,我觉得这个模式值得分享,大概是这样的。


So whenever you're solving a complex problem, you're doing essentially two things. The first one is, you take that problem apart into its bits and pieces so that you can deeply analyze those bits and pieces, and then of course you do the second part. You put all of these bits and pieces back together again to come to your conclusion. And sometimes you have to do it over again, but it's always those two things: taking apart and putting back together again.

当你要解决一个复杂问题时, 你通常必然会做两件事。 首先,你会把问题拆分得非常细,这样你就可以深度地分析这些细节, 当然你要做的第二件事就是, 再把这些细节重新整合在一起,来得出你要的结论。有时候你必须重复几次,但基本都是围绕这两件事:拆分、再整合。


And now the crucial thing is that data and data analysis is only good for the first part. Data and data analysis, no matter how powerful, can only help you taking a problem apart and understanding its pieces. It's not suited to put those pieces back together again and then to come to a conclusion. There's another tool that can do that, and we all have it, and that tool is the brain. If there's one thing a brain is good at, it's taking bits and pieces back together again, even when you have incomplete information, and coming to a good conclusion, especially if it's the brain of an expert.

那么关键的问题在于,数据和数据分析只适用于第一步,无论数据和数据分析多么强大,它都只能帮助你拆分问题和了解细节, 它不适用于把细节 重合整合在一起来得出一个结论。 有一个工具可以实现第二步, 我们每个人都有, 那就是大脑。 如果要说大脑很擅长某一件事, 那就是,它很会把琐碎的细节重新整合在一起, 即使你拥有的信息并不完整,也能得到一个好的结论, 特别是专家的大脑。


And that's why I believe that Netflix was so successful, because they used data and brains where they belong in the process. They use data to first understand lots of pieces about their audience that they otherwise wouldn't have been able to understand at that depth, but then the decision to take all these bits and pieces and put them back together again and make a show like "House of Cards," that was nowhere in the data. Ted Sarandos and his team made that decision to license that show, which also meant, by the way, that they were taking a pretty big personal risk with that decision. And Amazon, on the other hand, they did it the wrong way around. They used data all the way to drive their decision-making, first when they held their competition of TV ideas, then when they selected "Alpha House" to make as a show. Which of course was a very safe decision for them, because they could always point at the data, saying, "This is what the data tells us." But it didn't lead to the exceptional results that they were hoping for.

而这也是我相信 Netflix 会这么成功的原因, 因为他们在分析过程中同时 使用了数据和大脑。 他们利用数据, 首先去了解观众的若干细节, 没有这些数据, 他们不可能进行这么透彻的分析, 但在之后,要做出重新整合, 制作像"纸牌屋"这样的节目的决策, 就无法依赖数据了。 是 Ted Sarandos 和他的团队(通过思考) 做出了批准该节目的这个决策, 这也就意味着, 他们在做出决策的当下, 也正在承担很大的个人风险。 而另一方面,亚马逊把事情搞砸了。 他们全程依赖数据来制定决策, 首先,举办了关于节目创意的竞赛, 然后他们决定选择制作 "阿尔法屋"。 当然,对他们而言, 这是一个非常安全的决策, 因为他们总是可以指着数据说,“这是数据告诉我们的。” 但数据并没有带给他们 满意的结果。


So data is of course a massively useful tool to make better decisions, but I believe that things go wrong when data is starting to drive those decisions. No matter how powerful, data is just a tool, and to keep that in mind, I find this device here quite useful. Many of you will ...

当然,数据依然是做决策时的一个强大的工具, 但我相信,当数据开始主导这些决策时, 并不能保证万无一失。 不管它有多么的强大, 数据都仅仅是一个工具, 记住这句话之后, 我发现这个装置相当有用。 你们很多人就会......


Before there was data, this was the decision-making device to use.

在有数据之前, 这就是用来做决策的工具。


Many of you will know this. This toy here is called the Magic 8 Ball, and it's really amazing, because if you have a decision to make, a yes or no question, all you have to do is you shake the ball, and then you get an answer-"Most Likely"-right here in this window in real time. I'll have it out later for tech demos.

你们很多人应该知道这个玩意儿。 这个玩具称做“魔术 8 号球”, 它真的很奇妙, 因为如果你要做一个“是” 或 “不是” 的决策时, 你只要摇一摇这颗球, 就可以得到答案了—— “很有可能是”—— 在这个视窗里,马上就可以看到。 我回头会带它去做技术示范。


Now, the thing is, of course -- so I've made some decisions in my life where, in hindsight, I should have just listened to the ball. But, you know, of course, if you have the data available, you want to replace this with something much more sophisticated, like data analysis to come to a better decision. But that does not change the basic setup. So the ball may get smarter and smarter and smarter, but I believe it's still on us to make the decisions if we want to achieve something extraordinary, on the right end of the curve. And I find that a very encouraging message, in fact, that even in the face of huge amounts of data, it still pays off to make decisions, to be an expert in what you're doing and take risks. Because in the end, it's not data, it's risks that will land you on the right end of the curve.

事实上,当然—— 我已经在我人生中做出了一些决定, 虽然事后证明, 我当初应该直接用这颗球。 但,当然,如果你手里有数据, 你就会想用更尖端的方式 来取代这颗球, 比方说,用数据分析来得到更好的决策。 但这无法改变基本的设定。 这球可能会变得越来越智能, 但我相信,如果我们想达成某些 像曲线最右端那样 出色的成就,最后的决定权还是应该落在我们身上。 事实上,我还发现了一件非常鼓舞人心的事, 即使面对庞大的数据, 当你要做出决定, 想要变成一位该领域的专家并承担风险时, 你仍然会有很大的收获。 因为到最后,不是数据,而是风险,会把你引到曲线的最右端。


Thank you.

谢谢。

文字内容未经授权请勿转载!

转载文章地址:http://www.qbiki.com/zhougong/20813.html
(本文来自兜兜里有地雷整合文章:http://www.qbiki.com)未经允许,不得转载!
标签:
相关推荐
网站简介 联系我们 网站申明 网站地图

版权所有:www.qbiki.com ©2017 兜兜里有地雷

兜兜里有地雷提供的所有内容均是网络转载或网友提供,本站仅提供内容展示服务,不承认任何法律责任。