Yves here. Yet again, Naked Capitalism debunked what we correctly discerned as a seriously overhyped, planted story of a scary miraculous tech black box (an obvious tell was trying to take credit for the Brexit vote). Others took it up as the gospel truth. As usual, we were ahead of the MSM in ferreting out a con. Congrats to Marina!
The Guardian does get some credit for running the article that shredded its own original breathless account.
By Marina Bart (formerly aab) a writer and former public relations consultant, who thinks and writes about many things, including political economy, culture and communication
If you missed it, Lambert linked this morning to an article published Saturday in The Guardian that, well, proves me right.
For those playing Big Data Con Job Bingo at home, this story offers:
- Eton (aka “elite education”)
- Famous for dating a member of the Royal Family (in the US, this square would be for Kennedy/Kissinger connections)
- Attempts to have negative yet accurate information proving lack of credentials suppressed
- Treating actual expertise as fungible widgets
- Scientists so treated as widgets cutting ties, saying the CEO has misrepresented their work – “their work” being the actual foundation for everything Cambridge Analytica claims to do
- James Bond reference (best part: it was basically set design; he apparently never delivered any results in Indonesia at all)
- Meritocratic suckers in governments, the military (NATO!), the aristocracy and among the ultra-wealthy
- Direct evidence that the Cambridge Analytica algorithm can’t even discern gender and orientation correctly from a person’s Facebook data
Some of the most devastating quotes are in the underlying article published on ItalyEurope24. That’s where the one of the two scientists whose work supposedly forms the basis for Cambridge Analytica’s approach says (among other negative things) “But we found that no matter how much we tried to reign him in, he would make all kinds of claims that we felt we could not substantiate, and that is why we stopped working for him.”
In short, the man behind Cambridge Analytica has no background in computing, data science, psychometrics or psychology. The scientists he claimed developed the foundation of the program say he’s a liar who doesn’t know what he’s doing. There is no evidence AT ALL of this program working ANYWHERE to do ANY of the fancy things he is claiming. There is evidence that the program cannot even do the simplest first step towards understanding human beings by processing their Facebook data.
I believe that’s game, set, match.
Right now, Big Data is mostly looking like a Big Con. Facebook, Google and the NSA can certainly do things with your data. Some of those things are bad. Very bad. But much of what we have been told is either currently happening or visibly on the horizon is just smoke and snake oil. That the ruling class is pouring money into this without apparently understanding that they’re being snookered is a real problem. They’re going to use those faulty algorithms to deny people mortgages who should have them, and direct money to people and entities who shouldn’t get it. I shudder to think what our militarized police will do with the inaccurate profiles they will be told are as sound as fingerprinting.
In the meantime, however, we at Naked Capitalism have, once again, been vindicated. When I told Yves what I thought was going on with Cambridge Analytica, she made me explain the underlying evidence and reasoning in much greater detail than what ended up in the piece. (Really, it could have been so much longer…) She didn’t just trust that I was right. She made me prove it. Which I did. That lies at the heart of what is important about this site and this community, and what is going to make it such a source of strength going forward. The truth really is out there, and we can find it. We don’t have to merely wander blind in the smoke and haze of propaganda and elite deceit until we’re led off an unseen cliff into a contaminated sea.
Coming this week: a paradigm of persuasion to add to your critical thinking tool kit.
Over sold…. where’s the fainting couch…. swoon…
I wouldn’t be too quick to dismiss Cambridge Analytica. There are two key components to their ‘product’: a) it uses the maths behind Google PageRank algorithm (LSA/SVD) and b) choice architecture/nudging where asymmetric responses can be crafted to better effect.
Firstly, by using LSA/SVD Cambridge Analytica is getting better correlations/predictability than psychometric tests used by companies to screen employees. Its all relative but if you think Cambridge Analytica is crap then there are a lot of business making key hiring decisions with crapper technology/tools.
Secondly, the application of choice architecture/nudging techniques allows Cambridge Analytica to leverage asymmetric behavioural responses. This stops individual heterogeneity cancelling out the impacts of interventions. That is, using non-nudging techniques some individuals respond positively and others negatively leading to a zero-sum endgame. And yes, with policy interventions nudges not many have shown to create large effect sizes – but how big does an effect size need to be in order to swing an election? Not much.
I’m not surprised by the dodgy history given how unethical this technology is. Nudging is bad enough.
I do not know if and how Google PageRank algorithms (LSA/SVD) are the gold standard of testing etc.
But one thing I’m sure of: “there are a lot of business making key hiring decisions with crappy technology/tools”.
Having worked in the related fields of adult education and testing for businesses for some years, I think it is fair to say that because of the role of testing tools, their quality isn’t the first thing HR people are looking for. They first and foremost watch out for means to shift responsability, if their decisions are challenged. Q: “Why did you hire that stupid sucker?” A: “Bexause that highly reliable scientific test told me so”.
But it’s not quality of testing tools and of their inherent methods that matter, it’s appearence in scientific clothes.
You’re correct about that, and it goes on from there; Even if the program would work as claimed, it’s not all that hard to answer the questions the way you think that the recruiter would like to hear them answered. Don’t judge, but that’s how I got the job I’m in now. I answered the questionnaire knowing exactly what traits they were looking for. Turns out that it wasn’t such a bad move for me or them, I’ll be here 20 years in September, and will probably be here another 20 (assuming I live that long).
One other thing people in HR (and their bosses) should keep in mind, it’s very hard to pick the right people, you will always make mistakes. What’s even harder is knowing when it’s time to admit you made a mistake and fire them (pride I suppose). Most employers wait far too long, which is detrimental to both parties involved.
Psychometric testing is 99.99% based on voodoo science, admittedly solid marketing skills and glossy sales material. It is garbage with zero predictive value – the 0.01% is left open for when they actually use a trained psychologist to the evaluation.
Comparing one flavor of garbage with another is a null result.
It is well known that companies are really bad at making hiring decisions, and it has become much worse over the span of my career, as more and more of the process is now managed by “specialists”. We get a lot of people who “fit in” and are “nice” but who struggle to do the work. Maybe, if those algorithms were essentially drawing random samples it would actually be Better than the deliberate process of Failure that is used now?
You cannot do science hiring. No way. Unless you opt hiring all candidates at the same time to do the same job for some time (impossible, isn’t it?) and then decide. I think the utility of these tests is just that you have one decission helping tool. Whether this tool is good or bad is something that you cannot scientifically demonstrate. The problem, as with Cambridge Analytica, is to believe that it works when you cannot even demonstrate it scientifically.
Congratulations to Marina. Go on with it!!!
The BIG problem is if we end with policemen/lenders/… making decissions based on this crappy thing.
Yes, 100% agreement re psychometrics. And the Personality Psychology it rode in on, including but not limited to Myers-Briggs.
I think you missed an important statement in this article:
<blockquoteThat’s where the one of the two scientists whose work supposedly forms the basis for Cambridge Analytica’s approach says (among other negative things) “But we found that no matter how much we tried to reign him in, he would make all kinds of claims that we felt we could not substantiate, and that is why we stopped working for him.”
The algorithm “may” (and, note, may is in quotes because I haven’t seen any validations of its use in this context or any statistical analyses of error) be sound, but the fact is, that Cambridge Analytica is making claims that go beyond the constraints of the algorithm……
I really like the phrase “where asymmetric responses can be crafted to better effect” – but it’s still a matter of words words words words + challenges of basic control theory.
Would you mind publishing somewhere the “long-form” debunking that you presented to Yves?
The original NC article was great, and more details on the nooks and crannies of what those tools can/cannot do would be great.
I second the motion. Explanations of underlying reasoning and evidence are always welcome here.
I will try to incorporate relevant pieces going forward when discussing any of the related issues. When the series is complete, if people really want to see all the background reasoning, I might consider doing an annotated version of the CA piece. The problem is, where and how would I make it available? The original piece was already long for this format. No corporate publication would touch this material, unless the culture changes dramatically and quickly.
Let’s see if I can figure out a way to communicate what you’re asking for in an easier to digest context.
The NYT chimes in,
Bold Promises Fade to Doubts for a Trump-Linked Data Firm,
with no acknowledgement of either the NC or Guardian coverage. Because Newspaper of Record™.
I stumbled on a similar story about a Canadian company which is claimed to also have been instrumental in Brexit, using similar tools as Cambridge Analytica. Perhaps this company has already been debunked since, but I will offer it up to you all anyway for purposes of tidying up loose ends–company is called Aggregate IQ, here is the link:
http://www.huffingtonpost.ca/2017/02/28/aggregateiq-brexit_n_15065932.html
If you read the Guardian piece from this weekend, it mentions Aggregate IQ. There was some kind of relationship between it and Cambridge Analytica, which is now being disavowed. It wasn’t clear from that section which party (or both) was insisting on the disavowal.
But the quotes from AIQ’s president (I also read the HuffPo piece) does not suggest any psychological manipulation or advanced data mining. It sounds like pretty traditional direct mail-type segmentation and messaging, just done via Facebook’s platform. It pushed a very simple, consistent message to people who self-identified as Leavers, if I understand him. No woo-woo at all. Fundamentally primitive GOTV. That is not the secret sauce the Democratic Party is searching for.
Artificial Intelligence has always been bunk. All algorithms are programmed by HUMANS. Human thinking ALWAYS contains suppositions and points of view. Therefore AI is subject to error as a predictive tool in every case. Cambridge Analytica is nothing more than a slick marketing/PR tool.
Dig into every scientific hypothesis and you will uncover first principles which are presupposed and not questioned.
Take vaccine development for example. Historically it was believed that the human immune system could be manipulated to boost when a fragment of a virus was introduced thus making the human immune to that virus. One of the first principles or presuppositions by science was that the human immune system would only attack the introduced virus and therefore would not attack itself.
That first principle has revealed itself to be untrue in some cases.
In the old days the study of Aristotle revealed the importance of correct first principles. AI tools all contain the creators’ first principles/suppositions so AI is only as good as the programmer.
I think you are confusing Artificial Intelligence with Big Data. They aren’t the same. Artificial Intelligence is just trying to teach machines to do things that human do. Big Data is the analysis of data to reveal patterns.
Yes it is true that humans think individually. But there are patterns in how humans as a group behave. It really is no different than how atoms behave. Each atom has its own motion that cannot be predicted. But the behavior of a group of atoms is why Chemistry works as a science.
The beauty of science is that it keeps testing those “first principles”…..
‘Science’ does not test first principles (aka assumptions) unless human scientists specifically design means for doing so. Even then, one is left with an infinite regress of assumptions made in order to test assumptions, etc.
A community of self-critical human beings should be able to handle that regress, at least sufficiently well for the scientific project to (eventually) advance. But self-criticism cannot be reduced to algorithmic syntax, whether associated with the labels ‘AI’ or ‘Big Data’ or ‘Machine Learning’ or ‘Deep Learning’ – or whatever else will come down the pike.
Two books sharing the same idea, and read long ago wised me up to Big Data. The first, by a pair of mathematicians, made basic math, statistics and the need for their everyday uses approachable for the lay person. A point in their section about big data, was if you flip a coin often enough, 100 “heads” in a row will occur. Our brains are wired to detect patterns and assign meaning. In this context however, 100 “heads” signifies nothing.
The fiction author John Sanford (Camp) put that to service in a novel about Big Data in government hands, and made the point that the number of false positives (100 heads in a row) would render the program ineffective for the stated purpose of “keeping us safe”, but would be wondrously effective at suppressing dissent.
And if Big Data can’t perform, it will be made to seem to, to keep us peasants in check.
What are the names of those two books?
What the Numbers Say by Niederman and Boyum, and The Hanged Man’s Song by John Sanford. Totally different genres but both informative.
Tell it to Hillary and the Stay voters.
Cambridge Analytica, another example of fake news.
Big Data has always been a solution in search of a problem (which is typical of big tech, hence the phrase “old wine, new bottle”); so much so in fact, this article last year was saying how companies are spending their money on other priorities rather than Big Data projects.
As far as AI, only IBM’s Watson has begun rolling out as tailored, specific job replacement apps recently, and last year Google’s AI project won a game of Go with a ranked player.
Other than that, AI is still pretty much in the crawl stage, certainly nothing for changing peoples’ minds for elections.
Where politics and elections are concerned, it’s always worth remembering the words of Dick Tuck:
The People Have Spoken, the bastards.
It’s almost as if the need isn’t for actual analysis, but the appearance of rationality.
I wonder why that would be…
Sounds like someone else we read a lot about!
Cambridge Analytica may well be over-selling the current effectiveness of its services, but there’s no reason why the model they describe wouldn’t work.
Train a model from the public + FB data of a few thousand people who have completed an OCEAN test, and you should be able to predict OCEAN scores pretty accurately for any user, given the same data points.
Did you read the piece? You can’t claim this one is too long. You can get what you need from the bullet pointed list at the top.
The scientists who the CEO claimed developed the model say it doesn’t work.
A woman is identified and quoted who said she input all her Facebook data into Cambridge Analytica’s system, and it was incapable of identifying the simplest things about her.
Training a system to recognize and then predict OCEAN scores — which we have no evidence has happened, despite ENORMOUS resources and effort put to the task — does not directly or inevitably lead to creating nuanced segments that can be manipulated to deliver desired political (or any other) results.
Whether or not the model “wouldn’t” work, it DOESN’T work. And there is no evidence suggesting that it ever will.
You seem to misunderstand the significance of CA being unable to distinguish between genders and orientations. That is a gigantic red flag. Our biology in these areas tends towards an A/B model by design, which our culture then relentlessly and rigorously conditions and enforces. If the algorithm cannot yet distinguish between the data produced by a straight* woman and a gay man, why is there any reason to believe that in any of our lifetimes, it will be able to tease out much less defined personality categories like “neuroticism” and “conscientiousness”?
*I am presuming her orientation based on the context of her quote in the article.
And its based on a rather big assumption, that humans can be predicted with any meaningful certanty. It smacks of deteremanistic thinking that the rest of the scific comunity gave up n the 50s’.
Now CA state they didn’t have FB data to train on anyway, with Ocean scores as the target to learn. Even if they did, then in deployment mode they’d need the FB data of the FB users to predict on and they definitely don’t have that. FB doesn’t let them have it.
So the model can’t be put to use.
These models are fragile to their assumption sets, and the assumption sets are only valid if you assume the future will be exactly like the past.
See Nassim Taleb.
In theory, there’s no difference between theory and practice …
“Direct evidence that the Cambridge Analytica algorithm can’t even discern gender and orientation correctly from a person’s Facebook data”
It’s a quibble, but that proves very little. Facebook is mostly writing, and female writers have always been able to “pass” – that is, disguise their gender. One fairly recent example was James Tiptree, a brilliant mid-20th science fiction writer; nobody caught on until she was established enough to come out. SF in those days was quite the boys’ club – though Ursula LeGuin managed to reign supreme under her own name.
And there were several even more famous examples in the 19th Century, when it mattered more, both called George. I’ve no idea how many gay or lesbian writers managed to conceal their orientation until it didn’t matter any more. Anyway, the point is that people can’t tell, so why would an algorithm be able to? Which I suppose is the real point: the claim was always implausible.
@Oregoncharles
Given your final point, what did you mean by ‘that proves very little’?
Please see my reply above to Johan Testad. This was not a test of whether or not the algorithm could identify a woman pretending to be a man based on writing style. This was (as far as we know from the limited information in the article) a straightforward data dump, exactly the way the system is expected to perform its baseline functions.
Identifying gender should be the actual easiest thing for the system to do. If it can’t do that, it certainly can’t figure out how to manipulate people into acting against their interests.
Seems to me applying this stuff for job seekers misses the forest for a couple of shrubs. The environment one is working in, the conflict of personalities, the location, the commute, the asshole boss, the hot secretary, the crappy coffee, on and on….that may be a larger factor in one’s success than the data analysis saying one is competent at a set of skills. So yeah, better get on the behavioral big-data model for your local office. Then just extrapolate that out to the corp as a whole. I’m sure that will work just fine.
anything to abolish consciousness. that’s what they’ll do.
This post seems awfully short! ;-)
I found the political emails I received to be so ham-handed, so asinine, so bloviating, so cynical, so disingenuous, so effingly pandering, so fallacious (if not fellatious), so gratuitous, so hysterical, so idiotic, so jejune (yes that’s a word that’s not only apropos but it starts with “j”), so kvetching, lurid at times, monstrous at times, nauseant, odious, — not to be too critical — petty, queasy, risible, stultifying, tedious, unctuous, vapid, wrathful, so xanthic, yawn inspiring and zealotrous, that I can’t believe anybody actually reads them!
But if they do, they won’t anymore after they read 4 or 5 of them. It reminds me of the Eagles song:
Well I found out a long time ago
What they’ll try to do to your soul [that wasn’t the real Eagles line]
Aw but they can’t take you anyway
you don’t already know how to go.
Maybe that means they’ll make the data even bigger. Uuuuuuge data. Data so uuuuge you just look at it and say “Oh My God.” But you still won’t do anything you don’t want to do. hahaha
Maybe they’re like the nigerian emails, purposely designed to filter out all but the greediest,dimmest and most desperate.
Thanks for pointing out that the elite are squandering money on the Big Data rabbit hole. Money that could, and should be devoted to higher worker wages, is wasted on a marketing boondoggle. Maybe they can be shamed into acting more humanely and responsibly. Just one mode of attack.
Not enough can be said for building a human economy. Big Data, and whatever comes of it, seems to be its antithesis.
Marina, you do great work, thanks.
“…the elite a squandering money on the Big Data rabbit hole.”
Taleb offers an interesting take on why the elite would squander money in this way. He’s not talking about Big Data, but I think the general idea applies.
“Only The Rich Are Poisoned: The Preference of Others”
“When people get rich, they shed their skin-in-the game driven experiential mechanism. They lose control of their preferences, substituting constructed preferences to their own, complicating their lives unnecessarily, triggering their own misery. And these are of course the preferences of those who want to sell them something….
…
“It is easy to scam people by getting them into complication –the poor is spared that type of scamming. This is the same complication we saw in Chapter x that made academics sell the most possibly complicated solution when a simple one to the problem can do.”
https://medium.com/incerto/only-the-rich-are-poisoned-the-preference-of-others-c35ddf65cf68#.ypflj6lpr
Big Data. Big Con. Big Time. Great post.