the NIPS peer review experiment and more cyber zeitgeist

image(happy 50th blog post) ❗ 😀 ⭐

hi all. have been collecting zillions of links on collaboration in science & have mentioned this subject quite a few times before on the blog. just waiting for an opportune moment/ zeitgeist buzz to share em. timing is everything! my “event detector” went off (and is still going off) with this recent exercise by the NIPS committee. an actual scientific investigation into the reliability of peer review. the review committee split in two and each half reviewed the same papers and there was a comparison of the overlap. the results were both predictable and not really that reassuring. a significant number of papers were rejected by one committee and accepted by another. ie what might be called “false negatives/ positives”.

and this result would be quite relatable by anyone who has used stackexchange and tried to anticipate voting on their posts. it can be really unpredictable and capricious. or in other words it has a sort of “predictable/ expected randomness” to it.

all this is of course is highly to be expected in any subjective measure of quality. but its a bit stunning that there seems never to have been a case of this exercise being done by any scientific committees before. and when one imagines how basically the same process is used in job reviews, hiring, and other major decisions such as awards, some of which even have millions of $ riding on them, then you quickly get the picture. but hey, thats life in the human world isnt it? committees run the planet eh? what is a board of directors other than a committee? and there is now quite a bit of sociological study of eg groupthink by committees.


as for this picture of unpredictability of human subjective response and fairness issues thereof, even kids could figure much of this out—and actually do at a young age. in fact that is far from a joke or facetious observation. its now newly realized from recent psychology research that even kids at a very young age, around 3, have an internal sense of fairness related to “work done”.[d8-10] amazing! so peer review is almost like a very deep instinct in humans. see the paper “Fairness in Distributive Justice by 3- and 5-Year-Olds Across Seven Cultures” (2009). and hows that for a scientific paper?

so the NIPS experiment is brave, commendable, predictable, and deserves a lot of further study. scientists are normally not so audacious, and one also would have expected it to have been done by sociologists or psychologists instead! but maybe the temptation for a meta-scientific experiment was strong, and am now wondering how the decision for the experiment was proposed behind-the-scenes! & there is some pride to be taken here that it happened in a computer scientific field. there was a similar/ reminiscent mini-scandal in physics years ago with the Bogdanov affair involving assertions of bogus papers. and that has also arisen more recently too with a somewhat stunning hundreds of “Wild West” open access journals accepting a fake paper.

the whole area of scientific peer review is getting a lot of attention these days with massive cyber and open science shifts going on. some of this is very well summarized in Nielsens book, New Era of Networked Science which is one of the best kept secrets around… re open science, eg this recent headline: “Gates Foundation to require immediate free access for journal articles”

new scientific peer review systems are being proposed all over the place, and some are actually being built. we can expect early ones to be uneven, unpolished, somewhat weak and awkward at times. but these are likely to grow rapidly into highly critical systems over the years and decades ahead. there might be shearing forces with existing “institutions” but which at heart, as scientists are discovering these days, are really corporations (such as science magazines/ journals etc.) with the footnote/ fineprint/ sometimes deceptively innocuous term profit-seeking.

peer review has a mystique and aura referenced in some contexts yet is a very difficult, opaque and error-prone process. cyberspatial peer review systems certainly are not a magic bullet, and some degree of “fuzziness” (to put it politely) will always be inherent to this process. however, there is large potential for improvements in transparency, coverage, etcetera. and some of this will lead to remarkable new developments that cant be predicted.

the NIPS experiment is making some waves and leading to a range of blogosphere reactions from near-yawn to impassioned, some drawing many comments, and there are many high-profile respondents on this issue. maybe a simple way to describe this is its like the scientific equivalent of American Idol? (uh, except without Jennifer Lopez?) oh and then of course theres the illustrious Sayres law, wink wink.

so these are some really great links on the subject collected over many months, hope you find something interesting, and this issue is sure to be in the limelight in the months, years, even decades ahead. its yet another paradigm shift in progress.

unfortunately on stackexchange there tends to be a lot of aversion and near-hostility toward reviewing arxiv papers, although this is to some degree expected and understandable. a rare case study is question [d5] on cs stackexchange where a question about a preprint claiming to prove graph isomorphism is in P got 10 votes and high quality/ upvoted responses eg by high rep users. however, they used many indirect means of measurement such as whether the author has been published before, whether they are at a university, etcetera.

and then theres “somebody-or-other-who-shall-remain-nameless” who has been thinking about this topic in general for many years, & spurred by this, attempted an answer below critiquing the human tendency toward relying on indirect/ 2ndhand evaluations (probably another measurable psychological attribute!), and talking about some of the historical unreliability of peer review in mathematics (and yes maybe got a teeny bit carried away), and this got quickly killed by a mod and they have zero patience in further discussion (as the expression goes we are not amused), and so far there is almost zero reaction by anyone else too. so the sword of damocles hangs over every peer reviewing system, and sometimes the harsh/ cruel axe falls. thunk!

yep, stackexchange “warts and all” may actually be a leading edge system/ model for peer review. on one hand, sometimes its instant feedback. (as Lennon said, “instant karmas gonna get you.”) actually it is quite powerful on the outside, but on the inside it can feel a bit capricious, oppressive, even tyrannical at times. a lot of it has to do with how much power particular users have. sometimes it can feel unbalanced. the heaviest users of systems are not necessarily the most fair/ unbiased. (sometimes its the opposite.)

the brand new physics overflow site already moderately active has a way to sandwich in peer reviews in with the stream of questions. wow! so peer reviews become “first class entities” of the system. [d6] theres another audacious experiment going on. not sure how it is working out yet. hope to see some statistics/ analysis somewhere. hope it works out.

the natural place to do peer review is on arxiv, but that system has been rather fixed as far as innovations for many years—and not saying thats a bad thing either, it might make a lot of sense to decouple the peer review system from the archival system, or not, its definitely a key design decision with uncertain implications at this time.

another semifamous case study is the Deolalikar proof.[e] this was a case of a P vs NP proof attempt that went “scientifically viral” a few years ago even to the point of being profiled in the New York Times. after this happened, thought it could be just the beginning. but it appears there are very few serious attempts on this problem or other major problems, so cases where scientific papers go semi-viral (esp on introduction) are rather rare in general. in other words, lack of recognition of serious attempts is generally not the problem with existing peer review, although it does have many other problems!

the Zhang twin prime proof did come very close or significantly exceed the Deolalikar event. (much more on all this theme in the volunteer tab/ section.) actually the recent Zhang paper gives huge reassurance that the peer review system functioned quite well in that case; accounts state the paper was quickly reviewed by top experts who were impressed and all agreed it was correct and its publication went on the fast track. so much for the “misunderstood/ isolated genius” stereotype sometimes verging on conspiracy theory that circulates informally among the public. Zhang was even quickly embraced by the community after his breakthrough paper, apparently being invited for lectures at top schools and maybe being offered positions.

so peer review is one of those areas, where even as transformative as cyberspace can be… the more things change the more they stay the same. its a leading area of “collective intelligence” and we are likely to see many more major shifts and occasionally even some sparks & fireworks in the coming years.

another remarkable study [d13] just released/ showed that introverts tend to downgrade extroverts in job reviews! fascinating.

when you think about it, peer review touches on some of the deepest parts of humanity. my peers review me, therefore I am.

➡ 💡 note! further/ many more examples and possibilities of cutting-edge cyber peer review documented in two sections on this blog: volunteer (namely polymath & the Deolalikar proof review) and chat, see ie Fukuyama, rotia, Abhishek, Feinstein, Gazman, realz slaw, and math blogoverflow dialogues, and/ or feel free to leave comments asking about any particular further details/ clarifications.

 

a. nips
b. sci journals/ MSM
c. blogs
d. stackexchange/ studies/ misc
e. deolalikar

 


this is a somewhat general meta-note also based on mathematics, scientific philosophy, psychology, and sociology countering some of the other views expressed in answers/ comments. there is an attitude by some that “if it had been proven, then people would know about it”. for example from a locked-down meta post on the subject from cstheory (perhaps accurately reflecting experts impatience with such claims), the following assertion:

Note that if a claim about progress on a famous long standing open problem (like “P vs. NP”) is credible at all, there will be discussions about it online in a short time after its announcement (e.g. on theory blogs like Computational Complexity, Gödel’s Lost Letter and P=NP, etc.) and if there is not then it is a sign that the claim is not taken seriously by experts.

to some degree this assumption about community reaction is correct, to some degree it is incorrect. yes, many in CS are interested in certain key problems such as graph isomorphism, they have high profile, and correct proofs on high profile problems are possibly (very?) likely to “spread like wildfire” with a near-viral like effect within the community.

however on the other hand, assuming this will always/ consistently happen involves several logical/ thinking fallacies. the simplest way to understand/ realize this is to study mathematics history and notice that in mathematics (which ofc is not the same but highly relevant to TCS), there have been very major proofs that were not recognized to be correct at the time they were presented, or were lost when presented (maybe because of a skepticism of their validity), and only “discovered” to be correct sometimes many years later. one of the most famous is Galois proof of the impossibility of the solution of the quintic via group theory. this story can be found in ch7 of Ian Stewarts book From Here to Infinity, “the duellist and the monster”. there are many other examples. also somewhat similarly, sometimes major/ important proofs are later found to be incorrect.

another example from recent memory is Perelmans proof of the Poincaré conjecture which was posted to arxiv but was not accepted by the mathematical community for years. to some degree this was due to lack of initial detail, but other specialists agree he basically sketched out the full, complete, correct solution initially.

another example along these lines is that of Ramanujan who was recognized to have quite brilliant/ advanced insight into math theorems but his letters to two British mathematicians went unanswered/ ignored(?), and he was not “discovered” until later by Hardy leading to one of the great math collaborations of all times.

another area to study wrt this topic is the Kuhnian paradigm shift which is generally accepted by many scientists. it observes that major shifts in scientific theories happen but may not be immediately accepted by the scientific community and there are many examples from history.

the following are psychological science phenomena that are known to exist and apply within the TCS/math fields as in all other fields of human endeavor and will therefore have some effect on scientific peer review

  • buck passing tendency of people to avoid responsibility
  • bystander apathy a tendency of people to not get involved
  • somebody elses problem people avoid an issue in critical need of recognition
  • diffusion of responsibility people do not take responsibility when others are present
  • tragedy of the commons the scientific arena and peer review are like a commons, and scientists would like their papers reviewed with clarity/ care, but to review other’s papers takes time without much/ major incentive.
  • free rider problem, some users utilize or benefit from a system but do not contribute and/or pay anything
Advertisements

3 thoughts on “the NIPS peer review experiment and more cyber zeitgeist

  1. Phillip Somerville

    I gather Stephen Wiesner’s paper on quantum cryptography took a decade to get published after initial rejection in the early 70’s. No doubt such examples are legion, as is the understandable urge of traditional journals to protect their reputations (& revenue stream!).
    The alternative to high value – low volume, formal review through a ‘fine seive’ is probably a second tier, ‘course sieve’, wiki-review platform for arxiv type articles? Not as prestigious but recognition never-the-less, of passing a more democratic & equitable level of review? Thanks for an interesting article vzn.

    Reply
    1. vznvzn Post author

      yes exactly there different levels of review quality just as there are different levels of paper quality and as they say in english “you get what you pay for”. some tiered-like systems are likely to emerge. but lets note that probably a staggering amount of scientific peer review currently going on is unpaid volunteer work. (and its not a great indicator of our social values that so much science goes unpaid or underpaid and seems to exist with some dissonance with the capitalist system.) theres a link above [b11] looking into the difficult issue of incentives (which alas is not paid a lot of attention in science in general). Nielsen also has some writing on the subject in his book. thx for dropping by.

      Reply
  2. Pingback: Blogs on the NIPS Experiment | Inverse Probability

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s