by Marc Schönwiesner
25 April 2014, Opening talk at the Erasmus Mundus symposium in Auditory Cognitive Neuroscience, Leipzig, Germany
00:30— These are all papers indexed by Medline. I had written a Python script to get the number of records per year in the Medline database, but it turns out that you can get this information with a single PubMed search: 1600:2100[dp]. You can then download a csv file of paper numbers per year in the sidebar.
01:00— Google Ngram
Changes in the way we publish
02:01— The cost of knowledge. I am currently faculty representative at a library committee at my university, and frankly I had underestimated how much drama, big money, racketeering, boycotts, and even connections with the weapons industry (not kidding) are part of this seemingly benign world of academic publishing.
02:15— That’s MGM. The average margin of companies in the US is around 7 to 10%. A back-of-the-napkin calculation suggests a average revenue per paper in 2011 of 2000$ (10 billion revenue divided by 5 million papers).
02:50— This problem is relatively recent. Elsevier and others started to buy journals from non-profit academic societies in the 70s, thinking quite rightly that they could raise the prize substantially without loosing costumers. See also: http://www.nature.com/news/open-access-the-true-cost-of-science-publishing-1.12676
03:40— The impact factor is the average number of citations that papers receive in a given journal over 2 years. It is generated by a private company (Thompson-Reuters) from a private database, using unreproducible methods. These authors bought the database for 3 journals from Thompson-Reuters, but the numbers did not add up: “When queried about the discrepancy, Thomson Scientific explained that they have two separate databases—one for their “Research Group” and one used for the published impact factor … When we requested the database used to calculate the published impact factors, Thomson Scientific sent us a second database. But these data still did not match the published impact factor data. This database appeared to have been assembled in an ad hoc manner to create a facsimile of the published data that might appease us.”
The distribution of citation counts to individual articles in a journal is highly skewed and approximates a power law. The mean is not representative and vastly overestimates the typical number of citations. For instance, Nature mentioned that 90% of their 2004 IF was generated by only 25% of the papers and that the great majority of their papers received fewer than 20 citations. Think about it: do you have a paper that was cited 20 times during its 2. and 3. year after publication?
A report by the international mathematical union calculated the probability that a randomly selected paper in a certain mathematical journal had at least as many citations as a randomly selected paper in another journal with twice the impact factor. The answer was 62%. 62% of time you’d be wrong when assuming that a paper in the better journal (twice the impact factor) was better (cited more often)! Of course, it is easy to look at the actual numbers of citations for individual papers now and disregard the journal impact factor altogether. Newer journals are trying interesting metrics to measure the importance of individual papers.
04:18— The open access is 3000$ and colour is 600$ per figure, plus 60$ per page, and 75$ just for submitting. Libraries pay another 3000$ for a subscription this year. (A personal subscription costs 1000$). As it turns out, we will have to pay for open access, because our granting organization requires it.
05:00— Bret Victor
06:30— At my university, the journal subscriptions cost 1700$ per professor and year. If we’d switch entirely to PLOS and pay 1300 per paper, we’d probably exceed those savings.
(8.5 million CA$ library budget divided by 5000 professors.
08:05— Here is an interview with another PLOS founder and Nobel laureate Harold Varmus.
08:12— Here are the first few sentences of the opening of a paper called Tragic loss or good riddance from around this time:
Traditional printed journals are a familiar and comfortable aspect of scholarly work. They have been the primary means of communicating research results, and as such have performed an invaluable service. However, they are an awkward artifact, although a highly developed one, of the print technology that was the only means available over the last few centuries for large-scale communication.
An example of successful post-publication review: “Science self-corrects – instantly”
Changes in the way we do science
10:10— Read the paper for free at PLOS Medicine.
10:32— The base rate fallacy, like all fallacies, comes from faulty brain wiring. Humans have no intuitions for Bayes theorem. Here is a famous example: In 1978 Casscells, Schoenberger, and Grayboys asked students and staff at Harvard Medical School (I’m paraphrasing): Let’s assume there is a very accurate test for a certain lethal disease: if you have the disease, the test will always show it. The test also has a high correct rejection rate of 95%, if you don’t have the disease, the test will show that with 95% chance. You are a doctor and in front of you sits a person who just got a positive result. What do you say to that person? How likely is it that he/she actually has this lethal disease? You also know that the disease has a prevalence of 1 in 1000 people who participate in routine screening. The most common answer was 95%. Only 15% of doctors gave the right answer (Casscells, Schoenberger, and Grayboys 1978; Eddy 1982; Gigerenzer and Hoffrage 1995). The correct reasoning is displayed below:
The squares represent 1000 persons. Only one of them has the disease (and a positive test result, red square). Fifty persons will have a false positive test result (5%, blue squares). The chances that per person with a positive result has the disease are only about 2% (1 in 51). Let’s rephrase the example. Now you are not a doctor, but a scientist, and in front of you sits, not a patient, but a hypothesis that just got the results of a statistical test (in the form of a p-value). This is part of Ioannides’ main argument.
Some think that there are principle problems in using p-values, ubiquitous as they are (Fischer would have over 3 million citations if people cared to cite him). For an intuition of one of the problems with p-values, check out the dance of the p-values demo by Geoff Cumming and read his paper.
10:40— Nature’s announcement “Reducing our Irreproducibility”
12:30— Uri Simonsohn’s paper.
13:20— The Chrysalis effect. This was in management research; there is no data for our field, but it would be interesting to know we do any better.
15:25— Evaluation of Replication Results. Of course, you can always check whether authors build on their own findings in subsequent studies, which would indicate that at least they believe in the results, and give a sort of lower bound on replicability.
Changes in the way we teach science
16:13— Here is a quick example concerned with Newton’s third law (this is mentioned in Mazur’s talk, see below):
A heavy truck and a light car collide head-on. During the collision, the force exerted by the heavy truck on the light car is: a) larger than that exerted by the light car on the heavy truck, b)they are equal, c) the light care exerts a larger force on the heavy truck than the other way around, d) they are not exerting any force on each other, they are just in each others way.
16:13— The plot can be found in this paper
17:38— Hestenes wrote: “Since the students have evidently not learned the most basic Newtonian concepts, they must have failed to comprehend most of the material in the course. They have been forced to cope with the subject by rote memorization of isolated fragments and by carrying out meaningless tasks. No wonder so many are repelled! The few who are successful have become so by their own devices.”
Carl Wieman:People do not develop true understanding of a complex subject like science by listening passively to explanations.
True understanding only comes through the student actively constructing their own understanding through a process of mentally building on their prior thinking and knowledge through effortful study.
18:30— A similar approach, called flipped classroom, was developed by two high-school chemistry teachers, Bergmann and Sams.
19:00—Another very real obstacle is that especially early-stage professors seriously risk their career by spending time on improving their teaching methods. Teaching success plays almost no role in tenure evaluations (and no-one actually measures teaching success). Aren’t educated young people the main product of a university?
Further reading: http://blogs.kqed.org/mindshift/2011/09/dont-lecture-me-rethinking-how-college-students-learn/