The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's.
More specifically, it was really AlexNet, the 2012 ImageNet entry, running on two NVIDIA GTX 580's, that highlighted the practicality and utility of running large scale neural nets on affordable hardware. CUDA had been released in 2006, but cuDNN (the CUDA library for neural nets) didn't come out until 2014 - after AlexNet had already kickstarted the demand.
What followed from AlexNet was a few years of intense competition on the ImageNet benchmark, and larger and larger/deeper neural nets (CNNs), which gave rise to a lot of the algorithms and concepts still used today such as residual connections (originally from ResNet), ADAM (training algorithm), ReLU/etc, normalization, dropout, etc... all the fundamentals that made building large neural nets possible.
Schmidhuber's continual reminding everyone that he was working on neural nets back in the 1990s is beyond tiresome. Yes, he should have been recognized alongside Hinton/Bengio/LeCun as one of the pioneers, but time for him to get over it.
I agree. I also think it's about the hardware and, obviously, recognizing AD as the fundamental primitive.
Particular architectures don't matter so much yet. It's quite possible that S3-Mamba or xLSTM could be used in lieu of transformers and we would still have LLMs.
And Google's acquisition of DNN Research to get the ball rolling with conv nets and AI moneyball, followed by the acquisition of Deepmind. Schmidhuber IMO *has* been recognized as one of the 4 horseman and rightly so, but what has he done lately? Just noticed they now say the 3 godfathers of AI. This is what people hate about academia. It's not academia itself, it's the mean girl politics that emerge from the tenure system. And at this point, tenure should be abolished IMO having been utterly weaponized to defend the status quo.
> The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's
I disagree. But more critically, I'd argue it's the legacy of the PDP project that led to what became foundation models today.
The PDP project was very early - relevant in term of neural net history of course, but hard to see much there relevant to today's large models other than Hinton's reinvention of SGD as an alternative to the layer-wise training that was then the norm.
One interesting thing to note from the PDP handbook are mentions by LeCun and Hinton of what would later be called CNNs, which LeCun claims to have invented. It seems that Hinton deserves just as much credit as LeCun, and in any case these are discussed just as locally connected models using shared weights as an optimization.
2012 really fundamentally changed everything for the AI community, I’d argue because tensorflow/keras/pytorch followed and that made the infrastructure accessible for distributed training.
It's sad that he is the only one speaking out about Hinton. This whole Hinton glorification seems like it's being pushed by an agenda. I'm not sure if he would receive this much attention if he held a different view (closer to LeCun or Ng), rather than these Effective Altruism takes on current AI.
I don't associate Hinton will Effective Altruism. He did switch to focus on warning of the dangers of AI, but that was after he already was established as the father (or one of them) of deep learning.
Read through his papers, and these are substantial accusations. Perhaps Hinton et al. should investigate, and either a) correct themselves by properly citing these Munich researchers; or b) proove they did not base their work (unlikely) on these papers;
And then, as a whole, this weighs in favor of European scholars and also should properly inform the funding of similar research in the EU.
Writing the last in the light of a month-and-a-half wait (to date) for EuroHPC to process their own form where we submitted a funding request by no less than University + Private Company already established in the area + 4 alumni, two PHDs and one postdoc. Zero response since.
worth separating: LSTM (Hochreiter & Schmidhuber 1997) is ironclad and widely cited. the transformer attention priority claims are far shakier. conflating them is how Schmidhuber undermines himself
Yes, and notable how Alex Graves, one of Schmidhuber's students, later at DeepMind, doesn't even mention Schmidhuber in his historical overview of attention mechanisms "Attention and Memory in Deep Learning".
When it comes to attention, details matter, since the idea itself is obvious - weighted inputs, and implicit attention is present in every neural network - this is what weights are.
The specific form of attention used by the Transformer is key-based associative attention, aka "Bahdanau attention" introduced in Bahdanau's paper "Neural Machine Translation by Jointly Learning to Align and Translate". It's perhaps worth noting that the word "attention" is barely even mentioned in this paper, other than noting that this weighted input mechanism can be seen as a form of attention (presumably mentioned since attention was at that time a recurring theme in various types of neural network).
Bahdanau attention - not just the general concept of attention - seems to be a very critical piece of the Transformer architecture since this this is what allows the Transformer to find things in context and is behind the "induction head" mechanism that appears central to how Transformers operate.
There's this crowd on HN which is very vocal against academia. From what I've seen, the main points are that academia isn't efficient, most of the science coming out of academia is useless and that the whole system is just a waste of taxpayers money. Instead, what is often argued, all good research is done in private labs. Then pointing to SpaceX, Moderna, OpenAI, Google, etc.
And while it is very true that often the research coming out of Academia is useless, what is always neglected are the roots of the research done in private labs.
When Jürgen Schmidhuber and team published their work on Neural Nets back in 1991 it was also useless. Unless you had a supercomputer and very, very deep pockets you were not going to do anything with what came out of their lab.
But still, 30 years later here we are, standing on top of the shoulders of this useless research.
Like half of what Schmidhuber is always complaining about is that (except for LSTMs) people aren't standing on the shoulders of his research very much. They try to solve some of the same problems people have always wanted to solve, try some of the same approaches people always tend to try, and then tinker until it works. At no point do they consult Schmidhuber's decade-old papers where he tried something kind of similar but didn't get very impressive results, and hence they also do not think to cite him. Then he comes out of the woodwork to assert priority.
What you're describing is people who fail to follow the most basic principles of academic research. (Check existing academic literature, mention and give credit to prior work.) This would be fine if these people didn't claim to be doing scientific research, didn't boast their academic credentials, didn't publish their finding as original work and didn't demand credit for their work in academia. Of course, they do all of these things. They benefit from a system they're actively denigrating (and in some ways degrading).
To put it more simply, people with academic credentials should not demand acknowledgement of their current intellectual work while denigrating and ridiculing the importance of very similar work done in the past.
Shane Legg was in Schmidhuber's lab at IDSIA before being one of the founders of DeepMind, so he probably read the papers personally and knows what influenced him or not...
"if you haven't read them you also shouldn't cite them" -- this is wildly incorrect in an academic context. If I'm using ResNets, I should cite the original ResNet paper, even if I haven't read it. If I'm using Transformers, I should cite the original Transformer paper, even if I haven't read it. If my work is a direct extension of method B, and method B is a direct extension of method A, I should cite the source of A, even if I haven't read it.
You can't claim independence from past work simply because you didn't look directly at it. The job of an academic researcher is to know the landscape of relevant ideas, where they come from, where they're going, and to hopefully contribute a few new good ones.
Citation chains should extend back from your work, along a reasonable line conceptual inheritance, back to a reasonable point of origin. Schmidhuber has different definitions for both of these reasonables than the bulk of the ML research community, to a point that makes him difficult to satisfy.
> this is wildly incorrect in an academic context. If I'm using ResNets, I should cite the original ResNet paper, even if I haven't read it.
Eh, I think the correct answer is: read it, then cite it.
You're not really supposed to cite something without reading it, as it might say something different than you think. But sure, citing it w/o reading it is better than not citing it at all.
It's worth pointing out that sometimes, some papers just become part of the general context of things and are no longer explicitly cited. Or people cite textbooks or general survey papers instead.
Your Paper C does not need to cite Paper A unless you are discussing some aspect of it that Paper B is not. Otherwise you inherit the A citation via B.
You make it sound like all original ideas from academia must be cited all the time, even if that was not the source of someone’s inspiration.
If I’m in the private sector, and I rediscover something from first principles, it is not my responsibility to go search all academia to see if someone’s done it before so I can cite their work.
If I rely on a code library that doesn’t explicitly cite papers it was built on, it is also not my responsibility to go find all the papers that it might’ve been built from and cite those papers.
> Of course, but if you haven't read them you also shouldn't cite them.
But if you build on them you should have read them. I don't know about the specifics and I don't know if Schmidhuber is out of line or not, and citations and impact factors are a terrible mess, but generally speaking, you are responsible for finding and reading and citing any related work that needs to be cited, and if you work on neural networks in an academic context you probably have been forced to read that particular one at some point. Citation obligations don't just disappear because you don't want to do the research.
I do a lot of work that is based on academic research, aka building a proprietary sparse embedding model. My issue with academia is that they don’t bother to solve the practical issues. They tell you how to build a PPMI model, but what about hitting a database that’s 500TB to find co-occurrence numbers? This isn’t even touched so you’d then have to go and invent a bazillion of algorithms yourself to make your life easier. So while the bedrock is based on academic research and we thank them for that, scaling anything requires a lot of work in uncharted territories.
Well, yeah. That's why we have "research & development" as a term.
What you're referring to is the "development" part of that. In some sense: the job you have _exists precisely because it's not part of the research phase_, and it's equally as valuable as the research part. Research is the proof of concept; development is scaling up and making production-ready and finding small efficiencies and so on.
From an industry perspective, it's tempting to conflate these, because that's what industry research labs are designed to do: integrated R&D. But that is not at all how academic research labs work.
But that isn't the purpose of academia -- the purpose of it is to discover new phenomena not to make products. It is true that there is a lot of work to turn a new advance into a product whether it is software or turning biological knowledge into a drug, but without discovery of new phenomena new products will come to a halt. While it is true that some corporate labs, most famously Bell Labs in its heyday, but also for example IBM's T.J. Watson and Xerox's PARC did do basic research besides product-focused work, this is pretty rare because it is hard to justify the cost of something that may only be practical in decades and often help your competitors as much as yourself.
> My issue with academia is that they don’t bother to solve the practical issues. They tell you how to build a PPMI model, but what about hitting a database that’s 500TB to find co-occurrence numbers?
Soon we will also blame academia for not providing iOS and android apps
I did that too. Ending up building my own reverse index with a fixed-size vocabulary. But that's my issue, you start building one product and you end-up building ten in the process to solve all edge cases because no one bothered to research how things scale.
The practical issue of academia is epistemological. It's about learning how a phenomenon came to exists. If you are looking for efficiency the field of academia related to learning how to do so is computational complexity and it works quite well.
The goal of academia isn't to be practical, "only" learning.
I think most people forget the graph-like nature of scientific research. You don't have n useful papers and m useless ones by themselves, you have an interconnection of those. There may be isolated cliques of uselessness, but there isn't a clear correlation between academia and private research.
Many ideas come from philosophy, which many find useless.
Heraclitus discovered change back in ancient Greek, I don't know where we would be in scientific research without that (deliberately ignoring the debate about the originality of what we know about Heraclitus work). I bet his contemporaries found his "research" useless.
The closest to that that I've seen is that traditional academia approaches are too far removed from practical applications for highly applied fields like software engineering, or too slow for fast-moving fields like modern day ML (thus, all the preprints).
Private labs feed off academia. Without academics to staff them, they'd get a lot less far.
I used to work at Nokia Research when they still made phones. Probably the closest thing Europe had to Silicon Valley twenty years ago. Except it was in Helsinki. Lots of stuff got invented there. Nokia didn't really manage to capitalize on its own inventions of course. Or rather it got caught up in its own clumsy attempts throwing babies out of the window by the bucket load. But others sure did. A lot of modern smart phones still have tech in them that Nokia pioneered before either Google or Apple shipped a smart phone.
At the time there was a lot of talk about the demise of industrial research labs. Bell labs (now actually owned by Nokia!), Xerox PARC, IBM, and all the other big US labs that produced amazing stuff are former shadows of themselves. There is some truth in that
But you could argue that Google and Apple picked up some of the slack. And the current AI boom came out of Google cherry picking all the best universities for their AI talent and putting them all together in a research group that then got free reign. Like Nokia, that involved a lot of ejecting of babies with the bath water. But it seems to have spawned lots of new startups that can trace their roots back to that research group in Google.
It's like the old saying "only 10% of my marketing budget is making a difference, I just don't know which 10%"
You don't know ahead of time, where the breakthrough will come from.
There is ton of research that sits on the shelf, and then years later, it gets re-combined with some other useless research, and boom, some big breakthrough.
This current attitude of all research is worthless, so it should be cancelled, is shooting our future selves in the face.
Every western academic nearly systematically ignores eastern science and philosophy: classicism means "western European". Never mind Europe only flourished intellectually post Islam, which imported the science and engineering of China and India, critically including printing and zero[0]. IMHO this is why distaste for academia grows: it's based on appeals to authority which are demonstrably farcically misplaced. Alternatively stated: the emperor has no clothes, much less silk or paper!
Just as the Dewey Decimal System really only served the purpose of providing the facetious nominal linearization of an arbitrary depth ontological oversimplification, so too humans are much more like random pattern matching machines than festidious sense-makers glued to absolutes derived from false appeals to static mono-perspective ontological hierarchies. The same is becoming lived experience in the LLM age, although the tiktokked youth apparently cannot string ten words together or focus longer than three seconds to attest, I'd wager they can feel it. Are we losing something by rejecting the habit of rigorously manually tending to spurious and temporary ontologies? Yes. Is it necessarily a loss in the long term? Probably not, in the same way we no longer write long-form letters or leave calling cards. Are we gaining something in response? Yes, at a minimum much stronger cross-pollination between ivory towers by fearless exploratory pragmatists who disrespect the would-be scope of nominal professions in favor of holistic thinking... both AI and human.
Practically no one is against hard science research, properly conducted. The issues are rampant fraud / p-hacking / unreproducible garbage mixed with an unhealthy dose of ideological monoculture and indoctrination, garnished with rising tuition prices while sitting on huge endowments in case of the Ivy Leagues.
There is a long list of grievances I have regarding the (mis-) use of taxpayer money, and funding the hard sciences is way, way down. I can’t even see it from where I stand.
> From what I've seen, the main points are that academia isn't efficient, most of the science coming out of academia is useless and that the whole system is just a waste of taxpayers money. Instead, what is often argued, all good research is done in private labs. Then pointing to SpaceX, Moderna, OpenAI, Google, etc.
Well... that's "starve the beast" in action. A lot of things we take for granted, that underpin our modern ways of life, came to be due to government investing. Laser, radar, microwaves, the early Internet, that all was military R&D.
"Unfortunately" (well, for the rich and the MIC, at least) there is no way for people to siphon off money in government-funded research, so once the libertarian/small-state BS completely took over following the collapse of the USSR, a lot of that got torn down or supplemented with enough bureaucracy to make Germans cry... and that's why reusable rockets were not invented at NASA but at SpaceX instead.
Reusable rockets were "invented" by Lars Blackmore when he was working at JPL (Jet Propulsion Lab). I say invented because like anything in the evolution of engineering, credit is messy.
> Founded in 1936 by California Institute of Technology (Caltech) researchers, the laboratory is now owned and sponsored by NASA and administered and managed by Caltech.
Cheap reusable rockets allow for a lot more research for a lot less money.
Unfortunately, as the early history of SpaceX shows, it required a lot of failures to learn from to design the current crop of rockets. And that's the advantage that private R&D has... as long as the person in charge has money, failure is an option, because in anything publicly funded, any failure will relentlessly be blamed on the currently governing party by the opposition.
I feel like you're constructing a strawman to argue against. I visit this site almost daily and the prevailing sentiment is usually the polar opposite of what you're suggesting.
If sentiment on HN were as you say, how could your pro-academia and anti-big tech comment be sitting at the top as the most upvoted comment?
> it is easy to forget that the foundations of this trillion-dollar industry were laid down over 30 years ago in Munich
Yes is very easy to forget, cause the trillion is not being made in Europe. If it was really conceived in Munich (like the maps that got stolen also), it show how incompetent is Europe to keep it´s technology and protect European companies.
Somehow "protecting companies" by keeping basic research, done openly at a university lab, from being "stolen"? What?
It's like saying it's painful that the Web was invented in Europe and opened for everybody rather than being kept at CERN to protect European companies.
Right, is totally fine to create new inventions, but let others take credit and financial benefits.
It is our duty to protect and get the benefits of European inventions, especially the ones financed with public tax. Open for everybody means benefits for everybody.
Which work has more value: the abstract description of a catalogue of potential model architectures or their validated application trained on real data?
In the Schmidhuber case their is 20 years and a chain of countless other works in between the two.
The real root of the current AI boom is a master thesis from university of Toronto.
The thesis demonstrated that neural networks much longer than before could be trained by simply having a random fraction of the neurons excluded during forward and back propagation.
That's how we got practical deep neural networks. Without that we would still be in AI winter.
Surely the roots, if we skip over the early preceptron work', are in backpropagation and Hinton, and the work going on at Edinburgh and elsewhere in the 80s.
Indeed I remember buying a set of three conference-papers-as-books around that time, titled Artificial Neural Networks .. proceedings of the whatever the conference was.
No doubt Schmidhuber made important contributions, but I see him pop up claiming to be the 'root' of it all every couple of years.
Modern backpropagation was first published by Seppo Linnainmaa as "reverse mode of automatic differentiation" (1970)[26] for discrete connected networks of nested differentiable functions.[27][28][29]
In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard.
Paul Werbos did not apply backprop to MLPs as cleanly described in Hinton's paper, but rather to some kind of autoregressive non-linear parametrized functions with a much more specific application scope.
Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function.
That's what bugs me about him. So much work has gone into today's models that calling his contributions "the root" isn't really warranted. He's always complaining that Hinton, LeCun, and Bengio get more credit than they deserve, and now he's over-claiming himself.
Name a single aspect of something modern like the Transformer architecture or how it is trained, that is even indirectly attributable to Schmidhuber.
No doubt he'd be jumping up and down wanting to take credit for residual connections, but where was Schmidhuber in the ImageNet era when everyone else was discovering how to build deep neural nets? Why didn't Schmidhuber invent ResNets, but instead waited until someone else (Kaiming He) did, then claim credit for it?
I'll bet Schmidhuber isn't done with yet ... when someone eventually comes up with an architecture for AGI, Schmidhuber will come out of the woodwork and point to a note he made on a napkin in 1800 that predicted it all.
It's crazy to think that if Elon Musk hadn't mentioned Schmidhuber, most people would have no idea.
It's nauseating how all the researchers who happened to work for big tech got tons of media coverage but Schmidhuber and his team were getting zero coverage yet they made massive contributions. I bet there are many others not mentioned.
Nobody even knows about Frank Rosenblatt. It's insane how distorted our perception of innovation is.
Even science has been corrupted. It makes one doubt every story we're told about who invented what.
Very Trump-like statement - "Not many people know this, but ...". Yes, I lot of people know this. Any class that even says a little about the history of NNs will talk about Rosenblatt and the Perceptron.
> Any class that even says a little about the history of NNs will talk about Rosenblatt and the Perceptron.
Sure. I think it starts to get more interesting when the influences that Rosenblatt explicitly cites in his seminal Perceptron paper (e.g. Hayek) become part of the discussion (which rarely happens in my experience).
There seems to be a coordinated push around Schmidhuber all around media in the EU, even LinkedIn is full of "random" posts about him in the past week.
I believe invention of Transformers and especially Attention mechanism do have influence from past research but its not definitely only the Schmidhuber's work. Said that, if we remove the papers mentioned by Schmidhuber from history, I am quite certain that there will be no influence in the discovery of Transformers, hence his works can not be the root. He has to grow up and accept that work and equations can appear similar, looking at inverse squared law and saying Newton stole that from someone is being dishonest.
The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's.
More specifically, it was really AlexNet, the 2012 ImageNet entry, running on two NVIDIA GTX 580's, that highlighted the practicality and utility of running large scale neural nets on affordable hardware. CUDA had been released in 2006, but cuDNN (the CUDA library for neural nets) didn't come out until 2014 - after AlexNet had already kickstarted the demand.
What followed from AlexNet was a few years of intense competition on the ImageNet benchmark, and larger and larger/deeper neural nets (CNNs), which gave rise to a lot of the algorithms and concepts still used today such as residual connections (originally from ResNet), ADAM (training algorithm), ReLU/etc, normalization, dropout, etc... all the fundamentals that made building large neural nets possible.
Schmidhuber's continual reminding everyone that he was working on neural nets back in the 1990s is beyond tiresome. Yes, he should have been recognized alongside Hinton/Bengio/LeCun as one of the pioneers, but time for him to get over it.
I agree. I also think it's about the hardware and, obviously, recognizing AD as the fundamental primitive.
Particular architectures don't matter so much yet. It's quite possible that S3-Mamba or xLSTM could be used in lieu of transformers and we would still have LLMs.
And Google's acquisition of DNN Research to get the ball rolling with conv nets and AI moneyball, followed by the acquisition of Deepmind. Schmidhuber IMO *has* been recognized as one of the 4 horseman and rightly so, but what has he done lately? Just noticed they now say the 3 godfathers of AI. This is what people hate about academia. It's not academia itself, it's the mean girl politics that emerge from the tenure system. And at this point, tenure should be abolished IMO having been utterly weaponized to defend the status quo.
> The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's
I disagree. But more critically, I'd argue it's the legacy of the PDP project that led to what became foundation models today.
The PDP project was very early - relevant in term of neural net history of course, but hard to see much there relevant to today's large models other than Hinton's reinvention of SGD as an alternative to the layer-wise training that was then the norm.
One interesting thing to note from the PDP handbook are mentions by LeCun and Hinton of what would later be called CNNs, which LeCun claims to have invented. It seems that Hinton deserves just as much credit as LeCun, and in any case these are discussed just as locally connected models using shared weights as an optimization.
This is well put.
2012 really fundamentally changed everything for the AI community, I’d argue because tensorflow/keras/pytorch followed and that made the infrastructure accessible for distributed training.
Also see Schmidhuber's take on the Hinton + Hopfield Nobel prize: https://people.idsia.ch/~juergen/physics-nobel-2024-plagiari...
Not that surprising since the whole LLM ecosystem is based on plagiarism.
It's sad that he is the only one speaking out about Hinton. This whole Hinton glorification seems like it's being pushed by an agenda. I'm not sure if he would receive this much attention if he held a different view (closer to LeCun or Ng), rather than these Effective Altruism takes on current AI.
I don't associate Hinton will Effective Altruism. He did switch to focus on warning of the dangers of AI, but that was after he already was established as the father (or one of them) of deep learning.
Read through his papers, and these are substantial accusations. Perhaps Hinton et al. should investigate, and either a) correct themselves by properly citing these Munich researchers; or b) proove they did not base their work (unlikely) on these papers;
And then, as a whole, this weighs in favor of European scholars and also should properly inform the funding of similar research in the EU.
Writing the last in the light of a month-and-a-half wait (to date) for EuroHPC to process their own form where we submitted a funding request by no less than University + Private Company already established in the area + 4 alumni, two PHDs and one postdoc. Zero response since.
TU Munich and Nipkow, Makarius et.al. are also at the center of the influential Isabelle theorem prover. TU Munich is cool :-)
worth separating: LSTM (Hochreiter & Schmidhuber 1997) is ironclad and widely cited. the transformer attention priority claims are far shakier. conflating them is how Schmidhuber undermines himself
Yes, and notable how Alex Graves, one of Schmidhuber's students, later at DeepMind, doesn't even mention Schmidhuber in his historical overview of attention mechanisms "Attention and Memory in Deep Learning".
https://www.youtube.com/watch?v=AIiwuClvH6k
When it comes to attention, details matter, since the idea itself is obvious - weighted inputs, and implicit attention is present in every neural network - this is what weights are.
The specific form of attention used by the Transformer is key-based associative attention, aka "Bahdanau attention" introduced in Bahdanau's paper "Neural Machine Translation by Jointly Learning to Align and Translate". It's perhaps worth noting that the word "attention" is barely even mentioned in this paper, other than noting that this weighted input mechanism can be seen as a form of attention (presumably mentioned since attention was at that time a recurring theme in various types of neural network).
Bahdanau attention - not just the general concept of attention - seems to be a very critical piece of the Transformer architecture since this this is what allows the Transformer to find things in context and is behind the "induction head" mechanism that appears central to how Transformers operate.
There's this crowd on HN which is very vocal against academia. From what I've seen, the main points are that academia isn't efficient, most of the science coming out of academia is useless and that the whole system is just a waste of taxpayers money. Instead, what is often argued, all good research is done in private labs. Then pointing to SpaceX, Moderna, OpenAI, Google, etc.
And while it is very true that often the research coming out of Academia is useless, what is always neglected are the roots of the research done in private labs.
When Jürgen Schmidhuber and team published their work on Neural Nets back in 1991 it was also useless. Unless you had a supercomputer and very, very deep pockets you were not going to do anything with what came out of their lab.
But still, 30 years later here we are, standing on top of the shoulders of this useless research.
Like half of what Schmidhuber is always complaining about is that (except for LSTMs) people aren't standing on the shoulders of his research very much. They try to solve some of the same problems people have always wanted to solve, try some of the same approaches people always tend to try, and then tinker until it works. At no point do they consult Schmidhuber's decade-old papers where he tried something kind of similar but didn't get very impressive results, and hence they also do not think to cite him. Then he comes out of the woodwork to assert priority.
What you're describing is people who fail to follow the most basic principles of academic research. (Check existing academic literature, mention and give credit to prior work.) This would be fine if these people didn't claim to be doing scientific research, didn't boast their academic credentials, didn't publish their finding as original work and didn't demand credit for their work in academia. Of course, they do all of these things. They benefit from a system they're actively denigrating (and in some ways degrading).
To put it more simply, people with academic credentials should not demand acknowledgement of their current intellectual work while denigrating and ridiculing the importance of very similar work done in the past.
It's not just the papers. It's also the students and their students, many of whom ended up working in the top labs today.
You can be influenced downstream by papers you haven't personally read.
Shane Legg was in Schmidhuber's lab at IDSIA before being one of the founders of DeepMind, so he probably read the papers personally and knows what influenced him or not...
Of course, but if you haven't read them you also shouldn't cite them.
And that's where Schmidhuber goes off the rails: publicly shaming published papers into citing you isn't good academic practice. It's bullying.
"if you haven't read them you also shouldn't cite them" -- this is wildly incorrect in an academic context. If I'm using ResNets, I should cite the original ResNet paper, even if I haven't read it. If I'm using Transformers, I should cite the original Transformer paper, even if I haven't read it. If my work is a direct extension of method B, and method B is a direct extension of method A, I should cite the source of A, even if I haven't read it.
You can't claim independence from past work simply because you didn't look directly at it. The job of an academic researcher is to know the landscape of relevant ideas, where they come from, where they're going, and to hopefully contribute a few new good ones.
Citation chains should extend back from your work, along a reasonable line conceptual inheritance, back to a reasonable point of origin. Schmidhuber has different definitions for both of these reasonables than the bulk of the ML research community, to a point that makes him difficult to satisfy.
> this is wildly incorrect in an academic context. If I'm using ResNets, I should cite the original ResNet paper, even if I haven't read it.
Eh, I think the correct answer is: read it, then cite it.
You're not really supposed to cite something without reading it, as it might say something different than you think. But sure, citing it w/o reading it is better than not citing it at all.
It's worth pointing out that sometimes, some papers just become part of the general context of things and are no longer explicitly cited. Or people cite textbooks or general survey papers instead.
For example, take a look at Albert Einstein's Google Scholar profile. He's not the top cited physicist. Not even close. It's because other researchers don't explicitly cite his papers. https://scholar.google.com/citations?user=qc6CJjYAAAAJ&hl=en...
Same with Tim Berners-Lee and the World Wide Web. Imagine if his original paper were cited every time someone deployed a web site.
Your Paper C does not need to cite Paper A unless you are discussing some aspect of it that Paper B is not. Otherwise you inherit the A citation via B.
Spamming citations is unnecessary.
You make it sound like all original ideas from academia must be cited all the time, even if that was not the source of someone’s inspiration.
If I’m in the private sector, and I rediscover something from first principles, it is not my responsibility to go search all academia to see if someone’s done it before so I can cite their work.
If I rely on a code library that doesn’t explicitly cite papers it was built on, it is also not my responsibility to go find all the papers that it might’ve been built from and cite those papers.
You should read those papers then
> Of course, but if you haven't read them you also shouldn't cite them.
But if you build on them you should have read them. I don't know about the specifics and I don't know if Schmidhuber is out of line or not, and citations and impact factors are a terrible mess, but generally speaking, you are responsible for finding and reading and citing any related work that needs to be cited, and if you work on neural networks in an academic context you probably have been forced to read that particular one at some point. Citation obligations don't just disappear because you don't want to do the research.
I do a lot of work that is based on academic research, aka building a proprietary sparse embedding model. My issue with academia is that they don’t bother to solve the practical issues. They tell you how to build a PPMI model, but what about hitting a database that’s 500TB to find co-occurrence numbers? This isn’t even touched so you’d then have to go and invent a bazillion of algorithms yourself to make your life easier. So while the bedrock is based on academic research and we thank them for that, scaling anything requires a lot of work in uncharted territories.
Well, yeah. That's why we have "research & development" as a term.
What you're referring to is the "development" part of that. In some sense: the job you have _exists precisely because it's not part of the research phase_, and it's equally as valuable as the research part. Research is the proof of concept; development is scaling up and making production-ready and finding small efficiencies and so on.
From an industry perspective, it's tempting to conflate these, because that's what industry research labs are designed to do: integrated R&D. But that is not at all how academic research labs work.
But that isn't the purpose of academia -- the purpose of it is to discover new phenomena not to make products. It is true that there is a lot of work to turn a new advance into a product whether it is software or turning biological knowledge into a drug, but without discovery of new phenomena new products will come to a halt. While it is true that some corporate labs, most famously Bell Labs in its heyday, but also for example IBM's T.J. Watson and Xerox's PARC did do basic research besides product-focused work, this is pretty rare because it is hard to justify the cost of something that may only be practical in decades and often help your competitors as much as yourself.
> My issue with academia is that they don’t bother to solve the practical issues. They tell you how to build a PPMI model, but what about hitting a database that’s 500TB to find co-occurrence numbers?
Soon we will also blame academia for not providing iOS and android apps
I jest but database design is its own sub field of computer science, maybe look into their papers?
I did that too. Ending up building my own reverse index with a fixed-size vocabulary. But that's my issue, you start building one product and you end-up building ten in the process to solve all edge cases because no one bothered to research how things scale.
Well it sounds like you did?
The science is on them, the engineering is on you.
The practical issue of academia is epistemological. It's about learning how a phenomenon came to exists. If you are looking for efficiency the field of academia related to learning how to do so is computational complexity and it works quite well.
The goal of academia isn't to be practical, "only" learning.
Yeah, that's your job.
I think most people forget the graph-like nature of scientific research. You don't have n useful papers and m useless ones by themselves, you have an interconnection of those. There may be isolated cliques of uselessness, but there isn't a clear correlation between academia and private research.
Many ideas come from philosophy, which many find useless.
Heraclitus discovered change back in ancient Greek, I don't know where we would be in scientific research without that (deliberately ignoring the debate about the originality of what we know about Heraclitus work). I bet his contemporaries found his "research" useless.
Where is "this crowd" that you are talking about?
The closest to that that I've seen is that traditional academia approaches are too far removed from practical applications for highly applied fields like software engineering, or too slow for fast-moving fields like modern day ML (thus, all the preprints).
Private labs feed off academia. Without academics to staff them, they'd get a lot less far.
I used to work at Nokia Research when they still made phones. Probably the closest thing Europe had to Silicon Valley twenty years ago. Except it was in Helsinki. Lots of stuff got invented there. Nokia didn't really manage to capitalize on its own inventions of course. Or rather it got caught up in its own clumsy attempts throwing babies out of the window by the bucket load. But others sure did. A lot of modern smart phones still have tech in them that Nokia pioneered before either Google or Apple shipped a smart phone.
At the time there was a lot of talk about the demise of industrial research labs. Bell labs (now actually owned by Nokia!), Xerox PARC, IBM, and all the other big US labs that produced amazing stuff are former shadows of themselves. There is some truth in that
But you could argue that Google and Apple picked up some of the slack. And the current AI boom came out of Google cherry picking all the best universities for their AI talent and putting them all together in a research group that then got free reign. Like Nokia, that involved a lot of ejecting of babies with the bath water. But it seems to have spawned lots of new startups that can trace their roots back to that research group in Google.
I think most of criticism of academia is about the rampant fraud and unreproducible results, due to the way the incentives are structured.
It's like the old saying "only 10% of my marketing budget is making a difference, I just don't know which 10%"
You don't know ahead of time, where the breakthrough will come from.
There is ton of research that sits on the shelf, and then years later, it gets re-combined with some other useless research, and boom, some big breakthrough.
This current attitude of all research is worthless, so it should be cancelled, is shooting our future selves in the face.
Every western academic nearly systematically ignores eastern science and philosophy: classicism means "western European". Never mind Europe only flourished intellectually post Islam, which imported the science and engineering of China and India, critically including printing and zero[0]. IMHO this is why distaste for academia grows: it's based on appeals to authority which are demonstrably farcically misplaced. Alternatively stated: the emperor has no clothes, much less silk or paper!
Just as the Dewey Decimal System really only served the purpose of providing the facetious nominal linearization of an arbitrary depth ontological oversimplification, so too humans are much more like random pattern matching machines than festidious sense-makers glued to absolutes derived from false appeals to static mono-perspective ontological hierarchies. The same is becoming lived experience in the LLM age, although the tiktokked youth apparently cannot string ten words together or focus longer than three seconds to attest, I'd wager they can feel it. Are we losing something by rejecting the habit of rigorously manually tending to spurious and temporary ontologies? Yes. Is it necessarily a loss in the long term? Probably not, in the same way we no longer write long-form letters or leave calling cards. Are we gaining something in response? Yes, at a minimum much stronger cross-pollination between ivory towers by fearless exploratory pragmatists who disrespect the would-be scope of nominal professions in favor of holistic thinking... both AI and human.
[0] https://en.wikipedia.org/wiki/Science_and_Civilisation_in_Ch...
and you still need tons of money
This is a straw-man if I ever saw one.
Practically no one is against hard science research, properly conducted. The issues are rampant fraud / p-hacking / unreproducible garbage mixed with an unhealthy dose of ideological monoculture and indoctrination, garnished with rising tuition prices while sitting on huge endowments in case of the Ivy Leagues.
> Practically no one is against hard science research, properly conducted.
As long as you do that with your own money (or money got freely given from other people), sure.
If you use taxpayer money, that's a different game.
There is a long list of grievances I have regarding the (mis-) use of taxpayer money, and funding the hard sciences is way, way down. I can’t even see it from where I stand.
Yes all good points showing issues that academia has at the moment.
However I often see this going from "there's issues" to discounting academia altogether and positioning private labs as a good or only alternative.
After all, most people in the open science collaboration which published the seminal paper kicking off the replication crisis were from academia.
Yes there is no substitute for academia. Monopolist's research labs get close (Bell Labs etc), but they tend to be more "applied".
> From what I've seen, the main points are that academia isn't efficient, most of the science coming out of academia is useless and that the whole system is just a waste of taxpayers money. Instead, what is often argued, all good research is done in private labs. Then pointing to SpaceX, Moderna, OpenAI, Google, etc.
Well... that's "starve the beast" in action. A lot of things we take for granted, that underpin our modern ways of life, came to be due to government investing. Laser, radar, microwaves, the early Internet, that all was military R&D.
"Unfortunately" (well, for the rich and the MIC, at least) there is no way for people to siphon off money in government-funded research, so once the libertarian/small-state BS completely took over following the collapse of the USSR, a lot of that got torn down or supplemented with enough bureaucracy to make Germans cry... and that's why reusable rockets were not invented at NASA but at SpaceX instead.
Reusable rockets were not invented by nasa because their mission was exploration not commercialization.
Reusable rockets were "invented" by Lars Blackmore when he was working at JPL (Jet Propulsion Lab). I say invented because like anything in the evolution of engineering, credit is messy.
https://en.wikipedia.org/wiki/Jet_Propulsion_Laboratory
> Founded in 1936 by California Institute of Technology (Caltech) researchers, the laboratory is now owned and sponsored by NASA and administered and managed by Caltech.
Minimum-Landing-Error Powered-Descent Guidance for Mars Landing Using Convex Optimization http://larsjamesblackmore.com/BlackmoreEtAlJGCD10.pdf
Elon originally wanted parachutes and was convinced by Lars to go with self landing rockets.
Cheap reusable rockets allow for a lot more research for a lot less money.
Unfortunately, as the early history of SpaceX shows, it required a lot of failures to learn from to design the current crop of rockets. And that's the advantage that private R&D has... as long as the person in charge has money, failure is an option, because in anything publicly funded, any failure will relentlessly be blamed on the currently governing party by the opposition.
I feel like you're constructing a strawman to argue against. I visit this site almost daily and the prevailing sentiment is usually the polar opposite of what you're suggesting.
If sentiment on HN were as you say, how could your pro-academia and anti-big tech comment be sitting at the top as the most upvoted comment?
This article, too, was originally discovered by Jürgen Schmidhuber in 1991!
> it is easy to forget that the foundations of this trillion-dollar industry were laid down over 30 years ago in Munich
Yes is very easy to forget, cause the trillion is not being made in Europe. If it was really conceived in Munich (like the maps that got stolen also), it show how incompetent is Europe to keep it´s technology and protect European companies.
It is painful to read this article.
Somehow "protecting companies" by keeping basic research, done openly at a university lab, from being "stolen"? What?
It's like saying it's painful that the Web was invented in Europe and opened for everybody rather than being kept at CERN to protect European companies.
Right, is totally fine to create new inventions, but let others take credit and financial benefits. It is our duty to protect and get the benefits of European inventions, especially the ones financed with public tax. Open for everybody means benefits for everybody.
https://en.wikipedia.org/wiki/J%C3%BCrgen_Schmidhuber
Which work has more value: the abstract description of a catalogue of potential model architectures or their validated application trained on real data?
In the Schmidhuber case their is 20 years and a chain of countless other works in between the two.
This sort of seems like a pattern in CS - someone creates something and then it blows up 20 or 30 years later when the world is ready for it.
Hot take:
The real root of the current AI boom is a master thesis from university of Toronto.
The thesis demonstrated that neural networks much longer than before could be trained by simply having a random fraction of the neurons excluded during forward and back propagation.
That's how we got practical deep neural networks. Without that we would still be in AI winter.
Surely the roots, if we skip over the early preceptron work', are in backpropagation and Hinton, and the work going on at Edinburgh and elsewhere in the 80s.
Indeed I remember buying a set of three conference-papers-as-books around that time, titled Artificial Neural Networks .. proceedings of the whatever the conference was.
No doubt Schmidhuber made important contributions, but I see him pop up claiming to be the 'root' of it all every couple of years.
Hinton did not invent backpropagation.
related paragraph from Wikipedia:
Modern backpropagation was first published by Seppo Linnainmaa as "reverse mode of automatic differentiation" (1970)[26] for discrete connected networks of nested differentiable functions.[27][28][29]
In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard.
Paul Werbos did not apply backprop to MLPs as cleanly described in Hinton's paper, but rather to some kind of autoregressive non-linear parametrized functions with a much more specific application scope.
Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function.
That's what bugs me about him. So much work has gone into today's models that calling his contributions "the root" isn't really warranted. He's always complaining that Hinton, LeCun, and Bengio get more credit than they deserve, and now he's over-claiming himself.
Both can be right.
They could be, but they really aren't.
Name a single aspect of something modern like the Transformer architecture or how it is trained, that is even indirectly attributable to Schmidhuber.
No doubt he'd be jumping up and down wanting to take credit for residual connections, but where was Schmidhuber in the ImageNet era when everyone else was discovering how to build deep neural nets? Why didn't Schmidhuber invent ResNets, but instead waited until someone else (Kaiming He) did, then claim credit for it?
I'll bet Schmidhuber isn't done with yet ... when someone eventually comes up with an architecture for AGI, Schmidhuber will come out of the woodwork and point to a note he made on a napkin in 1800 that predicted it all.
Surely the roots go back to Turing, Gödel, Hilbert, Frege, Leibniz, Aristoteles.
It's crazy to think that if Elon Musk hadn't mentioned Schmidhuber, most people would have no idea.
It's nauseating how all the researchers who happened to work for big tech got tons of media coverage but Schmidhuber and his team were getting zero coverage yet they made massive contributions. I bet there are many others not mentioned.
Nobody even knows about Frank Rosenblatt. It's insane how distorted our perception of innovation is.
Even science has been corrupted. It makes one doubt every story we're told about who invented what.
Yes, Rosenblatt is another good example. I recently looked deeper into the development of the perceptron and it's absolutely fascinating.
> Nobody even knows about Frank Rosenblatt.
Very Trump-like statement - "Not many people know this, but ...". Yes, I lot of people know this. Any class that even says a little about the history of NNs will talk about Rosenblatt and the Perceptron.
> Any class that even says a little about the history of NNs will talk about Rosenblatt and the Perceptron.
Sure. I think it starts to get more interesting when the influences that Rosenblatt explicitly cites in his seminal Perceptron paper (e.g. Hayek) become part of the discussion (which rarely happens in my experience).
Instead of focusing on the future, EU is busy rewriting history to please some eccentric researcher that claims he invented it all.
How does the EU feature in TFA exactly?
There seems to be a coordinated push around Schmidhuber all around media in the EU, even LinkedIn is full of "random" posts about him in the past week.
You clearly aren't familiar with Schmidhuber if this kind of thing seems new to you. It's basically his thing.
Schmidhuber isn't in the EU, nor Switzerland at the moment.
Schmidhuber will NEVER stop trying to aggressively preserve his relevance and its endlessly amusing. Good for him.
I believe invention of Transformers and especially Attention mechanism do have influence from past research but its not definitely only the Schmidhuber's work. Said that, if we remove the papers mentioned by Schmidhuber from history, I am quite certain that there will be no influence in the discovery of Transformers, hence his works can not be the root. He has to grow up and accept that work and equations can appear similar, looking at inverse squared law and saying Newton stole that from someone is being dishonest.