How to Read Deep Learning Paper as a Software Engineer

How to Read Deep Learning Paper as a Software Engineer

(youtube.com)

12d

by research_pie

ibash

12d

This video is great.

Especially the comment that it takes a week to a month to deeply understand a research paper.

rldjbpin

10d

The comment mentioned (reply to the pinned comment as of writing):

> If I'm trying to understand it deeply enough to mechanistically understand how the methodology work, this can take a few days to a week.

> If I'm trying to reproduce the paper result to incorporate it within my own research project, I might wrestle with the thing for a full month.

> However, I don't do this process with all papers. That would be wasteful. I select which paper have high relevancy for myself and my lab by reading lots of them superficially, then I pin point a few that have disproportionate benefits.

while the "deeply understand" notion might mean something else outside of the scope of the OP (assuming they are not doing this for further research), it seems that practically you spend a few days (not completely) to understand enough.

personally i struggle to find good quality of technical writing in this field than not, and maybe that says about my competence more than anything else. given the empirical nature of tweaking some abstraction layer to have the paper in most cases, there must be a better way to present results.

simonw

12d

The idea that it takes a week to a month to deeply understand a research paper feels to me like a massive failure in our expectations of academic writing.

The reason it takes so long is that academic culture deliberately encourages creating documents that are extremely difficult for people to learn from.

I am confident that in the vast majority of cases the increased understanding that somebody gets from spending 1-4 weeks of effort on a single paper was not actually worth that effort - the same result could have been had in an afternoon of direct conversation with the author of the paper.

Academia really needs to break away from the culture of obfuscating these discoveries! At the very least, any paper worth its salt should be accompanied by an informal blog post that helps explain the discovery and maybe video or audio of the researcher communicating it as effectively as possible.

EDIT: I just spotted that comment, it's a YouTube comment not a note in the video itself https://www.youtube.com/watch?v=nL7lAo95D-o&lc=UgxHeJrOv1g_t...

> If I'm trying to understand it deeply enough to mechanistically understand how the methodology work, this can take a few days to a week.

> If I'm trying to reproduce the paper result to incorporate it within my own research project, I might wrestle with the thing for a full month.

That comment feels very reasonable to me - for actually implementing a new deep learning strategy from a paper a full month feels fine to me.

(I committed the cardinal Hacker News sin here of jumping on an opportunity to share one of my pet peeves, rather than engaging with the information directly!)

niemandhier

12d

Usually a paper is the result of months of work. These months get condensed into a document of a few pages, part of this condensation process is to replace part of the needed explanations with references.

Unless you are one of the few experts in exactly the field the author is working on, you will have to work through these references and make the connections to the work.

After this you will have to create a mapping from the clear but h are es to grasp facts of the publication to your internal representation of the topic.

All of these things take time, sometimes you get to skip the part where you have to read the references by consulting a pud thesis that the paper is based on (4 pages to ca. 120 expansion), even than you have to build your internal understanding.

Talking to the author will indeed speed up this process, but that essentially means having an extremely qualified privat tutor.

yodsanklai

12d

> The idea that it takes a week to a month to deeply understand a research paper feels to me like a massive failure in our expectations of academic writing.

Isn't it just because of all the knowledge required to actually understand the paper? typically, you wouldn't expect someone without a strong background in the field to understand novel research in an afternoon. This is usually what's a PhD is for: covering the gap between master and state of the art.

beng-nl

12d

Edit: sibling comments add good perspective to this.

While I agree papers should be clearer, and should be accompanied by auxiliary material making the paper easier to understand and apply, I do not agree academia deliberately encourages papers that are difficult to learn from, or of obfuscation. I have a little experience reviewing and writing papers in computer science, specifically security.

As in any field, yes a lot of mentioned but also unmentioned background knowledge may be assumed; and jargon may be used; and it may be written with imperfect style and structure (researchers aren’t professional writers after all, and papers are often written on a harsh deadline). But I don’t think it’s in anyone’s interest, neither the authors nor the community at large, to make the paper inaccessible as a goal and I’ve never perceived this to be happening [1].

Source: have served on many conference PC’s and have been on the receiving end of many reviews, from which I can tell a paper being hard to follow can be grounds for rejection, and a paper being written with clarity and being easy to follow, learn from, reproduce, and build on, is greatly appreciated.

[1] with one exception, come to think of it: needless formalism and mathematical notation, by one research group I remember. I got the impression the intention was to impress/intimidate the reader/reviewers and this was not appreciated.

russfink

12d

Page limits are at once both the savior and the tormenter of science.

Simon_ORourke

12d

Yes, but look at it from the other perspective, if they state things clearly they're not puffing themselves up to their fullest extent for academic peers and funding bodies.

sigmoid10

12d

>academic culture deliberately encourages creating documents that are extremely difficult for people to learn from.

That's because they are not meant to be used as a learning resource. They're meant to communicate results to other experts who will understand them much faster than you ever could. If you want to learn a new field, start with textbooks and then look for review papers or things like that. Understanding research papers may be your ultimate goal, but don't expect to get there just by consuming these types of papers.

gopher_space

12d

> They're meant to communicate results to other experts who will understand them much faster than you ever could.

And yet I'm still deconstructing papers entirely within my domain down to the bullet-pointed outline the author used to write it from. We do not need to write like Dickensian wordsmiths and we are not good at it. I'd consider it gatekeeping if I thought people wrote this way out of malice rather than cupidity.

sigmoid10

12d

Meh, I consider myself pretty deep in my domains of expertise and even I regularly struggle because I haven't had that much exposition to a specific subject that is not exactly my specialty. You really can't rate these things unless you are literally among the top experts at the forefront of the topic. That's also who I'd have in mind when writing such a paper. Not some well versed enthusiast, not even a PhD student. Depending on your field, you might be able to count the people who can comfortably read and understand your paper on one hand. For everyone else, the effort will rise exponentially for every step they are removed from that specific topic.

gopher_space

10d

I'm thinking more about a situation where the author has three bullet points to convey and needs to construct an encapsulating paragraph around them. When I'm parsing the paper I'll need to undo that work.

sigmoid10

That's just what prose is. But admittedly some people are better at reading/writing it than others.

dboreham

12d

Applies to reading a paper in any field fwiw.

apwell23

12d

great thing about collapse of LLM hype is that I no longer have FOMO as a software engineer.

danielmarkbruce

12d

Media collapse != collapse. Most people working on these things are about as bullish as ever.

wpietri

12d

Most people working on these things are still riding a wave of investor funding. The investor hype wave lags the media hype wave. We'll see how bullish they are once that wave crests and we hit AI Winter #3 (or however many waves you want to count [1]).

[1] https://en.wikipedia.org/wiki/AI_winter

danielmarkbruce

12d

People building things and who understand LLMs actually believe in it. The people who believe because of investors and media aren't in that bucket.

Re winter - there is a difference this time - a hugely useful product (ChatGPT + gpt4 apis) doing billions of revenue. And it's almost all inference revenue.

lispisok

12d

>People building things and who understand LLMs actually believe in it.

You can say the same thing about blockchain

danielmarkbruce

12d

touche.

wpietri

12d

I would suggest that the Venn diagram for "people building things" and "people who understand LLMs" is worth contemplating here.

ChatGPT is certainly interesting, and some people are paying for it, but I think a) it's not clear how much of that use is also driven by hype, b) of the non-hype use, it's not clear how much will be sustained over the long term, and c) of that long-term use, it's not clear how much is actually of net benefit to society.

Regarding A, an interesting parallel is blockchain hype. There were a zillion blockchain projects, from startups to enterprise efforts. I can't quickly find a reliable number for how much money product vendors, service vendors, and consulting companies took in, but I wouldn't be surprised if that was in the low billions at peak, to say nothing of all the investor and in-house money spent on staff and whatnot. And as far as I know, no non-cryptocurrency success was ever demonstrated. So we can't just suggest that revenue means something will last.

Regarding B, it's a very volatile landscape both in terms of technology and in terms of market. E.g., for a hot minute, everybody loved lively AI generated images as stock photo replacement, but that fad is already past. People are getting a better sense for machine-generated text, too. I know of one company that fired their contract development shop because they kept getting machine-generated communications from the people they were paying to do actual work. Or we could look at Alexa and her kin as an example of something where despite initial excitement, it turns out people just don't care much. Same for VR; since the 1990s the technology keeps getting better and it keeps being a small niche. And we haven't even gotten to how competition and technological improvement will change (or perhaps eliminate) the margins here.

And regarding C, a number of the uses people are paying for are things that are not making the world better. Academic cheating, for example, may be to the advantage of a student who just wants a credential, but it burns a lot of money and makes the world worse. The same applies for people in companies who want to give the appearance of work. Then we have things like spam and influence operations. Over the long term, parasites tend to get squished, however much they can flourish temporarily when they learn a new trick. So that's another slice of AI revenue that can't be counted on.

And I should add that belief among the technically literate has so far been negatively correlated with there not being a winter. In this talk, iRobot founder and noted robot scientist takes a look at that: https://www.youtube.com/watch?v=pgrzEHJTPPM#t=36m55s

In particular, I set the timestamp to his list of AI hype cycles he had seen, 25 of them. Those all had people who understood the technology and believed in it. Maybe it's different this time, but that's what people say in every hype cycle.

danielmarkbruce

12d

Blockchain is a bad analogy to chatgpt. It would have been a good analogy if there were lots of people actually using blockchain as a currency. Suggesting Nvidia revenue is analogous is at least closer to fair because as I understand it a majority of GPU usage for LLMs is still for training. If everyone is sitting around training and no one is doing inference... there's a problem.

I don't believe any image generation service was earnings billions in revenue. Certainly no VR application did. OpenAI is reported to have done over $3 bill in revenue LTM. That's real money, it's all inference money, none of it is hardware. It's all real usage revenue.

The revenue is different. Money talks. And, even though it's anecdata, I personally know dozens of people who use ChatGPT a lot in their job. I used it a lot and get 1000's of dollars a month of value out of it. I worked in VR at peak hype and everyone who worked in it saw the usage numbers - awful, always awful. Not a single person I knew who had a VR headset used it with any frequency. No one.

wpietri

11d

I guess I wasn't clear enough.

Regarding blockchain, I was responding to the point that revenue was proof. A shit-ton of money went into blockchain projects. VR has also had significant sales. Both of these are fine analogies to demonstrate that billions of dollars in revenue does not prove a lasting success.

I'm not arguing that nobody uses ChatGPT; I believe they do. My point instead is that it's not clear how sustained the usage will be. E.g., a counter-anecdote: I know of a team who hired a dev who was a big ChatGPT fan. They eventually fired him, because his LLM-generated communication was poor and confusing, and his code was the kind of bad pretty typical of generated code. I'm sure he felt like he was getting thousands of dollars of value out of it up until the point he was shown the door.

One of the things that makes it especially hard to gauge is the extent to which piles of investor money are driving this. During Bubble 1.0, a lot of infrastructure companies had "real revenue" that turned out to be nth-hand investor money. We know ChatGPT has some real users, but how much of that inference traffic is ineffective or unsustainable startups burning up VC cash? That's a question that I expect will take 18 months or so to answer.

danielmarkbruce

11d

The blockchain stuff wasn't usage revenue, it was people trying to make a quick buck revenue. VR didn't have significant sales of applications, all the sales were hardware which was shelfware.

Billions of dollars of real usage revenue is an entirely different thing.

Bubble 1.0 is akin to Nvidia revenue - lots of dollars being spent on training, less on inference. Very unclear how much of it ever sees a return. There is a big difference between Nvidia revenue and ChatGPT revenue as far as the issue you are discussing. Inference revenue = real, training revenue (for infra providers) = very speculative.

wpietri

11d

Do you have stats on that "real usage revenue"? Because I don't think inference revenue alone qualifies. There are a lot of "AI" applications that are pumping money into the inference providers, but how much is real, sustainable, socially valuable usage is, as I've said a few times not yet determined.

danielmarkbruce

11d

The 3.4 billion of revenue is a stat. And it's the most meaningful. If a human voluntarily pays money for something you should assume they find it useful/valuable. It's not strictly true (various addictions etc), but the burden of proof to show otherwise is on you.

wpietri

10d

The specific number of $3.4 billion is not a stat. It's what Altmain claims they are "on pace" for.

Even taking it as real, you're the one making claims about how much of it is "real usage revenue". The burden of proof for that is on you. But as is typical for people riding a hype wave, you have not only repeatedly failed to do so, but you also apparently fail to understand countervailing concerns. That you are, in a fashion also typical for the true believer, trying to insist it's my job to prove the hype wrong, marks you as entirely unserious about honest inquiry here. Given that, I'm done.

danielmarkbruce

10d

The claim that people generally find use in things they spend money on isn't a hyped idea.

wpietri

10d

Again, your failure to even understand my point, or to even ask questions that might lead you to understanding it, makes me think you are not interested in having a serious discussion, but instead are riding the hype train.

danielmarkbruce

10d

"Everyone who believes in LLMs are on the hype train, the usage is fake, and it's just like every other hyped thing that didn't work out, prove to me otherwise" is just a weak argument. People understand it, you are one of a large group of people making a similar point.

It's anecdata, but it seems every time this argument is made, it's from a person arguing by analogy rather than first principles, and the person don't really understand how LLMs work and haven't made any real effort to use them. You might not be in this bucket but making an analogy to the get rich quick but no real underlying use case blockchain community suggests you are.

wpietri

danielmarkbruce

If there was ever a train to be on, it's LLMs...

wpietri

apwell23

12d

> OpenAI is reported to have done over $3 bill in revenue LTM.

1. revenue doesn't mean much, afaik they still lose money on every query.

2. I think you are using that as a proxy for utility . Companies buying licenses en-masse hoping it will boost productivity is an unproven thesis. Don't see any profit boost in their qtrly reports from chatgpt licenses that they bought.

I use chatgpt and use it daily for coding. it isn't that its completly useless but uses are limited and certainly not worth hundereds of billions of dollars used to literally boil oceans. If someone took away chatgpt from me it not a big deal for me, to be honest. its dumb as rocks for coding, doesn't do even a tiny bit of high level thinking and almost half of it responses are hallucinating garbage.

danielmarkbruce

11d

Revenue = people are paying to use the product.

You use it daily for work. In engineering. How much time does it save you every day? 15 mins? 1 hr? That's a lot at your hourly rate.

Comment was deleted :(

apwell23

12d

not sure what media collapse is.

ibash

12d

Which means now’s the time to learn and build.

Comment was deleted :(

matt1

12d

[flagged]

Comment was deleted :(

Crafted by Rajat

Source Code

hckrnws

How to Read Deep Learning Paper as a Software Engineer