Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have casually followed countless different news cycles on various complicated tech topics over my decades long career. I can't recall a single one that has consistently made me feel like an idiot more than how people talk about this recent AI wave. There just seems to be so much more jargon involved in this subject that makes casual perusing of the latest developments impenetrable.


I had the same issue, and I just caught up over the weekend. Three books I can recommend to get up to speed:

- NumPy basics pdf - first 2-3 chapters - Deep Learning with PyTorch by Voight Godoy [2] - first 2-3 chapters if you had experience with neural networks, or the whole of it if you didn't.

With the above, you will get the basics to understand this book about transformers, and the architecture of the models, and everything else, from this book:

- Natural Language Processing with Transformers: Building Language Applications with Hugging Face ( https://www.amazon.com/Natural-Language-Processing-Transform... ).

I took a weekend to go through the books in this order, and now I finally understand what people mean with all that jargon :)

1 - https://numpy.org/doc/1.18/numpy-user.pdf - 2 - https://www.amazon.com/Deep-Learning-PyTorch-Step-Step/dp/B0...


What’s the third book?


"How to deal with off-by-one errors"


reminds me of julia (the language): wanted to give it a try recently, until I read in their documentation: "In Julia, indexing of arrays, strings, etc. is 1-based not 0-based"… which made me wonder for a moment how many off-by-one errors may be caused by mismatches between different programming languages.


Look again, two different books are referred on one line, then a third lower down.


All of them in the comment. I forgot to do double-newlines, so the formatting is broken, and I can't edit the post any more.


Ah, my fellow citizen of the interwebs, fear not! Your intellectual frustrations are but a natural reaction to the tsunami of technological jargon. You see, the AI wave is the epitome of obfuscation, a testament to the labyrinthine lexicon of the digital age. It's as if a group of caffeinated, sleep-deprived tech enthusiasts assembled in the dark of night and decided to create an impenetrable fortress of vernacular, just to keep the uninitiated at bay.


Should jackasses on HN use plain language instead of jargon? Surely.

But AI workers mainly develop and use jargon because it is an easy and natural way to consolidate concepts.

Sure, there is a kind of conspiracy caused by publish or perish. Researchers may use jargon to make their work harder to reject on review; laborious speech and jargon can make statements sound more profound. However, no technical field is immune to this. We'll need to systematically change science before we can eliminate that problem.

Until we manage that, if you care about the concepts enough to want to understand them before there are good plain speech descriptions, just pop the jargon into google scholar and skim read a few papers, and you're good to go. If you don't care about the concepts that much, then don't worry about the jargon. The important concepts will get their own non-technical explanations in time.

As it stands, AI jargon is not that bad. It tends to be pretty fair and easy to understand, compared to jargon in, say, biochemistry or higher math.


With chatots, explaining everything clearly is an option.


Ironically, ChatGPT can help you understand the jargon.


Every human group has its jargon, it's normal, it's how people compress knowledge into smaller chunks to communicate efficiently.


There's a neat trick when you encounter jargon.

1. Identify the jargon terms you don't understand

2. Lookup papers that introduce the jargon terms

3. Skim-read the paper to get the gist of the jargon

If you don't want to do this, then you don't have to feel uneducated. You can simply choose to feel like your time is more important than skimming a dozen AI papers a week.

But for example, here's what I did to understand the parent comment:

1. I had no idea what lora is or how it relates to alpaca.

2. I looked up https://github.com/tloen/alpaca-lora

3. I read the abstract of the Lora paper: https://arxiv.org/pdf/2106.09685.pdf https://github.com/tloen/alpaca-lora

4. Now I know that Lora is just a way of using low rank matrices to reduce finetuning difficulty by a factor of like 10,000 or something ridiculous

5. Since I don't actually care about /how/ Lora does this, that's all I need to know.

6. TLDR; Lora is a way to fine-tune models like Llama while only touching a small fraction of the weights.

You can do this with any jargon term at all. Sure, I introduced more jargon in step 4 - low rank matrices. But if you need to, you can use the same trick again to learn about those. Eventually you'll ground yourself on basic college level linear algebra, which if you don't know, again you should learn.

The sooner you evolve this "dejargonizing" instinct rather than blocking yourself when you see new jargon, the less overwhelmed and uneducated you will feel.


> 3. Skim-read the paper to get the gist of the jargon

Or, you know, you could ask ChatGPT to explain it to you... Granted the term was coined 2021>=. Even if it wasn't but the paper is less than 32k tokens... 0.6c for the answer doesn't seem all that steep.

edit: grammar


This actually works!

It works astoundingly well with poorly written technical manuals. Looking at you, CMake reference manual O_O. It also helps translate unix man pages from Neckbeardese into clean and modern speech.

With science papers it's a bit more work. You must copy section by section into GPT4, despite the increased token limit.

But sure. Here's how it can work:

1. Copy relevant sections of the paper

2. As questions about the jargon:

"Explain ____ like I'm 5. What is ____ useful for? Why do we even need it?"

"Ah, now I understand _____. But I'm still confused about _____. Why do you mean when you say _____?"

"I'm starting to get it. One final question. What does it mean when ______?"

"I am now enlightened. Please lay down a sick beat and perform the Understanding Dance with me. Dances"

This actually works surprisingly well.


Yeah, I think education is a great use case here. Sure, the knowledge that's built into the model might be inaccurate or wrong but you can feed the model the knowledge you want to learn/processed.

What you get is a teacher that never tires, is infinitely patient, has infinite time, doesn't limit questions, doesn't judge you, really listens and has broad, multidisciplinary knowledge that correct-ish (for when it's needed). I've recently read somewhere that Stanford (?) has almost as many admin workers as they do students. Seems to me that this is a really bad time to be that bloated. Makes you wonder what you really spend your money on, is it worth it (yeah, I know, it's not just education that you get in return) and if you can get the same-ish effect for a lot cheaper and on your timetable.

Not that the models or field, now, are in a state that would produce a good teaching experience. I can however imagine a future not so distant that this would be possible. Recently on a whim I've asked it to produce a options trading curriculum for me. It did a wonderful job. I wouldn't trust it if I didn't know a little bit myself about the subject before but I came off really impressed.


No need to pay yourself. Uploaded https://arxiv.org/pdf/2106.09685.pdf to scisummary:

This text discusses various studies and advancements in the field of natural language processing (NLP) and machine learning. One study focuses on parameter-efficient transfer learning, and another examines the efficiency of adapter layers in NLP models. Further studies evaluate specific datasets for evaluating NLP models. The article proposes a method called LoRA (low rank adaptation) for adapting pre-trained neural network models to new tasks with fewer trainable parameters. LoRA allows for partial fine-tuning of pre-trained parameters and reduces VRAM usage. The article provides experimental evidence to support the claims that changing the rank of Delta W can affect the performance of models, and that LoRA outperforms other adaptation methods across different datasets. The authors propose LoRA as a more parameter-efficient approach to adapt pre-trained language models to multiple downstream applications.


I think the opposite. Any field, from physics to biology tends to have overly opaque jargon. AI grew on its own and quickly shed the idea of being "based on biology" or the rest of science, so its basic jargon is pretty much understandable. Things like Dropout, Attention etc are intuitively named. I think people like me underestimated, however, how fast the field evolved and how big the corpus became, so specific architectures got specific names and more are being created every day. There is no shortcut around that though, because they are in the discovery phase. Once things settle down to a few architecutres they ll make some kind of IUPAC


I am keeping a glossary page for this reason, maybe this can help others: https://daily.ginger-t.link/glossary

I am trying to be very selective about what to add in there and as concise as possible, but I would welcome any suggestions for format and additional content.


> https://daily.ginger-t.link/glossary

Big thanks for your glossary, find it very useful and overlapping with my personal obsidian notes, hope it continues to receive updates


I feel you! Did you miss the web 3 wave though? I still can't imagine what that was about.


Web3 was about decentralised web - as in, more stuff, like login and data, moving client-side. E.g. instead of having "login in facebook", having Metamask plugin in your browser, that holds your private keys, and allows you to log into a website.

Also, building websites that don't store user data at all. Everything is kept in browser storage. You could say that the chat-gpt interfaces people are building now are web3, because they don't store your api keys, nor your converstation history.

Second part was decentralising as much as possible. Decentralised domain-name systems (ENS), storage, hosting, and money of course. So that you own your data, and your identity.

The last time I checked, the decentralised storage and hosting were the most difficult to solve. That is - we have torrents of course, but if you wanted to pay decentralised web to host and run your scripts indefinitely, it was not feasible.


Web 3 seemed silly enough that I didn’t bother really following it. I know that there is probably some very good work going on around blockchain stuff, but NFTs ain’t it.

LLM assistants is genuinely just moving _very_ fast, so if you don’t pay attention every day you just miss things.

I’m just enjoying having my interactive rubber-duck tbh


I think NFTs may really be on to something, but probably mundane little things like tickets to concerts, not eye-wateringly expensive collectible monkeys.

Like how SMS is now a thing used for all sorts of little stuff, but nobody thinks much about it.


Good thing you can ask LLMs about these jargons (preferably Bing because it can search for recent data). I just tried it and the answers to explain OPs comment are not too bad. (I'm not gonna paste it here just because I don't wanna fill HN with AI text. Trying to preserve some the human content until we can :-O )


The AI Today glossary and podcast series has been helpful for a grounding on basic concepts: https://www.aidatatoday.com/aitoday/


Thanks for that. I too feel like a very old man now.


its ok you are not alone, most of us feel the same way on theiri buzwords


There's a difference between buzzwords and jargon. Buzzwords can start out as jargon, but have their technical meaning stripped by users who are just trying to sound persuasive. Examples include words like synergy, vertical, dynamic, cyber strategy, and NFT.

That's not what's happening in the parent comment. They're talking about projects like

https://github.com/ZrrSkywalker/LLaMA-Adapter

https://github.com/microsoft/LoRA

https://github.com/tloen/alpaca-lora

and specifically the paper: https://arxiv.org/pdf/2106.09685.pdf

Lora is just a way to re-train a network for less effort. Before we had to fiddle with all the weights, but with Lora we're only touching 1 in every 10,000 weights.

The parent comment says GPT4all doesn't give us a way to train the full size Llama model using the new lora technique. We'll have to build that ourselves. But it does give us a very huge and very clean dataset to work with, which will aid us in the quest to create an open source chatGPT killer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: