Quit your job and start RLHF-ing.

May 29, 2024

The year is 2030. You are a neurosurgeon.

After a long day at the hospital—operating on 2 brains, reviewing 10 CAT scans, and meeting with 7 patients—you finally arrive home. Ready to relax, you slingshot your bag onto the couch, plop down in your comfy armchair, and whip out your phone. Time to make some money.

You open your RLHF Pro app and start providing feedback on AI answers to neuroscience questions, problems, and clinical scenarios.

Yup, that response looks great! +$1000. Nope, we need more nuance here. +$1000. Wow, that is totally wrong, need to correct that one. +$2000.

“Thanks for your feedback, Doctor Kerpatrick! You earned $4,000 in today’s 5-minute session. See you next time!”

What Doctor Kerpatrick did is called Reinforcement Learning with Human Feedback (RLHF). You may be thinking, “$4,000 in 5 minutes, that’s crazy!”—and you’re right. BUT, as the world runs out of high-quality data, this fantasy may very well become reality.

What is RLHF?

RLHF stands for Reinforcement Learning with Human Feedback. It’s a process where AI systems learn and improve by receiving feedback from human experts. Unlike traditional methods where AI learns from pre-existing datasets like YouTube transcripts or Reddit forums, RLHF involves real-time interaction between humans and AI.

The experts provide detailed feedback on AI responses, guiding the system to refine its answers and improve its understanding. This continuous loop of feedback helps create more accurate, reliable, and useful AI systems that can adapt to complex and nuanced tasks—like answering complex clinical neuroscience questions.

They are essentially acting as teachers for the AI models, correcting their responses and updating the AI model’s knowledge in the process.

Why is RLHF so important?

We’re running out of high-quality data for training AI.

In the past, AI has relied heavily on large datasets to learn, but these datasets often lack the depth and specificity required for advanced applications.

Expert insight is the best high-quality data available to enhance AI learning because it adds a layer of precision and contextual understanding that raw data cannot provide.

Who are the most highly sought after RLHFers?

Experts. Experts. Experts.

Doctors, engineers, scientists, academic researchers, industry professionals, skilled technicians, etc.

RLHF could be a career of the future.

As the demand for high-quality data skyrockets, RLHF could become a lucrative career opportunity. High-quality data is incredibly valuable in the creation of purpose-built AI systems.

As large tech companies compete to build the most powerful AI models, they could begin paying huge sums of money for expert RLHFers to take their models from good to exceptional.

It may soon be time to quit your job and start RLHFing.

See you in the future,

Bennie 3

What is 𝐖𝐀𝐈𝐓, 𝐎𝐍𝐄 𝐌𝐎𝐑𝐄 𝐓𝐇𝐈𝐍𝐆?
1x per week, I send out one interesting thing I came across in the world of tech.
That’s right, just one. The message is short and sweet—a 30-second read. I share products, demos, Tweets, thoughts, announcements, articles, and more.
Subscribe to get WAIT, ONE MORE THING straight to your inbox.

WAIT, ONE MORE THING.

Discussion about this post