AI deep dive part 1 (mostly for Useff)

So background:

We’re making datasets and fine-tuning large language models (LLMs)

Learnings:

1. Training is easy (we just train a subset of parameters so it fits in memory)
2. Saving the “big” model (the original model plus the adapter we trained) is “hard” (usually when you run out of memory)

3. Inference is hard (it takes a really long time, something like for 5 outputs it takes 3 minutes on a T4 GPU)

On the topic of inference:

For our purposes, you want a REALLY low temperature. Temperature is how much leeway the model takes, and a lot of times it doesn’t matter but for us we want SPECIFIC and STRUCTURED output so we don’t want the model to just wing it.

So set the temperature super low, it can be anything between 0 and 1 so put it at 0.001 or something.

Top_k this is similar and might not matter much but it basically says “how many different tokens will you consider” so if I say “I got scratched by the ca_” the model will say “okay, t is the top k for the next token, but maybe it’s n because you COULD be scratched by a can.” Again, for our purposes we can be strict and set this to 1 (some people give more leeway here, say 20 or 50 or whatever, basically input an integer of your choice but don’t make it too high).

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.