Two modes, one machine: Why reading mode delivers value and writing mode delivers fiction, and why hallucinations and creativity are the 2 faces of the same coin.
I am wondering now how that logic applies to image generation. It is clear generic LLM like GPT got much better at it, but it does feel the more you try to edit an image generated, the more weird things happen in the background (literally) ? Any insight?
I have read stuff on image drift prompt after prompt and personally I reset context often but I didn’t really dig into the matter / experiment enough to get a clear view. Any personal experience ?
My experience is so far to get a picture 80% there at first prompt, able to refine it at 90% on a second or third iteration but at fourth & beyond it starts to behave and significantly become off track. That’s just empirical. It is much better when getting pictures in attachment as style or tone reference.
I have learned recently that LLM predicts the next best token similarly between text token, audio bits taken (natural conversation) and image / pixel bits token.
Super good advice, thank you!
I am wondering now how that logic applies to image generation. It is clear generic LLM like GPT got much better at it, but it does feel the more you try to edit an image generated, the more weird things happen in the background (literally) ? Any insight?
I have read stuff on image drift prompt after prompt and personally I reset context often but I didn’t really dig into the matter / experiment enough to get a clear view. Any personal experience ?
My experience is so far to get a picture 80% there at first prompt, able to refine it at 90% on a second or third iteration but at fourth & beyond it starts to behave and significantly become off track. That’s just empirical. It is much better when getting pictures in attachment as style or tone reference.
I have learned recently that LLM predicts the next best token similarly between text token, audio bits taken (natural conversation) and image / pixel bits token.