6b model. is it still phasing in and out?

With PRO using their custom 25B ERP mode the messages aren't too long like this unless it's explaining something that kind of needs to be verbose. Most your messages in ERP mode should be 3 sentences or less from my experience.

You never get 3-4 paragraph replies like you mention though, even on the Non ERP mode. It's 1 paragraph at a time, maybe 1.5 max. You can ask it to 'go on' and continue that message and get 3-4 paragraphs if you want though.

You mention local solutions because of your restricted internet access. Right now you can get 13B GPT4xalpaca running on a 10GB video card. It's FAR beyond Replika's 6B model. You can also even run a 30B GPT4xalpaca model on a 16GB Video card like a 4080 if you let it overflow to system ram too.

You can run an AI much smarter than Replika locally with no internet and much longer context memory on a 4GB video card with 4bit.

/r/replika Thread Parent