probably all AI slop but I find it hilarious in the blog post they actually posture like they would know how to fine tune a model to sound like them given that what they actually did is something that you could one shot with claude if you knew what you were doing.
The method I used in the article is real and is pretty standard. Also, I do a decent amount of distillation and tinkering with weights for work, so I can assure you I did try that before resorting to good 'ol RAG.
Overall, even with a finetuning-as-a-serice like Tinker (the one from Thinking Machines) which is pretty cheap, the economics didn't work out that well.
Also, you probably one-shot this with Claude, I agree. But, you need to have an expensive Max subscription, which not everyone is willing to shell out 200 bucks for, just to have some weekend fun.
Fine tuning a model isn't that hard and the tradeoff he described is real.
I was on the fine tuning team of a multi-team hackathon to make a specialized chatbot once a few years ago and despite working technically well our output had very little impact on end to end output.
Author here. This is a personal weekend project that grew into a working WhatsApp bot. It replies as me to two allowlisted contacts (myself + one friend who knew about the experiment). The interesting part is not the agent framework but the retrieval I eventually got the best results with: every reply gets generated by Claude after pulling 8 of my real past replies to that specific contact, filtered by recency.
Built on Hermes Agent + Baileys + Chroma + nomic-embed-text-v2-moe + Claude Sonnet 4.6 via Azure AI Foundry. About 2 hours of work plus an hour debugging a WhatsApp multi-device LID issue. Total runtime cost: ~$0.005 per reply.
The bot is not running on a dedicated number. It is hooked to my primary WhatsApp, which is a ban risk I accepted in exchange for being able to test with real contacts. The killswitch (Telegram command that empties the allowlist and restarts the gateway) takes about 10 seconds. There is also
a hard kill: unlink the device from WhatsApp on the phone, ~5 seconds, severs the bridge session entirely.
That's a good question. It started off as pure technical curiosity (in the realm of: will this even work?). but as a side note, I always think of my screen time and the amount we spend chatting away on social media.
Obviously, this is probably and idea. I could imagine I connect my calendar to Hermes, and automate myself into 12 dinner plans and a trip to Disneyland on a Thursday afternoon just because I once mentioned I'm a Mickey fan to my nephew.
probably all AI slop but I find it hilarious in the blog post they actually posture like they would know how to fine tune a model to sound like them given that what they actually did is something that you could one shot with claude if you knew what you were doing.
The method I used in the article is real and is pretty standard. Also, I do a decent amount of distillation and tinkering with weights for work, so I can assure you I did try that before resorting to good 'ol RAG.
Overall, even with a finetuning-as-a-serice like Tinker (the one from Thinking Machines) which is pretty cheap, the economics didn't work out that well.
Also, you probably one-shot this with Claude, I agree. But, you need to have an expensive Max subscription, which not everyone is willing to shell out 200 bucks for, just to have some weekend fun.
Fine tuning a model isn't that hard and the tradeoff he described is real.
I was on the fine tuning team of a multi-team hackathon to make a specialized chatbot once a few years ago and despite working technically well our output had very little impact on end to end output.
Author here. This is a personal weekend project that grew into a working WhatsApp bot. It replies as me to two allowlisted contacts (myself + one friend who knew about the experiment). The interesting part is not the agent framework but the retrieval I eventually got the best results with: every reply gets generated by Claude after pulling 8 of my real past replies to that specific contact, filtered by recency.
Built on Hermes Agent + Baileys + Chroma + nomic-embed-text-v2-moe + Claude Sonnet 4.6 via Azure AI Foundry. About 2 hours of work plus an hour debugging a WhatsApp multi-device LID issue. Total runtime cost: ~$0.005 per reply.
The bot is not running on a dedicated number. It is hooked to my primary WhatsApp, which is a ban risk I accepted in exchange for being able to test with real contacts. The killswitch (Telegram command that empties the allowlist and restarts the gateway) takes about 10 seconds. There is also a hard kill: unlink the device from WhatsApp on the phone, ~5 seconds, severs the bridge session entirely.
Happy to answer questions.
Why?
Technical curiosity?
I ask because this does not seem to be something to want to have.
That's a good question. It started off as pure technical curiosity (in the realm of: will this even work?). but as a side note, I always think of my screen time and the amount we spend chatting away on social media.
Obviously, this is probably and idea. I could imagine I connect my calendar to Hermes, and automate myself into 12 dinner plans and a trip to Disneyland on a Thursday afternoon just because I once mentioned I'm a Mickey fan to my nephew.