The other challenge that we have found is accuracy and completeness of fields required to be updated across use cases. Either we have to mandate all the fields or when we set them optional in the tool def. it sometimes blows through - how are you handling that ?
the biggest issue i see is authorization boundaries. you want the agent to be able to pay for things autonomously (otherwise what's the point), but you also need hard limits so a loop doesn't drain your wallet at 3am. it's basically the same trust problem as giving an intern a company credit card.. useful but you want guardrails. is it purely prefunded with a cap, or is there a way to pause and ask for human approval above a threshold?
The other challenge that we have found is accuracy and completeness of fields required to be updated across use cases. Either we have to mandate all the fields or when we set them optional in the tool def. it sometimes blows through - how are you handling that ?
the biggest issue i see is authorization boundaries. you want the agent to be able to pay for things autonomously (otherwise what's the point), but you also need hard limits so a loop doesn't drain your wallet at 3am. it's basically the same trust problem as giving an intern a company credit card.. useful but you want guardrails. is it purely prefunded with a cap, or is there a way to pause and ask for human approval above a threshold?
I also shared this on Twitter with a demo: https://x.com/i/status/2034021301543649535