Huh? The first "conflict" you list isn't a conflict.
> The snippet from "search docs crawling indexing pause online business" states that adding a Disallow: / rule for Googlebot in robots.txt will keep Googlebot away permanently as long as the rule remains. "search help office hours 2023 june", however, advises against disallowing all crawling via robots.txt, warning that such a file "may remove the website's content, and potentially its URLs, from Google Search." This directly contradicts the claim that a full-disallow rule safely blocks Googlebot without negative consequences, creating a true conflict about the effect and advisability of using a disallow rule to block Googlebot.
If you want to block Googlebot "permanently", why would you expect to stay listed in Search? The first page actually agrees with the second - if you only want to temporarily block crawling, it recommends not blocking Googlebot.
Actually, your last "conflict" is bad too. A 503 fetching robots.txt does stop crawling the site, for at least twelve hours and possibly forever (if other pages return errors). The only crawling Google will continue to do is to keep trying to fetch robots.txt.
I appreciate what you're trying to set up here but 2/4 is a pretty bad record for a demo.
This definitely solves a good problem.
Company don’t keep generally good confluence docs and documentation.
Somehow if there is common source of truth it helps to entire org in a company.
But I was wandering if it will be helpful or not for external world because I feel companies usually double check any information before releasing it publicly ..especially related to code base.(Just a thought).
Totally agree - we've found a handful of conflicts on every large company whose public docs we've looked at but a lot of the value is definitely in internal docs where there's no technical writer double checking everything that comes out.
We do! An org can define precedence rules, but the engine looks at things like recency, authority, majority voting etc. We also flag criticality to raise manual reviews when needed.
Seen it a lot in enterprise environments. Teams maintain parallel Confluence spaces and internal API docs. They drift constantly. The newer page is correct, but search still surfaces the old one first
Huh? The first "conflict" you list isn't a conflict.
> The snippet from "search docs crawling indexing pause online business" states that adding a Disallow: / rule for Googlebot in robots.txt will keep Googlebot away permanently as long as the rule remains. "search help office hours 2023 june", however, advises against disallowing all crawling via robots.txt, warning that such a file "may remove the website's content, and potentially its URLs, from Google Search." This directly contradicts the claim that a full-disallow rule safely blocks Googlebot without negative consequences, creating a true conflict about the effect and advisability of using a disallow rule to block Googlebot.
If you want to block Googlebot "permanently", why would you expect to stay listed in Search? The first page actually agrees with the second - if you only want to temporarily block crawling, it recommends not blocking Googlebot.
Actually, your last "conflict" is bad too. A 503 fetching robots.txt does stop crawling the site, for at least twelve hours and possibly forever (if other pages return errors). The only crawling Google will continue to do is to keep trying to fetch robots.txt.
I appreciate what you're trying to set up here but 2/4 is a pretty bad record for a demo.
This definitely solves a good problem. Company don’t keep generally good confluence docs and documentation. Somehow if there is common source of truth it helps to entire org in a company. But I was wandering if it will be helpful or not for external world because I feel companies usually double check any information before releasing it publicly ..especially related to code base.(Just a thought).
Totally agree - we've found a handful of conflicts on every large company whose public docs we've looked at but a lot of the value is definitely in internal docs where there's no technical writer double checking everything that comes out.
[dead]
Thisis really good. I wonder if these "truths" propagate anywhere or not.
thanks! yes, we enable auto updating documentation, automatic conflict resolution, and accurate search indexing.
would be cool to extend this to enable auto-creating a pr to update a docs repo!
Hey Andrew, that is a 100% in play.
Love the concept. Just curious if and how you guys determine which source is correct in case of a conflict
We do! An org can define precedence rules, but the engine looks at things like recency, authority, majority voting etc. We also flag criticality to raise manual reviews when needed.
Makes sense now why integrations sometimes break unexpectedly. Conflicting info in official docs is a real problem
have you seen a similar situation before?
Seen it a lot in enterprise environments. Teams maintain parallel Confluence spaces and internal API docs. They drift constantly. The newer page is correct, but search still surfaces the old one first
[dead]