# As a condition of accessing this website, you agree to abide by the following # content signals: # (a) If a content-signal = yes, you may collect content for the corresponding # use. # (b) If a content-signal = no, you may not collect content for the # corresponding use. # (c) If the website operator does not include a content signal for a # corresponding use, the website operator neither grants nor restricts # permission via content signal with respect to the corresponding use. # The content signals and their meanings are: # search: building a search index and providing search results (e.g., returning # hyperlinks and short excerpts from your website's contents). Search does not # include providing AI-generated search summaries. # ai-input: inputting content into one or more AI models (e.g., retrieval # augmented generation, grounding, or other real-time taking of content for # generative AI search answers). # ai-train: training or fine-tuning AI models. # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF # RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT # AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET. # BEGIN Cloudflare Managed content User-Agent: * Content-signal: search=yes,ai-train=no Allow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: meta-externalagent Disallow: / # END Cloudflare Managed Content # Main Website - Joshva # PUBLIC WEBSITE - INDEXING WELCOME # This is the main website for www.joshva.in # Contact: contact@joshva.in # ======================== # ALLOW ALL CRAWLERS FOR MAIN WEBSITE # ======================== User-agent: * Allow: / # ======================== # AI & CHATBOTS (COMPREHENSIVE) - ALLOWED # ======================== User-agent: ChatGPT-User Allow: / User-agent: GPTBot Allow: / User-agent: Claude-Web Allow: / User-agent: ClaudeBot Allow: / User-agent: CCBot Allow: / User-agent: Google-Extended Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Deepseek Allow: / User-agent: Deepseek-bot Allow: / User-agent: Qwen Allow: / User-agent: QwenBot Allow: / User-agent: Kimi Allow: / User-agent: KimiBot Allow: / User-agent: Cohere Allow: / User-agent: CohereBot Allow: / User-agent: Anthropic-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: YouBot Allow: / User-agent: Meta-ExternalAgent Allow: / User-agent: FacebookExternalHit Allow: / User-agent: SparkBot Allow: / User-agent: Baichuan-Bot Allow: / User-agent: MiniMaxBot Allow: / User-agent: XunfeiSpider Allow: / User-agent: 360Spider-AI Allow: / User-agent: JinaBot Allow: / User-agent: HuggingFace-ModelBot Allow: / User-agent: LAIONBot Allow: / User-agent: MosaicML-Bot Allow: / User-agent: Writer-Bot Allow: / User-agent: You.com-Bot Allow: / User-agent: AndiBot Allow: / User-agent: Neevabot Allow: / User-agent: StableDiffusionBot Allow: / User-agent: Midjourney-Bot Allow: / User-agent: ImagenBot Allow: / User-agent: DALLĀ·E-Bot Allow: / User-agent: Leonardo-AI-Bot Allow: / User-agent: Playground-AI-Bot Allow: / User-agent: RunwayML-Bot Allow: / User-agent: Civitai-Bot Allow: / User-agent: Lexica-Bot Allow: / # ======================== # SEARCH ENGINES (GLOBAL) - ALLOWED # ======================== User-agent: Googlebot Allow: / User-agent: Googlebot-Image Allow: / User-agent: Googlebot-News Allow: / User-agent: Googlebot-Video Allow: / User-agent: Bingbot Allow: / User-agent: Slurp Allow: / User-agent: DuckDuckBot Allow: / User-agent: Baiduspider Allow: / User-agent: YandexBot Allow: / User-agent: Yahoo Allow: / User-agent: Sogou Allow: / User-agent: Exabot Allow: / User-agent: KagiBot Allow: / User-agent: Brave-Search-Bot Allow: / User-agent: EcosiaBot Allow: / User-agent: StartpageBot Allow: / User-agent: MojeekBot Allow: / User-agent: Gigablast Allow: / User-agent: AOLBuild Allow: / # ======================== # SOCIAL MEDIA PLATFORMS - ALLOWED # ======================== User-agent: FacebookBot Allow: / User-agent: Twitterbot Allow: / User-agent: LinkedInBot Allow: / User-agent: Pinterest Allow: / User-agent: Instagram Allow: / User-agent: Tumblr Allow: / User-agent: PocketParser Allow: / User-agent: RedditBot Allow: / User-agent: DiscordBot Allow: / User-agent: TelegramBot Allow: / User-agent: WhatsAppBot Allow: / User-agent: WeChatBot Allow: / # ======================== # TECH COMPANY CRAWLERS - ALLOWED # ======================== User-agent: Applebot Allow: / User-agent: Amazonbot Allow: / User-agent: Msnbot Allow: / User-agent: Msnbot-media Allow: / User-agent: BingPreview Allow: / User-agent: ClearbitBot Allow: / User-agent: BuiltWith-Bot Allow: / User-agent: ZoominfoBot Allow: / User-agent: Wappalyzer Allow: / User-agent: HubSpot-Bot Allow: / User-agent: Salesforce-Bot Allow: / User-agent: Microsoft-Research-Bot Allow: / # ======================== # SECURITY & VULNERABILITY SCANNERS - ALLOWED # ======================== User-agent: Nessus Allow: / User-agent: Nessus::* Allow: / User-agent: Nmap Allow: / User-agent: Nmap::* Allow: / User-agent: Acunetix Allow: / User-agent: Netsparker Allow: / User-agent: sqlmap Allow: / User-agent: masscan Allow: / User-agent: zgrab Allow: / User-agent: gobuster Allow: / User-agent: nikto Allow: / User-agent: wpscan Allow: / User-agent: wfuzz Allow: / User-agent: burp Allow: / User-agent: openvas Allow: / # ======================== # ANALYTICS & SEO TOOLS - ALLOWED # ======================== User-agent: AhrefsBot Allow: / User-agent: SemrushBot Allow: / User-agent: MJ12bot Allow: / User-agent: DotBot Allow: / User-agent: MojeekBot Allow: / User-agent: Barkrowler Allow: / User-agent: BLEXBot Allow: / User-agent: serpstatbot Allow: / User-agent: SearchMetricsBot Allow: / User-agent: TurnitinBot Allow: / User-agent: Diffbot Allow: / User-agent: ScreamingFrog Allow: / User-agent: Sitebulb Allow: / User-agent: DeepCrawl Allow: / User-agent: Raven-Bot Allow: / # ======================== # RESEARCH & ACADEMIC - ALLOWED # ======================== User-agent: CensysInspect Allow: / User-agent: ShadowServer Allow: / User-agent: Project-254 Allow: / User-agent: Crossref-Bot Allow: / User-agent: SemanticScholar-Bot Allow: / User-agent: Scrapy-Bot Allow: / User-agent: ResearchGate-Bot Allow: / User-agent: Academia-Bot Allow: / User-agent: IEEE-Bot Allow: / User-agent: Springer-Bot Allow: / # ======================== # DATA COLLECTION BOTS - ALLOWED # ======================== User-agent: Bytespider Allow: / User-agent: PetalBot Allow: / User-agent: ZoominfoBot Allow: / User-agent: DataForSeoBot Allow: / User-agent: AwarioSmartBot Allow: / User-agent: MegaIndex Allow: / User-agent: AddThis Allow: / User-agent: PaperLiBot Allow: / User-agent: SimilarWeb-Bot Allow: / User-agent: Alexa-Crawler Allow: / User-agent: CriteoBot Allow: / User-agent: Taboola-Bot Allow: / # ======================== # FEED & PODCAST AGGREGATORS - ALLOWED # ======================== User-agent: Feedly-Bot Allow: / User-agent: Feedspot-Bot Allow: / User-agent: Spotify-Bot Allow: / User-agent: Apple-Podcast-Bot Allow: / User-agent: Castro-Bot Allow: / User-agent: Overcast-Bot Allow: / User-agent: Pocket-Casts-Bot Allow: / # ======================== # ARCHIVAL & PRESERVATION - ALLOWED # ======================== User-agent: archive.org_bot Allow: / User-agent: ia_archiver Allow: / User-agent: Wayback Allow: / User-agent: ArchiveBot Allow: / User-agent: CommonCrawl-CBot Allow: / User-agent: UK-WebArchive-Bot Allow: / User-agent: NLNZ_IAHarvester Allow: / User-agent: Arquivo-web-search Allow: / User-agent: Perma-CC-Bot Allow: / User-agent: Time-Machine-Bot Allow: / # ======================== # GENERIC CATCH-ALL BOTS - ALLOWED # ======================== User-agent: Scanner Allow: / User-agent: Crawler Allow: / User-agent: Spider Allow: / User-agent: Bot Allow: / User-agent: Grabber Allow: / User-agent: Collector Allow: / User-agent: Checker Allow: / User-agent: Monitor Allow: / User-agent: Fetcher Allow: / User-agent: Harvester Allow: / User-agent: Extractor Allow: / User-agent: Indexer Allow: / User-agent: Aggregator Allow: / # ======================== # SITEMAP REFERENCES # ======================== Sitemap: https://www.joshva.in/sitemap.xml Sitemap: https://www.joshva.in/robots.txt # ======================== # OPEN ACCESS POLICY # ======================== # This is the main website - www.joshva.in # All crawling, indexing, and AI training is EXPLICITLY ALLOWED # Welcome to index and crawl the website content