What's with the "Places > Wikipedia" pages on iNat?

I have written a lot on this post, so I want to sum up briefly with a couple of requests to the iNat devs [edit] (see below: now reported separately in Bug Reports):

  1. Please see if HTML fragments offered up from our website like /places/wikipedia/* can be excluded from indexing by search engine bots. That seems like a straightforward thing to do.
  2. Please review how the rails app is handling results from querying Wikipedia. Substituting http:// for // seems questionable here:

https://github.com/inaturalist/inaturalist/blob/6352bbb3d5ff59b9c2dfad8c5556426a183f4e8c/app/controllers/shared/wikipedia_module.rb#L21-L22

  module WikipediaModule
    def wikipedia
      @title ||= params[:id]
      coder = HTMLEntities.new
      w = @wikipedia = WikipediaService.new
      @decoded = ""
      begin
        query_results = w.query(
          titles: @title,
          redirects: "",
          prop: "revisions",
          rvprop: "content"
        )
        raw = query_results.blank? ? nil : query_results.at( "page" )
        unless raw.blank? || raw["missing"]
          parsed = w.parse( page: raw["title"] )&.at( "text" )&.try( :inner_text )&.to_s
          @decoded = coder.decode( parsed )
          @decoded.gsub!( 'href="//', 'href="http://' )
          @decoded.gsub!( 'src="//', 'src="http://' )
          @decoded.gsub!( 'href="/', "href=\"#{w.base_url}/" )
          @decoded.gsub!( 'src="/', "src=\"#{w.base_url}/" )
          filter_wikipedia_content
          …

I didn’t do a very deep read of the code so I don’t know if this is the culprit or not, but it looks suspicious enough to warrant having a look.

[edit] I’d be happy to file bug reports either in Github or in Bug Reports on the forum as needed. Just let me know which is preferred. I have now filed them in Bug Reports

  1. https://forum.inaturalist.org/t/wikipedia-place-page-fragment-is-indexed-by-search-engines/57735
  2. https://forum.inaturalist.org/t/place-page-about-tab-from-wikipedia-sometimes-breaks-images/57736
5 Likes