What's with the "Places > Wikipedia" pages on iNat?

benarmstrong · October 28, 2024, 2:15pm

I have written a lot on this post, so I want to sum up briefly with a couple of requests to the iNat devs [edit] (see below: now reported separately in Bug Reports):

Please see if HTML fragments offered up from our website like /places/wikipedia/* can be excluded from indexing by search engine bots. That seems like a straightforward thing to do.
Please review how the rails app is handling results from querying Wikipedia. Substituting http:// for // seems questionable here:

https://github.com/inaturalist/inaturalist/blob/6352bbb3d5ff59b9c2dfad8c5556426a183f4e8c/app/controllers/shared/wikipedia_module.rb#L21-L22

  module WikipediaModule
    def wikipedia
      @title ||= params[:id]
      coder = HTMLEntities.new
      w = @wikipedia = WikipediaService.new
      @decoded = ""
      begin
        query_results = w.query(
          titles: @title,
          redirects: "",
          prop: "revisions",
          rvprop: "content"
        )
        raw = query_results.blank? ? nil : query_results.at( "page" )
        unless raw.blank? || raw["missing"]
          parsed = w.parse( page: raw["title"] )&.at( "text" )&.try( :inner_text )&.to_s
          @decoded = coder.decode( parsed )
          @decoded.gsub!( 'href="//', 'href="http://' )
          @decoded.gsub!( 'src="//', 'src="http://' )
          @decoded.gsub!( 'href="/', "href=\"#{w.base_url}/" )
          @decoded.gsub!( 'src="/', "src=\"#{w.base_url}/" )
          filter_wikipedia_content
          …

I didn’t do a very deep read of the code so I don’t know if this is the culprit or not, but it looks suspicious enough to warrant having a look.

[edit] ~~I’d be happy to file bug reports either in Github or in Bug Reports on the forum as needed. Just let me know which is preferred.~~ I have now filed them in Bug Reports

Topic		Replies	Views
Wikipedia place page fragment is indexed by search engines Bug Reports staff-can-replicate , github-issue-made	2	93	November 4, 2024
Place page About tab from Wikipedia sometimes breaks images Bug Reports github-issue-made	1	59	November 4, 2024
Whats up with iNaturalist Places General	7	882	January 9, 2023
How do Wikipedia pages get on "About" sections? General wikipedia	7	429	August 22, 2024
Incorrect wikipedia page in 'About' or shows EOL page when wiki page exists Bug Reports	19	931	January 30, 2022

What's with the "Places > Wikipedia" pages on iNat?

Related Topics