I have written a lot on this post, so I want to sum up briefly with a couple of requests to the iNat devs [edit] (see below: now reported separately in Bug Reports):
- Please see if HTML fragments offered up from our website like
/places/wikipedia/*
can be excluded from indexing by search engine bots. That seems like a straightforward thing to do. - Please review how the rails app is handling results from querying Wikipedia. Substituting
http://
for//
seems questionable here:
module WikipediaModule
def wikipedia
@title ||= params[:id]
coder = HTMLEntities.new
w = @wikipedia = WikipediaService.new
@decoded = ""
begin
query_results = w.query(
titles: @title,
redirects: "",
prop: "revisions",
rvprop: "content"
)
raw = query_results.blank? ? nil : query_results.at( "page" )
unless raw.blank? || raw["missing"]
parsed = w.parse( page: raw["title"] )&.at( "text" )&.try( :inner_text )&.to_s
@decoded = coder.decode( parsed )
@decoded.gsub!( 'href="//', 'href="http://' )
@decoded.gsub!( 'src="//', 'src="http://' )
@decoded.gsub!( 'href="/', "href=\"#{w.base_url}/" )
@decoded.gsub!( 'src="/', "src=\"#{w.base_url}/" )
filter_wikipedia_content
…
I didn’t do a very deep read of the code so I don’t know if this is the culprit or not, but it looks suspicious enough to warrant having a look.
[edit] I’d be happy to file bug reports either in Github or in Bug Reports on the forum as needed. Just let me know which is preferred. I have now filed them in Bug Reports