i don’t see much inconsistency between “scrap[ing] comments to generate information” and “collect[ing] relevant comments” (emphasis mine).
suppose we have this these identification notes related to Rudbeckia hirta:
- “has hairy green parts”
- “has hairy stems and leaves”
- “it’s fuzzy”
- “leaf faces and stems hispid to hirsute”
- “tallos y hojas peludas”
- “hirta = hairy”
- (1000 more variations of the above)
do you want a page that displays 1006 items that say the same thing in different ways?
if not, then as far as i know, any automated method of trying to present that information in a meaningful, more concise way is going to involve some sort of language model.
for me, something like this would be ideal:
“community identifiers note that Rudbeckia hirta has hairy stems and leaves (hirta = hairy). more precisely, the leaf faces are stems are hispid to hirsute. observations 1, 2, 3, and 4 have discussions involving identifiers X, Y, Z that may provide more insight.”
but even if the system did something like grouping everything together into logical conceptual groupings, i think that still involves a language model to know how to form the logical groupings (maybe presented within, say, an expandable list). for example:
hairy stems and leaves >>
- “leaf faces and stems hispid to hirsute” (ex. obs 1 with identifier X)
- “hirta = hairy” (ex. obs 3 with identifier Z)
- etc…