28 views
--- breaks: false --- # OKH Community Call - 2025 June 11th ## Overview ![OKH dataflow - for humans](https://codeberg.org/OSEGermany/OpenSearchEcosystem/media/branch/main/res/media/img/ngi-search-ecosystem.svg) [More detailed version of the above graphic](https://github.com/iop-alliance/OpenKnowHow/raw/refs/heads/master/res/media/img/dataflow-principle.svg) <!-- ![OKH dataflow - for cyborgs](https://github.com/iop-alliance/OpenKnowHow/raw/refs/heads/master/res/media/img/dataflow-principle.svg) --> ## Changes - The whole thing is more complex now (Julian, that one's for you! ;-) ) - ... as a side effect, a price we think we need to pay, for the goals we have in mind - The OKH Ontology is more optimized and robust then ever, [ready to be cooperated on](https://github.com/iop-alliance/OpenKnowHow/) - ~35k (2022) -> ~ one million projects indexed - different scraping sources: - removed: - <s>github.com API</s> (purposefully crippled search) - <s>wikifactory.com</s> (API removed) - kept: - appropedia.org - oshwa.org - thingiverse.com (aka "the big exception", in (very) lower-case) - new: - manifest-repos - git repos containing multiple manifest files \ (an extensive example was proivded by Mairin Ogrady through Public Invention (Robert Reed), covering Medical OSH) - manifest-lists - a recursive set of lists (Kaspars invention) - considered but found unworthy: - <s>gitlab.com API</s> (purposefully crippled search) - <s>[printables.com](https://github.com/iop-alliance/OKH-krawler/issues/8)</s> (no API) - <s>[makerworld.com](https://github.com/iop-alliance/OKH-krawler/issues/9)</s> (no API) - <s>[hardware-x.com](https://github.com/iop-alliance/OKH-krawler/issues/6)</s> (no sources of their "OS"H projects) - <s>[youmagine.com](https://github.com/iop-alliance/OKH-krawler/issues/2)</s> (no API) - Images now ... - have tags - can occupy slots - can have captions (aka depiction texts) - Ontologies - hosted on perma-URLs now (thx to [w3id]) - following [Ontology hosting best-practices] (using [ontprox]) - ... the little thing, lots and lots of little things! - There is more data cleaning happening in the crawling process (LOTS of work, this one!) - Last but not least: We have a permanent (at least 10 years) server hosting OKH data now! ## Learnings > Puh, this business we got into ... it is haaard! One has to do a lot: - getting to the data in the first place - ... and efficiently so! - keeping it up to date - figure out its strucutre - figure out its _actual_ content (vs the list of fields of the API/Schema) - cleaning; tidying up the data - mapping it to [our schema][OKH Ontology]; coneptually and practically - ... which includes: finding ways to make use of the original data without compromising "generality"/agnosticism of the OKH ontology (schema) - keeping the data consistent between sources - preventing duplicates, aka the [distributed identification] problem - try to convince people to work together on a shared solution, instead of going for their own, isolated one, even though that would be much easier for them - [naming things] is hard, but crucial in this business ### Outlook - update the IoPa website content about and references to OKH - solving [distributed identification] - establishing a [BoM standard], and therewith introduce ["atomized" BoMs]( https://github.com/iop-alliance/OpenKnowHow/issues/145) (this standard needs to be very flexible and come with robust, practical tools) - [naming things] - check-out [LinkML], either as a potential single-source-of-truth format, or as an intermediary one - make the **scraper** production ready - continuous mode - performance (faster) - maintainable (not Python nor web-tech) - implement robust project identification - [Standards - Is it the solution?](https://github.com/iop-alliance/OpenKnowHow/issues/154) - difficult - frustrating - lots and lots of work - annoying for outsiders - ... ... but is there an alternative? \ For now, we only see the option of "biting the sour apple". ... hm!? maybe add some sweet, soft sliced dates? - Standards (including Ontologies) - further improve the ecosystem (software/tech wise) - establish stability over longer time-periods (as [LinkML] seems to have managed) - devise new, intuitive ways for visualization and editing \ (-> General Graph-Data Browser and Editor -> TODO Write blog post) [distributed identification]: https://github.com/iop-alliance/OpenKnowHow/issues/160 [LinkML]: https://linkml.io/ [BoM standard]: https://codeberg.org/OSEGermany/open-bom/ [OKH Ontology]: https://github.com/iop-alliance/OpenKnowHow/blob/master/src/spec/okh.ttl [Ontology hosting best-practices]: https://www.w3.org/TR/swbp-vocab-pub/#negotiation [ontprox]: https://github.com/elevont/ontprox [w3id]: https://w3id.org [naming things]: https://codeberg.org/OSEGermany/osh-ont/