Siddhanta Kosha
Siddhanta Kosha is a web portal for searchable and semantically annotated Bharatiya books (https://siddhantakosha.org). It offers paid subscription based access to consumers. It offers remuneration to content curators to curate, enrich and annotate its content. Bharatiya knowledge is embedded in millions of manuscripts, some printed and mostly hand-written, in diverse fields, technical, religious, social. To mine such knowledge using the latest technologies such as ChatGPT, a big hurdle is making them available in the form of processable text in Indic languages. Indic languages are very different grammatically from English, but have a common, rich grammatic structure derived from Sanskrit. Annotating these texts grammatically (splitting the sandhis and samaasas) will greatly help in automating the further linguistic process and semantic knowledge extraction. This step is highly ambiguous and cannot be fully automated. The second major hurdle is the unfamiliarity of today’s technologists with the language and discursive style used in these texts. This can be overcome only by involving traditional shaastra experts in the interpretation.
True exploitation of Indic knowledge for modern innovation by technical institutes such as IITs is possible only after making our textual heritage accessible and intelligible as above. Potential STEM areas where India’s traditional knowledge can have positive impact today include health sciences, ecology, sustainable agriculture, architecture and town planning, design. Potential non-STEM areas include Psychology, Political Science, Management and Arts.
Our mission is to create curated and grammatically tagged text versions of top 10 seminal books (printed) in each of the following IKS knowledge verticals to enable technical institutes to dive deep into. This will make the content of those books searchable and accessible via internet search engines and latest text mining tools. As a result, the knowledge contained in these books will be available for contemporary research.
- Agriculture and Ecology
- Source texts: Krishi and Vrikshaayurveda texts, Brihat Samhita
- Architecture and Civil Engineering
- Source texts: Vaastu and Agama texts
- Ayurveda
- Source texts: Ayurveda texts from ITRA library
- Knowledge Processing
- Source texts: Nyaaya, Miimaamsa, Arthika Vyakarana
- Governance and Management
- Source texts: Arthashaastra, Niitisaara, Sukra Niiti and their commentaries
- Mathematics
- Source texts: Indic Math treatises
- Behavior modelling (Applied Jyotisha)
- Source texts: Jyotisha texts related to predictive modelling of international relations, weather, countries, organisations
For each of these areas, we shall convene a workshop with domain experts to select the source texts for maximal impact. If a book is already available in text form, we can use it. Siddhanta Knowledge Foundation has developed a searchable knowledge portal of Bharatiya granthas such as the above, called Siddhantakosha.org. This portal can be used to support technical IKS research at multiple institutions. This portal will enable scanned Indic books to be uploaded, and offer text conversion, tagging and republication services.
Start with scanned image versions of Indian language books available in printed form and then perform the following steps:
- Convert the scanned version into text.
- Proofread the text.
- Grammatically annotate the compound words to identify the constituent words.
- Index the text to make it searchable via google.
- Tag the technical terms and the location of their definitions in the text.
Example output
Original Sloka:
कर्मण्येवाधिकारस्ते मा फलेषु कदाचन।
मा कर्मफलहेतुर्भूर्मा ते सङ्गोऽस्त्वकर्मणि।।
Grammar-annotated version:
कर्मणि+एव+अधिकारः+ते मा फलेषु कदाचन ।
मा कर्म-फल-हेतुः+भूः+मा ते सङ्गः+अस्तु+अकर्मणि ॥
Concept-tagged version:
कर्मणि+एव+अधिकारः+ते मा फलेषु कदाचन ।
मा कर्म-फल-हेतुः+भूः+मा ते सङ्गः+अस्तु+अकर्मणि ॥