Terminology harvesting — also known as term-harvesting — is a method to select and validate appropriate terms that will eventually be used to search the literature on your topic. Selecting words to search in databases such as PubMed, Google Scholar, and EMBASE can be challenging. Term-harvesting can optimize the searching of databases and serve as evidence of how the search was created. Using the tools and methods suggested in this article will provide you with options to assess how well your collected terms cover the scope of your research topic and have confidence that your literature search will produce relevant results.
Before starting any research project it is important to conceptualize the problem at hand and identify characteristics that would make a term worth harvesting. Once you have identified the problem you wish to solve, the best source for extracting terms is a set of highly relevant papers that you have discovered and reviewed during your preliminary topic review. SWIFT-Review is a literature prioritization tool that uses text-mining to identify highly relevant papers on your research question. The tool is free, but requires a software download to use.
Using frequency analyzers
After you have collected a set of highly relevant papers to your topic, you can use tools to identify terms that are repeated often. Some of these tools are listed below.
- Yale MeSH Analyzer will provide a list of keywords and MeSH controlled vocabulary terms based on an article's PMID (PubMed ID) entered on the tool’s homepage.
- Web of Science's Incites Citation Topics are available once you have logged into your account. First enter terms related to your research topic into the search bar. Next click the "Analyze Results" button. Finally, select "Citation Topics Micro" on the purple dropdown menu. Then scroll down on the results page to view a list of related topics and their frequency. This analyzer offers three levels of topics (macro, meso, and micro) depending on the level of granularity needs.
- Web of Science can also be used to collect MEDLINE MeSH Headings. Rather than searching the Web of Science Core Collection, select "MEDLINE" from the dropdown menu. Then enter your search term(s) into the search bar and click the "Search" button. Next click the "Analyze Results" button. On the dropdown menu, select "MeSH Headings" to view a list of controlled vocabulary collected from the article that your search found.
- OpenAlex provides a variety of information. The "Topic" box shows the frequency of related topics to whatever term was entered in the search bar.
Mapping controlled vocabulary
After you have collect a list of keywords related to each segment of your research question, map which keywords relate to which controlled vocabulary terms within the selected databases. Add all the controlled vocabulary (also known as subject headings) to your term-harvesting working document.
MeSH on Demand is a tool that can help you identify MeSH controlled vocabulary terms related to the set of highly relevant papers on your research topic. Simply copy and paste the text of the paper into the text box provided and click "Search."
Identifying a complementary set of keywords and controlled vocabulary may take a few iterations in which you try out different options. Consider how precise the search will need to be based on the amount of literature existing on the topic already. The keywords and controlled vocabulary selected should account both for subjectivity and inconsistencies of indexing.
If you wish to use Unified Medical Language System (UMLS) concepts, consider using the MetaMap tool, one of the tools used to automatically index the articles in the National Library of Medicine database. For a deeper explanation on that tool, check out the NLM Medical Text Indexer (MTI).
Making decisions
After you have collected keywords (also known as "natural language terms") and controlled vocabulary terms (i.e., a database’s indexed terms) into a single working document, organize the terms by concept. This step will help with the development of search strings. Decide which terms (both keywords and controlled vocabulary) produce a balance between sensitivity and precision within the search that you are developing.
Documenting the terms
To ensure you have all the information captured in one place about the term-harvesting process you have followed, create a document or spreadsheet to store each iteration of this process. Your document or spreadsheet might include headings for each concept within your research question. Lists of keywords and controlled vocabulary can then be organized under each heading. Consider ranking the terms in order of relevance, depending on what kind of study you are conducting.
Testing the results
Ensure comprehensiveness by comparing your harvested terms to the research project’s protocol (if one is required). Consider the following:
- Are all concepts of your research question framework covered in your term-harvesting document?
- Have you considered your inclusion and exclusion criteria while conducting your term-harvesting?
- Have you added spelling variations to account for spelling differences in other countries? What about acronyms and truncated versions of the keywords?
- Double-check your search syntax and take note it may change depending on which database(s) you are using for your test searches.
- Try your terms out in a test search in a few databases. Add additional terms that you find on those databases.
- When you run a test search with your harvested terms, does it locate the key articles that you have already identified?
- Is your test search finding too many or too few results?
Some final considerations
While these tools and tips can help with the term-harvesting process, they do pose potential issues. Human error within the term-harvesting process is always a possibility, which is why the "Testing Your Results" section has been provided. It's important to keep in mind that most indexing within databases is performed by humans, which can also lead to errors. Because of this manual process, articles may take a while to get indexed in a database, which can make them even more challenging to locate. This is where subject-matter expertise comes in handy. If you do not consider yourself a subject-matter expert, connect with one who can make the final determination in term selection.
Further Reading
Bodenreider O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research 1;32 (Database issue):D267-70. doi: 10.1093/nar/gkh061. PubMed PMID: 14681409; PubMed Central PMCID: PMC308795. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308795/
Bramer, W. M., de Jonge, G. B., Rethlefsen, M. L., Mast, F., & Kleijnen, J. (2018). A systematic approach to searching: an efficient and complete method to develop literature searches. Journal of the Medical Library Association: JMLA, 106(4), 531–541. https://doi.org/10.5195/jmla.2018.283
Howard, B.E., Phillips, J., Miller, K. et al. (2016). SWIFT-Review: a text-mining workbench for systematic review. Systematic Reviews 5, 87. https://doi.org/10.1186/s13643-016-0263-z
National Library of Medicine, National Institutes of Health, U.S. Department of Health and Human Services. (N.D.) MeSH. https://www.ncbi.nlm.nih.gov/mesh
National Library of Medicine, National Institutes of Health, U.S. Department of Health and Human Services. (N.D.) Using PubMed in evidence-based practice training course. https://www.nlm.nih.gov/oet/ed/pubmed/pubmed_in_ebp/02-100.html