How to Build My Traditional Chinese Medicine Brain with the Help of AI: The Journey of Analyzing Case Databases

Vol. A Tutorial for Myself#

Traditional Chinese Medicine Data Analysis#

Data Mining#

Inheritance of Traditional Chinese Medicine#

Obsidian#

Artificial Intelligence

Hey, what will you do when you have a medical case database?
Unknowingly, I have accumulated nearly 100,000 characters of medical case text by following my mentor's prescriptions. When the medical case data reaches a certain scale, a natural idea emerges: it's time to let this valuable experience "speak"!
Here, I share my framework and practical ideas for analyzing Traditional Chinese Medicine prescription data:

I. Establishing a Framework for Analyzing Traditional Chinese Medicine Prescription Data#

Step	Traditional Chinese Medicine Data Issues	Response Strategy	Tools/Technologies
1. Data Collection	- Case information is unstructured (free text) - Standards for tongue image recording are inconsistent	- Structured Entry: Design standardized records for case information and clinical reports - Image Standardization: Unify tongue image collection devices and methods for storage norms	- Create a Traditional Chinese Medicine Electronic Medical Record System (TCM EMR) - Standardize image collection devices and processes
2. Data Cleaning	- Terminology Ambiguity: Different names for the same symptoms (e.g., "fever" vs. "body heat") - Mixed Units: Mixing grams, sticks, bags, etc. - Missing Information: Incomplete records of syndrome types, tongue pulse, symptoms - High Subjectivity: Syndrome determination relies on physician experience - Complex Prescriptions: Various factors for adjusting medications during follow-up	- Standardize Traditional Chinese Medicine terminology - Extract syndrome elements: break down the basic elements of syndrome - Unify dosage units to "grams" - Fill in missing values based on syndrome differentiation logic/or leave blank - Clearly distinguish between the basic prescription for the first visit and the adjusted prescription for follow-up	- WHO International Standard Traditional Chinese Medicine Terminology 2022 - Edited by Deng Tie Tao《Traditional Chinese Medicine Diagnosis》 - Replicate Professor Zhu Wenfeng's《Traditional Chinese Medicine Syndrome Differentiation Mechanism》
3. Feature Engineering	- Complex Multi-dimensional Associations: Symptoms ↔ Syndrome Elements ↔ Syndrome Types ↔ Formulas	- Build Association Networks: Symptom-Syndrome network, Symptom-Formula network - Calculate formula similarity: e.g., Jaccard index	- Data Mining - Complex Network Analysis
4. Analysis Modeling	- Models need to be interpretable: conform to Traditional Chinese Medicine syndrome differentiation logic - Small sample issues	- Syndrome classification model (SVM + Traditional Chinese Medicine rule engine) - Core prescription recommendation	- Association rule algorithms
5. Result Interpretation	- Must conform to the theoretical system of Traditional Chinese Medicine - Emphasize individual differences	- Theoretical validation: Compare model output results with Traditional Chinese Medicine theory and mentor experience - Real case retrospective: Match similar cases in EMR for self-validation	- Retrospective in Traditional Chinese Medicine Electronic Medical Record System (TCM EMR) - Expert review

Current Limitations and Exploration Directions

The association between prescription adjustments and therapeutic effects: The complexity of medication adjustments during follow-up and the difficulty of quantifying specific therapeutic effects require deeper "therapeutic effect-prescription" association rule mining.

II. Extracting Data#

The first step in analysis is to convert the accumulated medical cases into structured data. My approach is:

Build a Traditional Chinese Medicine Electronic Medical Record Database (TCM EMR): I chose to use Obsidian to manage medical cases, storing each case as a structured .md file. This facilitates subsequent information extraction and linking.
Establish Key Field Extraction Standards: This is the most critical step that affects the quality of results! It is necessary to clearly define what information to extract from the case text. The CSV header I designed includes:
- Patient anonymized ID
- Baseline data: Gender, Age
- Diagnosis information: Main disease, Concurrent disease
- Syndrome information: Syndrome elements, Symptoms, Tongue image, Pulse image
- Prescription information: Prescription composition, Medication dosage (unified to grams)

For example:
example1

This data is for demonstration purposes only and is not real data.

III. Smart Use of AI: Accelerating Data Mining of Traditional Chinese Medicine Prescriptions#

Faced with structured CSV data, how to analyze efficiently?
My secret is to use AI to assist in creating Python scripts. For detailed methods, refer to my previous blog post: AI Communication Guide: How to Ask AI Correctly?.

Core ideas and processes:

First, clarify the ultimate goal!
Data Exploration: First, let AI analyze the CSV file to understand the data structure, field meanings, and provide suggestions and precautions.
Specify Concrete Needs: Clearly describe what you want AI to write in Python (or R, etc.), obtain the code generated by AI, and run it in your own environment.
Feedback and Optimization: Is the running result unsatisfactory? Check for errors, analyze reasons, and modify the prompt.
Iterative Loop: Repeat the process of "modifying prompt -> generating new code -> running tests -> feedback" until the code output fully meets your analysis expectations. This process itself is also a deepening understanding of data and problems.

Some code examples:

If you find the code uninteresting, you can ask GPT to personalize it~ (Muscle version & Cute girl version)

Tips: It is recommended to set up a control panel with adjustable parameters when conducting related research, which can better meet your needs without having to ask AI every time.

Example output results:

The output results generated by GPT are satisfactory, providing enough understanding of the mentor's medication patterns. If further used for research projects, further optimization is needed.
In fact, the Ancient and Modern Medical Case Cloud Platform has developed the entire research process, which can be directly used in scientific research. So why waste time tinkering with these things? There are two reasons:
One is that private data is not easy to upload to the cloud platform.
The second is that I didn't buy it in college...

IV. To Be Continued#

This is just the beginning! Next, I want to explore:

Luck Analysis: Convert consultation dates into seasonal/yang-yin parameters.
Therapeutic Effect Feedback: Quantify the degree of improvement during follow-up and include it in the analysis (standardized scoring needs to be designed).
Knowledge Graph: Build an interactive network of "Symptoms-Syndromes-Medicines-Effects".

🌱 If you are also exploring similar topics, feel free to share insights! I especially want to hear:

How do you handle data?

Are there any clever feature engineering methods?

How do you build knowledge graphs?