banner
akihaye

akihaye

Universe!
follow
github
steam
discord user
email

An attempt to organize files using ChatGPT + Mac Terminal

As usual, during the daily follow-up, after obtaining the patient's evidence, I silently think about my prescription in my mind, then compare it with the teacher's prescription. It is best if the flavors and dosages do not differ by more than three. However, today's topic is not about copying prescriptions and differentiating syndromes in the teacher's clinic. The teacher has a habit of recording electronic medical records, and the small computer has been recording cases for seven or eight years, each saved in txt format under the Windows 7 system. This is excellent material for learning the teacher's clinical thinking and is very suitable for data analysis.

As a long-time user of Obsidian for note-taking and data management, I decided to import these medical records into Obsidian as my case study database. Obsidian's powerful plugin library and local offline storage meet my needs, and the global search greatly facilitates the retrieval of prescription information.

Obsidian Tcm Folder panel


Responding to Challenges#

However, after copying the files to my own Mac, problems arose one after another.

First, the text formats between Windows and Mac systems are different, and many txt files appear garbled after being copied to Mac, and some cannot even be opened. Upon inspection, the reason is that the teacher's small computer uses the GB18030 encoding format, while the Mac uses the UTF-8 format. For more details, see an explanation on Zhihu: Why does Windows use GB18030 as the default for Chinese instead of UTF-8?

Secondly, Obsidian uses Markdown format for storage, which means I need to convert thousands of case records into md format.

But these are minor issues; they can be resolved with two lines of code using Terminal.
Mac terminal

iconv Command#

iconv is a tool for converting file encodings. This command can be understood as "using the iconv command, for GB18030 files to UTF-8 files, output the converted file." But facing thousands of case files, we need a command for batch conversion:

mv Command#

Having resolved the encoding format issue, let's see how to convert files to the Markdown format supported by Obsidian:

mv is the "move/rename" command, used to change the file extension, which can be understood as "Move .txt file to .md file." The batch conversion command is:


Taking Advantage of the Situation#

I thought that after the conversion, the organization of the database could come to a close, but after a preliminary check of the outpatient case files, I discovered a frustrating problem!

  • There are duplicate files in the case records.
  • Among the duplicate files, there are cases where "file names are the same," but "contents are different."

Faced with thousands of case files, manually checking and deleting redundant files one by one? Clearly unrealistic, not only is it a huge waste of time for you and me, who are in our prime, but the repetitive mechanical operation can be disheartening, making one’s scalp tingle just thinking about it.

After much contemplation, I decided to take advantage of the times and utilize ChatGPT's programming capabilities to help me complete these tedious tasks.
image

Communicating with ChatGPT#

ChatGPT's coding capabilities are already quite impressive, but most of the time it still outputs unsatisfactory results. Therefore, the focus should be on communication skills with ChatGPT.

  1. Ask questions aimed at getting GPT to output explanatory tutorials.
  2. Be detailed in your requests, as detailed as writing a thesis, making each step clear.
  3. Communicate repeatedly, just like a mentor asking you to revise your proposal multiple times; give GPT requirements, and it will connect the context to meet your needs.
  4. If encountering an unfamiliar knowledge point, start a "new conversation" to avoid affecting the current dialogue.
  5. If the repeated modifications yield unsatisfactory results, open a new dialogue box, identify the issues from previous communications, organize the question text, and re-output until satisfied.
  6. Backup the original files, conduct small sample tests, and if issues arise, provide feedback to GPT for modifications, repeating steps 3-5.

Here is my final question text (opened a "new conversation" and got satisfactory results from GPT, yay!)
How to ask a good question to GPT

(This command line can be used directly, but be sure to cd into the folder first!)


Appendix: Using ChatGPT to Optimize Encoding and File Format Conversion Steps#

Clarify the requirement: batch convert files in the folder from GB18030 to Unicode UTF-8 format and change to .md format!
In the ChatGPT dialogue, I proposed the requirement:
ask GPT1

Specific issues discovered while validating the command line were fed back to GPT for modification:
ask GPT2


This is just one person's attempt to engage with new things to solve a specific need among the multitude of beings.
Regarding ChatGPT's assistance in completing work, it is like a traditional Chinese medicine practitioner finding the right evidence and accurately differentiating syndromes to prescribe a personalized prescription. As long as the right method is found, the goal can be achieved!

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.