Should we develop a new DB for LLM dev?

Jan 4, 2024 ·

In the article, PCP_Liu shares a fascinating story about his work as an AI engineer. He describes a project he was working on to develop a machine learning algorithm for a large media company that was trying to build an intelligent system to monitor and filter content.

The project was successful and the system was deployed without any major issues. However, after some time, the system started to detect incorrect results. After further investigations, PCP_Liu discovered that the problem was caused by the lack of training data which had not been properly represented in the model.

To resolve this issue, he proposed a new approach to improve the system's accuracy. This idea involved dividing the data into two sets: one for training and another for validation. By using different data sets for these tasks, it would be possible to train the model to recognize more nuances in the data and reduce errors. The plan was successful and the intelligence system was refined to accurately detect correct and incorrect results.

PCP_Liu ends his story by noting the importance of understanding the data before building an AI system. He encourages people to always remember that data is the most important factor when developing a machine learning system, and careful consideration should be given to its structure and size before making any decisions.