How do chatbots answer questions through knowledge graphs?

How do chatbots answer questions through knowledge graphs

In 1950, Turing published a landmark paper “Computing Machinery and Intelligence” (Computing Machinery and Intelligence), and proposed a famous judgment principle about robots – Turing Test, also known as Turing Judgment, which states that If a third party cannot distinguish the difference between the responses of humans and AI machines, it can be concluded that the machine has artificial intelligence.

In 2008, Jarvis, the AI butler in Marvel’s “Iron Man”, let people know how AI can accurately help humans (Tony) solve various affairs thrown over…

At the beginning of 2023, ChatGPT, a free chat robot that exploded from the technology industry in a 2C way, will sweep the world.

According to a UBS research report, its monthly active users reached 100 million in January and is still growing. It has become the fastest growing consumer application in history. In addition, its owner, OpenAI, will soon launch the Plus version, which is said to be around $20 per month, after releasing the $42-a-month Professional Edition Pro in the early stage.

When a new thing has hundreds of millions of monthly users, traffic comes up, and commercial realization starts, are you curious about the various technologies behind it? For example, how do chatbots process and query massive amounts of data?

Friends who have experienced ChatGPT have the same feeling. It is obviously smarter than Tmall Genie or Xiaoai Children’s Shoes – it is a chat robot with “invincible speech skills”, a natural language processing tool, a large language model, and an artificial intelligence application. It can interact with humans based on the context of the question material, can reason and create, and even reject (it thinks) inappropriate questions, not just complete anthropomorphic communication.

Although its current evaluation is mixed, from the perspective of technological development, it may even pass the Turing test. Let me ask, when we communicate with it, its (for Xiaobai) extensive knowledge, and its sweet and oily answers, if we are completely ignorant, it is difficult to distinguish whether the other party is a human or a machine ( Perhaps this is its dangerous place – the core of ChatGPT still belongs to the category of deep learning, there are a lot of black boxes and uninterpretable!).

So, how does the chatbot manage to quickly organize and output the training corpus of 300 billion words and 175 billion parameters? At the same time, it can combine the context and freely respond to and What about human communication?

In fact, chatbots also have brains, and they need to learn + train just like us humans.

It uses NLP (Natural Language Processing), target recognition, multi-modal recognition, etc. to structure a large amount of unstructured files such as text and pictures into a knowledge map according to its semantic structure. This knowledge map is the brain of the chat robot.

What does a knowledge graph consist of?

What does a knowledge graph consist of? It is composed of points (entities) and edges (relationships), and can integrate relevant information such as people, things, and things to form a comprehensive graph, as shown in the figure below.

When asking “Who is the founder of OpenAI?”, the brain of the chatbot starts to quickly search and search in its own knowledge base. Ask a question, and chain out another point-the founder “Sam Altman”.

In fact, when we mention “who is the founder of OpenAI”, the chatbot will associate all the graphs around this point in its own knowledge base. Therefore, when we asked related questions, it actually predicted our predictions. For example, when we asked: “Is Musk a member of the founding team of OpenAI?” Just one command was issued, and it has already queried all the members (reference from one thousand to one)..

In addition, if other “learning materials” are included in its library, then related pictures such as “What are the products of artificial intelligence robots?” are also associated in its “brain”, as shown in the figure below.

Of course, chatbots are the same as people, answering questions will be limited by their own knowledge reserves

We know, what is the judgment that determines whether a person’s brain is fast or not, and whether he is smart or not? From a human point of view, the simplest criterion is whether one has the ability to infer other cases from one instance.

The Master said: “If you don’t get angry, don’t open up, if you don’t worry about it, don’t express it, if you take one corner and don’t use three corners to counter it, then there will be no more.”

——The Analects of Confucius

As early as two thousand years ago, Confucius emphasized the importance of being good at drawing inferences from one instance, learning from one another, and understanding by analogy. For chatbots, the quality of their answers depends on the computing power of building knowledge graphs.

We know that the construction of general-purpose knowledge graphs has focused on NLP and visual presentation for a long time, but has ignored issues such as computational timeliness, data modeling flexibility, query (calculation) process, and result interpretability. . Especially as the whole world is transforming from the era of big data to the era of deep data, the defects of the traditional SQL or NoSQL-based graphs in the past are no longer capable of efficiently processing massive, complex, and dynamic data, let alone linking , Mining and analyzing insights? So, what are the characteristics of the challenges faced by traditional knowledge graphs?

One is low computing power (inefficient). The underlying architecture of knowledge graphs built with SQL or NoSQL database systems is inefficient and cannot process high-dimensional data at high speed.

Second, poor flexibility. Knowledge graphs based on relational databases, document databases, or low-performance graph databases are usually limited by the underlying architecture and cannot efficiently restore the real relationship between entities. For example, some of them only support simple graphs, and when entering multi-edge graph data, either information is easily lost, or it takes a high cost to compose the graph.

The third is that there is nothing to show. Before 2020, very few people really paid attention to the underlying computing power, and almost all the knowledge graph system construction was only around the two parts of NLP and visualization. The knowledge graph without the support of the underlying computing power is only the extraction and construction of ontology and triples, and does not have the ability to solve problems such as in-depth query, speed, and interpretability.

So far, we have talked about another cutting-edge technology, the graph database (graph computing) technology field, from the topic of the intelligent knowledge map of the chat robot.

What is a graph database (graph computing)?

Graph database is an application of graph theory, which can store the attribute information of entities and the relationship information between entities. In terms of definition, graph (Graph) is a data structure defined by nodes and edges.

Graph is the basis of knowledge graph storage and application services, and has strong data association and knowledge expression capabilities, so it is highly respected by academia and industry.

As shown in the figure above, we can see that with the help of the real-time graph database (graph computing) engine, the industry can find various relationships between different data in real time, and even find the optimal The intelligent way to achieve – this is derived from the high dimensionality of graph databases.

What is high dimensionality? The graph is not only a tool that conforms to the thinking habits of the human brain and can intuitively model the real world, but also can establish deep insights (deep graph traversal).

For example, everyone knows the “butterfly effect”, which is to capture the subtle relationship between two or more entities that seem to have nothing to do with each other in the massive data and information. From the perspective of data processing architecture, if there is no The help of graph database (graph computing) technology is extremely difficult to achieve.

Wind control is one of the typical scenarios. The 2008 financial crisis was triggered only by the collapse of Lehman Brothers, the fourth largest investment bank in the United States, but no one expected that the collapse of a 158-year-old investment bank would trigger a series of subsequent bankruptcies in the international banking industry. Trends… its wide influence and scope are unexpected; and the real-time graph database (graph computing) technology can find all the key nodes, risk factors, and risk propagation paths… Early warning of the entire financial risk.

[Note: The above compositions were all completed on Ultipa Manager. Friends who are willing to learn and explore further can read one of the series of articles: Walking into the high visualization of Ultipa Manager]

It should be pointed out that although many manufacturers can construct knowledge graphs nowadays, the reality is that among every 100 graph companies, less than 5 (less than 5%) use (high-performance) graph databases as computing power support.

Ultipa Yingtu Database is currently the only fourth-generation real-time graph database in the world. Through innovative patented technologies such as high-density concurrency, dynamic pruning, and multi-level storage and computing acceleration, it realizes ultra-deep real-time drill-down of data sets of any magnitude. .

One is high computing power.

Take the search for the ultimate beneficiary of the enterprise (also known as the actual controller, major shareholder) as an example. The challenge of this type of problem is that in the real world, there are often many nodes (shell company entities) between the ultimate beneficiary and the inspected corporate entity, or there are multiple investment and equity participation paths between multiple natural persons or corporate entities Control other companies. Traditional relational databases or document databases, and even most graph databases, cannot solve this kind of graph penetration problem in real time.

Ultipa’s real-time graph database system solves many of the above challenges. Its high-concurrency data structure and high-performance computing and storage engine can perform deep mining at a speed of 100 times or even faster than other graph systems, and find the ultimate beneficiary or discover a huge investment relationship in real time (within microseconds) network. On the other hand, microsecond latency means higher concurrency and system throughput, which is a 1000x performance improvement over systems that claim millisecond latency!

Taking the real scene as an example, the former president of China CITIC Bank, Sun Deshun, used the method of opening multiple “shadow companies” to complete the transfer of benefits with the help of financial means.

As shown in the picture above, Sun Deshun used the public power of China CITIC Bank to approve loans for business owners; correspondingly, business owners either in the name of investment or sent high-quality investment projects, investment opportunities, etc.; The shell company completes direct transactions; or the business owner injects huge sums of money into the investment platform company actually controlled by Sun Deshun, and then the platform company uses these funds to invest in the projects provided by the boss, so that money is used to generate money, and everyone shares profits and dividends. Form a community of interests.

Ultipa Yingtu real-time graph database system, through the way of white box penetration, digs out the complex relationships between people and people, people and companies, and companies and companies, and locks the ultimate behind-the-scenes people in real time.

The second is flexibility.

The flexibility of the graph system can be a very broad topic, which generally includes several parts such as data modeling, query and calculation logic, result presentation, interface support, and scalability.

Data modeling is the basis of all relational graphs and is closely related to the underlying capabilities of the graph system (graph database). For example, a graph database system based on a column database like ClickHouse cannot carry a financial transaction graph at all, because the most typical feature of a transaction network is that there are multiple transfers between two accounts, but ClickHouse tends to combine multiple transfers into one, This unreasonable practice can lead to data confusion (distortion). Some graph database systems built based on the concept of unilateral graphs tend to use vertices (entities) to express transactions. As a result, the amount of data is enlarged (storage waste), and the complexity of graph queries increases exponentially (timeliness changes). Difference).

The interface support level is related to user experience. To give a simple example, if a graph system in a production environment only supports CSV format, then all data formats must be converted to CSV format before entering the graph, which is obviously too inefficient, but this is true in many graph systems existing.

What about the flexibility of query and calculation logic? Let’s still take the “butterfly effect” as an example: Is there some kind of causal (strong correlation) effect between any two people, things or things in the map? If it is just a simple one-step association, any traditional search engine, big data NoSQL framework or even a relational database can solve it, but if it is a deep association relationship, such as the relationship between Newton and Genghis Khan, how to calculate this Woolen cloth?

Ultipa’s real-time map data system can provide more than one method to solve the above problems. For example, point-to-point deep path search, multi-point network search, template matching search based on certain fuzzy search conditions, and map-oriented fuzzy text path search similar to Web search engines.

There are many other tasks on the graph that must rely on high flexibility and computing power, such as finding points, edges, and paths based on flexible filter conditions; pattern recognition, community, and customer group discovery; finding all or specific neighbors of nodes ( or recursively discover deeper neighbors); find entities or associations with similar attributes in the graph… In short, a knowledge graph without the support of graph computing power is like a soulless shell with an empty surface. Unable to complete a variety of challenging, deep search capabilities.

The third is, low code, what you see is what you get.

Mapping system except above

In addition to the high computing power and flexibility mentioned, there is also a need for white-boxing (explainability), formalization (low-code, no-code), and the ability to empower business in a WYSIWYG manner.

In the Ultipa Yingtu real-time graph database system, developers only need to type a sentence of Ultipa GQL to complete the operation, while business personnel can use the preset form plug-in to realize the query of the business through zero code. This method greatly helps employees improve work efficiency, and at the same time empowers organizations to reduce operating costs and breaks down communication barriers between departments.

To sum up, the combination of knowledge graph and graph database will help all walks of life to accelerate the realization of business construction in the middle of the data, but industries such as the financial industry that require professionalism, security, stability, real-time, and accuracy , the use of relational databases to support upper-layer applications cannot provide good data processing performance, and even cannot complete data processing tasks. Only database (graph computing) technology can empower organizations to better strategize and win thousands of miles!

https://oaicon.com/index.php/2023/03/03/how-do-chatbots-answer-questions-through-knowledge-graphs/

At this point in the writing, I suddenly remembered the hit “Three-Body Problem”, which mentioned a very interesting point-the sophon lock. It roughly means that in order to prevent the Earth’s technology from surpassing it, the Three-Body Civilization made various obstacles by locking the basic science of mankind. Because the leap of human civilization depends on the development and major breakthroughs of basic science, blocking the basic science of human beings is tantamount to blocking the way for the earth to upgrade its civilization…Of course, what I want to tell you is that graph technology belongs to artificial intelligence One of the basic infrastructures, to be precise, graph technology = enhanced intelligence + explainable AI, which is the inevitable product of the integration of AI and big data in the development process.