Abstract
Aim: The digital revolution, in which advanced technologies and their dynamically growing user base are generating unprecedented amounts of freely accessible data in cyberspace, some of which is of national security importance, and the collection and analysis of which is far beyond human capacity. This explosion of data justifies a rethinking of the way data has been collected from open sources, including the exploration of ways to automate the collection and processing of information.
Methodology: The study outlines the design and operation of a semi-automated system, examining the theoretical possibilities and pitfalls of its implementation. The study does not examine the legal framework for setting up and operating the system. For reasons of space, the author has concentrated on highlighting the main points, as a detailed discussion of a single topic would exceed the scope of this article. The primary objective of the study is to stimulate and stimulate reflection.
Findings: In addition to the research on the relevant literature in Hungary and abroad, as the author has been investigating the dark web and the possibilities of open source information gathering for many years, he has relied heavily on his own research and experience.
Value: The theoretical system analysis presented in the study showed that such systems cannot be operated without human supervision at the current state of the technological development, including the limitations of the use of artificial intelligence, and therefore only the design of a semi-automated system seems to be a feasible option. Today, data processing is still unthinkable without data engineers skilled in this field. In addition to these limitations, such a system can speed up the collection of data from open sources to an extraordinary extent, making Intelligence more efficient. In practice, of course, it is not necessary to implement or create all the elements at once, as the operation and integration of some of the components into existing systems can significantly speed up and improve the efficiency of the data collection process.