How does different databases effect & vary the outcome of searches.
Updated: Feb 23
We are in an age often referred to as the information age. In this information age, since we believe that information leads to power and success, and thanks to sophisticated technologies such as computers, satellites, etc., we have been collecting tremendous amounts of information. With the enormous amount of data stored in files, databases, and other repositories, it is increasingly important, if not necessary, to develop powerful means for analysis and perhaps interpretation of such data and for the extraction of interesting knowledge that could help in making profitable decisions.
Many new software tool has created waves in the technology circles with discussions around it’s applications to commerce, retail, consumer data, finance and other areas. However, another segment less mentioned where database solutions are highly relevant is the intellectual property and patent data management domain. Enormous amount of data has the potential to improve the quality of business decisions by enabling text and data mining of vast amounts of data and delivering actionable insights which in the domain of Intellectual Property Research is invaluable.
Intellectual property (IP) databases let you see what your competition is up to, whether it be branding via trade mark application databases or technology via patent databases. Online searchable databases include individual patent office websites (such as uspto.gov or ipindiaservices.gov.in) or websites (such as worldwide.espacenet.com or patentscope.wipo.int/search/) which combine different country IP databases. IP databases are updated frequently and are a source of high quality data. For example, many patent offices will publish full details of patent specifications within 18 months from filing. Trade mark applications are published in many countries the same day as filing or within 6 months from filing. Consequently, IP databases can be an extremely accurate source of information on competitors’ activities.
With the emergence of various Intellectual Property databases in the market wherein each has a different working algorithm, each database has a unique distinctive feature. The outcome of a same search strategy may differ on different databases. Some of the basic parameters that one should keep in mind while choosing suitable database are discussed below:
Full Text Coverage: Patent search requires extensive country coverage of the patents, so the database which gives maximum countries to be searched will have an upper hand. Also, a simultaneous non-patent search will always be beneficial.
Complex Query Handling: When it comes to running complex search queries, especially in domains related to Life Sciences and Chemistry, it becomes crucial that noting goes missing. A database that can handle long complex queries with multiple levels of proximity operators outstands every other database since the basic idea behind having these databases is to ensure exhaustiveness. A missed document may help save millions of dollars. Thomson Innovation (now Derwent Innovation) takes a great lead because mostly it can easily run narrow nested queries in less time without giving any error.
Data Export and Sanity: There are a lot of searchers, who prefer to export data than doing their analysis. For such type of cases, one needs a database that exports results in a user friendly format and provides the export fast. A lot of databases take your export request and then send the data over an email with a link to download. However, if you are searching in real time, this may be really frustrating. Almost all paid databases these days have both the features of providing easily downloadable links on the database itself and sharing the download links over emails.
Processing Time: A searcher runs a minimum of about 50 search strings and analyses a minimum of 500 documents in a normal day. Even a second's delay in display of results can lead to loss of hours of productive time. A database whose servers are either very powerful or are co-located with your geographical region is a must for those who are frequent searchers and conduct analysis on a daily basis. Google’s free patent search engine and Thomson Innovation are comparatively faster than other databases.
User Interface: For a searcher who spends around 8 hours looking at the PC monitor conducting searches, it is very important that the user interface of the database is neat and well organized. A cluttered user interface can make the searcher lose focus while scanning the documents result and may result into a loss of efficiency. Questel Orbit and PatBase provide a very neat and one page access to all important information.
Predictive Analytics: Data analysis does not always hold much gravity unless it can impact business decisions by projecting futuristic insights. Some intellectual property databases can project the future movement of trends on the basis of current and past trends. Development of predictive analysis software using IP data is likely to see more demand. E.g. if a firm is planning to invest in autonomous vehicles has historic analytics of patent data showing the trend map for GPS, radar, artificial intelligence etc., then predictive analysis can help extrapolate those trends to ascertain future movement of activity in these areas.
Semantic Search: Semantic search is a data searching technique in a which a search query aims to not only find keywords, but to determine the intent and contextual meaning of the words a person is using for search. Semantic search discovers the concepts behind the words in order to find relevant matching documents. Google search has been doing a wonderful job since beginning and is constantly improving at this front. Innography's semantic search uses advanced numerical algorithms to examine the word frequency, sequencing and patterns – independent of the specific words used – to detect the author's meaning and intent. Since authors of patent filings use words consistently throughout the patent filing, this is a highly effective method to find similar documents that don’t use the same keywords.
Translated Text: With the rising market in Asia Pacific countries, the patent filing activity in these regions has drastically increased in recent years. A patent filed in these countries is equally important for infringement prevention and preventing revocation of patents. The majorly faced issue with patents filed in these regions is that the script is mostly logographic and entirely different from western world. Most databases provide machine translations of documents using Google’s Translate technology and in-house manual translators etc. Thomson Innovation and PatBase seem to provide good translations in comparison to other databases.
Derwent World Patents Index: The Derwent World Patents Index (DWPI) is a database containing patent applications and grants from 44 of the world’s patent issuing authorities. Thomson Derwent provides access to more than 26 million patent documents issued by patenting authorities including the European Patent Office, France, Germany, Japan, the United Kingdom, the United States, and the WIPO. One of the biggest benefits of the DWPI is the English language abstract, which greatly improves the searcher’s access to non-English language patent information. Derwent is compiled in English by editorial staff. The database provides a short abstract detailing the nature and use of the invention described in a patent and is indexed into alphanumeric technology categories to allow retrieval of relevant patent documents by users. The DWPI is an excellent tool to use when searching global art because each indexed document is given its own new abstract, which is a 200 to 500 word English abstract written by a subject matter expert. Databases like Thomson Innovation (now Derwent Innovation), Delphion and STN are some of the best providers of DWPI data.
Chemical & Sequence Structure Search: A chemical structure search looks for chemical substances that are identical or highly similar to compounds of interest. Search projects involving substances are rarely complete without chemical structure searching, especially in consideration of the rapid growth of generic and exemplified chemical structures disclosed in patents. Sequences are sometimes referred to as a nucleic acid or amino acid search, the purpose of a sequence search is to discover extended amino acid (peptides and proteins) or nucleic acid (genes) sequences that either exactly match or are similar to a sequence of interest. Chemical & Sequence Structure Searches are an integral part of searches like invalidation and FTOs. Databases like STN and PatBase allow you to conduct such searches. These kinds of searches are most handy to stay on top of the latest pharmacological findings or retrieve authoritative chemical and regulatory information for the substances you work with.
For those who work in R&D, patent information databases can serve as an inspiration and a tool for solving problems in their daily work. Patent information databases provide information on competitor's activities. One of the main advantages of the patent information databases is that it represents very often the only source of information about some technical solutions, having in mind that it is the first publication and sometimes the only one, due to the precondition for invention protection - novelty.
Searching in different engines can be an iterative process - a keyword search can led to patents that you can run a citation search with - which in turn can suggest new keywords for a traditional search. In this way a second patent search engine can be used to provide a second opinion to the first patent search database you used. One should use as many as they can - they all have their strengths and weaknesses, and the results of patent searches can have large commercial consequences. Whatever works basically - patent searching can be a very non-linear process and the best and most confident searchers are happy to try a variety of new approaches.
Summarizing, data search engines do affect the outcome of the searches and one should choose their suitable database wisely.