Application Fields



The majority of chemical structure information in the literature (including patents) is present as two-dimensional graphical representations. These images can be very easily interpreted by the chemist, but pose a large problem to the computer. For example the following figure shows the same molecule drawn with two different tools. So far the computer cannot perceive this equivalence from the picture itself. Therefore you cannot search for molecules in pictures or index documents with pictures. E.g. try to search the Chemical Structure Lookup Service (CSLS) with an image:

 

Two depictions of Azithromycin.

 

On the other hand - if the picture is converted into a connection table, there exist several chemoinformatics algorithms to solve this problem. After the conversion process a lot of information on the molecule can be directly computed or retrieved from chemical databases. So why not use the corresponding SMILES:

 

CN1C(C(C(C)(C(OC(C(C(C(C(C(CC(C1)C)(C)O)OC1C(O)C(N(C)C)CC(C)O1)C)OC1CC(OC)(C)C(O)C(C)O1)C)=O)CC)O)O)C

 

So chemoCRTM is for

of chemical molecules in depictions.

 

In this highly interdisciplinary domain, interesting information is often presented as a combination of text and graphics. Combining textual information extraction methods with chemoCRTM for the multimodal information extraction of Markush structures from patents and from QSAR tables has not yet been addressed.

There is an example patent page showing a Markush structure to left. This functionality will be part of future work.

 

At the moment we are looking into reaction schemes.

 

 

Please contact us if you want to start a collaboration on these topics.

Publishing Notes - Contact