example of fingerprint formats

View: New views
4 Messages — Rating Filter:   Alert me  

example of fingerprint formats

by Andrew Dalke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

I'm looking for examples of existing fingerprint file formats.
Can anyone here provide pointers to documentation (preferably)
or to code?

I know of:
   - an in-house format used by a client
   - the text oriented one from Mesa Analytics
        http://www.mesaac.com/Fingerprint.htm
   - two in OpenBabel

as well as a few ways to embed fingerprint data inside another data  
stream:
   - Daylight TDT (through their encoding)
   - PubChem encoding (using base64)

In searching the CDK code, the relevant code appears to be
cdk/applications/FingerPrinter.java but I'm not sure about that.
For example, that program has no way to specify which fingerprinter
to use.

For each record of input that program generates lines of the form:

Hit molecule's remark: REMARK
Fingerprint= {2, 4, 10}

where
  REMARK comes from the record
  the values 2, 4, 10 come from the 'on' bits in the fingerprint
  the syntax "{2, 4, 10}" is defined by Java's BitSet toString method.

Am I missing something?

                                Andrew
                                dalke@...



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Cdk-devel mailing list
Cdk-devel@...
https://lists.sourceforge.net/lists/listinfo/cdk-devel

Re: example of fingerprint formats

by Rajarshi Guha-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jul 19, 2008, at 11:29 AM, Andrew Dalke wrote:

>
> I know of:
>    - an in-house format used by a client
>    - the text oriented one from Mesa Analytics
>         http://www.mesaac.com/Fingerprint.htm
>    - two in OpenBabel
>
> as well as a few ways to embed fingerprint data inside another data
> stream:
>    - Daylight TDT (through their encoding)
>    - PubChem encoding (using base64)

There is also the BCI plaintext text format (-bci) and the MOE plain  
text format

> In searching the CDK code, the relevant code appears to be
> cdk/applications/FingerPrinter.java but I'm not sure about that.
> For example, that program has no way to specify which fingerprinter
> to use.

That's correct, it hasn't been updated in quite a while.

> For each record of input that program generates lines of the form:
>
> Hit molecule's remark: REMARK
> Fingerprint= {2, 4, 10}
>
> where
>   REMARK comes from the record
>   the values 2, 4, 10 come from the 'on' bits in the fingerprint
>   the syntax "{2, 4, 10}" is defined by Java's BitSet toString method.
>
> Am I missing something?

No

- -------------------------------------------------------------------
Rajarshi Guha  <rguha@...>
GPG Fingerprint: D070 5427 CC5B 7938 929C  DD13 66A1 922C 51E7 9E84
- -------------------------------------------------------------------
Finally I am becoming stupider no more
   - Paul Erdos' epitaph


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiCDHIACgkQZqGSLFHnnoTXWwCdEvwmG3EJ7kpQXSPLoC4HFgsj
Y7MAnRgscisYwPj6Z0M5odceNAO7mrPR
=QI8B
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Cdk-devel mailing list
Cdk-devel@...
https://lists.sourceforge.net/lists/listinfo/cdk-devel

Re: example of fingerprint formats

by Andrew Dalke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> There is also the BCI plaintext text format (-bci) and the MOE  
> plain text format

Do you happen to have a pointer to documentation?
And/or examples?

                                Andrew
                                dalke@...



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Cdk-devel mailing list
Cdk-devel@...
https://lists.sourceforge.net/lists/listinfo/cdk-devel

Re: example of fingerprint formats

by Rajarshi Guha-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jul 19, 2008, at 3:26 PM, Andrew Dalke wrote:
>> There is also the BCI plaintext text format (-bci) and the MOE
>> plain text format
>
> Do you happen to have a pointer to documentation?

Unfortunately I can't find the docs for the BCI or MOE formats

> And/or examples?

However an example of the BCI 1052 fingerprint format for a 1-
molecule SMILES file:

MAKEBITS 7.0.3 1052 /usr/local/bci/support_data/bci1052.dic x.smi
CC1CCC2C(C)C(=O)OC3OC4(C)CCC1C23OO4 1 2 4 5 6 7 14 15 17 35 83 144  
145 149 300 301 302 360 363 364 373 382 405 413 455 456 467 484 505  
509 558 683 688 689 708 712 827 840 877 879 898 908 909 965 1029 1030  
0 46


The first line is always a header, containing the fingerprint size,  
dictionary file name and input smiles file name. Then each line  
represents the fingerprint for a given molecule.

With each fingerprint line:

Field 1: Name of the molecule, or the SMILES itself if no name is given
Field 2 - N: the positions that are set to 1
Field N+1: always 0
Field N+2: the number of bits that are set to one in the fingerprint


For the MOE MACCS keys, an example is

"FP:MACCS"
"46 106 107 112 134"
"46 106 107 134"
"46 103 106 107 134"
"46 74 107 134 149 160"

so it's one line per input molecule and within a line, the bit  
positions that are set to 1


- -------------------------------------------------------------------
Rajarshi Guha  <rguha@...>
GPG Fingerprint: D070 5427 CC5B 7938 929C  DD13 66A1 922C 51E7 9E84
- -------------------------------------------------------------------
C Code.
C Code Run.
Run, Code, RUN!
        PLEASE!!!!


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiCTGcACgkQZqGSLFHnnoQpDACdHyEq/ECg1MYtDslRFnyYJCH5
zOoAn2ensS6FuAQ4qXvYhvzXvULfINAw
=pSFe
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Cdk-devel mailing list
Cdk-devel@...
https://lists.sourceforge.net/lists/listinfo/cdk-devel
LightInTheBox - Buy quality products at wholesale price