The Similarity Measurement of Human DNA Profile Using Fuzzy Similarity

This research investigated the similarity of human DNA profile using fuzzy similarity measure. The similarity measurement of DNA profile had been done by measuring the similarity between query’s DNA profile and its biological family such as father, mother, brother, sister, grandmother and grandfather. The similarity measurement had been done to the short tandem repeat (STR) alleles in sixteen loci. The result of the experiment showed that each simulation gave matching result. This research is useful for Indonesian National Police (POLRI) in identifying process of disaster victim, terrorism victim and other criminal conduct. This is an open acces article under the CC-BY license. This is an open access article distributed under the Creative Commons 4.0 Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©2017 by author. Corresponding Author : Meira Parma Dewi Department of Mathematics, Faculty of Mathematics and Natural Science (FMIPA) Universitas Negeri Padang, Padang, Indonesia Email : meiraparma@fmipa.unp.ac.id

The identity of the victims can be investigated, DNA comparison profile data from the biological families of the victims are needed. Measurement of the similarity of the profile of human DNA can be done by comparing the STR of each locus. In comparing the two alleles the STR value must be exactly the same, but due to the possibility of shifting the STR value if the STR values being compared are not really the same it will be considered mismatched. To be able to measure the similarity of two prfile DNA alleles that have shifted used fuzzy similarity measurement.

Experimental Section Fuzzy Inference System
Measurement of the similarity of DNA profiles will be done by comparing each allele at 16 loci of the DNA profile of the victim with a DNA profile from the reference to the alleged biological family of the victim. In this case the family used as a comparison of the DNA profile is the father, mother, grandfather and grandmother on the part of the father and mother. Grandparents will be used as a reference if one or both parents are not present.
The rules in matching the DNA profile of the victim with the DNA profile of the father and mother are  If the father's DNA profile data is not available then in the process of measuring the similarity the DNA profile will use the DNA profile of the father and biological mother of the victim's father as a substitute for the victim's father. the process of measuring the resemblance of a locus does not change, where one locus must match / similar to the father's reference (ref_A) and the other must match / similar to the reference from the mother's party (Reff_B).

Membership Functions of Input Variables and Output Variables
Input variables have three membership functions, namely small, medium and large.
This membership function is described as follows:

Figure 1. Input Variable Membership Function
The output variable has three membership functions, namely low value 0, medium value 0.5 and high value 1.
Fuzzy rules used are as follows: If (father is small) And (mother is small) then (similarity is low) If (father is small) And (mother is medium) then (similarity is low) If (father is small) And (mother is big) then (similarity is medium) If (father is medium) And (mother is small) then (similarity is low) If (father is medium) And (mother is medium) then (similarity is medium) If (father is medium) And (mother is big) then (similarity is high) ISSN : 1411 3724 Eksakta : Berkala Ilmiah Bidang MIPA If (father is big) And (mother is small) then (similarity is medium) If (father is big) And (mother is medium) then (similarity is high) If (dad is big) And (mom is big) then (similarity is high) If (mother is small) And (grandfather is not big) And (grandmother is Small) Then (similarity is low) If (mother is small) And (grandfather is big) And (grandmother is Small) Then (similarity is medium) If (mother is small) And (grandfather is small) And (grandma is big) Then (similarity is medium) If (mother is small) And (grandfather is small) And (grandma is not big) Then (similarity is low) If (mother is medium) And (grandfather is not big) And (grandmother is Small) Then (similarity is low) If (mother is medium) And (grandfather is big) And (grandmother is Small) Then (similarity is high) If (mother is medium) And (grandfather is small) And (grandmother is large) Then (similarity is high) If (mother is medium) And (grandfather is small) And (grandma is not big) Then (similarity is medium) If (mother is big) And (grandfather is not big) And (grandmother is Small) Then (similarity is medium) If (mother is big) And (grandfather is big) And (grandmother is Small) Then (similarity is high) If (mother is big) And (grandfather is small) And (grandma is big) Then (similarity is high) If (mother is big) And (grandfather is small) And (grandmother is not big) Then (similarity is medium) If (father is small) And (grandfather of mother is not big) And (grandmother of mother is small) Then (similarity is low) If (father is small) And (grandfather of mother is big) And (grandmother of mother is small) Then (similarity is medium) If (father is small) And (grandfather of mother is small) And (grandmother of mother is large) Then (similarity is medium) If (father is small) And (grandfather of mother is small) And (grandmother of mother is not big) Then (similarity is low) If (father is medium) And (grandfather of mother is not big) And (grandmother of mother is small) Then (similarity is low) If (father is medium) And (grandfather of mother is large) And (grandmother of mother is small) Then (similarity is high) If (father is medium) And (grandfather of mother is small) And (grandmother of mother is large) Then (similarity is high) If (father is medium) And (grandfather of mother is small) And (grandmother of mother is not big) Then (similarity is medium) If (father is big) And (grandfather of mother is not big) And (grandmother of mother is small) Then (similarity is medium) If (father is big) And (grandfather of mother is big) And (grandmother of mother is small) Then (similarity is high) If (father is big) And (grandfather of mother is small) And (grandmother of mother is big) Then (similarity is high) If (father is big) And (grandfather of mother is small) And (grandmother of mother is not big) Then (similarity is medium)

Meira Parma Dewi, Nurtami Soedarsono
If (grandfather is not big) and (grandmother is not big) and (grandfather of mother is not big) and (grandmother of mother is not big) then (similarity is low) If (grandfather is big) and (grandmother is not big) and (grandfather of mother is not big) and (grandmother of mother is not big) then (similarity is medium) If (grandfather is not big) and (grandmother is big) and (grandfather of mother is not big) and (grandmother of mother is not big) then (similarity is medium) If (grandfather is not big) and (grandmother is not big) and (grandfather of mother is big) and (grandmother of mother is not big) then (similarity is medium) If (grandfather is not big) and (grandmother is not big) and (grandfather of mother is not big) and (grandmother of mother is big) then (similarity is medium) If (grandfather is big) and (grandmother is not big) and (grandfather of mother is big) and (grandmother of mother is not big) then (similarity is high) If (grandfather is big) and (grandmother is not big) and (grandfather of mother is not big) and (grandmother of mother is big) then (similarity is high) If (grandfather is not big) and (grandmother is big) and (grandfather of mother is big) and (grandmother of mother is not big) then (similarity is high) If (grandfather is not big) and (grandmother is big) and (grandfather of mother is not big) and (grandmother of mother is big) then (similarity is high)

Measurement of Allel Similarity
Fuzzy similarity measurements of DNA profiles are done by measuring the similarity of an allele. Assuming that a triangular allele with a short tendem repeat (STR) of an allele shows the middle value, the distance of the two legs is the same ie 0.4 and the height of the allele is equal to 1.

Results and Discussion
The data used as input for the system is a complete DNA profile data consisting of 16 loci, each consisting of two alleles. To enter DNA profile data into the system, it is done manually. Data obtained from the identification of biological evidence (DNA evidence) by a PCR machine in the form of an electropherogram still contains noise. This noise is not taken into account in determining a person's DNA profile. So for each DNA profile loci on the electropherogram only the two highest signals are read. This signal shows the alleles from the relevant loci. If there are two high signals in a loci, then both are alleles, but if there is only one signal that is significantly high compared to the surrounding noise, the first allele and the second allele for the relevant loci have the same value.

Figure 3. Electropherogram Signal Indicating an Allele from the Locus
Alleles are represented as an isosceles triangle, where the distance between the two legs is 0.4 and height is 1, the midpoint of the two feet is the STR value indicated by an allele. For a loci, each allele will be measured for its fuzzy similarity to the alleles that are in reference to the same loci. Similarity value is obtained by adding up the similarity value of each loci divided by 32.
Measurement of the similarity of DNA profiles using fuzzy similarity measurements is done by giving a similarity value to each allele which then produces a similarity value from a locus. The average of the similarity values of all loci is the similarity value of the DNA profile. DNA profile matches can be said to match if similarity values> 0.5.

Conclusions
DNA profile is a biological fingerprint that is owned by every human being that can distinguish the identity of an individual with other individuals. To facilitate the process of identifying victims of disaster, DNA reference profiles from the biological families of victims are needed. Fuzzy similarity measurement is used in the process of measuring the similarity of a human DNA profile because the alleles shown at the DNA profile locus can experience a shift caused by several factors. If the Meira Parma Dewi, Nurtami Soedarsono alleles shift in the range of 0.2 to the right or 0.2 to the left, a similarity value of 0.5 will be obtained so that the two alleles being compared can be said to be suitable or similar. To conclude a person's DNA profile with a reference said to be similar then the value of each allele from each DNA profile locus is summed then divided by 32, so that a similarity value> = 0.5 is obtained.

Acknowledgement
Thank you to Almarhum Dr. M. Rahmat Widyanto, M. Eng who provided a lot of help and contributions. This research was part of my master's research while studying for a master's degree at the University of Indonesia.