The exomes of four genotyped normal 957054-30-7 genome HapMap samples ; NA12761, NA12813, and NA12892 were sequenced and analysed in an identical manner to Capan-1 in order to normalize for copy number variation and filter for common genome polymorphisms. All SCH727965 variants detected in the HapMap samples were disregarded in Capan-1 as these were most likely to be false positive or non-somatic. The remaining variants were subsequently filtered. The aCGH data was used to estimate copy number status of each genomic region, and this was incorporated into the filtration. Heterozygous variants in single copy regions were discarded, and elsewhere, a minimum number of reads bearing the variant allele per copy was required. The identification of indels based on alignment analyses is more biased that SNP identification, leading to different variant features. Hence, we used different filtering premises depending on whether the variant was a SNP or an indel, but took into consideration the copy number status in both cases. For SNP filtering, the concordant genotypes for all four HapMap samples were used to establish that SNPs with a variant rate greater then 0.88 or less than 0.10 should be considered as homozygous variants for variant and reference allele, respectively. We observed that the heterozygous variant rate fluctuated from 0.33 to 0.67 . In order to discard false variants located in low depth regions , we applied a confidence threshold of 10 reads per genomic copy. For indel filtering, variants with a variant rate greater than 0.81 were considered to be homozygous. A threshold of 10 reads per genomic copy was applied, and only those variants where the number of reads bearing the variant allele was 0.75x the number of reads estimated to correspond to one genomic copy were considered . After filtering processes, remaining variants were classified according to their functional consequences. We used an in-house perl script to extract this information from Ensembl using the PerAPI application, checking functional consequences of each variant in every affected transcript for the gene. We also distinguished between previously described and novel variants using this tool. Structural variations. These were identified using BreakDancer with default parameters. Filtering process was based on depth, keeping those rearrangements supported by at least 10 different mate pairs.