Commit 4a466f9a5e790fd0e2e0269ac53ec89e87c11981

Authored by Brice COLOMBIER
1 parent 160a9caea8
Exists in master

Add filter_highest_variance and improve README

Showing 4 changed files with 125 additions and 16 deletions

... ... @@ -3,4 +3,5 @@
3 3 *.txt
4 4 *.*~
5 5 *.org
  6 +flymd.*
1   -# Parallel traces preprocessing
  1 +# Scripts for preprocessing power traces
2 2  
3 3 ## Description
4 4  
5   -This script applies preprocessing to power traces by combining pairs of samples.
6   -The possible pairs of samples are taken inside a sliding window which slides over the trace.
  5 +The following preprocessing scripts are available:
  6 +
  7 +* **pairwise_operation**
  8 +
  9 + This script combines pairs of samples.
  10 +The possible pairs of samples are taken inside a sliding window over the trace.
7 11 The operation used to combine the samples can be chosen.
8 12 Thanks to Python `multiprocessing` package, the trace is split into blocks that are processed in parallel.
9 13  
10   -Combining pairs of samples allows to launch a first-order CPA on a first-order masked implementation, which would otherwise require a second-order CPA.
  14 + Combining pairs of samples allows to launch a first-order CPA on a first-order masked implementation, which would otherwise require a second-order CPA.
  15 +
  16 +* **downsample**
11 17  
  18 + This script allows to reduce the size of the traces by keeping only every n<sup>th</sup> sample in the trace starting at a specified offset.
  19 +
  20 +* **filter\_highest\_variance**
  21 +
  22 + This script allows to identify points of interest in the trace by keeping only a ratio of the samples with the highest variance.
  23 +
12 24 ## Install
13 25  
14 26 ```bash
15 27  
16 28  
17 29  
18 30  
19 31  
... ... @@ -25,20 +37,68 @@
25 37  
26 38 ## Usage
27 39  
28   -The script is typically used in the following manner:
  40 +These scripts take one positional parameter and multiple keyword arguments.
  41 +The positional parameter is the file in which the traces are stored in `numpy` format.
29 42  
  43 +
  44 +* **pairwise_operation**
  45 +
  46 +To perform parallel multiplication of samples on 4 cores using a sliding window of 5 samples and all possible pairs of samples:
  47 +
30 48 ```bash
31   -python preprocessing.py masked_traces.npy --op=multiplication --window_size=5 --min_dist=1 --dtype=float64 --ncores=4
  49 +python pairwise_operation.py masked_traces.npy --op=multiplication --window_size=5 --min_dist=1 --dtype=float64 --ncores=4
32 50 ```
33 51  
34   -The parameter is the file in which the traces are stored in `numpy` format.
35   -Options are detailed below.
  52 +To perform parallel absolute difference of samples on 16 cores using a sliding window of 100 samples and pairs of samples that are at least 80 samples away from one another:
36 53  
37   -## Options
  54 +```bash
  55 +python pairwise_operation.py masked_traces.npy --op=absolute_difference --window_size=100 --min_dist=80 --dtype=float64 --ncores=16
  56 +```
38 57  
39   -- `--op`: operation to compute on the pair of samples. Should belong to `{'addition','multiplication','squared_addition','absolute_difference'}`. In DPA book it is said that `absolute difference` is a good choice for second-order CPA attacks that leak the Hamming weight.
40   -- `--window_size`: width of the sliding window
41   -- `--min_dist`: minimum distance between two samples in a pair
42   -- `--dtype`: `numpy` data type for the samples of the processed trace
43   -- `--ncores`: number of cores to use for the parallel computation
  58 +* **downsample**
  59 +
  60 +To keep only every 4<sup>th</sup> sample starting from sample 10.
  61 +
  62 +```bash
  63 +python downsample.py masked_traces.npy --factor=4 --offset=10
  64 +```
  65 +
  66 +* **filter\_highest\_variance**
  67 +
  68 +To keep only the 1% samples with the highest variance:
  69 +```bash
  70 +python filter_highest_variance.py masked_traces.npy --ratio=0.01
  71 +```
  72 +
  73 +To keep only the 100 samples with the highest variance:
  74 +
  75 +```bash
  76 +python filter_highest_variance.py masked_traces.npy --nsamples=100
  77 +```
  78 +
  79 +
  80 +## Keyword arguments
  81 +
  82 +* **pairwise_operation**
  83 +
  84 + - `--op`: the operation to compute on the pair of samples. It should belong to `{'addition','multiplication','squared_addition','absolute_difference'}`
  85 +
  86 + In DPA book it is said that `absolute difference` is a good choice for second-order CPA attacks that leak the Hamming weight
  87 + - `--window_size`: the width of the sliding window
  88 + - `--min_dist`: the minimum distance between two samples in a pair
  89 + - `--dtype`: `numpy` the data type for the samples of the processed trace
  90 + - `--ncores`: the number of cores to use for the parallel computation
  91 +
  92 +* **downsample**
  93 +
  94 + - `--factor`: the downsampling factor n, to keep only every n<sup>th</sup> sample
  95 + - `--offset`: the offset at which downsampling starts
  96 +
  97 +* **filter\_highest\_variance**
  98 +
  99 + - `--ratio`: the ratio of samples with highest variance to keep
  100 +
  101 + **OR**
  102 + - `--nsamples`: the number of samples with highest variance to kee
  103 +
filter_highest_variance.py View file @ 4a466f9
  1 +import numpy as np
  2 +import argparse
  3 +
  4 +def filter_highest_variance(traces, ratio=0, nsamples=0):
  5 +
  6 + """
  7 + Extracts the samples with the highest variance along with their indices
  8 +
  9 + Keyword arguments:
  10 + traces: numpy array holding the traces
  11 + ratio: ratio of samples to keep
  12 +
  13 + Returns:
  14 + traces: numpy array containing the filtered trace
  15 + indexes: the indexes of the samples of interest
  16 + """
  17 +
  18 + if ratio:
  19 + nb_to_keep = np.shape(traces)[1]*ratio
  20 + elif nsamples:
  21 + nb_to_keep = nsamples
  22 + to_keep = np.argpartition(np.var(traces, axis=0), -nb_to_keep)[-nb_to_keep:]
  23 + permutation = to_keep.argsort()
  24 + return traces[:,to_keep[permutation]], to_keep[permutation]
  25 +
  26 +if __name__ == "__main__":
  27 +
  28 + # Parsing arguments
  29 + parser = argparse.ArgumentParser(description='Preprocess traces')
  30 + parser.add_argument("traces_name", type=str)
  31 + group = parser.add_mutually_exclusive_group()
  32 + group.add_argument("--ratio", type=float)
  33 + group.add_argument("--nsamples", type=int)
  34 + args = parser.parse_args()
  35 +
  36 + fake_nb_samples = 10
  37 + fake_nb_traces = 2
  38 + test_array = np.random.random_integers(100, size=(fake_nb_traces, fake_nb_samples))
  39 + traces = test_array
  40 + # Load traces from file
  41 + # traces = np.load(args.traces_name)
  42 +
  43 + print traces
  44 + if args.ratio:
  45 + filtered_variance_traces = filter_highest_variance(traces, ratio=args.ratio)
  46 + elif args.nsamples:
  47 + filtered_variance_traces = filter_highest_variance(traces, nsamples=args.nsamples)
  48 + print filtered_variance_traces
  49 + np.save("filtered_variance_"+str(args.ratio)+"_"+args.traces_name, filtered_variance_traces)
pairwise_operation.py View file @ 4a466f9
... ... @@ -32,7 +32,7 @@
32 32 in the window_size with distance(x_i, x_j) > minimum_distance.
33 33  
34 34 Keyword arguments:
35   - traces_name: name of the file storing the traces
  35 + traces: numpy array holding the traces
36 36 window_size: size of the window in which pairwise operation is done
37 37 minimum_distance: minimum distance between two samples processed
38 38 operation: processing operation to apply on the pair of samples